From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12109 invoked by uid 1002); 23 Jun 2003 04:44:03 -0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 3789 invoked from network); 23 Jun 2003 04:44:02 -0000 Date: Mon, 23 Jun 2003 14:43:01 +1000 From: Martin Pool To: gentoo-dev@gentoo.org Message-ID: <20030623044300.GA20153@vexed.ozlabs.hp.com> Reply-To: gentoo-dev@gentoo.org References: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SLDf9lqlvOQaIe6s" Content-Disposition: inline In-Reply-To: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu> X-GPG: 1024D/A0B3E88B: AFAC578F 1841EE6B FD95E143 3C63CA3F A0B3E88B User-Agent: Mutt/1.5.4i Sender: Martin Pool Subject: Re: [gentoo-dev] Re: proposed md5sum change X-Archives-Salt: 110136ca-84da-4dc4-8c12-ee3ac368f3ba X-Archives-Hash: 2abe0e399af7a80b6cd5ed28c64ae392 --SLDf9lqlvOQaIe6s Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 22 Jun 2003, bdharring wrote: > >The uncompressed form is the natural and efficient place to do delta > >compression. > Agreed, although I would posit that decompressing a large bzip2 for > md5suming in memory makes it a substantially longer affair then if > you just md5'd the compressed tarball. On my personal system, > compressed=3D>3-5s, bzip2 decompressing piped to md5 =3D 1-2 minutes. > More below... Yes, if the user is downloading a compressed form, then it makes sense to calculate the hash of the compressed form when checking if e.g. they got an interrupted or corrupt download. =20 But aside from that, including the time to decompress as a cost of checking the MD5 sum is a furphy. It has to be decompressed at some point whether to patch it or to build it. You can check the MD5sum then. Note that xdelta patches in fact include the MD5 checksum of the output file, so checking it is a bit redundant. > >.zip, .rpm, or self-extracting .exe files can also be uncompressed and > >diffd, at least in principle. > Summing it up, if we can pull it apart and get the uncompressed data,=20 > we md5 that data. If we can't, well I've yet to see any diff prog=20 > (aside from xdelta's lackluster gzip support) that even does=20 > decompression of data, so it's a non-issue for the moment... Yes, if we can decompress it then we do. Otherwise we just do the xdelta across the whole file. In either case, if the delta is ridiculously large, then we discard it. > I'd agree. My understanding for why the deltup format, from what I've=20 > gathered trolling the forums, jjw's attempting to build his own=20 > differencing/encoding setup which is a fair amount of work speaking=20 > from experience. I think the right thing is to use the VCDIFF format, which allows standard expression of deltas regardless of the algorithm that generates them. I understand that xdelta is moving towards this and librsync will too eventually. > A side note for doing gentoo delta patching is that (imo) it ought > to in some form provide for standard diff's since any version > patches that are distributed currently are typically diff (look at > the kernel for instance). That would be OK, but I'm actually inclined to think that it would be better to recode diffs into xdeltas. xdeltas are often 5-10x smaller than a compressed diff, because they don't include redundant context. diffs are great for humans or for fuzzy merges. As a delta-compression mechanism they're pretty lame. --=20 Martin=20 --SLDf9lqlvOQaIe6s Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE+9oVUPGPKP6Cz6IsRArp+AJ4tbQVaebReW7kszacld8doR9KQ/ACgpMem 9SBi9jpGXNmmJnxLxXGMP5o= =YJc4 -----END PGP SIGNATURE----- --SLDf9lqlvOQaIe6s--