From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-dev-return-3923-arch-gentoo-dev=gentoo.org@gentoo.org>
Received: (qmail 12109 invoked by uid 1002); 23 Jun 2003 04:44:03 -0000
Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm
Precedence: bulk
List-Post: <mailto:gentoo-dev@gentoo.org>
List-Help: <mailto:gentoo-dev-help@gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev-unsubscribe@gentoo.org>
List-Subscribe: <mailto:gentoo-dev-subscribe@gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@gentoo.org
Received: (qmail 3789 invoked from network); 23 Jun 2003 04:44:02 -0000
Date: Mon, 23 Jun 2003 14:43:01 +1000
From: Martin Pool <mbp@samba.org>
To: gentoo-dev@gentoo.org
Message-ID: <20030623044300.GA20153@vexed.ozlabs.hp.com>
Reply-To: gentoo-dev@gentoo.org
References: <pan.2003.06.23.02.06.32.327923@sourcefrog.net> <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="SLDf9lqlvOQaIe6s"
Content-Disposition: inline
In-Reply-To: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu>
X-GPG: 1024D/A0B3E88B: AFAC578F 1841EE6B FD95E143 3C63CA3F A0B3E88B
User-Agent: Mutt/1.5.4i
Sender: Martin Pool <mbp@vexed.ozlabs.hp.com>
Subject: Re: [gentoo-dev] Re: proposed md5sum change
X-Archives-Salt: 110136ca-84da-4dc4-8c12-ee3ac368f3ba
X-Archives-Hash: 2abe0e399af7a80b6cd5ed28c64ae392

--SLDf9lqlvOQaIe6s
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 22 Jun 2003, bdharring <bdharring@wisc.edu> wrote:

> >The uncompressed form is the natural and efficient place to do delta
> >compression.

> Agreed, although I would posit that decompressing a large bzip2 for
> md5suming in memory makes it a substantially longer affair then if
> you just md5'd the compressed tarball.  On my personal system,
> compressed=3D>3-5s, bzip2 decompressing piped to md5 =3D 1-2 minutes.
> More below...

Yes, if the user is downloading a compressed form, then it makes sense
to calculate the hash of the compressed form when checking if
e.g. they got an interrupted or corrupt download. =20

But aside from that, including the time to decompress as a cost of
checking the MD5 sum is a furphy.  It has to be decompressed at some
point whether to patch it or to build it.  You can check the MD5sum
then.

Note that xdelta patches in fact include the MD5 checksum of the
output file, so checking it is a bit redundant.

> >.zip, .rpm, or self-extracting .exe files can also be uncompressed and
> >diffd, at least in principle.
> Summing it up, if we can pull it apart and get the uncompressed data,=20
> we md5 that data.  If we can't, well I've yet to see any diff prog=20
> (aside from xdelta's lackluster gzip support) that even does=20
> decompression of data, so it's a non-issue for the moment...

Yes, if we can decompress it then we do.  Otherwise we just do the
xdelta across the whole file.  In either case, if the delta is
ridiculously large, then we discard it.

> I'd agree.  My understanding for why the deltup format, from what I've=20
> gathered trolling the forums, jjw's attempting to build his own=20
> differencing/encoding setup which is a fair amount of work speaking=20
> from experience.

I think the right thing is to use the VCDIFF format, which allows
standard expression of deltas regardless of the algorithm that
generates them.  I understand that xdelta is moving towards this and
librsync will too eventually.

> A side note for doing gentoo delta patching is that (imo) it ought
> to in some form provide for standard diff's since any version
> patches that are distributed currently are typically diff (look at
> the kernel for instance).

That would be OK, but I'm actually inclined to think that it would be
better to recode diffs into xdeltas.  xdeltas are often 5-10x smaller
than a compressed diff, because they don't include redundant context.

diffs are great for humans or for fuzzy merges.  As a
delta-compression mechanism they're pretty lame.

--=20
Martin=20

--SLDf9lqlvOQaIe6s
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+9oVUPGPKP6Cz6IsRArp+AJ4tbQVaebReW7kszacld8doR9KQ/ACgpMem
9SBi9jpGXNmmJnxLxXGMP5o=
=YJc4
-----END PGP SIGNATURE-----

--SLDf9lqlvOQaIe6s--