From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2626 invoked by uid 1002); 23 Jun 2003 04:00:26 -0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 23637 invoked from network); 23 Jun 2003 04:00:26 -0000 Date: Sun, 22 Jun 2003 23:00:30 -0500 From: bdharring In-reply-to: To: gentoo-dev@gentoo.org, mbp@samba.org Message-id: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu> MIME-version: 1.0 X-Mailer: Apple Mail (2.552) Content-type: text/plain; format=flowed; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: [gentoo-dev] Re: proposed md5sum change X-Archives-Salt: b0ce190d-e0ef-4838-8419-57fd0edb4e05 X-Archives-Hash: fed170f0b08c593ea203713ec65f14d6 Responses/cheer-leading littered liberally below... On Sunday, June 22, 2003, at 09:41 PM, Martin Pool wrote: > On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote: > >> Hola all, >> Straight to the point, I propose instead of md5summing the compressed >> distfile, we md5sum the actual data, the tarball. > > Speaking as somebody who has worked on rsync and librsync: I agree, I > think that would be an big improvement. Heh, small world. I'd actually read of the original complaint of it I in tridgell's master thesis while researching delta compression for my own little prog... > The uncompressed form is the natural and efficient place to do delta > compression. Agreed, although I would posit that decompressing a large bzip2 for md5suming in memory makes it a substantially longer affair then if you just md5'd the compressed tarball. On my personal system, compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes. More below... > Seemant Kuleen wrote: > >> Now, the promised concern bit. Unfortunately, while the majority of >> the >> packages do come in a compressed tarball format, there are many >> (enough to >> make it a corner case of some concern) packages which do not. Off >> the top >> of my head, I can think of .Z (forget which package), .rpm >> (redhat-artwork), .bin (realplayer). And in some cases, we just get >> an >> uncompressed README file in the SRC_URI (or the wacom.c file in xfree, >> though I'm not certain of it right this moment). > > .Z files can be uncompressed and handled as for gzip (I think gzip > handles them in fact.) > > .zip, .rpm, or self-extracting .exe files can also be uncompressed and > diffd, at least in principle. Summing it up, if we can pull it apart and get the uncompressed data, we md5 that data. If we can't, well I've yet to see any diff prog (aside from xdelta's lackluster gzip support) that even does decompression of data, so it's a non-issue for the moment... > > Experience on Debian has shown that compiled binaries in general do > not delta-compress very well, so I think not being able to uncompress > them is not a terrible thing. Horribly badly actually. Problem being of course that you change offset x, everything after x is different... tiz the reason I was looking at md5ing the data, since to get any decent delta compression you have to decompress... but you likely know that so I'll shut up now. > > The point: > > Gentoo should distribute the md5sums for both the compressed and > uncompressed forms of packages. They are checked in that order; > either is sufficient. That would solve the initial complaint I had mentioned about speed above. I like it, and it's a general solution allowing the user more control over how their distfiles are stored (aside from making delta compression much easier to do). > > Regular non-delta downloads will proceed as usual, and the md5sum can > be checked immediately after download. There is no added cost. > > Patch downloads can be done by > > - download xdelta > - uncompress old file, pipe it into 'xdelta patch', store the result > - check result against uncompressed MD5sum > > As far as I can see this removes any need for a special deltup file > format. Just simply send xdeltas. I'd agree. My understanding for why the deltup format, from what I've gathered trolling the forums, jjw's attempting to build his own differencing/encoding setup which is a fair amount of work speaking from experience. A side note for doing gentoo delta patching is that (imo) it ought to in some form provide for standard diff's since any version patches that are distributed currently are typically diff (look at the kernel for instance). Either way, back to adult swim... ~brian -- gentoo-dev@gentoo.org mailing list