From: bdharring <bdharring@wisc.edu>
To: gentoo-dev@gentoo.org, mbp@samba.org
Subject: Re: [gentoo-dev] Re: proposed md5sum change
Date: Sun, 22 Jun 2003 23:00:30 -0500 [thread overview]
Message-ID: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu> (raw)
In-Reply-To: <pan.2003.06.23.02.06.32.327923@sourcefrog.net>
Responses/cheer-leading littered liberally below...
On Sunday, June 22, 2003, at 09:41 PM, Martin Pool wrote:
> On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote:
>
>> Hola all,
>> Straight to the point, I propose instead of md5summing the compressed
>> distfile, we md5sum the actual data, the tarball.
>
> Speaking as somebody who has worked on rsync and librsync: I agree, I
> think that would be an big improvement.
Heh, small world. I'd actually read of the original complaint of it I
in tridgell's master thesis while researching delta compression for my
own little prog...
> The uncompressed form is the natural and efficient place to do delta
> compression.
Agreed, although I would posit that decompressing a large bzip2 for
md5suming in memory makes it a substantially longer affair then if you
just md5'd the compressed tarball. On my personal system,
compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes. More
below...
> Seemant Kuleen wrote:
>
>> Now, the promised concern bit. Unfortunately, while the majority of
>> the
>> packages do come in a compressed tarball format, there are many
>> (enough to
>> make it a corner case of some concern) packages which do not. Off
>> the top
>> of my head, I can think of .Z (forget which package), .rpm
>> (redhat-artwork), .bin (realplayer). And in some cases, we just get
>> an
>> uncompressed README file in the SRC_URI (or the wacom.c file in xfree,
>> though I'm not certain of it right this moment).
>
> .Z files can be uncompressed and handled as for gzip (I think gzip
> handles them in fact.)
>
> .zip, .rpm, or self-extracting .exe files can also be uncompressed and
> diffd, at least in principle.
Summing it up, if we can pull it apart and get the uncompressed data,
we md5 that data. If we can't, well I've yet to see any diff prog
(aside from xdelta's lackluster gzip support) that even does
decompression of data, so it's a non-issue for the moment...
>
> Experience on Debian has shown that compiled binaries in general do
> not delta-compress very well, so I think not being able to uncompress
> them is not a terrible thing.
Horribly badly actually. Problem being of course that you change
offset x, everything after x is different... tiz the reason I was
looking at md5ing the data, since to get any decent delta compression
you have to decompress... but you likely know that so I'll shut up now.
>
> The point:
>
> Gentoo should distribute the md5sums for both the compressed and
> uncompressed forms of packages. They are checked in that order;
> either is sufficient.
That would solve the initial complaint I had mentioned about speed
above. I like it, and it's a general solution allowing the user more
control over how their distfiles are stored (aside from making delta
compression much easier to do).
>
> Regular non-delta downloads will proceed as usual, and the md5sum can
> be checked immediately after download. There is no added cost.
>
> Patch downloads can be done by
>
> - download xdelta
> - uncompress old file, pipe it into 'xdelta patch', store the result
> - check result against uncompressed MD5sum
>
> As far as I can see this removes any need for a special deltup file
> format. Just simply send xdeltas.
I'd agree. My understanding for why the deltup format, from what I've
gathered trolling the forums, jjw's attempting to build his own
differencing/encoding setup which is a fair amount of work speaking
from experience. A side note for doing gentoo delta patching is that
(imo) it ought to in some form provide for standard diff's since any
version patches that are distributed currently are typically diff (look
at the kernel for instance).
Either way, back to adult swim...
~brian
--
gentoo-dev@gentoo.org mailing list
next prev parent reply other threads:[~2003-06-23 4:00 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-06-23 2:41 [gentoo-dev] Re: proposed md5sum change Martin Pool
2003-06-23 4:00 ` bdharring [this message]
2003-06-23 4:43 ` Martin Pool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu \
--to=bdharring@wisc.edu \
--cc=gentoo-dev@gentoo.org \
--cc=mbp@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox