public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: bdharring <bdharring@wisc.edu>
To: gentoo-dev@gentoo.org, mbp@samba.org
Subject: Re: [gentoo-dev] Re: proposed md5sum change
Date: Sun, 22 Jun 2003 23:00:30 -0500	[thread overview]
Message-ID: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu> (raw)
In-Reply-To: <pan.2003.06.23.02.06.32.327923@sourcefrog.net>

Responses/cheer-leading littered liberally below...

On Sunday, June 22, 2003, at 09:41 PM, Martin Pool wrote:

> On Wed, 11 Jun 2003 11:02:02 -0500, Brian Harring wrote:
>
>> Hola all,
>> Straight to the point, I propose instead of md5summing the compressed
>> distfile, we md5sum the actual data, the tarball.
>
> Speaking as somebody who has worked on rsync and librsync: I agree, I
> think that would be an big improvement.
Heh, small world.  I'd actually read of the original complaint of it I 
in tridgell's master thesis while researching delta compression for my 
own little prog...

> The uncompressed form is the natural and efficient place to do delta
> compression.
Agreed, although I would posit that decompressing a large bzip2 for 
md5suming in memory makes it a substantially longer affair then if you 
just md5'd the compressed tarball.  On my personal system, 
compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes.  More 
below...
> Seemant Kuleen wrote:
>
>> Now, the promised concern bit.  Unfortunately, while the majority of 
>> the
>> packages do come in a compressed tarball format, there are many 
>> (enough to
>> make it a corner case of some concern) packages which do not.  Off 
>> the top
>> of my head, I can think of .Z (forget which package), .rpm
>> (redhat-artwork), .bin (realplayer).  And in some cases, we just get 
>> an
>> uncompressed README file in the SRC_URI (or the wacom.c file in xfree,
>> though I'm not certain of it right this moment).
>
> .Z files can be uncompressed and handled as for gzip (I think gzip
> handles them in fact.)
>
> .zip, .rpm, or self-extracting .exe files can also be uncompressed and
> diffd, at least in principle.
Summing it up, if we can pull it apart and get the uncompressed data, 
we md5 that data.  If we can't, well I've yet to see any diff prog 
(aside from xdelta's lackluster gzip support) that even does 
decompression of data, so it's a non-issue for the moment...
>
> Experience on Debian has shown that compiled binaries in general do
> not delta-compress very well, so I think not being able to uncompress
> them is not a terrible thing.
Horribly badly actually.  Problem being of course that you change 
offset x, everything after x is different... tiz the reason I was 
looking at md5ing the data, since to get any decent delta compression 
you have to decompress... but you likely know that so I'll shut up now.
>
> The point:
>
> Gentoo should distribute the md5sums for both the compressed and
> uncompressed forms of packages.  They are checked in that order;
> either is sufficient.
That would solve the initial complaint I had mentioned about speed 
above.  I like it, and it's a general solution allowing the user more 
control over how their distfiles are stored (aside from making delta 
compression much easier to do).
>
> Regular non-delta downloads will proceed as usual, and the md5sum can
> be checked immediately after download.  There is no added cost.
>
> Patch downloads can be done by
>
>  - download xdelta
>  - uncompress old file, pipe it into 'xdelta patch', store the result
>  - check result against uncompressed MD5sum
>
> As far as I can see this removes any need for a special deltup file
> format.  Just simply send xdeltas.
I'd agree.  My understanding for why the deltup format, from what I've 
gathered trolling the forums, jjw's attempting to build his own 
differencing/encoding setup which is a fair amount of work speaking 
from experience.  A side note for doing gentoo delta patching is that 
(imo) it ought to in some form provide for standard diff's since any 
version patches that are distributed currently are typically diff (look 
at the kernel for instance).
Either way, back to adult swim...
~brian


--
gentoo-dev@gentoo.org mailing list


  reply	other threads:[~2003-06-23  4:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-06-23  2:41 [gentoo-dev] Re: proposed md5sum change Martin Pool
2003-06-23  4:00 ` bdharring [this message]
2003-06-23  4:43   ` Martin Pool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu \
    --to=bdharring@wisc.edu \
    --cc=gentoo-dev@gentoo.org \
    --cc=mbp@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox