public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [gentoo-dev] Re: proposed md5sum change
  @ 2003-06-23  4:43 99%   ` Martin Pool
  0 siblings, 0 replies; 1+ results
From: Martin Pool @ 2003-06-23  4:43 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]

On 22 Jun 2003, bdharring <bdharring@wisc.edu> wrote:

> >The uncompressed form is the natural and efficient place to do delta
> >compression.

> Agreed, although I would posit that decompressing a large bzip2 for
> md5suming in memory makes it a substantially longer affair then if
> you just md5'd the compressed tarball.  On my personal system,
> compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes.
> More below...

Yes, if the user is downloading a compressed form, then it makes sense
to calculate the hash of the compressed form when checking if
e.g. they got an interrupted or corrupt download.  

But aside from that, including the time to decompress as a cost of
checking the MD5 sum is a furphy.  It has to be decompressed at some
point whether to patch it or to build it.  You can check the MD5sum
then.

Note that xdelta patches in fact include the MD5 checksum of the
output file, so checking it is a bit redundant.

> >.zip, .rpm, or self-extracting .exe files can also be uncompressed and
> >diffd, at least in principle.
> Summing it up, if we can pull it apart and get the uncompressed data, 
> we md5 that data.  If we can't, well I've yet to see any diff prog 
> (aside from xdelta's lackluster gzip support) that even does 
> decompression of data, so it's a non-issue for the moment...

Yes, if we can decompress it then we do.  Otherwise we just do the
xdelta across the whole file.  In either case, if the delta is
ridiculously large, then we discard it.

> I'd agree.  My understanding for why the deltup format, from what I've 
> gathered trolling the forums, jjw's attempting to build his own 
> differencing/encoding setup which is a fair amount of work speaking 
> from experience.

I think the right thing is to use the VCDIFF format, which allows
standard expression of deltas regardless of the algorithm that
generates them.  I understand that xdelta is moving towards this and
librsync will too eventually.

> A side note for doing gentoo delta patching is that (imo) it ought
> to in some form provide for standard diff's since any version
> patches that are distributed currently are typically diff (look at
> the kernel for instance).

That would be OK, but I'm actually inclined to think that it would be
better to recode diffs into xdeltas.  xdeltas are often 5-10x smaller
than a compressed diff, because they don't include redundant context.

diffs are great for humans or for fuzzy merges.  As a
delta-compression mechanism they're pretty lame.

-- 
Martin 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[relevance 99%]

Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2003-06-23  2:41     [gentoo-dev] Re: proposed md5sum change Martin Pool
2003-06-23  4:00     ` bdharring
2003-06-23  4:43 99%   ` Martin Pool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox