From: Martin Pool <mbp@samba.org>
To: gentoo-dev@gentoo.org
Subject: Re: [gentoo-dev] Re: proposed md5sum change
Date: Mon, 23 Jun 2003 14:43:01 +1000 [thread overview]
Message-ID: <20030623044300.GA20153@vexed.ozlabs.hp.com> (raw)
In-Reply-To: <3AE46E76-A52F-11D7-8DBC-00306580AC5C@wisc.edu>
[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]
On 22 Jun 2003, bdharring <bdharring@wisc.edu> wrote:
> >The uncompressed form is the natural and efficient place to do delta
> >compression.
> Agreed, although I would posit that decompressing a large bzip2 for
> md5suming in memory makes it a substantially longer affair then if
> you just md5'd the compressed tarball. On my personal system,
> compressed=>3-5s, bzip2 decompressing piped to md5 = 1-2 minutes.
> More below...
Yes, if the user is downloading a compressed form, then it makes sense
to calculate the hash of the compressed form when checking if
e.g. they got an interrupted or corrupt download.
But aside from that, including the time to decompress as a cost of
checking the MD5 sum is a furphy. It has to be decompressed at some
point whether to patch it or to build it. You can check the MD5sum
then.
Note that xdelta patches in fact include the MD5 checksum of the
output file, so checking it is a bit redundant.
> >.zip, .rpm, or self-extracting .exe files can also be uncompressed and
> >diffd, at least in principle.
> Summing it up, if we can pull it apart and get the uncompressed data,
> we md5 that data. If we can't, well I've yet to see any diff prog
> (aside from xdelta's lackluster gzip support) that even does
> decompression of data, so it's a non-issue for the moment...
Yes, if we can decompress it then we do. Otherwise we just do the
xdelta across the whole file. In either case, if the delta is
ridiculously large, then we discard it.
> I'd agree. My understanding for why the deltup format, from what I've
> gathered trolling the forums, jjw's attempting to build his own
> differencing/encoding setup which is a fair amount of work speaking
> from experience.
I think the right thing is to use the VCDIFF format, which allows
standard expression of deltas regardless of the algorithm that
generates them. I understand that xdelta is moving towards this and
librsync will too eventually.
> A side note for doing gentoo delta patching is that (imo) it ought
> to in some form provide for standard diff's since any version
> patches that are distributed currently are typically diff (look at
> the kernel for instance).
That would be OK, but I'm actually inclined to think that it would be
better to recode diffs into xdeltas. xdeltas are often 5-10x smaller
than a compressed diff, because they don't include redundant context.
diffs are great for humans or for fuzzy merges. As a
delta-compression mechanism they're pretty lame.
--
Martin
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
prev parent reply other threads:[~2003-06-23 4:44 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-06-23 2:41 [gentoo-dev] Re: proposed md5sum change Martin Pool
2003-06-23 4:00 ` bdharring
2003-06-23 4:43 ` Martin Pool [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030623044300.GA20153@vexed.ozlabs.hp.com \
--to=mbp@samba.org \
--cc=gentoo-dev@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox