public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Tiziano Müller" <dev-zero@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation
Date: Sun, 08 Feb 2009 12:51:19 +0100	[thread overview]
Message-ID: <1234093879.24784.2819.camel@localhost> (raw)
In-Reply-To: <498E9EFE.2030807@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 3713 bytes --]

Am Sonntag, den 08.02.2009, 00:59 -0800 schrieb Zac Medico:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Tiziano Müller wrote:
> > Am Samstag, den 07.02.2009, 15:23 -0800 schrieb Zac Medico:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Tiziano Müller wrote:
> >>> Am Montag, den 02.02.2009, 12:34 -0800 schrieb Zac Medico:
> >>>> For the digest format, I suggest that we use the leftmost 10
> >>>> hexadecimal digits of the SHA-1 digest. The rationale for limiting
> >>>> it to 10 digits (out of 40) is to save space. Due to the avalanche
> >>>> effect [2], 10 digits should be sufficient to ensure that problems
> >>>> resulting from hash collisions are extremely unlikely.
> >>> I'd recommend to prefix the digest with a "{TYPE}" (like for hashed
> >>> passwords) to be able to change the digest algorithm as needed
> >>> (especially in regards to the current SHA successor competition).
> >>> This allows a future package manager which might use SHA-3 for hashing
> >>> (once it's released) to still check old digests. Furthermore it would
> >>> allow for easier transition and only needs a definition of allowed
> >>> hashes instead of a specific one.
> >> I like that idea. That way it's not necessary to bump the EAPI in
> >> order to change the hash function. So, a typical DIGESTS value might
> >> look like this:
You still have to bump the EAPI in case you want to use a new hash not
already available now (like SHA-3). The advantage of noting the used
hash is that new PMs can handle old metadata cache.

> >>
> >> SHA1 02021be38b a28b191904 3992945426 6ec21b29a3
> > 
> > Sleeping over it again I don't think that truncating a hash is a good
> > idea (truncating it from 40 to 10 digits makes the possibility of
> > collisions much much higher).
> 
> The probability of collision is much higher, but it's still
> relatively small. Given the "avalanche effect" that is typical of
> cryptographic hash functions, it's extremely unlikely that collision
> will occur in such a way that it will cause a problem for cache
> validation.
The "avalanche effect" as I understood it is required for a hash
function to avoid simple calculations of collisions (what the diffusion
is for crypto algorithms). So, small changes should affect as many
numbers in the hash as possible. But you don't have only small changes
here in case somebody patches an eclass, so, the only thing which counts
is the probability of a collision.

> 
> > But if you want to go this way, I'd say you should use something like
> > SHA1t (t for truncated) to make sure we can use full hashes once we feel
> > it's appropriate.
> 
> We could, but I think SHA1 would also be fine since one can infer
> from the length of the string that it's been truncated.
No, guessing is a bad thing here because it could be truncated because
of faulty metadata. But the main motivation is that if you write SHA1
everyone reading it expects it to be a full SHA1 hash, which it isn't.

But if your target is to reduce the size of the metadata cache, why
store the hashes of the eclasses in the ebuild's metadata and not in a
seperate dir? They have to be the same for every ebuild, don't they?
In case you have an average number of eclasses which is bigger than 4,
you can even store the full hash with less space used than with
truncated hashes for all eclasses.

-- 
-------------------------------------------------------
Tiziano Müller
Gentoo Linux Developer, Council Member
Areas of responsibility:
  Samba, PostgreSQL, CPP, Python, sysadmin
E-Mail     : dev-zero@gentoo.org
GnuPG FP   : F327 283A E769 2E36 18D5  4DE2 1B05 6A63 AE9C 1E30

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

  reply	other threads:[~2009-02-08 11:51 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-02 20:34 [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Zac Medico
2009-02-06  9:55 ` [gentoo-dev] " Markus Ullmann
2009-02-07  1:13   ` Zac Medico
2009-02-07 22:31 ` [gentoo-dev] " Tiziano Müller
2009-02-07 23:23   ` Zac Medico
2009-02-08  8:07     ` Tiziano Müller
2009-02-08  8:59       ` Zac Medico
2009-02-08 11:51         ` Tiziano Müller [this message]
2009-02-08 20:36           ` Zac Medico
2009-02-08 21:48             ` Tiziano Müller
2009-02-08 22:14               ` Zac Medico
2009-02-08 22:18     ` Ciaran McCreesh
2009-02-08 22:43       ` Zac Medico
2009-02-08 22:47         ` Ciaran McCreesh
2009-02-08 23:03           ` Zac Medico
2009-02-08 23:10             ` Ciaran McCreesh
2009-02-08 23:27               ` Zac Medico
2009-02-08 23:30                 ` Ciaran McCreesh
2009-02-08 23:40                   ` Zac Medico
2009-02-09 12:30                     ` Petteri Räty
2009-02-09 13:59                       ` Ciaran McCreesh
2009-02-09 14:15                         ` Petteri Räty
2009-02-09 14:18                           ` Ciaran McCreesh
2009-02-09 14:21                           ` Rémi Cardona
2009-02-09 20:19                       ` Zac Medico
2009-02-09 18:43         ` Ciaran McCreesh
2009-02-09 20:02           ` Zac Medico
2009-02-09 15:22     ` Tiziano Müller
2009-02-09 19:55       ` Zac Medico
2009-02-10 12:20         ` Brian Harring
2009-02-10 12:52           ` Nirbheek Chauhan
2009-02-10 20:55           ` Zac Medico
2009-02-11  9:00             ` Brian Harring
2009-02-11 10:01               ` Zac Medico
2009-02-14 13:18                 ` Brian Harring
2009-02-14 20:16                   ` Zac Medico
2009-02-15 22:51     ` Zac Medico
2009-02-15 23:15       ` Ciaran McCreesh
2009-02-15 23:26         ` Zac Medico
2009-02-15 23:30           ` Ciaran McCreesh
2009-02-15 23:56             ` Zac Medico
2009-02-16  0:06               ` Ciaran McCreesh
2009-02-16  0:48                 ` Zac Medico
2009-02-16  0:53                   ` Ciaran McCreesh
2009-02-16  0:54                   ` Zac Medico

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1234093879.24784.2819.camel@localhost \
    --to=dev-zero@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox