Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation

public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed

From: "Tiziano Müller" <dev-zero@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation
Date: Sun, 08 Feb 2009 22:48:57 +0100	[thread overview]
Message-ID: <1234129737.18160.191.camel@localhost> (raw)
In-Reply-To: <498F423A.8040604@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 4792 bytes --]

Am Sonntag, den 08.02.2009, 12:36 -0800 schrieb Zac Medico:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Tiziano Müller wrote:
> > Am Sonntag, den 08.02.2009, 00:59 -0800 schrieb Zac Medico:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Tiziano Müller wrote:
> >>> Am Samstag, den 07.02.2009, 15:23 -0800 schrieb Zac Medico:
> >>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>>> Hash: SHA1
> >>>>
> >>>> Tiziano Müller wrote:
> >>>>> Am Montag, den 02.02.2009, 12:34 -0800 schrieb Zac Medico:
> >>>> I like that idea. That way it's not necessary to bump the EAPI in
> >>>> order to change the hash function. So, a typical DIGESTS value might
> >>>> look like this:
> > You still have to bump the EAPI in case you want to use a new hash not
> > already available now (like SHA-3). The advantage of noting the used
> > hash is that new PMs can handle old metadata cache.
> 
> That's true.
> 
> >>>> SHA1 02021be38b a28b191904 3992945426 6ec21b29a3
> >>> Sleeping over it again I don't think that truncating a hash is a good
> >>> idea (truncating it from 40 to 10 digits makes the possibility of
> >>> collisions much much higher).
> >> The probability of collision is much higher, but it's still
> >> relatively small. Given the "avalanche effect" that is typical of
> >> cryptographic hash functions, it's extremely unlikely that collision
> >> will occur in such a way that it will cause a problem for cache
> >> validation.
> > The "avalanche effect" as I understood it is required for a hash
> > function to avoid simple calculations of collisions (what the diffusion
> > is for crypto algorithms). So, small changes should affect as many
> > numbers in the hash as possible. But you don't have only small changes
> > here in case somebody patches an eclass, so, the only thing which counts
> > is the probability of a collision.
> 
> Well, the avalanche effect helps in the sense that the leftmost 10
> digits would serve approximately as well as any other 10 digits out
> of all of them. But you're right about the probability of a
> collision being what really matters. With 10 hex digits, we've got a
> space of 16^10 = 1.1e12 possible combinations. Given a space that
> large, the probability of a collision pretty small.
> 
> >>> But if you want to go this way, I'd say you should use something like
> >>> SHA1t (t for truncated) to make sure we can use full hashes once we feel
> >>> it's appropriate.
> >> We could, but I think SHA1 would also be fine since one can infer
> >> from the length of the string that it's been truncated.
> > No, guessing is a bad thing here because it could be truncated because
> > of faulty metadata. But the main motivation is that if you write SHA1
> > everyone reading it expects it to be a full SHA1 hash, which it isn't.
> 
> Well, if the metadata is faulty then the digests are unlikely to
> match and the cache will be discarded anyway as invalid. However, I
> think your point is still somewhat valid, so SHA1t is fine with me
> if that makes more people happy. Does anyone else have a preference
> here?
> 
> > But if your target is to reduce the size of the metadata cache, why
> > store the hashes of the eclasses in the ebuild's metadata and not in a
> > seperate dir? They have to be the same for every ebuild, don't they?
> > In case you have an average number of eclasses which is bigger than 4,
> > you can even store the full hash with less space used than with
> > truncated hashes for all eclasses.
> 
> The problem with having eclass integrity data shared in a separate
> file is that it creates a requirement for all cache entries which
> reference the same eclasses to be consistent with one another. This
> means that a single cache entry can no longer be updated atomically.
> For example, before updating the shared eclass integrity data, you'd
> want to make sure that you first discard all of the cache entries
> which reference it. Although it can be done this way, I think it's
> much more convenient to have all of the integrity data encapsulated
> within each individual cache entry.
Ok, let me see if I get this: Since parts of the content of a
metadata-entry (like the DEPEND/RDEPEND vars) depend on the contents of
the eclass used by the time a cache entry got generated, you want to
store the eclass' hash in the ebuild entry to make sure the entry gets
invalidated once the eclass changes. Is that correct?


-- 
-------------------------------------------------------
Tiziano Müller
Gentoo Linux Developer, Council Member
Areas of responsibility:
  Samba, PostgreSQL, CPP, Python, sysadmin
E-Mail     : dev-zero@gentoo.org
GnuPG FP   : F327 283A E769 2E36 18D5  4DE2 1B05 6A63 AE9C 1E30

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

next prev parent reply	other threads:[~2009-02-08 21:49 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-02 20:34 [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Zac Medico
2009-02-06  9:55 ` [gentoo-dev] " Markus Ullmann
2009-02-07  1:13   ` Zac Medico
2009-02-07 22:31 ` [gentoo-dev] " Tiziano Müller
2009-02-07 23:23   ` Zac Medico
2009-02-08  8:07     ` Tiziano Müller
2009-02-08  8:59       ` Zac Medico
2009-02-08 11:51         ` Tiziano Müller
2009-02-08 20:36           ` Zac Medico
2009-02-08 21:48             ` Tiziano Müller [this message]
2009-02-08 22:14               ` Zac Medico
2009-02-08 22:18     ` Ciaran McCreesh
2009-02-08 22:43       ` Zac Medico
2009-02-08 22:47         ` Ciaran McCreesh
2009-02-08 23:03           ` Zac Medico
2009-02-08 23:10             ` Ciaran McCreesh
2009-02-08 23:27               ` Zac Medico
2009-02-08 23:30                 ` Ciaran McCreesh
2009-02-08 23:40                   ` Zac Medico
2009-02-09 12:30                     ` Petteri Räty
2009-02-09 13:59                       ` Ciaran McCreesh
2009-02-09 14:15                         ` Petteri Räty
2009-02-09 14:18                           ` Ciaran McCreesh
2009-02-09 14:21                           ` Rémi Cardona
2009-02-09 20:19                       ` Zac Medico
2009-02-09 18:43         ` Ciaran McCreesh
2009-02-09 20:02           ` Zac Medico
2009-02-09 15:22     ` Tiziano Müller
2009-02-09 19:55       ` Zac Medico
2009-02-10 12:20         ` Brian Harring
2009-02-10 12:52           ` Nirbheek Chauhan
2009-02-10 20:55           ` Zac Medico
2009-02-11  9:00             ` Brian Harring
2009-02-11 10:01               ` Zac Medico
2009-02-14 13:18                 ` Brian Harring
2009-02-14 20:16                   ` Zac Medico
2009-02-15 22:51     ` Zac Medico
2009-02-15 23:15       ` Ciaran McCreesh
2009-02-15 23:26         ` Zac Medico
2009-02-15 23:30           ` Ciaran McCreesh
2009-02-15 23:56             ` Zac Medico
2009-02-16  0:06               ` Ciaran McCreesh
2009-02-16  0:48                 ` Zac Medico
2009-02-16  0:53                   ` Ciaran McCreesh
2009-02-16  0:54                   ` Zac Medico

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1234129737.18160.191.camel@localhost \
    --to=dev-zero@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox