From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1LWHWX-0002xU-RJ for garchives@archives.gentoo.org; Sun, 08 Feb 2009 21:49:10 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 77181E03CF; Sun, 8 Feb 2009 21:49:08 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by pigeon.gentoo.org (Postfix) with ESMTP id 36B8DE03CF for ; Sun, 8 Feb 2009 21:49:08 +0000 (UTC) Received: from [192.168.0.100] (173-224.1-85.cust.bluewin.ch [85.1.224.173]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTP id 85603651FF for ; Sun, 8 Feb 2009 21:49:06 +0000 (UTC) Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation From: Tiziano =?ISO-8859-1?Q?M=FCller?= To: gentoo-dev@lists.gentoo.org In-Reply-To: <498F423A.8040604@gentoo.org> References: <498758E6.5080609@gentoo.org> <1234045916.24784.1373.camel@localhost> <498E17E6.8060407@gentoo.org> <1234080464.24784.2517.camel@localhost> <498E9EFE.2030807@gentoo.org> <1234093879.24784.2819.camel@localhost> <498F423A.8040604@gentoo.org> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-+SbFaXhXAFeG2ippu8sC" Organization: Gentoo Date: Sun, 08 Feb 2009 22:48:57 +0100 Message-Id: <1234129737.18160.191.camel@localhost> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org Mime-Version: 1.0 X-Mailer: Evolution 2.24.4 X-Archives-Salt: 10f2385c-4ff7-485d-ac54-1e51bf398f9e X-Archives-Hash: fc7555b034a95182fd6b2681b7e87e48 --=-+SbFaXhXAFeG2ippu8sC Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Am Sonntag, den 08.02.2009, 12:36 -0800 schrieb Zac Medico: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 >=20 > Tiziano M=C3=BCller wrote: > > Am Sonntag, den 08.02.2009, 00:59 -0800 schrieb Zac Medico: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> Tiziano M=C3=BCller wrote: > >>> Am Samstag, den 07.02.2009, 15:23 -0800 schrieb Zac Medico: > >>>> -----BEGIN PGP SIGNED MESSAGE----- > >>>> Hash: SHA1 > >>>> > >>>> Tiziano M=C3=BCller wrote: > >>>>> Am Montag, den 02.02.2009, 12:34 -0800 schrieb Zac Medico: > >>>> I like that idea. That way it's not necessary to bump the EAPI in > >>>> order to change the hash function. So, a typical DIGESTS value might > >>>> look like this: > > You still have to bump the EAPI in case you want to use a new hash not > > already available now (like SHA-3). The advantage of noting the used > > hash is that new PMs can handle old metadata cache. >=20 > That's true. >=20 > >>>> SHA1 02021be38b a28b191904 3992945426 6ec21b29a3 > >>> Sleeping over it again I don't think that truncating a hash is a good > >>> idea (truncating it from 40 to 10 digits makes the possibility of > >>> collisions much much higher). > >> The probability of collision is much higher, but it's still > >> relatively small. Given the "avalanche effect" that is typical of > >> cryptographic hash functions, it's extremely unlikely that collision > >> will occur in such a way that it will cause a problem for cache > >> validation. > > The "avalanche effect" as I understood it is required for a hash > > function to avoid simple calculations of collisions (what the diffusion > > is for crypto algorithms). So, small changes should affect as many > > numbers in the hash as possible. But you don't have only small changes > > here in case somebody patches an eclass, so, the only thing which count= s > > is the probability of a collision. >=20 > Well, the avalanche effect helps in the sense that the leftmost 10 > digits would serve approximately as well as any other 10 digits out > of all of them. But you're right about the probability of a > collision being what really matters. With 10 hex digits, we've got a > space of 16^10 =3D 1.1e12 possible combinations. Given a space that > large, the probability of a collision pretty small. >=20 > >>> But if you want to go this way, I'd say you should use something like > >>> SHA1t (t for truncated) to make sure we can use full hashes once we f= eel > >>> it's appropriate. > >> We could, but I think SHA1 would also be fine since one can infer > >> from the length of the string that it's been truncated. > > No, guessing is a bad thing here because it could be truncated because > > of faulty metadata. But the main motivation is that if you write SHA1 > > everyone reading it expects it to be a full SHA1 hash, which it isn't. >=20 > Well, if the metadata is faulty then the digests are unlikely to > match and the cache will be discarded anyway as invalid. However, I > think your point is still somewhat valid, so SHA1t is fine with me > if that makes more people happy. Does anyone else have a preference > here? >=20 > > But if your target is to reduce the size of the metadata cache, why > > store the hashes of the eclasses in the ebuild's metadata and not in a > > seperate dir? They have to be the same for every ebuild, don't they? > > In case you have an average number of eclasses which is bigger than 4, > > you can even store the full hash with less space used than with > > truncated hashes for all eclasses. >=20 > The problem with having eclass integrity data shared in a separate > file is that it creates a requirement for all cache entries which > reference the same eclasses to be consistent with one another. This > means that a single cache entry can no longer be updated atomically. > For example, before updating the shared eclass integrity data, you'd > want to make sure that you first discard all of the cache entries > which reference it. Although it can be done this way, I think it's > much more convenient to have all of the integrity data encapsulated > within each individual cache entry. Ok, let me see if I get this: Since parts of the content of a metadata-entry (like the DEPEND/RDEPEND vars) depend on the contents of the eclass used by the time a cache entry got generated, you want to store the eclass' hash in the ebuild entry to make sure the entry gets invalidated once the eclass changes. Is that correct? --=20 =EF=BB=BF------------------------------------------------------- Tiziano M=C3=BCller Gentoo Linux Developer, Council Member Areas of responsibility: Samba, PostgreSQL, CPP, Python, sysadmin E-Mail : dev-zero@gentoo.org GnuPG FP : F327 283A E769 2E36 18D5 4DE2 1B05 6A63 AE9C 1E30 --=-+SbFaXhXAFeG2ippu8sC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Dies ist ein digital signierter Nachrichtenteil -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEABECAAYFAkmPU0kACgkQGwVqY66cHjDQqQCcCxkhy0ap0/Kk9ZsJwvOgfa9e uy0An2nq7jUOarD5IkeNq+QYBoWPXvRR =5jqf -----END PGP SIGNATURE----- --=-+SbFaXhXAFeG2ippu8sC--