From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1LWzdl-0007dx-Ed for garchives@archives.gentoo.org; Tue, 10 Feb 2009 20:55:33 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 72417E0132; Tue, 10 Feb 2009 20:55:31 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by pigeon.gentoo.org (Postfix) with ESMTP id 30AA1E0132 for ; Tue, 10 Feb 2009 20:55:31 +0000 (UTC) Received: from [192.168.22.10] (ip68-4-152-120.oc.oc.cox.net [68.4.152.120]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTP id A168066451 for ; Tue, 10 Feb 2009 20:55:30 +0000 (UTC) Message-ID: <4991E9D7.6080706@gentoo.org> Date: Tue, 10 Feb 2009 12:55:51 -0800 From: Zac Medico User-Agent: Thunderbird 2.0.0.19 (X11/20081209) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation References: <498758E6.5080609@gentoo.org> <1234045916.24784.1373.camel@localhost> <498E17E6.8060407@gentoo.org> <1234192940.18160.1011.camel@localhost> <49908A3D.4050403@gentoo.org> <20090210122046.GD4076@hrair> In-Reply-To: <20090210122046.GD4076@hrair> X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: c3dd9581-16a2-45d6-803d-3f731a1b4ab3 X-Archives-Hash: 859e8b011ee5ba30f278e65a9481a2ba -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brian Harring wrote: > On Mon, Feb 09, 2009 at 11:55:41AM -0800, Zac Medico wrote: >> All that I can say right now is that I recall questions about it in >> the past from overlay maintainers (I don't have a list) and the >> funtoo project is the only one which I can name offhand. >> >> However, the ability to distribute cache via a vcs is only an >> ancillary feature which is made possible by the DIGESTS data. The >> DIGESTS data is useful regardless of the protocol that is used to >> distribute the cache, since it allows the cache to be properly >> validated for integrity. So, the real primary reason for introducing >> the DIGESTS data is to provide a proper solution for cases like bug >> #139134 [1] in which invalid metadata cache goes undetected. > > I'm sorry, but this proposal smells something awful. Because of the > mtime requirement on cache entries you're proposing jamming another > 1.4MB into the cache for validation purposes (which should be 4x that > since a full checksum really should be in there) while trying to > maintain compatibility. As I've said before [1], 10 hex digits gives 1.1e12 possible combinations and that's probably sufficient for the given application. > Frankly, forget compatibility- the current format could stand to die. > The repository format is an ever growing mess- leave it as is and > work on cutting over to something sane. Changing the repository layout is a pretty radical thing to do. You're welcome to start a new subject for that if you'd like but I'd prefer to keep the scope of this thread focussed on the cache format for the existing repository layout. > Overlay maintainers who want the latest/greatest obviously can convert > over also; one would hope their would be enough cleanup to make it > worth their time. > > As for the nasty gentoo-x86 compatibility, basically, do the > following: > > 1) maintain the existing cvs repo as is > 2) iron out what cleanup/restructuring is desired. glep55 being > jammed in here is a potential for example. Nail down the new repo > format basically (with an eye for translating the cvs repo to it on > the fly). > 3) use an eclass index holding the checksums, w/ the cache entries > referencing the index numbers rather (sorting the index by > consumption, meaning the more ebuilds using it the lower the index): > this brings the cache addition down to around 285KB (acceptable imo) > while giving full flexibility in the checksums available for eclasses. > This is assuming the current flat_list format is still in use in the > new repo... As previously discussed [2], having shared integrity data (as you suggest) has implications in terms of reduced simplicity and robustness. My intention is for the cache format to be both simple and robust. It may require some extra space in order to achieve these goals, but I think it's well worth it. When accessing a given cache entry, it's very important that the package manager be able to reliably validate it's integrity (given that the package manager has no control over the implementation details of the cache generation infrastructure), and I believe that the proposed DIGESTS data will solve this problem in a simple and robust manner. [1] http://archives.gentoo.org/gentoo-dev/msg_d92eddd796dcc7b9272cc8b8a5a9ca18.xml [2] http://archives.gentoo.org/gentoo-dev/msg_94a65c9f395706a112ec903b611aad0e.xml - -- Thanks, Zac -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmR6dYACgkQ/ejvha5XGaNlkwCeLA+roi+zg392R4HsWIuXIGrK nw4AoNztwEEioDDqPkVTv3pFKRrYUXKv =TRW8 -----END PGP SIGNATURE-----