From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1LWrc9-0001JL-Cc for garchives@archives.gentoo.org; Tue, 10 Feb 2009 12:21:21 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 5E976E01AC; Tue, 10 Feb 2009 12:21:19 +0000 (UTC) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.170]) by pigeon.gentoo.org (Postfix) with ESMTP id 1C481E01AC for ; Tue, 10 Feb 2009 12:21:19 +0000 (UTC) Received: by wf-out-1314.google.com with SMTP id 29so2524358wff.10 for ; Tue, 10 Feb 2009 04:21:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:date:from:to:subject :message-id:references:mime-version:content-type:content-disposition :in-reply-to:user-agent; bh=yLNpLqp+jM/F1iNaIpm+HM0gJFJOpfdh9K1W5aumLVQ=; b=DMxRnRQkvOWh+wJwfmUhuj5Jppnq/1EyqULWl62GduLN2GG3mY8ZWTUB7V76h6iXPh gvcgQZNP/P9YPkOjGODXI8R/7zbrwXz/YmcTuXHmoxeCIG06A0JcUISwRjgbZ7fZ6I3g hMi70Eg4RIAZs02lVu7W2A2i5vo8Uy5Y01dIk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=qNkd2uEvPIUx1Zc0E0AlrU/mgpNMbf4T+363Cd4lp53+wPZNECJ8lbWrFmCDZMUKgK k070rRBnOzzHUIVLv841yeNtolk/FrHPBCbNF8yhze4NkttU5h27JCNLwib2WrGi89cC 5wa8FJK7+L3yh7O3uNyzmkjEpKEW+skINAYwY= Received: by 10.142.154.14 with SMTP id b14mr750208wfe.69.1234268478652; Tue, 10 Feb 2009 04:21:18 -0800 (PST) Received: from smtp.gmail.com (c-98-210-196-21.hsd1.ca.comcast.net [98.210.196.21]) by mx.google.com with ESMTPS id 22sm12124494wfi.1.2009.02.10.04.21.16 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 10 Feb 2009 04:21:17 -0800 (PST) Received: by smtp.gmail.com (sSMTP sendmail emulation); Tue, 10 Feb 2009 04:20:46 -0800 Date: Tue, 10 Feb 2009 04:20:46 -0800 From: Brian Harring To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Message-ID: <20090210122046.GD4076@hrair> References: <498758E6.5080609@gentoo.org> <1234045916.24784.1373.camel@localhost> <498E17E6.8060407@gentoo.org> <1234192940.18160.1011.camel@localhost> <49908A3D.4050403@gentoo.org> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NklN7DEeGtkPCoo3" Content-Disposition: inline In-Reply-To: <49908A3D.4050403@gentoo.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Archives-Salt: 7239108f-7a59-4280-868e-6cd0ade67100 X-Archives-Hash: a9cc4eb9a000504e31d4dba0e930cc25 --NklN7DEeGtkPCoo3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 09, 2009 at 11:55:41AM -0800, Zac Medico wrote: > All that I can say right now is that I recall questions about it in > the past from overlay maintainers (I don't have a list) and the > funtoo project is the only one which I can name offhand. >=20 > However, the ability to distribute cache via a vcs is only an > ancillary feature which is made possible by the DIGESTS data. The > DIGESTS data is useful regardless of the protocol that is used to > distribute the cache, since it allows the cache to be properly > validated for integrity. So, the real primary reason for introducing > the DIGESTS data is to provide a proper solution for cases like bug > #139134 [1] in which invalid metadata cache goes undetected. I'm sorry, but this proposal smells something awful. Because of the=20 mtime requirement on cache entries you're proposing jamming another=20 1.4MB into the cache for validation purposes (which should be 4x that=20 since a full checksum really should be in there) while trying to=20 maintain compatibility. Frankly, forget compatibility- the current format could stand to die. =20 The repository format is an ever growing mess- leave it as is and=20 work on cutting over to something sane. Overlay maintainers who want the latest/greatest obviously can convert=20 over also; one would hope their would be enough cleanup to make it=20 worth their time. As for the nasty gentoo-x86 compatibility, basically, do the=20 following: 1) maintain the existing cvs repo as is 2) iron out what cleanup/restructuring is desired. glep55 being=20 jammed in here is a potential for example. Nail down the new repo=20 format basically (with an eye for translating the cvs repo to it on=20 the fly). 3) use an eclass index holding the checksums, w/ the cache entries=20 referencing the index numbers rather (sorting the index by=20 consumption, meaning the more ebuilds using it the lower the index):=20 this brings the cache addition down to around 285KB (acceptable imo)=20 while giving full flexibility in the checksums available for eclasses. =20 This is assuming the current flat_list format is still in use in the=20 new repo... 4) drop mtime on cache entries, bump it forward whenever it's updated=20 (bug 139134 goes away) jamming in an ebuild checksum of some sort. 5) rsync nodes are required to have 10GB of storage available- so=20 storage shouldn't be an issue, but ensuring all nodes have been=20 updated to sync both the old and *new* format is required. 6) suffer through cvs for a year (or whatever time frame), converting=20 folks over to the new url. 7) kill the old format after whatever period deemed best (potentially=20 leaving a README telling folks how to update if they're seriously=20 behind). 8) convert the cvs repo to the new format, tear down the=20 transformation bits. Yes, the plan above is coarse- there aren't any glaring holes as far=20 as I can see however. It does place restrictions on the repo format=20 choosen, but careful choices in the new format (heavy format=20 versioning) should make it possible to make this sort of issue less=20 of a pain down the line. At the very least, doing a different repo format for repos/overlays=20 stored in a vcs that doesn't track mtime would solve their issues- it=20 also has the nice benefit of not making the repo more bloated for the=20 99% of folk who didn't even hit the issues spawning this. If gentoo-x86 is left as is, bug 139134 can be head off w/out jamming=20 a new metadata key in; to be clear, I'm likely going to "Special Hell"=20 for suggesting this but if mtime/size on the new cache entry is the=20 same size as old, append a space to the value in the description=20 field. All sane managers ought to be doing basic clean up of that value=20 anyways in their data layer (let alone at the UI level), but it's=20 enough to make rsync behave. So... flame away. ~brian --NklN7DEeGtkPCoo3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkmRcR4ACgkQsiLx3HvNzgf6lACgmqPLGRtXO4YryS8NrcPsivWY kngAnRysThJww8rC2egqL5LM6whip02b =qaOq -----END PGP SIGNATURE----- --NklN7DEeGtkPCoo3--