From: Brian Harring <ferringb@gmail.com>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation
Date: Wed, 11 Feb 2009 01:00:41 -0800 [thread overview]
Message-ID: <20090211090041.GA3680@hrair.corp.631h.metaweb.com> (raw)
In-Reply-To: <4991E9D7.6080706@gentoo.org>
[-- Attachment #1: Type: text/plain, Size: 6403 bytes --]
On Tue, Feb 10, 2009 at 12:55:51PM -0800, Zac Medico wrote:
> Brian Harring wrote:
> > On Mon, Feb 09, 2009 at 11:55:41AM -0800, Zac Medico wrote:
> >> All that I can say right now is that I recall questions about it in
> >> the past from overlay maintainers (I don't have a list) and the
> >> funtoo project is the only one which I can name offhand.
> >>
> >> However, the ability to distribute cache via a vcs is only an
> >> ancillary feature which is made possible by the DIGESTS data. The
> >> DIGESTS data is useful regardless of the protocol that is used to
> >> distribute the cache, since it allows the cache to be properly
> >> validated for integrity. So, the real primary reason for introducing
> >> the DIGESTS data is to provide a proper solution for cases like bug
> >> #139134 [1] in which invalid metadata cache goes undetected.
> >
> > I'm sorry, but this proposal smells something awful. Because of the
> > mtime requirement on cache entries you're proposing jamming another
> > 1.4MB into the cache for validation purposes (which should be 4x that
> > since a full checksum really should be in there) while trying to
> > maintain compatibility.
>
> As I've said before [1], 10 hex digits gives 1.1e12 possible
> combinations and that's probably sufficient for the given application.
And as I said before, I don't agree with you on it (repeating it over
and over isn't going to convince the other side either).
The 1.4MB is more the concern then arguments over avalanche I might
add.
> > Frankly, forget compatibility- the current format could stand to die.
> > The repository format is an ever growing mess- leave it as is and
> > work on cutting over to something sane.
>
> Changing the repository layout is a pretty radical thing to do.
> You're welcome to start a new subject for that if you'd like but I'd
> prefer to keep the scope of this thread focussed on the cache format
> for the existing repository layout.
Vacuous arguement via focusing on the 'layout' part rather then the
repository whole I implied; you're stating that one should not
discuss changing the repository standard/spec while arguing that
repealing the requirement that cache mtime entries match ebuild
mtime (part of the repository spec) should be the point of discussion.
The daft thing about this is that w/ effectively atomic sync (if the
sync fails then mark the repo as screwed up till a sync completes),
the current cache format can *still* do validation- no clue if
paludis has it, but at least pkgcore and portage can handle this via
awareness of the eclass stacking.
So for git vcses bundling metadata (a bad idea anyways to be storing
generated content in the mainline vcs), your proposal allows them to
use a cache. For every other distribution mechanism that works fine,
they wind up paying the cost for that corner case. The 80 pays for
the 20 isn't the normal form of the 80/20 rule ;)
Note that proper PM implementations *still* have to set the cache
entries mtime for backwards compatibility w/ older PMs that don't
support this new unversioned change thus muddying the implementation
even further.
I reiterate, this belongs in a seperate repository format, along w/
the rest of the unversioned repository changes you've been pushing in
(profile package.mask breaking all non portage PMs is a perfect
example).
> > Overlay maintainers who want the latest/greatest obviously can convert
> > over also; one would hope their would be enough cleanup to make it
> > worth their time.
> >
> > As for the nasty gentoo-x86 compatibility, basically, do the
> > following:
> >
> > 1) maintain the existing cvs repo as is
> > 2) iron out what cleanup/restructuring is desired. glep55 being
> > jammed in here is a potential for example. Nail down the new repo
> > format basically (with an eye for translating the cvs repo to it on
> > the fly).
> > 3) use an eclass index holding the checksums, w/ the cache entries
> > referencing the index numbers rather (sorting the index by
> > consumption, meaning the more ebuilds using it the lower the index):
> > this brings the cache addition down to around 285KB (acceptable imo)
> > while giving full flexibility in the checksums available for eclasses.
> > This is assuming the current flat_list format is still in use in the
> > new repo...
>
> As previously discussed [2], having shared integrity data (as you
> suggest) has implications in terms of reduced simplicity and robustness.
The complexity arguement is a white elephant. Rsync is the sole
transport that has atomicity issues; the rest don't (when you check
out from vcs, you get an exact rev effectively). Rsync generation
ought to be preparing the new snapshot then swapping it in, and if I
recall correctly that's exactly what osprey does now (or whatever node
y'all are using for generating gentoo-x86 these days).
The point there is that there are specific steps taken preparing the
repo- those steps already ensure the snapshot/rev is complete prior to
being available so there isn't real potential of catching it mid
update. Via that existing machinery, a shared index is *no issue*-
the one spot it rears it's head is during a failed/partial sync (the
repo should not be using in such a state since stale cache is the
least concern at that point).
From where I'm sitting, the changes you've either slipped in, or want
to slip in to the repository format are completely ignoring the past
bad history of breakage doing such things, and ignoring why EAPI
exists these days.
The reason a new format is realistically needed here is that existing
PMS compliant implementations will wind up accessing the repo and
behaving as if everything is fine and dandy without knowing the rules
used to read no longer match the rules used to generate the repo.
This is a no go, same reason EAPI awareness had to sit for a long ass
time to ensure EAPI=1 would be properly masked by eapi aware PMs.
Either way for changes like this (or package.mask as a directory since
I'm still annoyed by that) the repo needs to be marked in some way, a
versioned format specifically, so that when stuff like this is added
the PM can handle it gracefully instead of doing the wrong thing.
~brian
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
next prev parent reply other threads:[~2009-02-11 9:01 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-02 20:34 [gentoo-dev] [RFC] DIGESTS metadata variable for cache validation Zac Medico
2009-02-06 9:55 ` [gentoo-dev] " Markus Ullmann
2009-02-07 1:13 ` Zac Medico
2009-02-07 22:31 ` [gentoo-dev] " Tiziano Müller
2009-02-07 23:23 ` Zac Medico
2009-02-08 8:07 ` Tiziano Müller
2009-02-08 8:59 ` Zac Medico
2009-02-08 11:51 ` Tiziano Müller
2009-02-08 20:36 ` Zac Medico
2009-02-08 21:48 ` Tiziano Müller
2009-02-08 22:14 ` Zac Medico
2009-02-08 22:18 ` Ciaran McCreesh
2009-02-08 22:43 ` Zac Medico
2009-02-08 22:47 ` Ciaran McCreesh
2009-02-08 23:03 ` Zac Medico
2009-02-08 23:10 ` Ciaran McCreesh
2009-02-08 23:27 ` Zac Medico
2009-02-08 23:30 ` Ciaran McCreesh
2009-02-08 23:40 ` Zac Medico
2009-02-09 12:30 ` Petteri Räty
2009-02-09 13:59 ` Ciaran McCreesh
2009-02-09 14:15 ` Petteri Räty
2009-02-09 14:18 ` Ciaran McCreesh
2009-02-09 14:21 ` Rémi Cardona
2009-02-09 20:19 ` Zac Medico
2009-02-09 18:43 ` Ciaran McCreesh
2009-02-09 20:02 ` Zac Medico
2009-02-09 15:22 ` Tiziano Müller
2009-02-09 19:55 ` Zac Medico
2009-02-10 12:20 ` Brian Harring
2009-02-10 12:52 ` Nirbheek Chauhan
2009-02-10 20:55 ` Zac Medico
2009-02-11 9:00 ` Brian Harring [this message]
2009-02-11 10:01 ` Zac Medico
2009-02-14 13:18 ` Brian Harring
2009-02-14 20:16 ` Zac Medico
2009-02-15 22:51 ` Zac Medico
2009-02-15 23:15 ` Ciaran McCreesh
2009-02-15 23:26 ` Zac Medico
2009-02-15 23:30 ` Ciaran McCreesh
2009-02-15 23:56 ` Zac Medico
2009-02-16 0:06 ` Ciaran McCreesh
2009-02-16 0:48 ` Zac Medico
2009-02-16 0:53 ` Ciaran McCreesh
2009-02-16 0:54 ` Zac Medico
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090211090041.GA3680@hrair.corp.631h.metaweb.com \
--to=ferringb@gmail.com \
--cc=gentoo-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox