From: Denis Dupeyron <calchan@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Cc: ferringb@gmail.com
Subject: Re: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)
Date: Mon, 11 Jan 2010 15:35:51 -0700 [thread overview]
Message-ID: <7c612fc61001111435s5caaf80dx4629d81447ab52e8@mail.gmail.com> (raw)
In-Reply-To: <20091026015005.GA12250@hrair.hsd1.ca.comcast.net>
Brian,
On Sun, Oct 25, 2009 at 6:50 PM, Brian Harring <ferringb@gmail.com> wrote:
> The proposal is pretty simple; if code modifies the vdb in any
> fashion, it needs to update the mtime on a file named
> '.modification_time' in the root of the vdb.
>
> For example-
>
> 1) ${PACKAGE_MANAGER} fires ups, builds a pkg. it's now ready to
> install it.
> 2) this step isn't strictly required, but is a zero cost safety
> measure- prior to modifying the vdb, it updates the timestamp. The
> reason for doing this is to protect against the manager blowing up in
> some fashion and now updating the timestamp- there still is a window
> if the manager breaks down during merging but it's far reduced.
> 3) manager does it's thing to the livefs, and to the vdb.
> 4) once finished, again, updates the timestamp.
>
> This isn't an incredibly complex change. What it enables however is
> package managers to get serious about optimizing access to the vdb.
> For example for the 3 managers:
>
> paludis:
> installed-cache currently needs to be manually ran by the user;
> specifically, the user is responsible for regenerating this cache if
> they use a non paludis manager to modify the VDB. This can be
> automated via checking the vdb timestamp against a stored copy of the
> the vdb timestamp at the time of the cache generation.
>
> portage:
> portage maintains a set of denormalized caches of the vdb- it however
> has to do validation of those caches on each access, meaning quite a
> few stats. Same thing, can compare timestamp from current vdb to when
> it was generated to identify if it is no longer authorative.
>
> pkgcore:
> pkgcore maintains a denormalized old style virtuals cache- same thing
> w/ portage, it has to do validation (stat'ing) whenever it uses that
> cache to ensure the data is accurate. Same thing, can compare
> timestamp from current vdb to whenit was generated to identify if it
> is no longer authorative.
>
> The existing vdb caching could all be modified to use this timestamp.
> One stat in the best (common) case, instead of having to either scan
> the whole vdb each time or doing a subset of stats.
>
> This change enables further caching/denormalization of the vdb data
> while maintaining the old format- basically, it allows the manager to
> build out a helluva lot faster access to the vdb while keeping on
> disk compatibility in /var/db/pkg.
>
>
> Now unfortunately since the vdb is not format versioned in any
> fashion, to get this timestamp we have to do the following-
>
> 1) nudge everyone who has code poking into the vdb to update their
> code to update the timestamp
> 2) sit on our hands for N months until such time we've deemed
> "everyone we care about has upgraded"
> 3) push out a new release, and start pushing out versions of the
> managers/vdb consumers that use this timestamp instead of just
> updating it.
>
> For anyone who has been around gentoo for a couple of years, this is a
> pretty familiar pattern- eapi, profile changes, etc, all go through
> this unfortunately.
>
>
> That's the core of the proposal; there is a ticket open
> ( http://bugs.gentoo.org/290428 ) regarding this although there is
> some debate from ciaran which I'll try to now summarize, along w/ the
> counterarguments.
>
> 1) do a new vdb.
> Counter: this mechanism provides a way to synchronize the new vdb
> while maintaining the old during it's transition period, so this is
> needed anyways. Further, pinning all of our optimization hopes on a
> new vdb is daft- it's been discussed for 5+ years now and still
> hasn't materialized (pkgcore has been able to have a new vdb for
> several years, but without a synchronization mechanism it would
> require locking users into the new format and locking out old
> consumers of the vdb- an unfriendly choice to push on users, hence
> never being implemented).
>
> 2) code that hasn't been updated to adjust the timestamp, but is still
> in use after the transition period will break things.
> Counter: nature of any modification of this sort, frankly the gains
> outweight the costs of users being rediculously out of date. Not
> saying it's perfect, but until someone comes up with a proposal that
> versions every PMS component (meaning PMS has to start documenting
> the VDB), it's what we have if we wish to move forward in
> refactoring.
>
> 3) the correct approach is to require users to tell each manager that
> changes have occured outside it's purview (run paludis
> --regenerate-installed-cache after every time you invoke pmerge or
> emerge).
> Counter: that's rather unfriendly to users, and isn't what
> pkgcore/portage do. Further, it's historically the opposite of the
> norm- consider the ebuild cache (we do validation as we go there,
> instead of expecting users to do a emerge --regen everytime they
> modify an ebuild).
>
>
> That's roughly the three points raised; there is some minor quibbling
> that mtime cannot be trusted, but that's mostly a variation of #2.
This looks to me like a good idea. I see some of it at least has been
implemented in portage and I would suspect in pkgcore too. However
it's not obvious to me that all the code is ready, and I don't see any
real specs, docs, etc... You're a seasoned slacker^Wdeveloper so you
know the drill. I will add this as a topic for the open floor
discussion for january but don't expect us to vote on it before we
have all of the above. Now, it might be that this whole thing is held
back by a more philosophical question in which case feel free to
propose it for addition to the (preferably february) agenda.
I'm a bit surprised by the low amount of discussions this topic has
generated. I know there is a bug about this and that there was some
action there, but still. I think that getting the above material ready
(specs, doc, PMS?, whatever) has a good chance of triggering
additional discussions.
Feel free to contact me in case you need help.
Denis.
next prev parent reply other threads:[~2010-01-12 0:12 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-26 1:50 [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb) Brian Harring
2009-10-27 18:32 ` [gentoo-dev] " Zac Medico
2009-10-28 5:11 ` Brian Harring
2010-01-11 22:35 ` Denis Dupeyron [this message]
2010-01-12 10:12 ` [gentoo-dev] " Ciaran McCreesh
2010-01-12 23:12 ` Brian Harring
2010-01-17 8:59 ` Ciaran McCreesh
2010-01-17 9:24 ` Tobias Klausmann
2010-01-17 9:46 ` Ciaran McCreesh
2010-01-17 10:48 ` [gentoo-dev] " Christian Faulhammer
2010-01-17 11:09 ` Ciaran McCreesh
2010-01-18 15:42 ` Brian Harring
2010-01-18 16:37 ` Ciaran McCreesh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7c612fc61001111435s5caaf80dx4629d81447ab52e8@mail.gmail.com \
--to=calchan@gentoo.org \
--cc=ferringb@gmail.com \
--cc=gentoo-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox