From: Brian Harring <ferringb@gmail.com>
To: gentoo-dev@lists.gentoo.org
Cc: zmedico@gentoo.org, solar@gentoo.org,
ciaran.mccreesh@googlemail.com, fuzzyray@gentoo.org
Subject: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)
Date: Sun, 25 Oct 2009 18:50:05 -0700 [thread overview]
Message-ID: <20091026015005.GA12250@hrair.hsd1.ca.comcast.net> (raw)
[-- Attachment #1: Type: text/plain, Size: 5156 bytes --]
First of all, feel free to forward this to anyone who is responsible
for code pkged in the tree that access the vdb (/var/db/pkg) in some
fashion.
The proposal is pretty simple; if code modifies the vdb in any
fashion, it needs to update the mtime on a file named
'.modification_time' in the root of the vdb.
For example-
1) ${PACKAGE_MANAGER} fires ups, builds a pkg. it's now ready to
install it.
2) this step isn't strictly required, but is a zero cost safety
measure- prior to modifying the vdb, it updates the timestamp. The
reason for doing this is to protect against the manager blowing up in
some fashion and now updating the timestamp- there still is a window
if the manager breaks down during merging but it's far reduced.
3) manager does it's thing to the livefs, and to the vdb.
4) once finished, again, updates the timestamp.
This isn't an incredibly complex change. What it enables however is
package managers to get serious about optimizing access to the vdb.
For example for the 3 managers:
paludis:
installed-cache currently needs to be manually ran by the user;
specifically, the user is responsible for regenerating this cache if
they use a non paludis manager to modify the VDB. This can be
automated via checking the vdb timestamp against a stored copy of the
the vdb timestamp at the time of the cache generation.
portage:
portage maintains a set of denormalized caches of the vdb- it however
has to do validation of those caches on each access, meaning quite a
few stats. Same thing, can compare timestamp from current vdb to when
it was generated to identify if it is no longer authorative.
pkgcore:
pkgcore maintains a denormalized old style virtuals cache- same thing
w/ portage, it has to do validation (stat'ing) whenever it uses that
cache to ensure the data is accurate. Same thing, can compare
timestamp from current vdb to whenit was generated to identify if it
is no longer authorative.
The existing vdb caching could all be modified to use this timestamp.
One stat in the best (common) case, instead of having to either scan
the whole vdb each time or doing a subset of stats.
This change enables further caching/denormalization of the vdb data
while maintaining the old format- basically, it allows the manager to
build out a helluva lot faster access to the vdb while keeping on
disk compatibility in /var/db/pkg.
Now unfortunately since the vdb is not format versioned in any
fashion, to get this timestamp we have to do the following-
1) nudge everyone who has code poking into the vdb to update their
code to update the timestamp
2) sit on our hands for N months until such time we've deemed
"everyone we care about has upgraded"
3) push out a new release, and start pushing out versions of the
managers/vdb consumers that use this timestamp instead of just
updating it.
For anyone who has been around gentoo for a couple of years, this is a
pretty familiar pattern- eapi, profile changes, etc, all go through
this unfortunately.
That's the core of the proposal; there is a ticket open
( http://bugs.gentoo.org/290428 ) regarding this although there is
some debate from ciaran which I'll try to now summarize, along w/ the
counterarguments.
1) do a new vdb.
Counter: this mechanism provides a way to synchronize the new vdb
while maintaining the old during it's transition period, so this is
needed anyways. Further, pinning all of our optimization hopes on a
new vdb is daft- it's been discussed for 5+ years now and still
hasn't materialized (pkgcore has been able to have a new vdb for
several years, but without a synchronization mechanism it would
require locking users into the new format and locking out old
consumers of the vdb- an unfriendly choice to push on users, hence
never being implemented).
2) code that hasn't been updated to adjust the timestamp, but is still
in use after the transition period will break things.
Counter: nature of any modification of this sort, frankly the gains
outweight the costs of users being rediculously out of date. Not
saying it's perfect, but until someone comes up with a proposal that
versions every PMS component (meaning PMS has to start documenting
the VDB), it's what we have if we wish to move forward in
refactoring.
3) the correct approach is to require users to tell each manager that
changes have occured outside it's purview (run paludis
--regenerate-installed-cache after every time you invoke pmerge or
emerge).
Counter: that's rather unfriendly to users, and isn't what
pkgcore/portage do. Further, it's historically the opposite of the
norm- consider the ebuild cache (we do validation as we go there,
instead of expecting users to do a emerge --regen everytime they
modify an ebuild).
That's roughly the three points raised; there is some minor quibbling
that mtime cannot be trusted, but that's mostly a variation of #2.
Feel free to dig into the bug for exact specifics, or wait for
ciaran's reply to this post.
So... thoughts?
~harring
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
next reply other threads:[~2009-10-26 1:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-26 1:50 Brian Harring [this message]
2009-10-27 18:32 ` [gentoo-dev] Re: adding a modification timestamp to the installed pkgs database (vdb) Zac Medico
2009-10-28 5:11 ` Brian Harring
2010-01-11 22:35 ` [gentoo-dev] " Denis Dupeyron
2010-01-12 10:12 ` Ciaran McCreesh
2010-01-12 23:12 ` Brian Harring
2010-01-17 8:59 ` Ciaran McCreesh
2010-01-17 9:24 ` Tobias Klausmann
2010-01-17 9:46 ` Ciaran McCreesh
2010-01-17 10:48 ` [gentoo-dev] " Christian Faulhammer
2010-01-17 11:09 ` Ciaran McCreesh
2010-01-18 15:42 ` Brian Harring
2010-01-18 16:37 ` Ciaran McCreesh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091026015005.GA12250@hrair.hsd1.ca.comcast.net \
--to=ferringb@gmail.com \
--cc=ciaran.mccreesh@googlemail.com \
--cc=fuzzyray@gentoo.org \
--cc=gentoo-dev@lists.gentoo.org \
--cc=solar@gentoo.org \
--cc=zmedico@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox