public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [RFC] Overlays and Metadata Cache
@ 2009-06-20 16:46 Patrick Lauer
  2009-06-20 17:09 ` Fabian Groffen
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Patrick Lauer @ 2009-06-20 16:46 UTC (permalink / raw
  To: gentoo-dev

Hello everybody,

those of us using overlays might have noticed that they can seriously slow 
down dependency calculation. This is mostly because of the lack of a metadata 
cache.
For overlay maintainers providing a metadata cache is quite tricky because to 
be really consistent and useful it'd have to be regenerated after every 
commit.  That's quite easy to forget or get wrong.

So I sat down, brained some thoughts and played around a bit. Here's what I 
came up with:

* server-side each overlay is checked out
* for every overlay in our list:
	- we add it to make.conf explicitly (avoids any spillover effects)
	- we let egencache generate a metadata cache for that repository
* we rsync the repositories with metadata to a different directory

The last step is just there to get rid of all the "unneeded" data like .svn 
directories and can be used to selectively exclude other data that is in the 
repo but not needed for end-users. Plus it reduces inconsistent data when a 
client copies the data while the metadata cache is being generated.

egencache creates the per-repository cache in metadata/cache, so it is nicely 
bundled and won't interfere with anything else.

So now we have all repositories, with metadata, in one place. We can start an 
rsync daemon sharing the parent directory. For users this makes things easier 
- instead of needind cvs, svn, git, darcs, hg, etc. etc. they only need rsync 
(which they already have installed!)

Layman gets easier too - it just needs to understand the rsync protocol and 
select the right directory(s).

The only issue I have found with this idea relates to eclasses - overriding 
in-tree eclasses to be precise. The problem there is that it invalidates in-
tree metadata and potentially affects other overlays too. So that's a bit of a 
bummer, but then I wonder how common that case is.

For performance, the difference is noticeable. As a very rough pointer it 
takes me ~15 minutes for "emerge -puNDv world" with three overlays and no 
metadata cache and about 75 seconds with metadata cache. That's of course a 
"worst case" scenario.

Generating the metadata cache isn't that expensive - it took about 45 minutes 
to initially check out almost everything layman provided and then about an 
hour for the first run. Consecutive runs should be much faster and can be run 
in parallel per overlay (at least in theory). So unless I missed something 
really big really obvious it should be "small enough" to be run every hour or 
even faster.

Advantages are:
- less deps for layman (if it is adapted)
- less complexity client-side
- faster sync performance - especially svn and git transfer way too much, the 
initial checkout of one overlay was >35M data for a few dozen ebuilds
- less load server-side. Rsync is easy to replicate and relatively cheap. 
Popular overlays will appreciate the reduced traffic :)
- faster dependency calculation
and a few I have already forgotten.

Disadvantages are:
- syncing the main tree can invalidate most of the metadata cache (changed 
eclasses etc), so you need to sync the overlays at the same time
- the eclass override situation I mentioned earlier
- slower update time (right now users can checkout immediately after a commit, 
with this indirection it'd be 30min+ delay)

If I don't get distracted I might set up a proof of concept public rsync 
server providing the main repo plus all overlays I can throw in, but it'd have 
a low initial update frequency (6h to daily).

Your thoughts, opinions and other input is appreciated.

Patrick



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-06-21 18:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-20 16:46 [gentoo-dev] [RFC] Overlays and Metadata Cache Patrick Lauer
2009-06-20 17:09 ` Fabian Groffen
2009-06-20 18:16 ` Zac Medico
2009-06-20 18:22 ` Ciaran McCreesh
2009-06-20 18:40   ` Patrick Lauer
2009-06-20 19:00     ` Ciaran McCreesh
2009-06-21  8:43       ` Patrick Lauer
2009-06-21 14:26         ` Ciaran McCreesh
2009-06-21 15:00           ` Patrick Lauer
2009-06-21 15:20             ` Ciaran McCreesh
2009-06-21 18:09             ` Zac Medico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox