public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [RFC] Overlays and Metadata Cache
@ 2009-06-20 16:46 Patrick Lauer
  2009-06-20 17:09 ` Fabian Groffen
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Patrick Lauer @ 2009-06-20 16:46 UTC (permalink / raw
  To: gentoo-dev

Hello everybody,

those of us using overlays might have noticed that they can seriously slow 
down dependency calculation. This is mostly because of the lack of a metadata 
cache.
For overlay maintainers providing a metadata cache is quite tricky because to 
be really consistent and useful it'd have to be regenerated after every 
commit.  That's quite easy to forget or get wrong.

So I sat down, brained some thoughts and played around a bit. Here's what I 
came up with:

* server-side each overlay is checked out
* for every overlay in our list:
	- we add it to make.conf explicitly (avoids any spillover effects)
	- we let egencache generate a metadata cache for that repository
* we rsync the repositories with metadata to a different directory

The last step is just there to get rid of all the "unneeded" data like .svn 
directories and can be used to selectively exclude other data that is in the 
repo but not needed for end-users. Plus it reduces inconsistent data when a 
client copies the data while the metadata cache is being generated.

egencache creates the per-repository cache in metadata/cache, so it is nicely 
bundled and won't interfere with anything else.

So now we have all repositories, with metadata, in one place. We can start an 
rsync daemon sharing the parent directory. For users this makes things easier 
- instead of needind cvs, svn, git, darcs, hg, etc. etc. they only need rsync 
(which they already have installed!)

Layman gets easier too - it just needs to understand the rsync protocol and 
select the right directory(s).

The only issue I have found with this idea relates to eclasses - overriding 
in-tree eclasses to be precise. The problem there is that it invalidates in-
tree metadata and potentially affects other overlays too. So that's a bit of a 
bummer, but then I wonder how common that case is.

For performance, the difference is noticeable. As a very rough pointer it 
takes me ~15 minutes for "emerge -puNDv world" with three overlays and no 
metadata cache and about 75 seconds with metadata cache. That's of course a 
"worst case" scenario.

Generating the metadata cache isn't that expensive - it took about 45 minutes 
to initially check out almost everything layman provided and then about an 
hour for the first run. Consecutive runs should be much faster and can be run 
in parallel per overlay (at least in theory). So unless I missed something 
really big really obvious it should be "small enough" to be run every hour or 
even faster.

Advantages are:
- less deps for layman (if it is adapted)
- less complexity client-side
- faster sync performance - especially svn and git transfer way too much, the 
initial checkout of one overlay was >35M data for a few dozen ebuilds
- less load server-side. Rsync is easy to replicate and relatively cheap. 
Popular overlays will appreciate the reduced traffic :)
- faster dependency calculation
and a few I have already forgotten.

Disadvantages are:
- syncing the main tree can invalidate most of the metadata cache (changed 
eclasses etc), so you need to sync the overlays at the same time
- the eclass override situation I mentioned earlier
- slower update time (right now users can checkout immediately after a commit, 
with this indirection it'd be 30min+ delay)

If I don't get distracted I might set up a proof of concept public rsync 
server providing the main repo plus all overlays I can throw in, but it'd have 
a low initial update frequency (6h to daily).

Your thoughts, opinions and other input is appreciated.

Patrick



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-20 16:46 [gentoo-dev] [RFC] Overlays and Metadata Cache Patrick Lauer
@ 2009-06-20 17:09 ` Fabian Groffen
  2009-06-20 18:16 ` Zac Medico
  2009-06-20 18:22 ` Ciaran McCreesh
  2 siblings, 0 replies; 11+ messages in thread
From: Fabian Groffen @ 2009-06-20 17:09 UTC (permalink / raw
  To: gentoo-dev

Just a FYI

On 20-06-2009 18:46:33 +0200, Patrick Lauer wrote:
> If I don't get distracted I might set up a proof of concept public
> rsync server providing the main repo plus all overlays I can throw in,
> but it'd have a low initial update frequency (6h to daily).

Note that the Prefix rsync tree is generated sort of like you described,
doing some extra voodoo of inserting news and glsas.  An update takes
about 2 minutes, most time spent in running cvs update and svn update. 


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-20 16:46 [gentoo-dev] [RFC] Overlays and Metadata Cache Patrick Lauer
  2009-06-20 17:09 ` Fabian Groffen
@ 2009-06-20 18:16 ` Zac Medico
  2009-06-20 18:22 ` Ciaran McCreesh
  2 siblings, 0 replies; 11+ messages in thread
From: Zac Medico @ 2009-06-20 18:16 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Patrick Lauer wrote:
> The only issue I have found with this idea relates to eclasses - overriding 
> in-tree eclasses to be precise. The problem there is that it invalidates in-
> tree metadata and potentially affects other overlays too. So that's a bit of a 
> bummer, but then I wonder how common that case is.

It seems like it should only be a problem for people who use
eclass-overrides in /etc/portage/repos.conf [1] (this is not
default). People who do that are on their own anyway, because that's
what triggers bug #124041 [2].

In the absence of eclass-overrides in /etc/portage/repos.conf,
everything should be fine. Any eclasses that are intended to be
shared between repos can be configured by those repos via
layout.conf [3]. This allows for consistent distribution of metadata
cache, which also allows for consistent repoman results as discussed
in the "QA Overlay Layout support" thread [4].

[1]
http://dev.gentoo.org/~zmedico/portage/doc/man/portage.5.html#repos.conf
[2] http://bugs.gentoo.org/show_bug.cgi?id=124041
[3] http://blogs.gentoo.org/zmedico/2009/04/20/overlay_layout_conf
[4]
http://archives.gentoo.org/gentoo-dev/msg_33c61550b4ed2b7b25dd5a4110e1ec81.xml

- --
Thanks,
Zac
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAko9J2YACgkQ/ejvha5XGaPIaQCgq4fCUtdsusIMEjtS6XbXYPzb
ZKoAn3SWop6OFLJQNm+9ZOcwyLM9dehE
=hqgh
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-20 16:46 [gentoo-dev] [RFC] Overlays and Metadata Cache Patrick Lauer
  2009-06-20 17:09 ` Fabian Groffen
  2009-06-20 18:16 ` Zac Medico
@ 2009-06-20 18:22 ` Ciaran McCreesh
  2009-06-20 18:40   ` Patrick Lauer
  2 siblings, 1 reply; 11+ messages in thread
From: Ciaran McCreesh @ 2009-06-20 18:22 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

On Sat, 20 Jun 2009 18:46:33 +0200
Patrick Lauer <patrick@gentoo.org> wrote:
> Generating the metadata cache isn't that expensive - it took about 45
> minutes to initially check out almost everything layman provided and
> then about an hour for the first run. Consecutive runs should be much
> faster and can be run in parallel per overlay (at least in theory).
> So unless I missed something really big really obvious it should be
> "small enough" to be run every hour or even faster.

Have you thought about the security implications of this? How much do
you trust the people running the overlays listed in layman?

-- 
Ciaran McCreesh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-20 18:22 ` Ciaran McCreesh
@ 2009-06-20 18:40   ` Patrick Lauer
  2009-06-20 19:00     ` Ciaran McCreesh
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Lauer @ 2009-06-20 18:40 UTC (permalink / raw
  To: gentoo-dev

On Saturday 20 June 2009 20:22:22 Ciaran McCreesh wrote:
> On Sat, 20 Jun 2009 18:46:33 +0200
>
> Patrick Lauer <patrick@gentoo.org> wrote:
> > Generating the metadata cache isn't that expensive - it took about 45
> > minutes to initially check out almost everything layman provided and
> > then about an hour for the first run. Consecutive runs should be much
> > faster and can be run in parallel per overlay (at least in theory).
> > So unless I missed something really big really obvious it should be
> > "small enough" to be run every hour or even faster.
>
> Have you thought about the security implications of this?
Yes.
> How much do you trust the people running the overlays listed in layman?
VirtualBox. 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-20 18:40   ` Patrick Lauer
@ 2009-06-20 19:00     ` Ciaran McCreesh
  2009-06-21  8:43       ` Patrick Lauer
  0 siblings, 1 reply; 11+ messages in thread
From: Ciaran McCreesh @ 2009-06-20 19:00 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Sat, 20 Jun 2009 20:40:17 +0200
Patrick Lauer <patrick@gentoo.org> wrote:
> > Have you thought about the security implications of this?
> Yes.
>
> > How much do you trust the people running the overlays listed in
> > layman?
>
> VirtualBox. 

And how do you use VirtualBox to prevent one malicious person from
running arbitrary code on the system of anyone using any layman overlay?

-- 
Ciaran McCreesh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-20 19:00     ` Ciaran McCreesh
@ 2009-06-21  8:43       ` Patrick Lauer
  2009-06-21 14:26         ` Ciaran McCreesh
  0 siblings, 1 reply; 11+ messages in thread
From: Patrick Lauer @ 2009-06-21  8:43 UTC (permalink / raw
  To: gentoo-dev

On Saturday 20 June 2009 21:00:46 Ciaran McCreesh wrote:
> On Sat, 20 Jun 2009 20:40:17 +0200
>
> Patrick Lauer <patrick@gentoo.org> wrote:
> > > Have you thought about the security implications of this?
> >
> > Yes.
> >
> > > How much do you trust the people running the overlays listed in
> > > layman?
> >
> > VirtualBox.
>
> And how do you use VirtualBox to prevent one malicious person from
> running arbitrary code on the system of anyone using any layman overlay?

Ah. I thought you were referring to the issues involved in sourcing ebuilds. 

But as you shift the discussion now ... well ... right now we allow almost 
everyone to add an overlay to the layman config. So we trust overlay 
maintainers not to screw users.

The metadata cache is "inert" in the sense that it isn't executable code (and 
if anyone tries to execute it ... "You're doing it wrong" comes to mind"), so 
adding it does not pessimize the situation.

So how do we guarantee that overlay maintainers (many who aren't even devs and 
thus might not be subjectively held to the same standards) don't screw users?

Hmm. I can't think of any sane way to prevent people from writing bad ebuilds. 
And I also can't think of a reliable method to detect such or prevent users 
from trying to use them. In short, we just have to trust people.
As a sidenote, we just randomly trust devs too. And it usually works ...



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-21  8:43       ` Patrick Lauer
@ 2009-06-21 14:26         ` Ciaran McCreesh
  2009-06-21 15:00           ` Patrick Lauer
  0 siblings, 1 reply; 11+ messages in thread
From: Ciaran McCreesh @ 2009-06-21 14:26 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]

On Sun, 21 Jun 2009 10:43:27 +0200
Patrick Lauer <patrick@gentoo.org> wrote:
> > > > How much do you trust the people running the overlays listed in
> > > > layman?
> > >
> > > VirtualBox.
> >
> > And how do you use VirtualBox to prevent one malicious person from
> > running arbitrary code on the system of anyone using any layman
> > overlay?
> 
> Ah. I thought you were referring to the issues involved in sourcing
> ebuilds. 

I am.

> But as you shift the discussion now ... well ... right now we allow
> almost everyone to add an overlay to the layman config. So we trust
> overlay maintainers not to screw users.
> 
> The metadata cache is "inert" in the sense that it isn't executable
> code (and if anyone tries to execute it ... "You're doing it wrong"
> comes to mind"), so adding it does not pessimize the situation.

But generating that cache means running code, and one of the things
that code could do is modify every overlay distributed by the box in
question such that anyone using any of those overlays will run
arbitrary code whenever they do emerge -p world.

> Hmm. I can't think of any sane way to prevent people from writing bad
> ebuilds. And I also can't think of a reliable method to detect such
> or prevent users from trying to use them. In short, we just have to
> trust people. As a sidenote, we just randomly trust devs too. And it
> usually works ...

There's a big difference between the levels of verification done for
developers and that which is done for overlay maintainers. Currently,
any overlay maintainer can root any box on which their overlay is used
(whether or not anything from that overlay is installed). You're
escalating this to any layman-listed overlay maintainer being able to
root any box using any layman-listed overlay.

-- 
Ciaran McCreesh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-21 14:26         ` Ciaran McCreesh
@ 2009-06-21 15:00           ` Patrick Lauer
  2009-06-21 15:20             ` Ciaran McCreesh
  2009-06-21 18:09             ` Zac Medico
  0 siblings, 2 replies; 11+ messages in thread
From: Patrick Lauer @ 2009-06-21 15:00 UTC (permalink / raw
  To: gentoo-dev


> > The metadata cache is "inert" in the sense that it isn't executable
> > code (and if anyone tries to execute it ... "You're doing it wrong"
> > comes to mind"), so adding it does not pessimize the situation.
>
> But generating that cache means running code, and one of the things
> that code could do is modify every overlay distributed by the box in
> question such that anyone using any of those overlays will run
> arbitrary code whenever they do emerge -p world.

Good, this means we have to isolate it so that only each overlay itself exists 
in an environment that generates the metadata cache. A bit bothersome, but 
nothing more than adding a line or two to the script(s) that drive(s) this 
process.

> > Hmm. I can't think of any sane way to prevent people from writing bad
> > ebuilds. And I also can't think of a reliable method to detect such
> > or prevent users from trying to use them. In short, we just have to
> > trust people. As a sidenote, we just randomly trust devs too. And it
> > usually works ...
>
> There's a big difference between the levels of verification done for
> developers and that which is done for overlay maintainers. Currently,
> any overlay maintainer can root any box on which their overlay is used
> (whether or not anything from that overlay is installed). You're
> escalating this to any layman-listed overlay maintainer being able to
> root any box using any layman-listed overlay.

Right, that would be silly. So ... we can restrict the whole concept to 
official overlays if we want (trust and all that), and we can keep separate 
environments per overlay to avoid cross-contamination. Which keeps us about as 
exposed as the status quo, but we make updates and dep calculation faster (at 
least for those overlays that are in a sane condition).

Y'know, what would be even more fun than this pingpong would be a consistent 
counterproposal from you. Point out all the issues at once instead of one per 
email and all that. Like this we're at the third or fourth iteration, people 
(including me) get bored with the whole thread and some half-baked thing gets 
implemented because only three people manage to read the mail thread to its 
end ... 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-21 15:00           ` Patrick Lauer
@ 2009-06-21 15:20             ` Ciaran McCreesh
  2009-06-21 18:09             ` Zac Medico
  1 sibling, 0 replies; 11+ messages in thread
From: Ciaran McCreesh @ 2009-06-21 15:20 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3450 bytes --]

On Sun, 21 Jun 2009 17:00:01 +0200
Patrick Lauer <patrick@gentoo.org> wrote:
> > But generating that cache means running code, and one of the things
> > that code could do is modify every overlay distributed by the box in
> > question such that anyone using any of those overlays will run
> > arbitrary code whenever they do emerge -p world.
> 
> Good, this means we have to isolate it so that only each overlay
> itself exists in an environment that generates the metadata cache. A
> bit bothersome, but nothing more than adding a line or two to the
> script(s) that drive(s) this process.

So your process would be to clone a new VM for every overlay, add the
overlay into that VM, do the generation and then grab the cache out of
the VM? Ouch.

> > There's a big difference between the levels of verification done for
> > developers and that which is done for overlay maintainers.
> > Currently, any overlay maintainer can root any box on which their
> > overlay is used (whether or not anything from that overlay is
> > installed). You're escalating this to any layman-listed overlay
> > maintainer being able to root any box using any layman-listed
> > overlay.
> 
> Right, that would be silly. So ... we can restrict the whole concept
> to official overlays if we want (trust and all that), and we can keep
> separate environments per overlay to avoid cross-contamination. Which
> keeps us about as exposed as the status quo, but we make updates and
> dep calculation faster (at least for those overlays that are in a
> sane condition).

That's getting towards a more reasonable proposal. Although then if you
decide you trust official overlays, the cross-contamination thing only
needs to protect against accidental screw-ups, so you don't need the
whole VM mess at all.

> Y'know, what would be even more fun than this pingpong would be a
> consistent counterproposal from you. Point out all the issues at once
> instead of one per email and all that.

I don't necessarily have a counterproposal. I don't think it's a bad
idea in principle, I just don't think you've thought through the
security implications. Once you do that, I don't see a way of doing this
sensibly for overlays you don't absolutely totally trust.

> Like this we're at the third or fourth iteration, people (including
> me) get bored with the whole thread and some half-baked thing gets
> implemented because only three people manage to read the mail thread
> to its end ... 

You end up with half-baked proposals if you jump straight into doing
something without repeatedly going over an idea until all the kinks are
worked out. Some of the flaws in the proposal aren't immediately
obvious and only come out after discussion.

I realise you believe I'm perfect and all-knowing, and can instantly
spot every single way your idea is flawed and immediately come up with
a perfect alternative, and I don't blame you for that belief. However,
a viable solution to this when untrusted overlays are involved doesn't
immediately spring to mind, and if such a solution exists, experience
suggests it'll only come about through possibly lengthy discussion, so
I'd rather not prejudice you with my preconceptions. If you're not
prepared to spend time discussing this, you'll definitely end up with
something that's either highly limited in scope or that opens up a
whole new avenue of abuse.

-- 
Ciaran McCreesh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-dev] [RFC] Overlays and Metadata Cache
  2009-06-21 15:00           ` Patrick Lauer
  2009-06-21 15:20             ` Ciaran McCreesh
@ 2009-06-21 18:09             ` Zac Medico
  1 sibling, 0 replies; 11+ messages in thread
From: Zac Medico @ 2009-06-21 18:09 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Patrick Lauer wrote:
>>> The metadata cache is "inert" in the sense that it isn't executable
>>> code (and if anyone tries to execute it ... "You're doing it wrong"
>>> comes to mind"), so adding it does not pessimize the situation.
>> But generating that cache means running code, and one of the things
>> that code could do is modify every overlay distributed by the box in
>> question such that anyone using any of those overlays will run
>> arbitrary code whenever they do emerge -p world.
> 
> Good, this means we have to isolate it so that only each overlay itself exists 
> in an environment that generates the metadata cache. A bit bothersome, but 
> nothing more than adding a line or two to the script(s) that drive(s) this 
> process.

If you generate a user with a separate uid for each overlay then
that will probably be provide a sufficient level of privilege isolation.
- --
Thanks,
Zac
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAko+d2MACgkQ/ejvha5XGaPzJQCeIg2d8MVhJTyhZWKCQGtZnY3V
Dk8An0f8WnJL/lb7iJZzlB+hxQDfNLTG
=pXrm
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-06-21 18:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-20 16:46 [gentoo-dev] [RFC] Overlays and Metadata Cache Patrick Lauer
2009-06-20 17:09 ` Fabian Groffen
2009-06-20 18:16 ` Zac Medico
2009-06-20 18:22 ` Ciaran McCreesh
2009-06-20 18:40   ` Patrick Lauer
2009-06-20 19:00     ` Ciaran McCreesh
2009-06-21  8:43       ` Patrick Lauer
2009-06-21 14:26         ` Ciaran McCreesh
2009-06-21 15:00           ` Patrick Lauer
2009-06-21 15:20             ` Ciaran McCreesh
2009-06-21 18:09             ` Zac Medico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox