public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
@ 2012-07-26 18:26 Rich Freeman
  2012-07-26 18:40 ` Michael Mol
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Rich Freeman @ 2012-07-26 18:26 UTC (permalink / raw
  To: gentoo-dev

I've been messing around with namespaces and some of what systemd has
been doing with them, and I have an idea for a portage feature.

But before doing a brain dump of ideas, how useful would it be to have
a FEATURE for portage to do a limited-visibility build?  That is, the
build would be run in an environment where the root filesystem appears
to contain everything in a DEPEND (including @system currently) and
nothing else?  It might be useful both in development/testing, and
also in production use (not sure how performance would work in the
real world - I was able in a script to get it to build an enviornment
in a few seconds for a few packages).

I really crazy idea would be to try to run packages in a similar
environment, but I think that needs better kernel/etc level support
since the performance hit would be much more noticeable, except for
things like daemons that only start once.

Implementing it wouldn't necessarily be hard - just create a tmpfs
under /var/tmp/portage, unshare off a new mount namespace, and
read-only bind-mount everything needed from the root filesystem
(including /var/tmp/portage/...), and chroot into it.  When the build
is done the process governing it terminates and the kernel wipes out
all the mounts and then portage unmounts the tmpfs.  You wouldn't need
to use a tmpfs for the build - it would actually be zero-size as
reported by df since it just contains a bazillion bind mounts, though
all those mounts would consume slab memory.

Thoughts?

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
@ 2012-07-26 18:40 ` Michael Mol
  2012-07-26 19:43   ` Rich Freeman
  2012-07-26 18:59 ` Michael Orlitzky
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Michael Mol @ 2012-07-26 18:40 UTC (permalink / raw
  To: gentoo-dev

On Thu, Jul 26, 2012 at 2:26 PM, Rich Freeman <rich0@gentoo.org> wrote:
> I've been messing around with namespaces and some of what systemd has
> been doing with them, and I have an idea for a portage feature.
>
> But before doing a brain dump of ideas, how useful would it be to have
> a FEATURE for portage to do a limited-visibility build?  That is, the
> build would be run in an environment where the root filesystem appears
> to contain everything in a DEPEND (including @system currently) and
> nothing else?  It might be useful both in development/testing, and
> also in production use (not sure how performance would work in the
> real world - I was able in a script to get it to build an enviornment
> in a few seconds for a few packages).

I very much like this, as it'd greatly simplify identifying any
unintended or unrecognized dependencies in my code. Furthermore, if
the mechanism for identifying and declaring specified-required content
can be generalized, this would make distributed builds potentially
more efficient by allowing the dispatching host to distribute the
necessary header and library files to the machines doing the building.
(Really, this observation is more about simply making the information
available; distcc could consume that information if someone chose to
do the work to add that functionality.)

> I really crazy idea would be to try to run packages in a similar
> environment, but I think that needs better kernel/etc level support
> since the performance hit would be much more noticeable, except for
> things like daemons that only start once.
>
> Implementing it wouldn't necessarily be hard - just create a tmpfs
> under /var/tmp/portage, unshare off a new mount namespace, and
> read-only bind-mount everything needed from the root filesystem
> (including /var/tmp/portage/...), and chroot into it.  When the build
> is done the process governing it terminates and the kernel wipes out
> all the mounts and then portage unmounts the tmpfs.  You wouldn't need
> to use a tmpfs for the build - it would actually be zero-size as
> reported by df since it just contains a bazillion bind mounts, though
> all those mounts would consume slab memory.
>
> Thoughts?

You've done 90% of the conceptual work needed for an idea I had; I've
wanted to do something similar at the 'make' level, to make
identifying and fixing broken parallel build systems easier. If I
could limit a make instance to only be able to see consequences of
jobs it declared as dependencies, that'd go a long way. I was going to
go by way of FUSE for this, though...

-- 
:wq


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
  2012-07-26 18:40 ` Michael Mol
@ 2012-07-26 18:59 ` Michael Orlitzky
  2012-07-26 21:45 ` Alec Warner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Michael Orlitzky @ 2012-07-26 18:59 UTC (permalink / raw
  To: gentoo-dev

On 07/26/12 14:26, Rich Freeman wrote:
> I've been messing around with namespaces and some of what systemd has
> been doing with them, and I have an idea for a portage feature.
> 
> But before doing a brain dump of ideas, how useful would it be to have
> a FEATURE for portage to do a limited-visibility build?  That is, the
> build would be run in an environment where the root filesystem appears
> to contain everything in a DEPEND (including @system currently) and
> nothing else?  It might be useful both in development/testing, and
> also in production use (not sure how performance would work in the
> real world - I was able in a script to get it to build an enviornment
> in a few seconds for a few packages).

The Cabal build system for Haskell packages does this and it's extremely
useful. It prevents me from forgetting dependencies like 'directory',
'time', etc. that I use without thinking.

  runghc Setup.hs build
  Building lwn-epub-0.0...
  Preprocessing executable 'lwn-epub' for lwn-epub-0.0...

  src/LWN/Article.hs:12:8:
      Could not find module `System.Directory'
      It is a member of the hidden package `directory-1.1.0.2'.
      Perhaps you need to add `directory' to the build-depends in your
      .cabal file.
      Use -v to see a list of the files searched for.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:40 ` Michael Mol
@ 2012-07-26 19:43   ` Rich Freeman
  0 siblings, 0 replies; 17+ messages in thread
From: Rich Freeman @ 2012-07-26 19:43 UTC (permalink / raw
  To: gentoo-dev

On Thu, Jul 26, 2012 at 2:40 PM, Michael Mol <mikemol@gmail.com> wrote:
> (Really, this observation is more about simply making the information
> available; distcc could consume that information if someone chose to
> do the work to add that functionality.)

Well, I'm not sure how to get the info out of the internals of portage
itself (I have to imagine it would be fast since portage has to know
about it already), but for a list of packages you can xargs them into
qlist to get a list of all files, and then pipe that into a script
that will populate a chroot for you.

Quick script for those curious to try it out:

mkdir newroot
mount -t tmpfs none newroot
cd newroot
unshare -m /bin/bash
echo "list of packages" | xargs qlist | linkfile
(poke around)
exit
(note that bind mounts are gone)
cd ..
umount newroot ; rmdir newroot
(note that all traces gone)

Contents of linkfile script:
#!/bin/bash
install -D /dev/null "./$1"
mount -n --bind "$1" "./$1"

(That -n is important if you don't want to muck up your /etc/mtab )

BTW, unshare is a fun command to play around with if you've never used
namespaces.  You can do things like replace commands by bind-mounting
right over them.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
  2012-07-26 18:40 ` Michael Mol
  2012-07-26 18:59 ` Michael Orlitzky
@ 2012-07-26 21:45 ` Alec Warner
  2012-07-26 22:35 ` Zac Medico
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alec Warner @ 2012-07-26 21:45 UTC (permalink / raw
  To: gentoo-dev

On Thu, Jul 26, 2012 at 8:26 PM, Rich Freeman <rich0@gentoo.org> wrote:
> I've been messing around with namespaces and some of what systemd has
> been doing with them, and I have an idea for a portage feature.
>
> But before doing a brain dump of ideas, how useful would it be to have
> a FEATURE for portage to do a limited-visibility build?  That is, the
> build would be run in an environment where the root filesystem appears
> to contain everything in a DEPEND (including @system currently) and
> nothing else?  It might be useful both in development/testing, and
> also in production use (not sure how performance would work in the
> real world - I was able in a script to get it to build an enviornment
> in a few seconds for a few packages).

You mean like cowbuilder?

http://wiki.debian.org/cowbuilder

>
> I really crazy idea would be to try to run packages in a similar
> environment, but I think that needs better kernel/etc level support
> since the performance hit would be much more noticeable, except for
> things like daemons that only start once.
>
> Implementing it wouldn't necessarily be hard - just create a tmpfs
> under /var/tmp/portage, unshare off a new mount namespace, and
> read-only bind-mount everything needed from the root filesystem
> (including /var/tmp/portage/...), and chroot into it.  When the build
> is done the process governing it terminates and the kernel wipes out
> all the mounts and then portage unmounts the tmpfs.  You wouldn't need
> to use a tmpfs for the build - it would actually be zero-size as
> reported by df since it just contains a bazillion bind mounts, though
> all those mounts would consume slab memory.
>
> Thoughts?
>
> Rich
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
                   ` (2 preceding siblings ...)
  2012-07-26 21:45 ` Alec Warner
@ 2012-07-26 22:35 ` Zac Medico
  2012-07-27 15:37   ` Rich Freeman
  2012-07-27  0:31 ` Gregory M. Turner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Zac Medico @ 2012-07-26 22:35 UTC (permalink / raw
  To: gentoo-dev

On 07/26/2012 11:26 AM, Rich Freeman wrote:
> Implementing it wouldn't necessarily be hard - just create a tmpfs
> under /var/tmp/portage, unshare off a new mount namespace, and
> read-only bind-mount everything needed from the root filesystem
> (including /var/tmp/portage/...), and chroot into it.  When the build
> is done the process governing it terminates and the kernel wipes out
> all the mounts and then portage unmounts the tmpfs.  You wouldn't need
> to use a tmpfs for the build - it would actually be zero-size as
> reported by df since it just contains a bazillion bind mounts, though
> all those mounts would consume slab memory.

It seems like you might need some kind of copy-on-write support, at
least to run pkg_setup. Apparently cowbuilder uses cow hardlinks for
that. Another way would be to use fiemap (cp --reflink).
-- 
Thanks,
Zac


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
                   ` (3 preceding siblings ...)
  2012-07-26 22:35 ` Zac Medico
@ 2012-07-27  0:31 ` Gregory M. Turner
  2012-07-27 15:04 ` [gentoo-dev] " Michael Palimaka
  2012-07-31 14:48 ` [gentoo-dev] " "Paweł Hajdan, Jr."
  6 siblings, 0 replies; 17+ messages in thread
From: Gregory M. Turner @ 2012-07-27  0:31 UTC (permalink / raw
  To: gentoo-dev

On 7/26/2012 11:26 AM, Rich Freeman wrote:
> I've been messing around with namespaces and some of what systemd has
> been doing with them, and I have an idea for a portage feature.
>
> But before doing a brain dump of ideas, how useful would it be to have
> a FEATURE for portage to do a limited-visibility build?  That is, the
> build would be run in an environment where the root filesystem appears
> to contain everything in a DEPEND (including @system currently) and
> nothing else?  It might be useful both in development/testing, and
> also in production use (not sure how performance would work in the
> real world - I was able in a script to get it to build an enviornment
> in a few seconds for a few packages).

In practice I think it's going to be very hard to make this work in a 
platform-independent way; however in principle this is a ridiculously 
sexy idea that has crossed my mind more than once.

The challenge is that it requires either

   o Building very large sandboxes on a per-package basis

or

   o Python-level access to unionfs/aufs-style COW features.

Imagine the tree of dependencies which would need to be thrown together 
for, i.e.: kmail or firefox!  This makes the former approach seem damn 
nearly infeasible.  The latter approach holds more promise, I think, but 
represents a pretty big development effort.

Still.... very sexy idea, if a python-fs-layering API could be coded up.

One thing to consider: even if it does work, continuing to support the 
"old" way without fancy COW features is going to be required if portage 
is still going to support Gentoo/Alt in all of its flavors (either that, 
or unionfs/aufs features would need to be coded up for all those 
platforms that lack them).

-gmt


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [gentoo-dev] Re: Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
                   ` (4 preceding siblings ...)
  2012-07-27  0:31 ` Gregory M. Turner
@ 2012-07-27 15:04 ` Michael Palimaka
  2012-07-31 14:48 ` [gentoo-dev] " "Paweł Hajdan, Jr."
  6 siblings, 0 replies; 17+ messages in thread
From: Michael Palimaka @ 2012-07-27 15:04 UTC (permalink / raw
  To: gentoo-dev

Autodep[1][2] is a current implementation of this idea, with library 
hook and FUSE options.

Would definitely love to see more development in this area. :)

[1]: https://dev.gentoo.org/~neurogeek/guidexml/
[2]: http://git.overlays.gentoo.org/gitweb/?p=proj/autodep.git;a=summary



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 22:35 ` Zac Medico
@ 2012-07-27 15:37   ` Rich Freeman
  0 siblings, 0 replies; 17+ messages in thread
From: Rich Freeman @ 2012-07-27 15:37 UTC (permalink / raw
  To: gentoo-dev

On Thu, Jul 26, 2012 at 6:35 PM, Zac Medico <zmedico@gentoo.org> wrote:
>
> It seems like you might need some kind of copy-on-write support, at
> least to run pkg_setup. Apparently cowbuilder uses cow hardlinks for
> that. Another way would be to use fiemap (cp --reflink).

Reflinks would be a much clearer implementation if you can assume
everything is on a single COW filesystem.

However, that seems like a bit of a strong restriction to have.
Cowbuilder seems to use hard links which are also limited to the same
filesystem, and it seems to use its own private build image besides.

I was thinking mainly in terms of giving limited visibility only to
those stages which should have it - the setup/postinst/etc phases
probably should have access to the real root.

A more ambitious undertaking would be to extend this to running
applications and not just building them. That is clearly beyond
portage (other than maybe maintaining the list of files requiring
runtime access), and would probably require either a namespace
extension to ld.so, use of MAC, or changes to the kernel itself.  One
implementation might be auto-creating SELinux policies at install time
based on declared RDEPENDS.

Ideally I'd love to see something like this be usable on an end-user
system - and not just be a QA tool.  Thanks to those who chimed in
with similar projects - glad to see some work already done in this
area.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
                   ` (5 preceding siblings ...)
  2012-07-27 15:04 ` [gentoo-dev] " Michael Palimaka
@ 2012-07-31 14:48 ` "Paweł Hajdan, Jr."
  2012-07-31 14:55   ` Michael Mol
  6 siblings, 1 reply; 17+ messages in thread
From: "Paweł Hajdan, Jr." @ 2012-07-31 14:48 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On 7/26/12 8:26 PM, Rich Freeman wrote:
> I've been messing around with namespaces and some of what systemd has
> been doing with them, and I have an idea for a portage feature.
> 
> But before doing a brain dump of ideas, how useful would it be to have
> a FEATURE for portage to do a limited-visibility build?  That is, the
> build would be run in an environment where the root filesystem appears
> to contain everything in a DEPEND (including @system currently) and
> nothing else?

I was thinking about something similar too. In my opinion it's a great
feature. If/when there are any bugs to get this implemented, please let
me know.

A possible alternative implementation would be to make the sandbox deny
access to anything outside DEPEND. One totally crazy idea to make that
fast are extended attributes (portage would record which package a file
belongs to when merging the file). Another possible solution is using a
cache.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 14:48 ` [gentoo-dev] " "Paweł Hajdan, Jr."
@ 2012-07-31 14:55   ` Michael Mol
  2012-07-31 14:56     ` Ian Stakenvicius
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Mol @ 2012-07-31 14:55 UTC (permalink / raw
  To: gentoo-dev

On Tue, Jul 31, 2012 at 10:48 AM, "Paweł Hajdan, Jr."
<phajdan.jr@gentoo.org> wrote:
> On 7/26/12 8:26 PM, Rich Freeman wrote:
>> I've been messing around with namespaces and some of what systemd has
>> been doing with them, and I have an idea for a portage feature.
>>
>> But before doing a brain dump of ideas, how useful would it be to have
>> a FEATURE for portage to do a limited-visibility build?  That is, the
>> build would be run in an environment where the root filesystem appears
>> to contain everything in a DEPEND (including @system currently) and
>> nothing else?
>
> I was thinking about something similar too. In my opinion it's a great
> feature. If/when there are any bugs to get this implemented, please let
> me know.
>
> A possible alternative implementation would be to make the sandbox deny
> access to anything outside DEPEND. One totally crazy idea to make that
> fast are extended attributes (portage would record which package a file
> belongs to when merging the file). Another possible solution is using a
> cache.

We already have the ability to run commands like 'equery b $somefile'
to map a file back to a package, so the data for a filesystem helper
should already be available in whatever database equery is using.

-- 
:wq


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 14:55   ` Michael Mol
@ 2012-07-31 14:56     ` Ian Stakenvicius
  2012-07-31 19:16       ` Rich Freeman
  2012-07-31 19:24       ` Michael Mol
  0 siblings, 2 replies; 17+ messages in thread
From: Ian Stakenvicius @ 2012-07-31 14:56 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 31/07/12 10:55 AM, Michael Mol wrote:
> On Tue, Jul 31, 2012 at 10:48 AM, "Paweł Hajdan, Jr." 
> <phajdan.jr@gentoo.org> wrote:
>> On 7/26/12 8:26 PM, Rich Freeman wrote:
>>> I've been messing around with namespaces and some of what
>>> systemd has been doing with them, and I have an idea for a
>>> portage feature.
>>> 
>>> But before doing a brain dump of ideas, how useful would it be
>>> to have a FEATURE for portage to do a limited-visibility build?
>>> That is, the build would be run in an environment where the
>>> root filesystem appears to contain everything in a DEPEND
>>> (including @system currently) and nothing else?
>> 
>> I was thinking about something similar too. In my opinion it's a
>> great feature. If/when there are any bugs to get this
>> implemented, please let me know.
>> 
>> A possible alternative implementation would be to make the
>> sandbox deny access to anything outside DEPEND. One totally crazy
>> idea to make that fast are extended attributes (portage would
>> record which package a file belongs to when merging the file).
>> Another possible solution is using a cache.
> 
> We already have the ability to run commands like 'equery b
> $somefile' to map a file back to a package, so the data for a
> filesystem helper should already be available in whatever database
> equery is using.
> 

Although that is true, it would be -WAY- too slow to generate said
list via equery/q* helpers; I think that's where the
extended-attributes and/or cache idea comes into play.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iF4EAREIAAYFAlAX8jUACgkQ2ugaI38ACPAm8wEAlfvF3KgQi5ZsH7FbCfALxOn0
hF9Y+vhH8I5Ki0NUbAYA/0uDWlPlx2RIpK8Z7B8E/n//Fuii8ZFppVC440g3djjT
=/xMA
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 14:56     ` Ian Stakenvicius
@ 2012-07-31 19:16       ` Rich Freeman
  2012-07-31 19:27         ` Michał Górny
  2012-07-31 19:24       ` Michael Mol
  1 sibling, 1 reply; 17+ messages in thread
From: Rich Freeman @ 2012-07-31 19:16 UTC (permalink / raw
  To: gentoo-dev

On Tue, Jul 31, 2012 at 10:56 AM, Ian Stakenvicius <axs@gentoo.org> wrote:
>
> Although that is true, it would be -WAY- too slow to generate said
> list via equery/q* helpers; I think that's where the
> extended-attributes and/or cache idea comes into play.

I agree.  This needs to be high-performance when it comes to
individual file access.  If it takes 10 seconds per build to populate
some database or set up a bazillion bind mounts that isn't the end of
the world, but if it takes an extra 0.1 seconds every time a file is
read that could add up VERY fast on a large build.

Ideally I'd like to see the same thing extended to run-time, and short
of writing some entirely new security model into the kernel or taking
namespaces to a whole new level, part of me thinks that
auto-generating SELinux policies might be the solution, so that the
existing mechanism can be extended.

The mad scientist in me keeps thinking up crazy schemes so that
package collisions can be handled by each package just seeing whatever
it wants to see - maybe the entire filesystem looks different
depending on what app you use.  Then I realize that bash is an
application, and how on earth would a human make sense of a system
where no file has any stable identifier other than maybe a
content-hashed key.  Then that makes me wonder why we link to
libraries by filename anyway, when we could just give each library a
GUID and version, and maybe a more general identifier for cases where
you have alternate implementations.

But, as long as we're still just running Gentoo on Unix-like OSes
maybe tweaking the jail is a good place to start...

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 14:56     ` Ian Stakenvicius
  2012-07-31 19:16       ` Rich Freeman
@ 2012-07-31 19:24       ` Michael Mol
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Mol @ 2012-07-31 19:24 UTC (permalink / raw
  To: gentoo-dev

On Tue, Jul 31, 2012 at 10:56 AM, Ian Stakenvicius <axs@gentoo.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 31/07/12 10:55 AM, Michael Mol wrote:
>> On Tue, Jul 31, 2012 at 10:48 AM, "Paweł Hajdan, Jr."
>> <phajdan.jr@gentoo.org> wrote:
>>> On 7/26/12 8:26 PM, Rich Freeman wrote:
>>>> I've been messing around with namespaces and some of what
>>>> systemd has been doing with them, and I have an idea for a
>>>> portage feature.
>>>>
>>>> But before doing a brain dump of ideas, how useful would it be
>>>> to have a FEATURE for portage to do a limited-visibility build?
>>>> That is, the build would be run in an environment where the
>>>> root filesystem appears to contain everything in a DEPEND
>>>> (including @system currently) and nothing else?
>>>
>>> I was thinking about something similar too. In my opinion it's a
>>> great feature. If/when there are any bugs to get this
>>> implemented, please let me know.
>>>
>>> A possible alternative implementation would be to make the
>>> sandbox deny access to anything outside DEPEND. One totally crazy
>>> idea to make that fast are extended attributes (portage would
>>> record which package a file belongs to when merging the file).
>>> Another possible solution is using a cache.
>>
>> We already have the ability to run commands like 'equery b
>> $somefile' to map a file back to a package, so the data for a
>> filesystem helper should already be available in whatever database
>> equery is using.
>>
>
> Although that is true, it would be -WAY- too slow to generate said
> list via equery/q* helpers; I think that's where the
> extended-attributes and/or cache idea comes into play.

Yeah, I was thinking you could use the equery database to initially
fill the cache. Spawning an equery instance for every file access
would be absolute madness. I have enough entropy problems on my
system.

-- 
:wq


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 19:16       ` Rich Freeman
@ 2012-07-31 19:27         ` Michał Górny
  2012-07-31 23:57           ` vivo75
  0 siblings, 1 reply; 17+ messages in thread
From: Michał Górny @ 2012-07-31 19:27 UTC (permalink / raw
  To: gentoo-dev; +Cc: rich0

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On Tue, 31 Jul 2012 15:16:34 -0400
Rich Freeman <rich0@gentoo.org> wrote:

> On Tue, Jul 31, 2012 at 10:56 AM, Ian Stakenvicius <axs@gentoo.org>
> wrote:
> >
> > Although that is true, it would be -WAY- too slow to generate said
> > list via equery/q* helpers; I think that's where the
> > extended-attributes and/or cache idea comes into play.
> 
> I agree.  This needs to be high-performance when it comes to
> individual file access.  If it takes 10 seconds per build to populate
> some database or set up a bazillion bind mounts that isn't the end of
> the world, but if it takes an extra 0.1 seconds every time a file is
> read that could add up VERY fast on a large build.

I'd be more afraid about resources, and whether the kernel will be
actually able to handle bazillion bind mounts. And if, whether it won't
actually cause more overhead than copying the whole system to some kind
of tmpfs.

> 
> Ideally I'd like to see the same thing extended to run-time, and short
> of writing some entirely new security model into the kernel or taking
> namespaces to a whole new level, part of me thinks that
> auto-generating SELinux policies might be the solution, so that the
> existing mechanism can be extended.
> 
> The mad scientist in me keeps thinking up crazy schemes so that
> package collisions can be handled by each package just seeing whatever
> it wants to see - maybe the entire filesystem looks different
> depending on what app you use.  Then I realize that bash is an
> application, and how on earth would a human make sense of a system
> where no file has any stable identifier other than maybe a
> content-hashed key.  Then that makes me wonder why we link to
> libraries by filename anyway, when we could just give each library a
> GUID and version, and maybe a more general identifier for cases where
> you have alternate implementations.
> 
> But, as long as we're still just running Gentoo on Unix-like OSes
> maybe tweaking the jail is a good place to start...
> 
> Rich
> 



-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 19:27         ` Michał Górny
@ 2012-07-31 23:57           ` vivo75
  2012-08-07 18:05             ` Rich Freeman
  0 siblings, 1 reply; 17+ messages in thread
From: vivo75 @ 2012-07-31 23:57 UTC (permalink / raw
  To: gentoo-dev; +Cc: Michał Górny, rich0

Il 31/07/2012 21:27, Michał Górny ha scritto:
> On Tue, 31 Jul 2012 15:16:34 -0400
> Rich Freeman<rich0@gentoo.org>  wrote:
>
>> On Tue, Jul 31, 2012 at 10:56 AM, Ian Stakenvicius<axs@gentoo.org>
>> wrote:
>>> Although that is true, it would be -WAY- too slow to generate said
>>> list via equery/q* helpers; I think that's where the
>>> extended-attributes and/or cache idea comes into play.
>> I agree.  This needs to be high-performance when it comes to
>> individual file access.  If it takes 10 seconds per build to populate
>> some database or set up a bazillion bind mounts that isn't the end of
>> the world, but if it takes an extra 0.1 seconds every time a file is
>> read that could add up VERY fast on a large build.
> I'd be more afraid about resources, and whether the kernel will be
> actually able to handle bazillion bind mounts. And if, whether it won't
> actually cause more overhead than copying the whole system to some kind
> of tmpfs.
If testing show that bind mounts are too heavy we could resort to 
LD_PRELOAD a library that filter the acces to the disk,
or to rework sandbox to also hide w/o errors some files,
with an appropriate database (sys-apps/mlocate come to mind) every 
access will have negligible additional cost compared to that of 
rotational disks.
>> Ideally I'd like to see the same thing extended to run-time, and short
>> of writing some entirely new security model into the kernel or taking
>> namespaces to a whole new level, part of me thinks that
>> auto-generating SELinux policies might be the solution, so that the
>> existing mechanism can be extended.
>>
>> The mad scientist in me keeps thinking up crazy schemes so that
>> package collisions can be handled by each package just seeing whatever
>> it wants to see - maybe the entire filesystem looks different
>> depending on what app you use.  Then I realize that bash is an
>> application, and how on earth would a human make sense of a system
>> where no file has any stable identifier other than maybe a
>> content-hashed key.  Then that makes me wonder why we link to
>> libraries by filename anyway, when we could just give each library a
>> GUID and version, and maybe a more general identifier for cases where
>> you have alternate implementations.
>>
>> But, as long as we're still just running Gentoo on Unix-like OSes
>> maybe tweaking the jail is a good place to start...
>>
>> Rich
>>
>
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
  2012-07-31 23:57           ` vivo75
@ 2012-08-07 18:05             ` Rich Freeman
  0 siblings, 0 replies; 17+ messages in thread
From: Rich Freeman @ 2012-08-07 18:05 UTC (permalink / raw
  To: vivo75@gmail.com; +Cc: gentoo-dev

On Tue, Jul 31, 2012 at 7:57 PM, vivo75@gmail.com <vivo75@gmail.com> wrote:
> Il 31/07/2012 21:27, Michał Górny ha scritto:
>> I'd be more afraid about resources, and whether the kernel will be
>> actually able to handle bazillion bind mounts. And if, whether it won't
>> actually cause more overhead than copying the whole system to some kind
>> of tmpfs.
>
> If testing show that bind mounts are too heavy we could resort to LD_PRELOAD
> a library that filter the acces to the disk,
> or to rework sandbox to also hide w/o errors some files,
> with an appropriate database (sys-apps/mlocate come to mind) every access
> will have negligible additional cost compared to that of rotational disks.

So, while I suspect that bind mount overhead won't actually be that
bad, I'm also thinking that extending the role of sandbox as has
already been suggested might be the simpler solution (and it works on
other kernels as well).  I'd still like a run-time solution some day,
but that would probably require SELinux and seems like a much more
ambitious project, and we'll probably get quite a bit of QA value out
of a sandbox solution.

I think the right solution is to not use external utilities unless
they can be linked in - at least not for anything running in sandbox.
We're talking about at VERY high volume of file opens most likely and
we can't be spawning processes every time that happens, let alone
running bash scripts or whatever.

So, here is my design concept (which had a little help from my LUG - PLUG):

1.  At the start of the build, portage generates a list of files that
are legitimate dependencies - anything in DEPEND or @system.  This can
be done by parsing the /var/pkg/db files (I assume portage has some
internal API for this already).

2.  Portage or a helper program (whatever is fastest) calls stat on
each file to obtain the device and inode IDs.  Maybe long-term we
might consider caching these (but I'm not sure how stable they are).

3.  The list of valid device/inode IDs are passed to sandbox somehow
(maybe in a file).  Sandbox creates a data structure in memory
containing them for rapid access (btree or such).

4.  When sandbox intercepts a file open request, it checks the file
inode against the list and allows/denies accordingly.

That said, after doing a quick pass at the sandbox source it seems
like it already is designed to restrict read access, but it uses
canonical filenames to do so. I'm not sure if those are going to be
reliable, especially if a filesystem contains bind mounts.  Since it
is already checking a read list if we thought that mechnism would be
robust and fast, we could just remove SANDBOX_READ="/" from
/etc/sandbox.d/00default and then load in whatever we want afterwards.
 I need to spend more time groking the current source.  I'd think that
using inode numbers as a key would be faster than determining a
canonical file name on every file access, but if sandbox is already
doing the latter then obviously it isn't that much overhead.

The other thing I'm not sure about here are symlinks.  If a symlink is
contained in a dependency, but the linked file is not, can that file
be used by a package?  I suppose the reverse is also a concern - if a
file is accessed through a symlink that isn't part of a dependency,
but the file it is pointing to is, is that a problem?  I'm wondering
if there is any eselect logic that could cause problems here.  When
calling stat we can choose whether to dereference symlinks.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-08-07 18:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-26 18:26 [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds Rich Freeman
2012-07-26 18:40 ` Michael Mol
2012-07-26 19:43   ` Rich Freeman
2012-07-26 18:59 ` Michael Orlitzky
2012-07-26 21:45 ` Alec Warner
2012-07-26 22:35 ` Zac Medico
2012-07-27 15:37   ` Rich Freeman
2012-07-27  0:31 ` Gregory M. Turner
2012-07-27 15:04 ` [gentoo-dev] " Michael Palimaka
2012-07-31 14:48 ` [gentoo-dev] " "Paweł Hajdan, Jr."
2012-07-31 14:55   ` Michael Mol
2012-07-31 14:56     ` Ian Stakenvicius
2012-07-31 19:16       ` Rich Freeman
2012-07-31 19:27         ` Michał Górny
2012-07-31 23:57           ` vivo75
2012-08-07 18:05             ` Rich Freeman
2012-07-31 19:24       ` Michael Mol

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox