public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Gentoo stats server/client @ 2009-06-21
@ 2009-06-21  1:26 Sebastian Pipping
  2009-06-21  3:34 ` Robin H. Johnson
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Pipping @ 2009-06-21  1:26 UTC (permalink / raw
  To: gentoo-dev

I've been working on the first Gentoo-specific data collecting bytes
today. As smolt is written in Python using Portage's Python API was an
easy choice.  Here's an excerpt of data sets and their status of
processing that I've been working with today:


Collected and auto-filtered:

    -   gentoo_overlays
            list of installed overlays
    -   gentoo_global_use
    -   gentoo_global_keywords
            i.e. ACCEPT_KEYWORDS

Collected, auto-filtering to be done:

    -   gentoo_compile_flags
            i.e. CXXFLAGS + CFLAGS + LDFLAGS
    -   gentoo_mirrors


What do I mean by auto-filtering?  Auto-filtering works to protect the
user's privacy.  It's the process of comparing his local settings
against the knowledge base of the Gentoo system:  Every part of his
config that's outside of that larger set is stripped away, because
publishing that information could hurt his privacy.  To make this more
concrete:


For Overlays ..
    we filter out overlays not located below /usr/local/portage/layman/.

For global use flags ..
    we filter out stuff that's not described in
    /usr/portage/profiles/use.desc or
    /usr/portage/profiles/use.local.desc


If you would like to see the code of today in action grab gentoo.py from
http://git.goodpoint.de/?p=smolt-gentoo.git;a=blob_plain;f=client/distros/gentoo/gentoo.py;hb=b9742d88c8216b2989fba327bd2e34972c68dcb5
and run it through "python gentoo.py"



Sebastian





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev] Gentoo stats server/client @ 2009-06-21
  2009-06-21  1:26 [gentoo-dev] Gentoo stats server/client @ 2009-06-21 Sebastian Pipping
@ 2009-06-21  3:34 ` Robin H. Johnson
  2009-06-21 14:55   ` Sebastian Pipping
  2009-07-01  3:25   ` Sebastian Pipping
  0 siblings, 2 replies; 10+ messages in thread
From: Robin H. Johnson @ 2009-06-21  3:34 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]

This isn't meant to shoot stuff down, but more to suggest other places that
filtering is probably going to be needed, based on some "advanced" [1] usage of
Gentoo.

On Sun, Jun 21, 2009 at 03:26:56AM +0200, Sebastian Pipping wrote:
> What do I mean by auto-filtering?  Auto-filtering works to protect the
> user's privacy.  It's the process of comparing his local settings
> against the knowledge base of the Gentoo system:  Every part of his
> config that's outside of that larger set is stripped away, because
> publishing that information could hurt his privacy.  To make this more
> concrete:
I really need to get around to publishing one of my sekrit projects,
"managed-portage", which I might as well start to describe here, as it's nearly
ready. It's not so much a direct codebase for use, but a guideline on how to
manage sets of machines that may match in certain dimensions only: location,
purpose, hardware type [2]

The entire managed-portage system works with stacked profiles, and
various degrees of partial inheritance, so machines can end up with very
different views of the package trees.

Relevant to this, I might not want to disclose my profile inheritance
tree. Here's one of them for you:
/etc/make.profile
/etc/managed-portage/hosts/build_webdb/make.profile
/etc/managed-portage/common/post/make.profile
/etc/managed-portage/class/webdb/make.profile
/etc/managed-portage/class/db/make.profile
/etc/managed-portage/class/web/make.profile
/etc/managed-portage/common/pre/make.profile
/etc/managed-portage/location/surrey/make.profile
/etc/managed-portage/hwtype/nehalem/make.profile
/usr/portage/profiles/default/linux/amd64/2008.0

> For Overlays ..
>     we filter out overlays not located below /usr/local/portage/layman/.
This is going to be fail.
1. That's not the only location used for layman.
- At home: /code/gentoo/layman/ 
- At work: /usr/local/portage-layman/
- Gentoo Infra: /usr/portage/local/layman/

2. Just because an overlay is distributed by layman does NOT mean that
   it's safe to disclose the existence of, within Gentoo infra, we do
   this in layman.cfg:
overlays  : http://www.gentoo.org/proj/en/overlays/layman-global.txt
            file:///etc/layman/infra-overlays.xml

While I don't mind disclosing the list of overlays we have in infra,
other large-scale use of layman might not be happy to disclose it.
If it came from the layman-global.txt, sure, it might be ok, but see if there's
a way to filter out others.

3. For one of my work overlays, we have a custom category called
   'ih-int', for our internal ebuilds (some just meta ebuild, others
   full applications). I might not want to disclose just those package names.

Footnotes:
[1] 
By "advanced", I mean stuff that I haven't seen used by many users, but have
seen in large-scale business usage of Gentoo.

[2]
Hardware type is very fined grained for my use:
- Usually pairs of motherboard+cpu combinations.
- Multiple generations of Opterons.
- Multiple generations of Xeons.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev] Gentoo stats server/client @ 2009-06-21
  2009-06-21  3:34 ` Robin H. Johnson
@ 2009-06-21 14:55   ` Sebastian Pipping
  2009-06-21 20:55     ` Sebastian Pipping
                       ` (2 more replies)
  2009-07-01  3:25   ` Sebastian Pipping
  1 sibling, 3 replies; 10+ messages in thread
From: Sebastian Pipping @ 2009-06-21 14:55 UTC (permalink / raw
  To: gentoo-dev

First thanks for sharing your concerns and setup bits.
That's the right thing at the the right time.



Robin H. Johnson wrote:
> Relevant to this, I might not want to disclose my profile inheritance
> tree. Here's one of them for you:
> /etc/make.profile
> /etc/managed-portage/hosts/build_webdb/make.profile
> /etc/managed-portage/common/post/make.profile
> /etc/managed-portage/class/webdb/make.profile
> /etc/managed-portage/class/db/make.profile
> /etc/managed-portage/class/web/make.profile
> /etc/managed-portage/common/pre/make.profile
> /etc/managed-portage/location/surrey/make.profile
> /etc/managed-portage/hwtype/nehalem/make.profile
> /usr/portage/profiles/default/linux/amd64/2008.0

Which of these is the target of the /etc/make.profile link?
The last one?  My current approach resolves the soft link and
cuts of the profiles dir prefix.  So in case it's the last for
you that would be

  default/linux/amd64/2008.0

To auto-filter profiles would parsing profiles.desc work?
Would a synced CVS checkout of
<http://sources.gentoo.org/viewcvs.py/gentoo-x86/profiles/>
give anything more that I could or should use?


>> For Overlays ..
>>     we filter out overlays not located below /usr/local/portage/layman/.
> This is going to be fail.
> 1. That's not the only location used for layman.
> - At home: /code/gentoo/layman/ 
> - At work: /usr/local/portage-layman/
> - Gentoo Infra: /usr/portage/local/layman/
> 
> 2. Just because an overlay is distributed by layman does NOT mean that
>    it's safe to disclose the existence of, within Gentoo infra, we do
>    this in layman.cfg:
> overlays  : http://www.gentoo.org/proj/en/overlays/layman-global.txt
>             file:///etc/layman/infra-overlays.xml

I see.  How about this approach instead:

- Get list of overlays from layman-global.txt, through either

  A) Download and keep a snapshot of layman-global.txt in sync ourselves

  B) Use heuristic on layman's cache

     - Resolve ${cache} from /etc/layman/layman.cfg

     - Parse all ${cache}/cache_*.xml files using the Layman API

     - Compare the list of overlays each file provides against
       a hardcoded snapshot of overlay names ("akoya alexxy arcon ..")

     - Assume the file with the highest count of matches for
       layman-global.txt if the count is >=50 of the number hardcoded
       overlays

- Take the official tree and globa overlays (overlays from
  layman-global.txt) into  account for statistics

  - Resolve ${storage} from /etc/layman/layman.cfg

  - Include ebuilds from ${storage}/{global,overlays,here} and
    /usr/portage/

What it does not catch is people putting their own ebuilds
right into the main tree.  As they lose them all on the next sync
are we safe to assume that no one really does that?
If not are there alternatives to comparing to a synced checkout
of gentoo-x86 (either rsync or CVS)?

Any concerns or ideas for improvement?


> While I don't mind disclosing the list of overlays we have in infra,
> other large-scale use of layman might not be happy to disclose it.

Agreed.


> 3. For one of my work overlays, we have a custom category called
>    'ih-int', for our internal ebuilds (some just meta ebuild, others
>    full applications). I might not want to disclose just those package names.

Right.  With the approach described above the whole overlay is ignored.



Sebastian



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev] Gentoo stats server/client @ 2009-06-21
  2009-06-21 14:55   ` Sebastian Pipping
@ 2009-06-21 20:55     ` Sebastian Pipping
  2009-06-21 21:43     ` [gentoo-dev] " Duncan
  2009-06-21 22:09     ` [gentoo-dev] " Robin H. Johnson
  2 siblings, 0 replies; 10+ messages in thread
From: Sebastian Pipping @ 2009-06-21 20:55 UTC (permalink / raw
  To: gentoo-dev; +Cc: wrobel

Sebastian Pipping wrote:
>   A) Download and keep a snapshot of layman-global.txt in sync ourselves
> 
>   B) Use heuristic on layman's cache
> 
>      - Resolve ${cache} from /etc/layman/layman.cfg
> 
>      - Parse all ${cache}/cache_*.xml files using the Layman API
> 
>      - Compare the list of overlays each file provides against
>        a hardcoded snapshot of overlay names ("akoya alexxy arcon ..")
> 
>      - Assume the file with the highest count of matches for
>        layman-global.txt if the count is >=50 of the number hardcoded
>        overlays

forget about (B).  as we're dealing with privacy a 100% approach
is much better than some heuristic.

i'll keep an extra copy of layman-global.txt in sync and
use modification timestamps on any file in layman's cache dir
as trigger to re-sync with layman-global.txt from the web.
that's an optimized version of (A) to me.



sebastian



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [gentoo-dev]  Re: Gentoo stats server/client @ 2009-06-21
  2009-06-21 14:55   ` Sebastian Pipping
  2009-06-21 20:55     ` Sebastian Pipping
@ 2009-06-21 21:43     ` Duncan
  2009-07-01  2:54       ` Sebastian Pipping
  2009-06-21 22:09     ` [gentoo-dev] " Robin H. Johnson
  2 siblings, 1 reply; 10+ messages in thread
From: Duncan @ 2009-06-21 21:43 UTC (permalink / raw
  To: gentoo-dev

Sebastian Pipping <webmaster@hartwork.org> posted
4A3E49C6.5070804@hartwork.org, excerpted below, on  Sun, 21 Jun 2009
16:55:02 +0200:

> What it does not catch is people putting their own ebuilds right into
> the main tree.  As they lose them all on the next sync are we safe to
> assume that no one really does that? If not are there alternatives to
> comparing to a synced checkout of gentoo-x86 (either rsync or CVS)?

Note that one can set PORTAGE_RSYNC_EXTRA_OPTS in make.conf, with the 
contents being added to the normal rsync command.  Looking at the rsync 
manpage, there's the --exclude-from=/path/to/exclude.file option.  That 
file is then examined for a list of files and directories to exclude from 
the rsync.  AFAIK there are other command-line options that accomplish 
the same thing, but listing the exclusions directly on the command line.

It is thus possible to exclude any directory or file that would otherwise 
be synced.  I use that here to exclude my src dir, as I prefer that name 
to the Gentoo standard distdir.  However, it should be obvious that 
anything a sysadmin wishes to add to the exclude list will then not be 
synced, and it'd be entirely possible for a sysadmin to decide to use 
that to store their own ebuilds directly in the tree.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev] Gentoo stats server/client @ 2009-06-21
  2009-06-21 14:55   ` Sebastian Pipping
  2009-06-21 20:55     ` Sebastian Pipping
  2009-06-21 21:43     ` [gentoo-dev] " Duncan
@ 2009-06-21 22:09     ` Robin H. Johnson
  2009-06-21 23:36       ` [gentoo-dev] " Duncan
  2009-07-01  3:12       ` [gentoo-dev] " Sebastian Pipping
  2 siblings, 2 replies; 10+ messages in thread
From: Robin H. Johnson @ 2009-06-21 22:09 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]

On Sun, Jun 21, 2009 at 04:55:02PM +0200, Sebastian Pipping wrote:
> Robin H. Johnson wrote:
> > Relevant to this, I might not want to disclose my profile inheritance
> > tree. Here's one of them for you:
> > /etc/make.profile
> > /etc/managed-portage/hosts/build_webdb/make.profile
> > /etc/managed-portage/common/post/make.profile
> > /etc/managed-portage/class/webdb/make.profile
> > /etc/managed-portage/class/db/make.profile
> > /etc/managed-portage/class/web/make.profile
> > /etc/managed-portage/common/pre/make.profile
> > /etc/managed-portage/location/surrey/make.profile
> > /etc/managed-portage/hwtype/nehalem/make.profile
> > /usr/portage/profiles/default/linux/amd64/2008.0
> Which of these is the target of the /etc/make.profile link?
> The last one?  My current approach resolves the soft link and
> cuts of the profiles dir prefix.  So in case it's the last for
> you that would be

$MP = /etc/managed-portage.

There is a symlink right now, but there might not be in future.
/etc/make.profile/parents -> $MP/hosts/build_webdb/make.profile

$MP/hosts/build_webdb/make.profile/parents:
$MP/common/pre/make.profile
$MP/location/surrey/make.profile
$MP/class/webdb/make.profile
$MP/hwtype/nehalem/make.profile
$MP/common/post/make.profile

$MP/class/webdb/make.profile/parents:
$MP/class/db/make.profile
$MP/class/web/make.profile

$MP/hwtype/nehalem/make.profile/parents:
/usr/portage/profiles/default/linux/amd64/2008.0

The following have no parents:
$MP/class/db/make.profile
$MP/class/web/make.profile
$MP/common/pre/make.profile
$MP/common/post/make.profile
$MP/location/surrey/make.profile

> To auto-filter profiles would parsing profiles.desc work?
> Would a synced CVS checkout of
> <http://sources.gentoo.org/viewcvs.py/gentoo-x86/profiles/>
> give anything more that I could or should use?
I'm wondering how profiles should be reported. Rather than just the
endpoint, I'm thinking that we should resolve them and generate a list,
like the above, then explicitly whiteout the non-public ones.
So in the above, you'd report:
===
(censored) X 13
default/linux/amd64/2008.0
===

The resolving can be terminated at each profile that is listed in
profiles.desc, so you can just report default/linux/amd64/2008.0 and not
all the profiles that make that up.


> I see.  How about this approach instead:
> - Get list of overlays from layman-global.txt, through either
>   A) Download and keep a snapshot of layman-global.txt in sync ourselves
Just A, per your other email.

> - Take the official tree and globa overlays (overlays from
>   layman-global.txt) into  account for statistics
> 
>   - Resolve ${storage} from /etc/layman/layman.cfg
> 
>   - Include ebuilds from ${storage}/{global,overlays,here} and
>     /usr/portage/
> 
> What it does not catch is people putting their own ebuilds
> right into the main tree.  As they lose them all on the next sync
> are we safe to assume that no one really does that?
> If not are there alternatives to comparing to a synced checkout
> of gentoo-x86 (either rsync or CVS)?
Why does the raw content of the trees matter?
I can see the source (which tree) of a given package that is already
installed mattering, but not the raw content of the tree.

> Any concerns or ideas for improvement?
/usr/portage might NOT be from the public rsync.
- Many devs have it straight from CVS.
- Infra has it stripped of a lot of GUI packages (like gnome, kde etc).

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [gentoo-dev]  Re: Gentoo stats server/client @ 2009-06-21
  2009-06-21 22:09     ` [gentoo-dev] " Robin H. Johnson
@ 2009-06-21 23:36       ` Duncan
  2009-07-01  3:12       ` [gentoo-dev] " Sebastian Pipping
  1 sibling, 0 replies; 10+ messages in thread
From: Duncan @ 2009-06-21 23:36 UTC (permalink / raw
  To: gentoo-dev

"Robin H. Johnson" <robbat2@gentoo.org> posted
robbat2-20090621T215006-739572854Z@orbis-terrarum.net, excerpted below, on
 Sun, 21 Jun 2009 15:09:41 -0700:

>> Any concerns or ideas for improvement?
> /usr/portage might NOT be from the public rsync. - Many devs have it
> straight from CVS. - Infra has it stripped of a lot of GUI packages
> (like gnome, kde etc).

Plus, the PORTDIR setting in /etc/make.conf controls where portage looks 
for the tree.  It doesn't have to be /usr/portage.  Here, it's /p (which 
is a symlink to /str/portage).  Now I do happen to have /usr/portage as a 
symlink to /p, just in case something stupid still has the location hard-
coded, but not everyone will have a /usr/portage at all, and in theory, 
it could in fact be a private overlay or even a user's private home dir.

Of course there's also the small detail that not everyone even uses 
portage, as there are two other Gentoo-pms (package management spec) 
compatible package managers in the tree, paludis and pkgcore.  But 
portage is still the official Gentoo package manager, and support for the 
others can always be added later, if desired.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev]  Re: Gentoo stats server/client @ 2009-06-21
  2009-06-21 21:43     ` [gentoo-dev] " Duncan
@ 2009-07-01  2:54       ` Sebastian Pipping
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Pipping @ 2009-07-01  2:54 UTC (permalink / raw
  To: gentoo-dev

Duncan wrote:
> Note that one can set PORTAGE_RSYNC_EXTRA_OPTS in make.conf, with the 
> contents being added to the normal rsync command.  Looking at the rsync 
> manpage, there's the --exclude-from=/path/to/exclude.file option.
> [..]
> However, it should be obvious that 
> anything a sysadmin wishes to add to the exclude list will then not be 
> synced, and it'd be entirely possible for a sysadmin to decide to use 
> that to store their own ebuilds directly in the tree.

Thanks for pointing to that, add to my todo list.



Sebastian




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev] Gentoo stats server/client @ 2009-06-21
  2009-06-21 22:09     ` [gentoo-dev] " Robin H. Johnson
  2009-06-21 23:36       ` [gentoo-dev] " Duncan
@ 2009-07-01  3:12       ` Sebastian Pipping
  1 sibling, 0 replies; 10+ messages in thread
From: Sebastian Pipping @ 2009-07-01  3:12 UTC (permalink / raw
  To: gentoo-dev

Robin H. Johnson wrote:
> I'm wondering how profiles should be reported. Rather than just the
> endpoint, I'm thinking that we should resolve them and generate a list,
> like the above, then explicitly whiteout the non-public ones.
> So in the above, you'd report:
> ===
> (censored) X 13
> default/linux/amd64/2008.0
> ===
> 
> The resolving can be terminated at each profile that is listed in
> profiles.desc, so you can just report default/linux/amd64/2008.0 and not
> all the profiles that make that up.

I'm not sure about that.  It feels a bit overkill to me.
I think for now I'll stick to

 1) $p = resolve the symlink
 2) match $p against /usr/portage/profiles/profiles.desc
 3) use $p on match, ~"custom" else



Sebastian




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [gentoo-dev] Gentoo stats server/client @ 2009-06-21
  2009-06-21  3:34 ` Robin H. Johnson
  2009-06-21 14:55   ` Sebastian Pipping
@ 2009-07-01  3:25   ` Sebastian Pipping
  1 sibling, 0 replies; 10+ messages in thread
From: Sebastian Pipping @ 2009-07-01  3:25 UTC (permalink / raw
  To: gentoo-dev

Robin H. Johnson wrote:
> 1. That's not the only location used for layman.
> - At home: /code/gentoo/layman/ 
> - At work: /usr/local/portage-layman/
> - Gentoo Infra: /usr/portage/local/layman/
> 
> 2. Just because an overlay is distributed by layman does NOT mean that
>    it's safe to disclose the existence of, within Gentoo infra, we do
>    this in layman.cfg:
> overlays  : http://www.gentoo.org/proj/en/overlays/layman-global.txt
>             file:///etc/layman/infra-overlays.xml
> 
> While I don't mind disclosing the list of overlays we have in infra,
> other large-scale use of layman might not be happy to disclose it.
> If it came from the layman-global.txt, sure, it might be ok, but see if there's
> a way to filter out others.

I have implemented a more sophisticated algorithm by now:

  def is_global(overlay_location):
      name = overlay_name(overlay_location)
      return overlay_location.startswith(layman_storage_path) \
          and same_repository(
              available_installed_overlay_dict[name],
              global_overlay_dict[name])

The dictonaries in the last two lines are maps from name to url:
- available_installed_overlay = from %(local_list)s
- global_overlay_dict         = from layman-global.txt

Please let me know what you think about that approach.
To check it out live grab the two python files from
http://git.goodpoint.de/?p=smolt-gentoo.git;a=tree;f=client/distros/gentoo;hb=refs/heads/gentoo
and run

  $ python gentoo.py | fgrep 'OVERLAYS'

For my current setup it prints

  NUMBER_OF_PRIVATE_OVERLAYS = 2
  NUMBER_OF_PUBLIC_OVERLAYS = 7
  OVERLAYS = ['toolchain', 'rbu', 'sunrise', 'zugaina',
    'berkano', 'python-testing', 'python-experimental']



Sebastian





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-07-01  3:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-21  1:26 [gentoo-dev] Gentoo stats server/client @ 2009-06-21 Sebastian Pipping
2009-06-21  3:34 ` Robin H. Johnson
2009-06-21 14:55   ` Sebastian Pipping
2009-06-21 20:55     ` Sebastian Pipping
2009-06-21 21:43     ` [gentoo-dev] " Duncan
2009-07-01  2:54       ` Sebastian Pipping
2009-06-21 22:09     ` [gentoo-dev] " Robin H. Johnson
2009-06-21 23:36       ` [gentoo-dev] " Duncan
2009-07-01  3:12       ` [gentoo-dev] " Sebastian Pipping
2009-07-01  3:25   ` Sebastian Pipping

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox