public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Gentoo package statistics -- GSoC 2011
@ 2011-06-08 14:36 Vikraman
  2011-06-08 15:19 ` "Paweł Hajdan, Jr."
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Vikraman @ 2011-06-08 14:36 UTC (permalink / raw
  To: gentoo-dev; +Cc: antarus, chris

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

Hi everyone,

I'm working on the `Package statistics` project this year. Till now, I
have managed to write a client and server[0] to collect the following
information from hosts:

* Uname, portage profile, timestamp of portage tree
* ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
* ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
* Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
  and Build	time for each installed	package

Is there a need to collect files installed by a package ? Doesn't PFL[1]
already provide that ?

Please provide some feedback on what other data should be collected, etc.

Also, I'm starting work on the webUI, and would like some
recommendations for stats pages, such as:

* Packages installed sorted by users
* Top arches, keywords, profiles
* Most enabled, disabled useflags per package/globally

[0]
http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02
[1] http://www.portagefilelist.de/index.php/Main_Page

-- 
Vikraman

[-- Attachment #2: PGP signature --]
[-- Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 14:36 [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman
@ 2011-06-08 15:19 ` "Paweł Hajdan, Jr."
  2011-06-08 15:48   ` Hans de Graaff
                     ` (2 more replies)
  2011-06-08 15:19 ` Gilles Dartiguelongue
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 13+ messages in thread
From: "Paweł Hajdan, Jr." @ 2011-06-08 15:19 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]

On 6/8/11 4:36 PM, Vikraman wrote:
> I'm working on the `Package statistics` project this year. Till now, I
> have managed to write a client and server[0] to collect the following
> information from hosts:

Excellent, good luck with the idea! I think that better information
about how Gentoo is actually used will greatly help improving it.

> Is there a need to collect files installed by a package ? Doesn't PFL[1]
> already provide that ?

Well, PFL is not an official Gentoo project. It might be useful, but I
wouldn't say it's a priority.

> Please provide some feedback on what other data should be collected, etc.

In my opinion it's *not* about collecting as much data as possible. I
think it's most important to get the core functionality working really
well, and convincing as large percentage of users as possible to enable
reporting the statistics (to make the results - hopefully - accurately
represent the user base). Please note that in some cases it may mean
collecting _less_ data, or thinking more about the privacy of the users.

For me, as a developer, even a list of packages sorted by popularity
(aka Debian/Ubuntu popcon) would be very useful.

Ah, and maybe files in /etc/portage: package.keywords and so on. It
could be useful to see what people are masking/unmasking, that may be an
indication of stale stabilizations or brokenness hitting the tree.
Anyway, I'd call it an enhancement.

> Also, I'm starting work on the webUI, and would like some
> recommendations for stats pages, such as:
> 
> * Packages installed sorted by users

Cool!

> * Top arches, keywords, profiles

And percentage of ~arch vs arch users?

> * Most enabled, disabled useflags per package/globally

Also great, especially the per-package variant. It'd be also useful to
have per-profile data, to better tune the profile defaults.

> [0]
> http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02

I took a quick look at the code. Some random comments:

- it uses portage Python API a lot. But it's not stable, or at least not
guaranteed to be stable. Have you considered using helpers like portageq
(or eventually enhancing those helpers)?

- make the licensing super-clear (a LICENSE file, possibly some header
in every source file, and so on)

- how about submitting the data over HTTPS and not HTTP to better help
privacy?

- don't leave exception handling as a TODO; it should be a part of your
design, not an afterthought

- instead of or in addition to the setup.txt file, how about just
writing the real setup.py file for distutils?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 14:36 [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman
  2011-06-08 15:19 ` "Paweł Hajdan, Jr."
@ 2011-06-08 15:19 ` Gilles Dartiguelongue
  2011-06-08 17:35 ` Николай Антонов
  2011-06-10 23:10 ` Sebastian Pipping
  3 siblings, 0 replies; 13+ messages in thread
From: Gilles Dartiguelongue @ 2011-06-08 15:19 UTC (permalink / raw
  To: gentoo-dev; +Cc: antarus, chris

Wasn't there a project like this a couple of years ago which tried to
use a cross-distro tool ?

-- 
Gilles Dartiguelongue <eva@gentoo.org>
Gentoo




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 15:19 ` "Paweł Hajdan, Jr."
@ 2011-06-08 15:48   ` Hans de Graaff
  2011-06-08 18:01   ` Vikraman
  2011-06-08 18:28   ` Donnie Berkholz
  2 siblings, 0 replies; 13+ messages in thread
From: Hans de Graaff @ 2011-06-08 15:48 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]

On Wed, 2011-06-08 at 17:19 +0200, "Paweł Hajdan, Jr." wrote:

> In my opinion it's *not* about collecting as much data as possible. I
> think it's most important to get the core functionality working really
> well, and convincing as large percentage of users as possible to enable
> reporting the statistics (to make the results - hopefully - accurately
> represent the user base). Please note that in some cases it may mean
> collecting _less_ data, or thinking more about the privacy of the users.

+1 on this. Taking the extreme, I'd rather see a properly implemented
architecture that is installed on >50% of Gentoo system just reporting
on the arch, then something that collects a lot more data and is
installed on 50 machines. Once the framework is in place and there is
user uptake then it is easy to slowly extend the statistics collection
and gather more useful data.

> For me, as a developer, even a list of packages sorted by popularity
> (aka Debian/Ubuntu popcon) would be very useful.

That would be useful.

> Ah, and maybe files in /etc/portage: package.keywords and so on. It
> could be useful to see what people are masking/unmasking, that may be an
> indication of stale stabilizations or brokenness hitting the tree.
> Anyway, I'd call it an enhancement.

I'd rather not see this in the initial gsoc project if that means we'll
sacrifice a big rollout.

Kind regards,

Hans

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 14:36 [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman
  2011-06-08 15:19 ` "Paweł Hajdan, Jr."
  2011-06-08 15:19 ` Gilles Dartiguelongue
@ 2011-06-08 17:35 ` Николай Антонов
  2011-06-08 18:07   ` Vikraman
  2011-06-10 23:10 ` Sebastian Pipping
  3 siblings, 1 reply; 13+ messages in thread
From: Николай Антонов @ 2011-06-08 17:35 UTC (permalink / raw
  To: gentoo-dev

On 08.06.2011 18:36, Vikraman wrote:
> Hi everyone,
> 
> I'm working on the `Package statistics` project this year. Till now, I
> have managed to write a client and server[0] to collect the following
> information from hosts:
> 
> * Uname, portage profile, timestamp of portage tree
> * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
> * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
> * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
>   and Build	time for each installed	package
> 

May be collect hardware info & kernel configs too?
For example cpuinfo, lspci and lsusb(?).

I think, that after 1-3 month after installing gentoo, user can(should)
"receive" newsitem about participating in `Package statistics` project.
This newsitem can contains short instruction how-to install and
configure this tool. And even in other gentoo projects(for example write
short wiki page)

And, where can I found ebuilds to the `Package statistics` project?

Sory for my english... and Good luck!



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 15:19 ` "Paweł Hajdan, Jr."
  2011-06-08 15:48   ` Hans de Graaff
@ 2011-06-08 18:01   ` Vikraman
  2011-06-08 18:55     ` Hans de Graaff
  2011-06-08 18:28   ` Donnie Berkholz
  2 siblings, 1 reply; 13+ messages in thread
From: Vikraman @ 2011-06-08 18:01 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3044 bytes --]

On Wed, Jun 08, 2011 at 05:19:33PM +0200, "Paweł Hajdan, Jr." wrote:
> On 6/8/11 4:36 PM, Vikraman wrote:
> > I'm working on the `Package statistics` project this year. Till now, I
> > have managed to write a client and server[0] to collect the following
> > information from hosts:
> 
> Excellent, good luck with the idea! I think that better information
> about how Gentoo is actually used will greatly help improving it.
> 

Well, that information cannot be collected automatically, can it ?

> > Is there a need to collect files installed by a package ? Doesn't PFL[1]
> > already provide that ?
> 
> Well, PFL is not an official Gentoo project. It might be useful, but I
> wouldn't say it's a priority.
> 
> > Please provide some feedback on what other data should be collected, etc.
> 
> In my opinion it's *not* about collecting as much data as possible. I
> think it's most important to get the core functionality working really
> well, and convincing as large percentage of users as possible to enable
> reporting the statistics (to make the results - hopefully - accurately
> represent the user base). Please note that in some cases it may mean
> collecting _less_ data, or thinking more about the privacy of the users.
> 
> For me, as a developer, even a list of packages sorted by popularity
> (aka Debian/Ubuntu popcon) would be very useful.
> 
> Ah, and maybe files in /etc/portage: package.keywords and so on. It
> could be useful to see what people are masking/unmasking, that may be an
> indication of stale stabilizations or brokenness hitting the tree.
> Anyway, I'd call it an enhancement.
> 
> > Also, I'm starting work on the webUI, and would like some
> > recommendations for stats pages, such as:
> > 
> > * Packages installed sorted by users
> 
> Cool!
> 
> > * Top arches, keywords, profiles
> 
> And percentage of ~arch vs arch users?
> 
> > * Most enabled, disabled useflags per package/globally
> 
> Also great, especially the per-package variant. It'd be also useful to
> have per-profile data, to better tune the profile defaults.
> 
> > [0]
> > http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b1094d6e08ec405c02
> 
> I took a quick look at the code. Some random comments:
> 
> - it uses portage Python API a lot. But it's not stable, or at least not
> guaranteed to be stable. Have you considered using helpers like portageq
> (or eventually enhancing those helpers)?
> 
> - make the licensing super-clear (a LICENSE file, possibly some header
> in every source file, and so on)
> 
> - how about submitting the data over HTTPS and not HTTP to better help
> privacy?

Fair points, thanks!

> 
> - don't leave exception handling as a TODO; it should be a part of your
> design, not an afterthought
> 
> - instead of or in addition to the setup.txt file, how about just
> writing the real setup.py file for distutils?
> 

Yes, these are part of my sub-goals for next week.

-- 
Vikraman

[-- Attachment #2: PGP signature --]
[-- Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 17:35 ` Николай Антонов
@ 2011-06-08 18:07   ` Vikraman
  2011-06-08 19:54     ` Francisco Blas Izquierdo Riera (klondike)
  0 siblings, 1 reply; 13+ messages in thread
From: Vikraman @ 2011-06-08 18:07 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On Wed, Jun 08, 2011 at 09:35:26PM +0400, Николай Антонов wrote:
> On 08.06.2011 18:36, Vikraman wrote:
> > Hi everyone,
> > 
> > I'm working on the `Package statistics` project this year. Till now, I
> > have managed to write a client and server[0] to collect the following
> > information from hosts:
> > 
> > * Uname, portage profile, timestamp of portage tree
> > * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
> > * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
> > * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
> >   and Build	time for each installed	package
> > 
> 
> May be collect hardware info & kernel configs too?
> For example cpuinfo, lspci and lsusb(?).

That's not part of package statistics. There's the smolt project for
hardware statistics.

> 
> I think, that after 1-3 month after installing gentoo, user can(should)
> "receive" newsitem about participating in `Package statistics` project.
> This newsitem can contains short instruction how-to install and
> configure this tool. And even in other gentoo projects(for example write
> short wiki page)
> 
> And, where can I found ebuilds to the `Package statistics` project?

The server hasn't been deployed yet, and ebuilds will be available soon!
> 
> Sory for my english... and Good luck!
> 

-- 
Vikraman

[-- Attachment #2: PGP signature --]
[-- Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 15:19 ` "Paweł Hajdan, Jr."
  2011-06-08 15:48   ` Hans de Graaff
  2011-06-08 18:01   ` Vikraman
@ 2011-06-08 18:28   ` Donnie Berkholz
  2 siblings, 0 replies; 13+ messages in thread
From: Donnie Berkholz @ 2011-06-08 18:28 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1270 bytes --]

On 17:19 Wed 08 Jun     , "Paweł Hajdan, Jr." wrote:
> On 6/8/11 4:36 PM, Vikraman wrote:
> > I'm working on the `Package statistics` project this year. Till now, I
> > have managed to write a client and server[0] to collect the following
> > information from hosts:
> 
> Excellent, good luck with the idea! I think that better information
> about how Gentoo is actually used will greatly help improving it.
> 
> > Is there a need to collect files installed by a package ? Doesn't PFL[1]
> > already provide that ?
> 
> Well, PFL is not an official Gentoo project. It might be useful, but I
> wouldn't say it's a priority.

I would love to see it happen, but it's more important to roll out a 
minimal working solution now and add on later.

By combining installed files with USE flag settings, this project could 
actually attempt to factor out which USE flags result in which files in 
an automatic fashion. That would address one of the biggest objections 
many people have had to such a package-to-file search engine.

It would also be pretty useful for some other GSoC projects, like the 
ebuild generator and the auto dependency scanner.

-- 
Thanks,
Donnie

Donnie Berkholz
Sr. Developer, Gentoo Linux
Blog: http://dberkholz.com

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 18:01   ` Vikraman
@ 2011-06-08 18:55     ` Hans de Graaff
  0 siblings, 0 replies; 13+ messages in thread
From: Hans de Graaff @ 2011-06-08 18:55 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 398 bytes --]

On Wed, 2011-06-08 at 23:31 +0530, Vikraman wrote:

> > Excellent, good luck with the idea! I think that better information
> > about how Gentoo is actually used will greatly help improving it.
> > 
> 
> Well, that information cannot be collected automatically, can it ?

You could pop up a window at random times and ask the user. So it can be
done. Whether it's a good idea …

Hans

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 18:07   ` Vikraman
@ 2011-06-08 19:54     ` Francisco Blas Izquierdo Riera (klondike)
  2011-06-08 20:26       ` ross smith
  0 siblings, 1 reply; 13+ messages in thread
From: Francisco Blas Izquierdo Riera (klondike) @ 2011-06-08 19:54 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

El 08/06/11 20:07, Vikraman escribió:
> On Wed, Jun 08, 2011 at 09:35:26PM +0400, Николай Антонов wrote:
>> On 08.06.2011 18:36, Vikraman wrote:
>>> Hi everyone,
>>>
>>> I'm working on the `Package statistics` project this year. Till now, I
>>> have managed to write a client and server[0] to collect the following
>>> information from hosts:
>>>
>>> * Uname, portage profile, timestamp of portage tree
>>> * ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
>>> * ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
>>> * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
>>>   and Build	time for each installed	package
>>>
>> May be collect hardware info & kernel configs too?
>> For example cpuinfo, lspci and lsusb(?).
> That's not part of package statistics. There's the smolt project for
> hardware statistics.
Well there is another reason about why you don't want' to log that:
Hardened users. Not having access to the kernel .config helps in making
the system more resilient to some attacks, as a result many hardened
users are very stubborn in not having the .config files published.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 19:54     ` Francisco Blas Izquierdo Riera (klondike)
@ 2011-06-08 20:26       ` ross smith
  0 siblings, 0 replies; 13+ messages in thread
From: ross smith @ 2011-06-08 20:26 UTC (permalink / raw
  To: gentoo-dev

>>> May be collect hardware info & kernel configs too?
>>> For example cpuinfo, lspci and lsusb(?).
>> That's not part of package statistics. There's the smolt project for
>> hardware statistics.
> Well there is another reason about why you don't want' to log that:
> Hardened users. Not having access to the kernel .config helps in making
> the system more resilient to some attacks, as a result many hardened
> users are very stubborn in not having the .config files published.

I would really like to see a nice way to set what information I want
sent.   Perhaps a config file in /etc ?   Also, an option to see what
is being sent would be great. :)

I look forward to start contributing my machine's info.

-Ross



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-08 14:36 [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman
                   ` (2 preceding siblings ...)
  2011-06-08 17:35 ` Николай Антонов
@ 2011-06-10 23:10 ` Sebastian Pipping
  2011-06-11  2:06   ` Vikraman
  3 siblings, 1 reply; 13+ messages in thread
From: Sebastian Pipping @ 2011-06-10 23:10 UTC (permalink / raw
  To: gentoo-dev

On 06/08/2011 04:36 PM, Vikraman wrote:
> * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
>   and Build	time for each installed	package

How many operations do you expect for a submissions with 1000 packages
on SQL level?  Will that be around 1000 inserts?

Best,



Sebastian



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] Gentoo package statistics -- GSoC 2011
  2011-06-10 23:10 ` Sebastian Pipping
@ 2011-06-11  2:06   ` Vikraman
  0 siblings, 0 replies; 13+ messages in thread
From: Vikraman @ 2011-06-11  2:06 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Sat, Jun 11, 2011 at 01:10:36AM +0200, Sebastian Pipping wrote:
> On 06/08/2011 04:36 PM, Vikraman wrote:
> > * Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
> >   and Build	time for each installed	package
> 
> How many operations do you expect for a submissions with 1000 packages
> on SQL level?  Will that be around 1000 inserts?
> 

One insert for each package entry, and one insert for every useflag.

> Best,
> 
> 
> 
> Sebastian
> 

-- 
Vikraman

[-- Attachment #2: PGP signature --]
[-- Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-06-11  2:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-08 14:36 [gentoo-dev] Gentoo package statistics -- GSoC 2011 Vikraman
2011-06-08 15:19 ` "Paweł Hajdan, Jr."
2011-06-08 15:48   ` Hans de Graaff
2011-06-08 18:01   ` Vikraman
2011-06-08 18:55     ` Hans de Graaff
2011-06-08 18:28   ` Donnie Berkholz
2011-06-08 15:19 ` Gilles Dartiguelongue
2011-06-08 17:35 ` Николай Антонов
2011-06-08 18:07   ` Vikraman
2011-06-08 19:54     ` Francisco Blas Izquierdo Riera (klondike)
2011-06-08 20:26       ` ross smith
2011-06-10 23:10 ` Sebastian Pipping
2011-06-11  2:06   ` Vikraman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox