public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] RFP: System to account users configurations
@ 2002-06-16 20:16 Rufiao
  0 siblings, 0 replies; 6+ messages in thread
From: Rufiao @ 2002-06-16 20:16 UTC (permalink / raw
  To: gentoo-dev

As stated in bug #3778 (http://bugs.gentoo.org/show_bug.cgi?id=3778):

1. Rationale

This system is inspired by Debian's popularity-contest package
(http://packages.debian.org/stable/misc/popularity-contest.html) with some 
important enhancements. The key idea is to provide means for the Gentoo
community to account the most used packages, hardware configurations, kernel
versions, compile flags and profiles. Additionaly, this system aims to 
provide the following advantages:

- Allow the creation of CD layouts which include the most used packages for
  each profile
- Allow the developers to investigate the most used configurations, and focus
  on them
  for setting priorities, documentation, standard kernel configurations, etc.
- Give some figures about the number of active users in the community

2. Description

The system comprises of 2 subsistems:

- A client-side system that runs periodically through cron to grab information
  from users' configurations and post them to the server system trhough HTTP.
  This system does not require any user intervention beyond the initial
  configuration.

- A server-side system running on the gentoo.org domain capable of receiving
  the information provided by the users, store them on a database and create 
  statistics with them. Also, it provides a web front-end to query the
  database.

The following information will be processed by the system:

- Packages installed, including their versions (as in 
  `qpkg -nc -I -v` from the gentoolkit package)
- Flags in make.conf (as in 
  `egrep "^(USE|CHOST|CFLAGS|CXXFLAGS)" /etc/make.conf`)
- CPU info (as in `egrep "^(model name|cpu MHz)" /proc/cpuinfo`)
- System memory (as in `egrep "^MemTotal:" /proc/meminfo`)
- PCI devices (as in 
  `lspci | colrm 1 8 | sed 's/\(.*\)(.*/\1/'` from the pciutils package)
- USB devices (as in `lsusb | grep iProduct | colrm 1 28` from the 
  usbutils packages)
- Kernel version (as in `uname -r`)
- Profile being used (as in
  `ls -ld /etc/make.profile | awk '{print $NF}' | awk -F/ '{print $NF}'`)

In the client side, the procedure to provide data for the system is the
following:

- User emerge the package, which:
  - Sets a crontab entry to let the system run periodically, possibly
    requiring user intervention to specify when the system should run
  - Points to an URL (in the gentoo.org domain) for signup
- User go to the provided url, which requests the e-mail from the user, and
  that the user transcribe a random 4-letters message shown as an image to
  a text box. These requirements are used to ensure, as long as possible,
  the autenticity of the data and to avoid automated signups
- The server-side system will e-mail the user with a key, which must be
  placed in the config file
- To post the information to the server-side system, the client-side system
  can use the proxy settings defined on /etc/make.conf
- In the first set of data the server-side system receives, it will e-mail
  a message to the user to let him know the system is running fine

Note that it is not guaranteed the system will have internet connectivity
when it gets run. In this case, it may keep periodically checking in the
background for a route to the server.

The following vars can be set on the config file:

- Key: as discussed above
- Acknowlege flag: send an e-mail to the user every time a set of data from
  him is processed (defaults to false)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-dev] RFP: System to account users configurations
@ 2002-06-16 21:12 Faust Tanasescu
  2002-06-16 23:11 ` Rufiao
  0 siblings, 1 reply; 6+ messages in thread
From: Faust Tanasescu @ 2002-06-16 21:12 UTC (permalink / raw
  To: gentoo-dev



>From: Rufiao <rufiao@gmx.net>
>Reply-To: gentoo-dev@gentoo.org
>To: gentoo-dev@gentoo.org
>Subject: [gentoo-dev] RFP: System to account users configurations
>Date: Sun, 16 Jun 2002 17:16:21 -0300
>
>
>As stated in bug #3778 (http://bugs.gentoo.org/show_bug.cgi?id=3778):
>
>1. Rationale
>
>This system is inspired by Debian's popularity-contest package
>(http://packages.debian.org/stable/misc/popularity-contest.html) with some
>important enhancements. The key idea is to provide means for the Gentoo
>community to account the most used packages, hardware configurations, 
>kernel
>versions, compile flags and profiles. Additionaly, this system aims to
>provide the following advantages:
>
>- Allow the creation of CD layouts which include the most used packages for
>   each profile
>- Allow the developers to investigate the most used configurations, and 
>focus
>   on them
>   for setting priorities, documentation, standard kernel configurations, 
>etc.
>- Give some figures about the number of active users in the community
>
>2. Description
>
>The system comprises of 2 subsistems:
>
>- A client-side system that runs periodically through cron to grab 
>information
>   from users' configurations and post them to the server system trhough 
>HTTP.
>   This system does not require any user intervention beyond the initial
>   configuration.
>
>- A server-side system running on the gentoo.org domain capable of 
>receiving
>   the information provided by the users, store them on a database and 
>create
>   statistics with them. Also, it provides a web front-end to query the
>   database.
>
>The following information will be processed by the system:
>
>- Packages installed, including their versions (as in
>   `qpkg -nc -I -v` from the gentoolkit package)
>- Flags in make.conf (as in
>   `egrep "^(USE|CHOST|CFLAGS|CXXFLAGS)" /etc/make.conf`)
>- CPU info (as in `egrep "^(model name|cpu MHz)" /proc/cpuinfo`)
>- System memory (as in `egrep "^MemTotal:" /proc/meminfo`)
>- PCI devices (as in
>   `lspci | colrm 1 8 | sed 's/\(.*\)(.*/\1/'` from the pciutils package)
>- USB devices (as in `lsusb | grep iProduct | colrm 1 28` from the
>   usbutils packages)
>- Kernel version (as in `uname -r`)
>- Profile being used (as in
>   `ls -ld /etc/make.profile | awk '{print $NF}' | awk -F/ '{print $NF}'`)
>
>In the client side, the procedure to provide data for the system is the
>following:
>
>- User emerge the package, which:
>   - Sets a crontab entry to let the system run periodically, possibly
>     requiring user intervention to specify when the system should run
>   - Points to an URL (in the gentoo.org domain) for signup
>- User go to the provided url, which requests the e-mail from the user, and
>   that the user transcribe a random 4-letters message shown as an image to
>   a text box. These requirements are used to ensure, as long as possible,
>   the autenticity of the data and to avoid automated signups

Users are required to 1) want to participate to this survey 2) asked when 
system should run information grab 3) go to URL to subscribe to service 4) 
get magic key from server 5) set up client system 6) check it runs well.

We don't have many users and setup is very complicated to my taste for 
somethng that brings nothing to me as a gentoo user. And we want people to 
sue this. the more, the better.
I don't know about this, but as a gentoo user, if a system like this were 
available I would not bother installing it. It is way too lenghty and I get 
nothing out of it as an individual.

I propose making this whole process a lot simpler for the client. What we 
must keep in mind is that no system is perfect, and to not fall into 
paranoia. I therefore propose shortening the setup of this survey system to 
something smaller.

1) user required to emerge package.
2) they are asked when the collect should run

and that's it

now how to keep people from abusing of this system is a whole new question 
and I think we should treat it separately. However I'd like to propose 
something as well.

it's the server's duty to protect itself from idiots. When client connects 
to server to upload it's information file, the server sends the client a 
unique key that expires after 1 week or couple days.. depends on how often 
we want input. If client tries to send input again it could remove the key 
file of course and claim it's new to the service, that's why the submitter's 
IP address needs to be recorded for first-time users as well.

Of course system  is not perfect... the idiot could change his IP address of 
course no problemo ... he could disconnect/reconnectto his ISP or something 
similar but that would be rael stupid. I don't think that many people would 
actually attempt that.

I think that the person who would attempt this, if it's ever going to 
happen, it's because our user base has grown very, very large and his impact 
would be minimal to our system.


This is just an idea.. i'm sure there are better...



>- The server-side system will e-mail the user with a key, which must be
>   placed in the config file
>- To post the information to the server-side system, the client-side system
>   can use the proxy settings defined on /etc/make.conf
>- In the first set of data the server-side system receives, it will e-mail
>   a message to the user to let him know the system is running fine
>
>Note that it is not guaranteed the system will have internet connectivity
>when it gets run. In this case, it may keep periodically checking in the
>background for a route to the server.
>
>The following vars can be set on the config file:
>
>- Key: as discussed above
>- Acknowlege flag: send an e-mail to the user every time a set of data from
>   him is processed (defaults to false)
>_______________________________________________
>gentoo-dev mailing list
>gentoo-dev@gentoo.org
>http://lists.gentoo.org/mailman/listinfo/gentoo-dev




_________________________________________________________________
MSN Photos is the easiest way to share and print your photos: 
http://photos.msn.com/support/worldwide.aspx



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-dev] RFP: System to account users configurations
  2002-06-16 21:12 Faust Tanasescu
@ 2002-06-16 23:11 ` Rufiao
  2002-06-18 10:37   ` George Shapovalov
  0 siblings, 1 reply; 6+ messages in thread
From: Rufiao @ 2002-06-16 23:11 UTC (permalink / raw
  To: gentoo-dev

The abuse of this kind of system should be taken into account, since it may be quite easy for someone to create a bot (or whatever) capable of feeding the system with fake data, and by consequence destroy its reputation. 

However, I agree this issue should not complicate the system setup. There are problems with the approach I've described, in particular for users who maintain more than a couple of Gentoo boxes (it may be inconvenient even for people who run more than one machine, due to the fact it's necessary to have one key per machine). 

Debian's popularity-contest uses SMTP as its transport, both to avoid the need for constant internet connection and to have some means to ensure the identity of every contributing machine. I'm not sure SMTP can help on the identification of users at all, and it may complicate the setup even more for users who don't have local MTA spools set (and which want to participate but don't have constant connectivity), so I've discarded it. 

Also, using the machine's IP addresses as a measure of abuse (by investigating how many posts occur for a given address) may lead to bad results, since some users have more than one machine under a 1:n NAT.

In the end, it may be better to simply avoid the signup, and use some 'loose' approach, which is to ask the user's e-mail to be used just in the case of abuse detection (of course a 'bad' user could provide a fake e-mail address, but in this case, after the detection of abuse and a unsucessful attempt to contact the user, all his provided data can be set to be automatically rejected by the server-side system).

But it may happen there's a better approach for this whole problem.. Any thoughts?

On Sun, 16 Jun 2002 17:12:52 -0400
"Faust Tanasescu" <faust_tanasescu@hotmail.com> wrote:

> >From: Rufiao <rufiao@gmx.net>
> >Reply-To: gentoo-dev@gentoo.org
> >To: gentoo-dev@gentoo.org
> >Subject: [gentoo-dev] RFP: System to account users configurations
> >Date: Sun, 16 Jun 2002 17:16:21 -0300
[...]
> >
> >In the client side, the procedure to provide data for the system is the
> >following:
> >
> >- User emerge the package, which:
> >   - Sets a crontab entry to let the system run periodically, possibly
> >     requiring user intervention to specify when the system should run
> >   - Points to an URL (in the gentoo.org domain) for signup
> >- User go to the provided url, which requests the e-mail from the user, and
> >   that the user transcribe a random 4-letters message shown as an image to
> >   a text box. These requirements are used to ensure, as long as possible,
> >   the autenticity of the data and to avoid automated signups
> 
> Users are required to 1) want to participate to this survey 2) asked when 
> system should run information grab 3) go to URL to subscribe to service 4) 
> get magic key from server 5) set up client system 6) check it runs well.
> 
> We don't have many users and setup is very complicated to my taste for 
> somethng that brings nothing to me as a gentoo user. And we want people to 
> sue this. the more, the better.
> I don't know about this, but as a gentoo user, if a system like this were 
> available I would not bother installing it. It is way too lenghty and I get 
> nothing out of it as an individual.
> 
> I propose making this whole process a lot simpler for the client. What we 
> must keep in mind is that no system is perfect, and to not fall into 
> paranoia. I therefore propose shortening the setup of this survey system to 
> something smaller.
> 
> 1) user required to emerge package.
> 2) they are asked when the collect should run
> 
> and that's it
> 
> now how to keep people from abusing of this system is a whole new question 
> and I think we should treat it separately. However I'd like to propose 
> something as well.
> 
> it's the server's duty to protect itself from idiots. When client connects 
> to server to upload it's information file, the server sends the client a 
> unique key that expires after 1 week or couple days.. depends on how often 
> we want input. If client tries to send input again it could remove the key 
> file of course and claim it's new to the service, that's why the submitter's 
> IP address needs to be recorded for first-time users as well.
> 
> Of course system  is not perfect... the idiot could change his IP address of 
> course no problemo ... he could disconnect/reconnectto his ISP or something 
> similar but that would be rael stupid. I don't think that many people would 
> actually attempt that.
> 
> I think that the person who would attempt this, if it's ever going to 
> happen, it's because our user base has grown very, very large and his impact 
> would be minimal to our system.
> 
> 
> This is just an idea.. i'm sure there are better...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-dev] RFP: System to account users configurations
@ 2002-06-17  0:01 Faust Tanasescu
  2002-06-17  0:12 ` Rufiao
  0 siblings, 1 reply; 6+ messages in thread
From: Faust Tanasescu @ 2002-06-17  0:01 UTC (permalink / raw
  To: gentoo-dev

I'm thinking of lots of glue, a perl script for client and https server on 
gentoo.org to allow SSL (secure socket layer) communication between  
client/server. It's a fresh approach to solve just this problem... Well 
fresh is relative here ;)

Here's a link
http://developer.netscape.com/docs/manuals/security/sslin/contents.htm



>From: Rufiao <rufiao@gmx.net>
>Reply-To: gentoo-dev@gentoo.org
>To: gentoo-dev@gentoo.org
>Subject: Re: [gentoo-dev] RFP: System to account users configurations
>Date: Sun, 16 Jun 2002 20:11:37 -0300
>
>
>The abuse of this kind of system should be taken into account, since it may 
>be quite easy for someone to create a bot (or whatever) capable of feeding 
>the system with fake data, and by consequence destroy its reputation.
>
>However, I agree this issue should not complicate the system setup. There 
>are problems with the approach I've described, in particular for users who 
>maintain more than a couple of Gentoo boxes (it may be inconvenient even 
>for people who run more than one machine, due to the fact it's necessary to 
>have one key per machine).
>
>Debian's popularity-contest uses SMTP as its transport, both to avoid the 
>need for constant internet connection and to have some means to ensure the 
>identity of every contributing machine. I'm not sure SMTP can help on the 
>identification of users at all, and it may complicate the setup even more 
>for users who don't have local MTA spools set (and which want to 
>participate but don't have constant connectivity), so I've discarded it.
>
>Also, using the machine's IP addresses as a measure of abuse (by 
>investigating how many posts occur for a given address) may lead to bad 
>results, since some users have more than one machine under a 1:n NAT.
>
>In the end, it may be better to simply avoid the signup, and use some 
>'loose' approach, which is to ask the user's e-mail to be used just in the 
>case of abuse detection (of course a 'bad' user could provide a fake e-mail 
>address, but in this case, after the detection of abuse and a unsucessful 
>attempt to contact the user, all his provided data can be set to be 
>automatically rejected by the server-side system).
>
>But it may happen there's a better approach for this whole problem.. Any 
>thoughts?
>
>On Sun, 16 Jun 2002 17:12:52 -0400
>"Faust Tanasescu" <faust_tanasescu@hotmail.com> wrote:
>
> > >From: Rufiao <rufiao@gmx.net>
> > >Reply-To: gentoo-dev@gentoo.org
> > >To: gentoo-dev@gentoo.org
> > >Subject: [gentoo-dev] RFP: System to account users configurations
> > >Date: Sun, 16 Jun 2002 17:16:21 -0300
>[...]
> > >
> > >In the client side, the procedure to provide data for the system is the
> > >following:
> > >
> > >- User emerge the package, which:
> > >   - Sets a crontab entry to let the system run periodically, possibly
> > >     requiring user intervention to specify when the system should run
> > >   - Points to an URL (in the gentoo.org domain) for signup
> > >- User go to the provided url, which requests the e-mail from the user, 
>and
> > >   that the user transcribe a random 4-letters message shown as an 
>image to
> > >   a text box. These requirements are used to ensure, as long as 
>possible,
> > >   the autenticity of the data and to avoid automated signups
> >
> > Users are required to 1) want to participate to this survey 2) asked 
>when
> > system should run information grab 3) go to URL to subscribe to service 
>4)
> > get magic key from server 5) set up client system 6) check it runs well.
> >
> > We don't have many users and setup is very complicated to my taste for
> > somethng that brings nothing to me as a gentoo user. And we want people 
>to
> > sue this. the more, the better.
> > I don't know about this, but as a gentoo user, if a system like this 
>were
> > available I would not bother installing it. It is way too lenghty and I 
>get
> > nothing out of it as an individual.
> >
> > I propose making this whole process a lot simpler for the client. What 
>we
> > must keep in mind is that no system is perfect, and to not fall into
> > paranoia. I therefore propose shortening the setup of this survey system 
>to
> > something smaller.
> >
> > 1) user required to emerge package.
> > 2) they are asked when the collect should run
> >
> > and that's it
> >
> > now how to keep people from abusing of this system is a whole new 
>question
> > and I think we should treat it separately. However I'd like to propose
> > something as well.
> >
> > it's the server's duty to protect itself from idiots. When client 
>connects
> > to server to upload it's information file, the server sends the client a
> > unique key that expires after 1 week or couple days.. depends on how 
>often
> > we want input. If client tries to send input again it could remove the 
>key
> > file of course and claim it's new to the service, that's why the 
>submitter's
> > IP address needs to be recorded for first-time users as well.
> >
> > Of course system  is not perfect... the idiot could change his IP 
>address of
> > course no problemo ... he could disconnect/reconnectto his ISP or 
>something
> > similar but that would be rael stupid. I don't think that many people 
>would
> > actually attempt that.
> >
> > I think that the person who would attempt this, if it's ever going to
> > happen, it's because our user base has grown very, very large and his 
>impact
> > would be minimal to our system.
> >
> >
> > This is just an idea.. i'm sure there are better...
>_______________________________________________
>gentoo-dev mailing list
>gentoo-dev@gentoo.org
>http://lists.gentoo.org/mailman/listinfo/gentoo-dev




_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-dev] RFP: System to account users configurations
  2002-06-17  0:01 Faust Tanasescu
@ 2002-06-17  0:12 ` Rufiao
  0 siblings, 0 replies; 6+ messages in thread
From: Rufiao @ 2002-06-17  0:12 UTC (permalink / raw
  To: gentoo-dev

Using https is not a big deal, but how would it help on this problem?

On Sun, 16 Jun 2002 20:01:20 -0400
"Faust Tanasescu" <faust_tanasescu@hotmail.com> wrote:

> I'm thinking of lots of glue, a perl script for client and https server on 
> gentoo.org to allow SSL (secure socket layer) communication between  
> client/server. It's a fresh approach to solve just this problem... Well 
> fresh is relative here ;)
> 
> Here's a link
> http://developer.netscape.com/docs/manuals/security/sslin/contents.htm


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [gentoo-dev] RFP: System to account users configurations
  2002-06-16 23:11 ` Rufiao
@ 2002-06-18 10:37   ` George Shapovalov
  0 siblings, 0 replies; 6+ messages in thread
From: George Shapovalov @ 2002-06-18 10:37 UTC (permalink / raw
  To: gentoo-dev

Hi guys.

Nice to see voting/user feedback discussed here!
I have spent some time few month ago thinking about similar issue. I would 
like to point out this link
http://www.its.caltech.edu/~georges/gentoo/epsp/vote0.html
where I present a few thoughts about possible voting system.
I should immediately note, that I was "designing" (well, "0" in the file name 
should give you an idea about its status :)) that system for a very specific 
purpose - to enhance quality control over ebuilds by allowing all users to 
cast their votes indicating ebuild stability and (optionally) popularity. 
Accumulation of additional information may provide nice statistics. However I 
would like to add one "feature request" to Rufiao's proposal. Namely the 
ability to configure what kinds of information are collected and sent 
upstream.

As I see there are two possible approaches towards design of voting system:
1. active system - passive users
2. passive system - active users

In reality any  reasonable voting system should include elements of both, the 
question is more about proportions :). In that text I was leaning more 
towards second option. You will find few arguments behind my thinking. 
Nonetheless the system I was in the end describing seems to be very similar 
to the one proposed by Rufiao :), including concerns about use of ips for 
identification and requirement to register. Though overall procedure looks a 
bit simplier.

Along the same lines there are two "boundary" positions WRT how much 
information is collected and processed.
1. Collect info about individual systems in central location and use that to 
build statistics. (pretty much necessary for 1st approach)
2. Only keep statistical info centrally and update it when user votes. May 
play well with 2nd approach if done correctly.

As was pointed out second position raises abuse concerns. However I would 
still prefer such approach if some care could turn up a reasonably secure 
model. At least it would be worth to try that as a first implementation, as 
it is much easier on resources and implementation.

Sorry about this rough posting, just wanted to bring up that link in case you 
will be able find anything usefull there :). I will try to get back to this 
topic and may be write something more detailed :).

George


On Sunday 16 June 2002 16:11, Rufiao wrote:
> The abuse of this kind of system should be taken into account, since it may
> be quite easy for someone to create a bot (or whatever) capable of feeding
> the system with fake data, and by consequence destroy its reputation.
>
> However, I agree this issue should not complicate the system setup. There
> are problems with the approach I've described, in particular for users who
> maintain more than a couple of Gentoo boxes (it may be inconvenient even
> for people who run more than one machine, due to the fact it's necessary to
> have one key per machine).
>
> Debian's popularity-contest uses SMTP as its transport, both to avoid the
> need for constant internet connection and to have some means to ensure the
> identity of every contributing machine. I'm not sure SMTP can help on the
> identification of users at all, and it may complicate the setup even more
> for users who don't have local MTA spools set (and which want to
> participate but don't have constant connectivity), so I've discarded it.
>
> Also, using the machine's IP addresses as a measure of abuse (by
> investigating how many posts occur for a given address) may lead to bad
> results, since some users have more than one machine under a 1:n NAT.
>
> In the end, it may be better to simply avoid the signup, and use some
> 'loose' approach, which is to ask the user's e-mail to be used just in the
> case of abuse detection (of course a 'bad' user could provide a fake e-mail
> address, but in this case, after the detection of abuse and a unsucessful
> attempt to contact the user, all his provided data can be set to be
> automatically rejected by the server-side system).
>
> But it may happen there's a better approach for this whole problem.. Any
> thoughts?
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-06-18 10:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-16 20:16 [gentoo-dev] RFP: System to account users configurations Rufiao
  -- strict thread matches above, loose matches on Subject: below --
2002-06-16 21:12 Faust Tanasescu
2002-06-16 23:11 ` Rufiao
2002-06-18 10:37   ` George Shapovalov
2002-06-17  0:01 Faust Tanasescu
2002-06-17  0:12 ` Rufiao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox