public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] Re: New idea: network eclean
@ 2010-03-19 15:09 Brian Dolbec
  2010-03-19 20:28 ` Dmitry Bashkatov
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Dolbec @ 2010-03-19 15:09 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 642 bytes --]

>Dmitry Bashkatov
>
>Yes, I have already wrote a script. But it will be better to include
>this functionality to eclean.

I am just completing a modular re-write of eclean.  It will be included
in the new gentoolkit-0.3.0_rc10 release coming out any time now.

There is now a place in the code that will accept external functions for
additional checks to determine if the file is to be cleaned or not.  The
only thing missing is a possible configure/command line option to pass
them in for the search.

I would be interested in your script to see how it may fit with the new
eclean.
-- 
Brian Dolbec <brian.dolbec@gmail.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-soc] Re: New idea: network eclean
  2010-03-19 15:09 [gentoo-soc] Re: New idea: network eclean Brian Dolbec
@ 2010-03-19 20:28 ` Dmitry Bashkatov
  2010-03-19 20:29   ` Dmitry Bashkatov
  2010-03-20 18:07   ` Brian Dolbec
  0 siblings, 2 replies; 7+ messages in thread
From: Dmitry Bashkatov @ 2010-03-19 20:28 UTC (permalink / raw
  To: gentoo-soc

2010/3/19 Brian Dolbec <brian.dolbec@gmail.com>:
>>Dmitry Bashkatov
>>
>>Yes, I have already wrote a script. But it will be better to include
>>this functionality to eclean.
>
> I am just completing a modular re-write of eclean.  It will be included
> in the new gentoolkit-0.3.0_rc10 release coming out any time now.
>
> There is now a place in the code that will accept external functions for
> additional checks to determine if the file is to be cleaned or not.  The
> only thing missing is a possible configure/command line option to pass
> them in for the search.
>
> I would be interested in your script to see how it may fit with the new
> eclean.
> --
> Brian Dolbec <brian.dolbec@gmail.com>
>

Here it is! File /etc/enetclean.hosts must contain all hosts, which
share distfiles. Format is <host> or <user>@<host>. Script doesn't
delete anything. It just prints out files that can be deleted. To
delete use "enetclean | xargs rm -f".
There is unhandled situation when distfiles located in another
location than default /usr/portage/distfiles. Also it doesn't handle
any errors such as failed connection. But anyway I think this script
will be completely rewritten.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-soc] Re: New idea: network eclean
  2010-03-19 20:28 ` Dmitry Bashkatov
@ 2010-03-19 20:29   ` Dmitry Bashkatov
  2010-03-20 18:07   ` Brian Dolbec
  1 sibling, 0 replies; 7+ messages in thread
From: Dmitry Bashkatov @ 2010-03-19 20:29 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 14 bytes --]

and script =)

[-- Attachment #2: enetclean.tar.gz --]
[-- Type: application/x-gzip, Size: 447 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-soc] Re: New idea: network eclean
  2010-03-19 20:28 ` Dmitry Bashkatov
  2010-03-19 20:29   ` Dmitry Bashkatov
@ 2010-03-20 18:07   ` Brian Dolbec
  2010-03-21 19:10     ` Dmitry Bashkatov
  1 sibling, 1 reply; 7+ messages in thread
From: Brian Dolbec @ 2010-03-20 18:07 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 2528 bytes --]

On Fri, 2010-03-19 at 23:28 +0300, Dmitry Bashkatov wrote:
> 2010/3/19 Brian Dolbec <brian.dolbec@gmail.com>:
> >>Dmitry Bashkatov
> >>
> >>Yes, I have already wrote a script. But it will be better to include
> >>this functionality to eclean.
> >
> > I am just completing a modular re-write of eclean.  It will be included
> > in the new gentoolkit-0.3.0_rc10 release coming out any time now.
> >
> > There is now a place in the code that will accept external functions for
> > additional checks to determine if the file is to be cleaned or not.  The
> > only thing missing is a possible configure/command line option to pass
> > them in for the search.
> >
> > I would be interested in your script to see how it may fit with the new
> > eclean.
> > --
> > Brian Dolbec <brian.dolbec@gmail.com>
> >
> 
> Here it is! File /etc/enetclean.hosts must contain all hosts, which
> share distfiles. Format is <host> or <user>@<host>. Script doesn't
> delete anything. It just prints out files that can be deleted. To
> delete use "enetclean | xargs rm -f".
> There is unhandled situation when distfiles located in another
> location than default /usr/portage/distfiles. Also it doesn't handle
> any errors such as failed connection. But anyway I think this script
> will be completely rewritten.
> 

Dmitry,  eclean is written in python same as portage.  For it to
integrate into eclean it too would also need to be written in python.
Unfortunately your script would not fit. 

There are 2 pieces of information that would be required from the client
pc's  the installed pkg list and the exclude file.  The exclude file
being a debatable one.  Since portage now needs a min. of python-2.6 it
should be possible for the network eclean module to import portage from
the client machine and obtain the installed pkg list.  From there it
would be added to a global installed pkgs list that would then remove
any source files they claim to own.  I think all other info and checks
would be handled by the eclean app running on the server.

As for the exclude file it may need to be transferred and parsed to
accumulate the results (trickier, due to possible conflicts).
alternatively that control might be better left controlled only on the
server. 

 Eclean would not need to installed on the client machines at all.

P.S.  eclean assumes all files dirty and in need of cleaning unless
proven otherwise (due to the dynamic nature of the tree). 
-- 
Brian Dolbec <brian.dolbec@gmail.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-soc] Re: New idea: network eclean
  2010-03-20 18:07   ` Brian Dolbec
@ 2010-03-21 19:10     ` Dmitry Bashkatov
  2010-03-21 23:59       ` Brian Dolbec
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Bashkatov @ 2010-03-21 19:10 UTC (permalink / raw
  To: gentoo-soc

2010/3/20 Brian Dolbec <brian.dolbec@gmail.com>:
> Dmitry,  eclean is written in python same as portage.  For it to
> integrate into eclean it too would also need to be written in python.
> Unfortunately your script would not fit.
>
> There are 2 pieces of information that would be required from the client
> pc's  the installed pkg list and the exclude file.  The exclude file
> being a debatable one.  Since portage now needs a min. of python-2.6 it
> should be possible for the network eclean module to import portage from
> the client machine and obtain the installed pkg list.  From there it
> would be added to a global installed pkgs list that would then remove
> any source files they claim to own.  I think all other info and checks
> would be handled by the eclean app running on the server.
>
> As for the exclude file it may need to be transferred and parsed to
> accumulate the results (trickier, due to possible conflicts).
> alternatively that control might be better left controlled only on the
> server.
>
>  Eclean would not need to installed on the client machines at all.
>
> P.S.  eclean assumes all files dirty and in need of cleaning unless
> proven otherwise (due to the dynamic nature of the tree).
> --
> Brian Dolbec <brian.dolbec@gmail.com>
>

Thanks for explanation, Brian. I was misleaded that you interested in
any working solution including bash script.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-soc] Re: New idea: network eclean
  2010-03-21 19:10     ` Dmitry Bashkatov
@ 2010-03-21 23:59       ` Brian Dolbec
  2010-03-22 20:07         ` Dmitry Bashkatov
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Dolbec @ 2010-03-21 23:59 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 6195 bytes --]

On Sun, 2010-03-21 at 22:10 +0300, Dmitry Bashkatov wrote:
> 2010/3/20 Brian Dolbec <brian.dolbec@gmail.com>:
> > Dmitry,  eclean is written in python same as portage.  For it to
> > integrate into eclean it too would also need to be written in python.
> > Unfortunately your script would not fit.
> >
> > There are 2 pieces of information that would be required from the client
> > pc's  the installed pkg list and the exclude file.  The exclude file
> > being a debatable one.  Since portage now needs a min. of python-2.6 it
> > should be possible for the network eclean module to import portage from
> > the client machine and obtain the installed pkg list.  From there it
> > would be added to a global installed pkgs list that would then remove
> > any source files they claim to own.  I think all other info and checks
> > would be handled by the eclean app running on the server.
> >
> > As for the exclude file it may need to be transferred and parsed to
> > accumulate the results (trickier, due to possible conflicts).
> > alternatively that control might be better left controlled only on the
> > server.
> >
> >  Eclean would not need to installed on the client machines at all.
> >
> > P.S.  eclean assumes all files dirty and in need of cleaning unless
> > proven otherwise (due to the dynamic nature of the tree).
> > --
> > Brian Dolbec <brian.dolbec@gmail.com>
> >
> 
> Thanks for explanation, Brian. I was misleaded that you interested in
> any working solution including bash script.
> 
Dmitry: I was not misleading you, (I think you meant misunderstood), and
yes I was interested in your script.  You had not stated it was a bash
script, so could have been a python script.  I needed to look at it to
see what/how/if it might be useable.  I do not consider myself to be an
expert, so I welcome other ideas, methods.

I am interested in working with you if you would like to work on it, in
a way that would be integrated with the current python code base.  But
also as it was stated, this is too small a job to be used as a soc
project.  I think it would take about 2 days tops including a few hours
of initial coding, lots of testing debugging, creating unit tests,
updating man pages, ...

I found that your script along with Nirbheek's idea of running eclean on
each machine and then finding the common files, is a poor, although
simple way of doing it.  The reason I think it is poor is that since the
distfiles are NFS shared, and that each instance of eclean accesses
those files, it's an unnecessary use of resources for files that are
largely common to all/nearly all systems. The other thing is that a
large part of the search for files to clean means accessing the portage
tree to obtain the source file names for installed packages by using
portage function calls.  The tree is also most likely being shared,
which again unnecessarily uses resources for those pkgs and versions in
common.  Now imagine an install with 100 clients using the same
distfiles and portage tree server all doing that for 1,000 installed
ebuilds.  It would be a tremendous waste of resources, not to mention a
huge increase in run time.

I have been reviewing the newly re-written code again and my original
place/method of adding network support is not the best way to do it.  It
is still easy to modify the correct location in the code with a slightly
different approach.  First the currently available eclean versions were
flawed in that if an ebuild version or complete pkg was deleted from the
tree, it did not check the installed pkg db for the "SRC_URI" in order
to match up the source filename(s) to an installed pkg.  It would
therefore delete installed pkg sources if that ebuild was no longer in
the tree.  That has been fixed in my re-write version.  So to continue
that methodology, it must be assumed that at worst case the NFS
distfiles and portage tree server was a minimal server system and most
of the installed pkgs sources are used in the clients (not in the
server). So there are 2 key pieces of data needed from the imported
portage instances from each client, making the tasks to be:

1) get and accumulate the installed pkg list via the
vardb.dbapi.cpv_all() for each client

2) after accumulating a complete list, it then determines which pkgs are
unigue to which clients and tasks them to retrieve the "SRC_URI" and
optionally the "RESTRICT" info from the clients installed db's.

3) depending on the number of clients decide how to split up the pkgs in
common and task each with a portion of cpv's for the
"SRC_URI","RESTRICT" info and accumulate them.

4) pass that info into the DistfilesSearch class and run the
findDistfiles() which will then determine the files to be cleaned and
continue with normal operation.

This would offload some or most of the portage system calls to the
clients and prevent any installed deprecated pkgs or versions sources
from being deleted.  It should also eliminate repeatedly doing the same
identical information look-ups on each machine.  The client machines
would not require eclean to be installed, and quite possibly even be
blocked from being installed.


As I previuosly stated, I think it would be best to only consider the
servers distfiles.exclude file.  If any files are to be protected in a
networked environment such as this, it should be done by an
administrator who is authorized to be running eclean.  If a client
system needed to protect some sources from being deleted then that
should be done on the server by an administrator.  I do not know this,
but I believe permissions would probably be set to prevent a client from
deleting files on the server supplying the shared distfiles.

I am open to thoughts and suggestions, so, if anyone sees any flaws in
my logic please speak up :)  Also I am not very experienced with NFS
shares and larger networked installations, nor do I have such a system
for thorough testing.  If someone has a small, yet large enough system
that could be used for proper testing ( -p, --pretend mode of course)
also speak up.
-- 
Brian Dolbec <brian.dolbec@gmail.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-soc] Re: New idea: network eclean
  2010-03-21 23:59       ` Brian Dolbec
@ 2010-03-22 20:07         ` Dmitry Bashkatov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Bashkatov @ 2010-03-22 20:07 UTC (permalink / raw
  To: gentoo-soc

2010/3/22 Brian Dolbec <brian.dolbec@gmail.com>:
> Dmitry: I was not misleading you, (I think you meant misunderstood), and
> yes I was interested in your script.
Yes, sorry, my english is bad for now. =)

> I am interested in working with you if you would like to work on it, in
> a way that would be integrated with the current python code base.
I love gentoo and I want to contribute in any way. Thanks for your offer.

But I aimed to participate in gsoc. Now I am diving in to program
projects, so have not any free time. If you want I can help you later.
It will be useful experience for me.

> I found that your script along with Nirbheek's idea of running eclean on
> each machine and then finding the common files, is a poor, although
> simple way of doing it.  The reason I think it is poor is that since the
> distfiles are NFS shared, and that each instance of eclean accesses
> those files, it's an unnecessary use of resources for files that are
> largely common to all/nearly all systems. The other thing is that a
> large part of the search for files to clean means accessing the portage
> tree to obtain the source file names for installed packages by using
> portage function calls.  The tree is also most likely being shared,
> which again unnecessarily uses resources for those pkgs and versions in
> common.  Now imagine an install with 100 clients using the same
> distfiles and portage tree server all doing that for 1,000 installed
> ebuilds.  It would be a tremendous waste of resources, not to mention a
> huge increase in run time.
Totally agree with you!

> If someone has a small, yet large enough system
> that could be used for proper testing ( -p, --pretend mode of course)
> also speak up.
I have such system and interested in testing.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-03-22 20:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-19 15:09 [gentoo-soc] Re: New idea: network eclean Brian Dolbec
2010-03-19 20:28 ` Dmitry Bashkatov
2010-03-19 20:29   ` Dmitry Bashkatov
2010-03-20 18:07   ` Brian Dolbec
2010-03-21 19:10     ` Dmitry Bashkatov
2010-03-21 23:59       ` Brian Dolbec
2010-03-22 20:07         ` Dmitry Bashkatov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox