* [gentoo-soc] Project Grumpy - weekly report #1
@ 2010-06-01 7:10 Priit Laes
2010-06-01 7:16 ` Domen Kožar
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Priit Laes @ 2010-06-01 7:10 UTC (permalink / raw
To: gentoo-soc; +Cc: leio, ferringb
This is a weekly progress report no. 1 for Project Grumpy.
As this is the first publicly visible announcement, I am also going to
give a short overview about the project itself.
The aim of this project is to create a database containing various
developer-related metadata about packages in the Gentoo portage.
Metadata that we are going to store can be used for different kinds of
purposes, some examples include upstream version checks and giving
notifications to developers who are interested about that package. And
eventually provide a nice web and API interface to access this data.
Project's semi-official IRC channel is #gentoo-grumpy on Freenode
network. Just step in say "Hi!" :)
Last week's progress report
===========================
My first week went a bit slowly due to having some "unfinished business"
that I needed to finish, and also because of two exams (which went
fine).
The core issue I wrestled during this week was how to keep portage
contents and database contents in sync - ie. when ebuild is modified,
removed or added, how to make sure that database contents correspond to
the portage contents.
The solution that I came up with is to use a simple daemon that logs
changes to portage tree and modifies database contents when it's
appropriate. Appropriate here means that we shouldn't log updates during
the update of the tree as it might be unsafe (ie package rename). So
currently it seems that daemon has also initiate the rsync progress and
push the updates into database after rsync has finished successfully.
(You can already see how all kinds of weird corner cases start popping
up :P )
My current approach to logging is using the inotify [1] framework
present in Linux kernel since 2.6.13 (sorry BSD users, but this is
Gentoo Linux afterall) with the help of pyinotify [2].
So far there's only one drawback to using inotify - by default kernel
has a limit of 8192 directory watches allowed per-process (but portage
contains a lots of directories) so in order to use that approach one has
to bump the number watches using /proc/sys/fs/inotify/max_user_watches
tunable. 81920 has worked so far fine on my machine ;)
There was also a secondary approach suggested by my mentor Leio to parse
rsync log files, but I am a bit relucant about this idea.
Anyway, I'll leave this idea simmering here for a while and unless
someone comes up with a better idea (Yes, I have also thought about
scanning whole portage tree every x-hours), I'm going to implement the
daemon.
Plans for current week
======================
As I currently consider the core issue solved, the next issue I have to
solve is how to take an ebuild, extract information about it and store
it in database. (Hint: pkgcore)
I'm not going take bigger tasks because I still have one quite hard exam
(thermodynamics and statistical physics) on 4th of June. And if I pass,
it is the last one.
PS. Sorry, no blog yet. I was using Zine, but it broke after I updated
my system to SQLAlchemy-0.6.
[1] http://en.wikipedia.org/wiki/Inotify
[2] http://trac.dbzteam.org/pyinotify
Päikest,
Priit Laes :)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-soc] Project Grumpy - weekly report #1
2010-06-01 7:10 [gentoo-soc] Project Grumpy - weekly report #1 Priit Laes
@ 2010-06-01 7:16 ` Domen Kožar
2010-06-01 7:19 ` Domen Kožar
2010-06-01 8:11 ` Arun Raghavan
2 siblings, 0 replies; 5+ messages in thread
From: Domen Kožar @ 2010-06-01 7:16 UTC (permalink / raw
To: gentoo-soc
[-- Attachment #1: Type: text/plain, Size: 3587 bytes --]
Hey, seems like a very intereseting project! Good luck (read below)
On Tue, 2010-06-01 at 10:10 +0300, Priit Laes wrote:
> This is a weekly progress report no. 1 for Project Grumpy.
>
> As this is the first publicly visible announcement, I am also going to
> give a short overview about the project itself.
>
> The aim of this project is to create a database containing various
> developer-related metadata about packages in the Gentoo portage.
> Metadata that we are going to store can be used for different kinds of
> purposes, some examples include upstream version checks and giving
> notifications to developers who are interested about that package. And
> eventually provide a nice web and API interface to access this data.
>
> Project's semi-official IRC channel is #gentoo-grumpy on Freenode
> network. Just step in say "Hi!" :)
>
> Last week's progress report
> ===========================
>
> My first week went a bit slowly due to having some "unfinished business"
> that I needed to finish, and also because of two exams (which went
> fine).
>
> The core issue I wrestled during this week was how to keep portage
> contents and database contents in sync - ie. when ebuild is modified,
> removed or added, how to make sure that database contents correspond to
> the portage contents.
>
> The solution that I came up with is to use a simple daemon that logs
> changes to portage tree and modifies database contents when it's
> appropriate. Appropriate here means that we shouldn't log updates during
> the update of the tree as it might be unsafe (ie package rename). So
> currently it seems that daemon has also initiate the rsync progress and
> push the updates into database after rsync has finished successfully.
> (You can already see how all kinds of weird corner cases start popping
> up :P )
>
> My current approach to logging is using the inotify [1] framework
> present in Linux kernel since 2.6.13 (sorry BSD users, but this is
> Gentoo Linux afterall) with the help of pyinotify [2].
> So far there's only one drawback to using inotify - by default kernel
> has a limit of 8192 directory watches allowed per-process (but portage
> contains a lots of directories) so in order to use that approach one has
> to bump the number watches using /proc/sys/fs/inotify/max_user_watches
> tunable. 81920 has worked so far fine on my machine ;)
>
> There was also a secondary approach suggested by my mentor Leio to parse
> rsync log files, but I am a bit relucant about this idea.
>
> Anyway, I'll leave this idea simmering here for a while and unless
> someone comes up with a better idea (Yes, I have also thought about
> scanning whole portage tree every x-hours), I'm going to implement the
> daemon.
>
> Plans for current week
> ======================
>
> As I currently consider the core issue solved, the next issue I have to
> solve is how to take an ebuild, extract information about it and store
> it in database. (Hint: pkgcore)
In what language are you writing the code?
>
> I'm not going take bigger tasks because I still have one quite hard exam
> (thermodynamics and statistical physics) on 4th of June. And if I pass,
> it is the last one.
>
> PS. Sorry, no blog yet. I was using Zine, but it broke after I updated
> my system to SQLAlchemy-0.6.
I suggest you use buildout for zine deploy, it is somehow good
alternative to virtualenv
>
> [1] http://en.wikipedia.org/wiki/Inotify
> [2] http://trac.dbzteam.org/pyinotify
>
> Päikest,
> Priit Laes :)
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-soc] Project Grumpy - weekly report #1
2010-06-01 7:10 [gentoo-soc] Project Grumpy - weekly report #1 Priit Laes
2010-06-01 7:16 ` Domen Kožar
@ 2010-06-01 7:19 ` Domen Kožar
2010-06-01 8:11 ` Arun Raghavan
2 siblings, 0 replies; 5+ messages in thread
From: Domen Kožar @ 2010-06-01 7:19 UTC (permalink / raw
To: gentoo-soc
[-- Attachment #1: Type: text/plain, Size: 3418 bytes --]
Also, good luck with thermodynamics;)
On Tue, 2010-06-01 at 10:10 +0300, Priit Laes wrote:
> This is a weekly progress report no. 1 for Project Grumpy.
>
> As this is the first publicly visible announcement, I am also going to
> give a short overview about the project itself.
>
> The aim of this project is to create a database containing various
> developer-related metadata about packages in the Gentoo portage.
> Metadata that we are going to store can be used for different kinds of
> purposes, some examples include upstream version checks and giving
> notifications to developers who are interested about that package. And
> eventually provide a nice web and API interface to access this data.
>
> Project's semi-official IRC channel is #gentoo-grumpy on Freenode
> network. Just step in say "Hi!" :)
>
> Last week's progress report
> ===========================
>
> My first week went a bit slowly due to having some "unfinished business"
> that I needed to finish, and also because of two exams (which went
> fine).
>
> The core issue I wrestled during this week was how to keep portage
> contents and database contents in sync - ie. when ebuild is modified,
> removed or added, how to make sure that database contents correspond to
> the portage contents.
>
> The solution that I came up with is to use a simple daemon that logs
> changes to portage tree and modifies database contents when it's
> appropriate. Appropriate here means that we shouldn't log updates during
> the update of the tree as it might be unsafe (ie package rename). So
> currently it seems that daemon has also initiate the rsync progress and
> push the updates into database after rsync has finished successfully.
> (You can already see how all kinds of weird corner cases start popping
> up :P )
>
> My current approach to logging is using the inotify [1] framework
> present in Linux kernel since 2.6.13 (sorry BSD users, but this is
> Gentoo Linux afterall) with the help of pyinotify [2].
> So far there's only one drawback to using inotify - by default kernel
> has a limit of 8192 directory watches allowed per-process (but portage
> contains a lots of directories) so in order to use that approach one has
> to bump the number watches using /proc/sys/fs/inotify/max_user_watches
> tunable. 81920 has worked so far fine on my machine ;)
>
> There was also a secondary approach suggested by my mentor Leio to parse
> rsync log files, but I am a bit relucant about this idea.
>
> Anyway, I'll leave this idea simmering here for a while and unless
> someone comes up with a better idea (Yes, I have also thought about
> scanning whole portage tree every x-hours), I'm going to implement the
> daemon.
>
> Plans for current week
> ======================
>
> As I currently consider the core issue solved, the next issue I have to
> solve is how to take an ebuild, extract information about it and store
> it in database. (Hint: pkgcore)
>
> I'm not going take bigger tasks because I still have one quite hard exam
> (thermodynamics and statistical physics) on 4th of June. And if I pass,
> it is the last one.
>
> PS. Sorry, no blog yet. I was using Zine, but it broke after I updated
> my system to SQLAlchemy-0.6.
>
> [1] http://en.wikipedia.org/wiki/Inotify
> [2] http://trac.dbzteam.org/pyinotify
>
> Päikest,
> Priit Laes :)
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-soc] Project Grumpy - weekly report #1
2010-06-01 7:10 [gentoo-soc] Project Grumpy - weekly report #1 Priit Laes
2010-06-01 7:16 ` Domen Kožar
2010-06-01 7:19 ` Domen Kožar
@ 2010-06-01 8:11 ` Arun Raghavan
2010-06-01 10:16 ` Nirbheek Chauhan
2 siblings, 1 reply; 5+ messages in thread
From: Arun Raghavan @ 2010-06-01 8:11 UTC (permalink / raw
To: gentoo-soc
On 1 June 2010 12:40, Priit Laes <plaes@plaes.org> wrote:
[...]
> Anyway, I'll leave this idea simmering here for a while and unless
> someone comes up with a better idea (Yes, I have also thought about
> scanning whole portage tree every x-hours), I'm going to implement the
> daemon.
My 2p - I don't see a major advantage of a monitoring daemon over a
periodic full-scan, or caching the mtimes. IMO that the latter options
are better on the grounds that you're not introducing an additional
entity (Occam's Razor, KISS, etc. etc.), and afaics there is no
significant gain of the daemon - inotify watches are not going to
imply immediate updates since your tree updates will be periodic
anyway.
--
Arun Raghavan
http://arunraghavan.net/
(Ford_Prefect | Gentoo) & (arunsr | GNOME)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-soc] Project Grumpy - weekly report #1
2010-06-01 8:11 ` Arun Raghavan
@ 2010-06-01 10:16 ` Nirbheek Chauhan
0 siblings, 0 replies; 5+ messages in thread
From: Nirbheek Chauhan @ 2010-06-01 10:16 UTC (permalink / raw
To: gentoo-soc
On Tue, Jun 1, 2010 at 1:41 PM, Arun Raghavan <arunissatan@gmail.com> wrote:
> On 1 June 2010 12:40, Priit Laes <plaes@plaes.org> wrote:
> [...]
>> Anyway, I'll leave this idea simmering here for a while and unless
>> someone comes up with a better idea (Yes, I have also thought about
>> scanning whole portage tree every x-hours), I'm going to implement the
>> daemon.
>
> My 2p - I don't see a major advantage of a monitoring daemon over a
> periodic full-scan, or caching the mtimes. IMO that the latter options
> are better on the grounds that you're not introducing an additional
> entity (Occam's Razor, KISS, etc. etc.), and afaics there is no
> significant gain of the daemon - inotify watches are not going to
> imply immediate updates since your tree updates will be periodic
> anyway.
>
I agree with Arun here, you can easily schedule the server to run the
grumpy update whenever an eix-sync (or cvs up) is done. No point
running a monitoring daemon when you know exactly when something will
be updated.
--
~Nirbheek Chauhan
Gentoo GNOME+Mozilla Team
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-06-01 10:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-01 7:10 [gentoo-soc] Project Grumpy - weekly report #1 Priit Laes
2010-06-01 7:16 ` Domen Kožar
2010-06-01 7:19 ` Domen Kožar
2010-06-01 8:11 ` Arun Raghavan
2010-06-01 10:16 ` Nirbheek Chauhan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox