public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] Project Grumpy - weekly report #1
@ 2010-06-01  7:10 Priit Laes
  2010-06-01  7:16 ` Domen Kožar
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Priit Laes @ 2010-06-01  7:10 UTC (permalink / raw
  To: gentoo-soc; +Cc: leio, ferringb

This is a weekly progress report no. 1 for Project Grumpy.

As this is the first publicly visible announcement, I am also going to
give a short overview about the project itself.

The aim of this project is to create a database containing various 
developer-related metadata about packages in the Gentoo portage.
Metadata that we are going to store can be used for different kinds of
purposes, some examples include upstream version checks and giving
notifications to developers who are interested about that package. And
eventually provide a nice web and API interface to access this data.

Project's semi-official IRC channel is #gentoo-grumpy on Freenode
network. Just step in say "Hi!" :)

Last week's progress report
===========================

My first week went a bit slowly due to having some "unfinished business"
that I needed to finish, and also because of two exams (which went
fine).

The core issue I wrestled during this week was how to keep portage
contents and database contents in sync - ie. when ebuild is modified,
removed or added, how to make sure that database contents correspond to
the portage contents.

The solution that I came up with is to use a simple daemon that logs
changes to portage tree and modifies database contents when it's
appropriate. Appropriate here means that we shouldn't log updates during
the update of the tree as it might be unsafe (ie package rename). So
currently it seems that daemon has also initiate the rsync progress and
push the updates into database after rsync has finished successfully.
(You can already see how all kinds of weird corner cases start popping
up :P )

My current approach to logging is using the inotify [1] framework
present in Linux kernel since 2.6.13 (sorry BSD users, but this is
Gentoo Linux afterall) with the help of pyinotify [2].
So far there's only one drawback to using inotify - by default kernel
has a limit of 8192 directory watches allowed per-process (but portage
contains a lots of directories) so in order to use that approach one has
to bump the number watches using /proc/sys/fs/inotify/max_user_watches
tunable. 81920 has worked so far fine on my machine ;)

There was also a secondary approach suggested by my mentor Leio to parse
rsync log files, but I am a bit relucant about this idea.

Anyway, I'll leave this idea simmering here for a while and unless
someone comes up with a better idea (Yes, I have also thought about
scanning whole portage tree every x-hours), I'm going to implement the
daemon.

Plans for current week 
======================

As I currently consider the core issue solved, the next issue I have to
solve is how to take an ebuild, extract information about it and store
it in database. (Hint: pkgcore)

I'm not going take bigger tasks because I still have one quite hard exam
(thermodynamics and statistical physics) on 4th of June. And if I pass,
it is the last one.

PS. Sorry, no blog yet. I was using Zine, but it broke after I updated 
my system to SQLAlchemy-0.6.

[1] http://en.wikipedia.org/wiki/Inotify
[2] http://trac.dbzteam.org/pyinotify

Päikest,
Priit Laes :)



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-01 10:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-01  7:10 [gentoo-soc] Project Grumpy - weekly report #1 Priit Laes
2010-06-01  7:16 ` Domen Kožar
2010-06-01  7:19 ` Domen Kožar
2010-06-01  8:11 ` Arun Raghavan
2010-06-01 10:16   ` Nirbheek Chauhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox