public inbox for gentoo-qa@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-qa] [GSoC-status] Collagen - database schema and further changes
@ 2009-06-26 12:51 Stanislav Ochotnicky
  2009-07-03 15:41 ` [gentoo-qa] " Stanislav Ochotnicky
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-06-26 12:51 UTC (permalink / raw
  To: gentoo-qa; +Cc: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

So another (if a bit late) status update for Tree-wide collision 
checking and files database is coming.

I don't plan on having any major architectural changes from this point
on (I will update docs on soc.gentooexperimental.org during the
weekend). We have matchbox as master server and tinderboxes as compile
slave. Previously mentioned binary host is not yet implemented at all
since we want to get to actually compiling stuff as soon as possible and
speed is a bit down the list for now.

We have basic database model for storing information collected by
tinderboxes ready (doc/ddl.sql - it is a dump of postgresql database,
model is at gentooexperimental web).
There are few changes that are not included there yet, such as 
tinderbox slave table with information about them. There will definitely
be more changes to ddl as we go, but hopefully nothing major.

I hit a few minor issues with chroot for compilation creation. Whole
process goes like this:
(not chrooted yet)
 * We get information about use flags/dependencies etc for the package
 * Call external shell script to prepare chroot and mount proc and dev
 * chroot and call portage.doebuild(...)

Now the external shell script I created uses official stage file to
create base chroot, then rsyncs /usr/portage to chroot. From this point
on further customization of BASE chroot is possible. Issue is that we
need to have same version of portage in BASE_CHROOT as we have on
tinderbox, otherwise things can get really ugly. Chroot preparation
script will therefore see some changes. I am looking into options for
making sure that everything is set up correctly. One easy possiblity is
to manually change BASE_CHROOT after basic setup by script. Better
solution is to integrate catalyst into chroot creation. 


Now it's one big puzzle with one bit missing here, one bit missing
there. But it's slowly starting to come together. Fortunately I have
tried most things as small POCs and I am starting to see light at the
end of the tunnel (pretty far away but visible).


P.S. In case it's not so obvious, repository is here: 
        git://git.overlays.gentoo.org/proj/collagen.git

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
  2009-06-26 12:51 [gentoo-qa] [GSoC-status] Collagen - database schema and further changes Stanislav Ochotnicky
@ 2009-07-03 15:41 ` Stanislav Ochotnicky
  2009-07-09 21:36   ` Stanislav Ochotnicky
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-07-03 15:41 UTC (permalink / raw
  To: gentoo-qa; +Cc: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 2904 bytes --]


I decided to post my next status report as reply to my previous post, so
that no unnecessary threads are created. 


Over the course of last week bulk of my work was around installing
dependencies, logging and fixing bugs in chroot creation script.

Installing dependencies is a bit hacky right now, since I am using
emerge.emerge_main() to install them. This means that I don't have to
repeat work of emerge and search dep tree etc etc. 

I'll show this simple hack on following ascii-non-art package hierarchy:

 A->B1->C1
 |  | ->C2
 |
 |->B2->C3

Package A is the one we want to install, B1 and B2 are its dependencies
(we can read them from ebuild of package A). We could walk the
hierarchy, but it's may not be necessary since we were asked to only
try and compile package A. Therefore to install package B1 and B2 we
actually ask emerge (and it will resolve deps for us). Then we install
package A ourselves (by using portage.doebuild). If it fails then 
something is probably wrong in ebuild for package A.

Creating of chroot environment for package installation had a lot of
bugfixes too. It's still not as good as it should be and there is always 
need to manually "synchronize" internal version of emerge with that on
the outside of chroot. I now use a lot of -o bind for mounting
subdirectories in chroot, this is speeding up stuff quite a bit.

As far as logging is concened I am using standard logging python module,
nothing fancy. But it works, and compile machines can now report errors
in more human-readable form, not just build.log. For example:
"Unable to emerge package A-1.2.3 with deps B1-2.0,B2-2.0"

I also had to add data that are transferred between matchbox and
tinderboxes since I realized that otherwise I would not be able to fill
information that need to be present when inserting data into database.
This part is not finished yet, so tinderbox currently sends no data to
matchbox. This regression should be fixed today.

On that note, what I plan to work on during weekend/next week:
 * start inserting data into database (therefore actually create app->db
   layer)
 * now that we really have functioning compilation/dep resolving try to
   install more packages. Therefore create list of 10-20 packages.
   dev-util/git (and its subversion[-dso] dep), postfix/sendmail blockers
   come to mind. If you have more ideas for combinations that usually
   cause problems, I'd love some input
 * Improve logging in a few places



-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
  2009-07-03 15:41 ` [gentoo-qa] " Stanislav Ochotnicky
@ 2009-07-09 21:36   ` Stanislav Ochotnicky
  2009-07-17 13:39     ` Stanislav Ochotnicky
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-07-09 21:36 UTC (permalink / raw
  To: gentoo-qa; +Cc: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 3233 bytes --]

Heya everyone,

another (almost) week went by so here is another status report. 

As I stated in my last report one of key goals for this week was db
layer for storing information retrieved by tinderboxes. I was looking
into using various ORM frameworks. It was suggested to me to try Django
and I though "Hey, that's not even ORM framework, but a web framework".

Well one part of my project is creating web interface for database at
later stage. So in spirit of not doing same thing twice I looked into
using ORM part of Django. And guess what? I it doable, and basic
implementation is in devel branch of my repo. 

There were certain caveats of course. Django is designed to work
for web applications, not as general purpose ORM framework. So when
using its ORM part without rest of Django, I have to take care of
DB exceptions and rollback of transactions myself. I soon realized
I am doing same thing in every db function I was writing so I ended
up writing a decorator in Python (finally had a reason! :-) ). It
looks something like this:

--- CODE

def dbquery(f):
    def decor(*args, **kwargs):
        reset_queries() 
        try:
            return f(*args, **kwargs)
        except Exception, e:
            _rollback_on_exception() 
            raise e
    return decor

@dbquery
def add_package(...)

--- CODE

This way we can be sure that failed transactions are rolled back. 

Because I am using Django to generate SQL now, orignal database
schema that I commited to repository some time ago is now deprecated. We
can generate database (and initial data) by using django-admin syncdb
command now.

This approach seems fairly good so far since everything was set-up by
code that fits on one screen. I only wish using only small part of
Django was less painful.


And now comes the big part. Actually populating the database with some
meaningful data. I did some work in that part. Last week there were some
modifications to protocol I was using between Matchbox and Tinderboxes.
Most of changes were touching code dealing with log/environment
collection and fact that we have been compiling inside chroot
environment.

I also added support for packages that require certain use flags
enabled/disabled for their dependencies. Good example is dev-utils/git
(requires dev-utils/subversion[-dso]). For git this is however only
RDEPEND (runtime dependency) so compilation doesn't depend on it.

Since we already have db for storing results, and support is there let's
compile some packages! For next week I plan to finally add proper
testing to the mix. Instead of compiling fortune-mod over and over,
Matchbox will ask also for other packages to be compiled. This will bring
out some more problems that will need to be fixed I am sure. At least
that's the idea...

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
  2009-07-09 21:36   ` Stanislav Ochotnicky
@ 2009-07-17 13:39     ` Stanislav Ochotnicky
  2009-07-25 22:50       ` Stanislav Ochotnicky
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-07-17 13:39 UTC (permalink / raw
  To: gentoo-qa; +Cc: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]


YAWR (Yet Another Weekly Report) is here and yet again I am feeling like an
intruder on these quiet lists :-)


So what was going on over the last week? Well not that much since I had
a visitor for the weekend and first day of the week. 

Bulk of the work went into making testing easier, documenting
installation procedures, creating startup scripts for collagen
components.

We also started compiling more packages and collecting build
errors/contents of these packages. Right now lot of errors come from
problems with collagen (for example it managed to unmerge sed from chroot, 
not a great idea :-) ).

Now we are slowly entering stage where most of work will be directed
towards fixing stuff up so that errors that remain are real ebuild
problems. I'd like to apologize to Andrey Kislyuk, my mentor, for
not doing this sooner. I realized too late that "we want to start
compiling packages as soon as possible" really meant "AS SOON AS
POSSIBLE". Thanks for being patient.

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
  2009-07-17 13:39     ` Stanislav Ochotnicky
@ 2009-07-25 22:50       ` Stanislav Ochotnicky
  2009-07-31  9:44         ` Stanislav Ochotnicky
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-07-25 22:50 UTC (permalink / raw
  To: gentoo-qa; +Cc: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

Hi everyone,

first thanks for responses to my last email. I was kind of joking
with the intruder part, but it still is nice to have feedback.

Now to good news. We had some nice results this week: first
discovered ebuild error. Apparently lxkde-base/lxsession was missing
intltool in DEPEND.

Apart from that everything went as planned. I was focusing on fixing
errors with collagen and building of packages. To speed up testing we
started building/using binary packages (with  --usepkg --buildpkg). This
will have to be improved later when we start playing with use flags
more, but for now it will do.

I also fixed problem with unmerging system packages (collagen now skips
unmerging of packages in "system" set).

Rest of changes went into improving information we get when compiling
packages (debugging info mostly).

I guess this will be it for now, plans for following week are as
follows:
 * fix few more outstanding bugs bugging me :-)
 * compile even more packages and start filling database up
 * categorize at least a few ebuild problems

See ya later alligators,

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
  2009-07-25 22:50       ` Stanislav Ochotnicky
@ 2009-07-31  9:44         ` Stanislav Ochotnicky
  2009-08-07  8:12           ` Stanislav Ochotnicky
       [not found]           ` <20090807081410.GB29277@w0rm.ynet.sk>
  0 siblings, 2 replies; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-07-31  9:44 UTC (permalink / raw
  To: gentoo-qa; +Cc: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 2278 bytes --]

Wow,

another week behind me. And quite productive one if you ask me.

I recently pushed yesterday's changes to public repo (so far
only devel branch which might get rebased so thread carefully :-)
). There is a lot of cleanup work to be done, but I can say that
we have base working now.

So what exactly was going on over the last week? I fixed
bunch of bugs (and two remaining I plan to fix today). You
can see more information about that at redmine bug tracker on
gentooexperimental.org. Once I fix remaining bugs I plan to add more
to bugtracker :-)

Some work went into making tinderboxes able to recover from problems
so that they can run without supervision. All around exception
handling and error logging is not perfect but a lot better then a
week ago.

I also started filling up database with information yesterday. It went
even smoother then I expected, I only had few typos in my code :-) All
in all it took about 2 hours to make it all work.

I have three more ebuild candidates for fixing (not confirmed yet):
 * dev-java/kaffe doesn't list x11-libs/libXtst in DEPEND
 * x11-lib/libXaw is missing x11-libs/libXext and x11-proto/xextproto
    in DEPEND
 * games-roguelike/tome ebuilds all have typo in ebuilds. Can you spot
   it? 

I'll give you short example so you don't have to look it up:
DEPEND="${REDEPEND}
        x11-misc/makedepend"

For a moment I actually thought that portage has some bug because it
didn't return proper DEPEND packages...Until I saw that typo later on.
Maybe this could be checked in repoman somehow?

So moving on to plan for today and the next week:
 * report bugs for mentioned ebuilds
 * fix main bugs remaining in collagen
 * start to refactor code to make it more pretty for later audit :-)
 * start creating web interface for file database using Django

So long and thanks for all the fish,

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
  2009-07-31  9:44         ` Stanislav Ochotnicky
@ 2009-08-07  8:12           ` Stanislav Ochotnicky
       [not found]           ` <20090807081410.GB29277@w0rm.ynet.sk>
  1 sibling, 0 replies; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-08-07  8:12 UTC (permalink / raw
  To: gentoo-qa

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

And here I am with another week report,

this week has been mostly about integration, deployment arrangements and
web development.

I also fixed two main remaining bugs, that is:
 * installation of packages even when KEYWORDS didn't have ARCH or ~ARCH in them.
 * nested use dependencies with '||' caused problems

The web development is quite easy now althought I admit that my
templates are more-less bare html django templates without any fancy
ajax or similar web 2.0 stuff :-)

For now we can easily:
 * list contens of certain package version (actually one compiled
   instance of that package version). If there was an error compiling
   instead of contents we can see build logs etc.
 * list only packages that were problematic (first category list, then
   package list)
 * search what packages certain path is in

Monday is suggested "pencils down" but since I haven't deployed collagen
yet this will not be our pencils down apparently.

For last week(end) I plan to polish up collagen, especially
documentation about workarounds and various hacks so that we will all
know what are current limits (there are quite a few but concentrated in
few places that can be easily improved).

Enjoy your weekend,

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-qa] Re: [GSoC-status] Collagen - database schema and further changes
       [not found]           ` <20090807081410.GB29277@w0rm.ynet.sk>
@ 2009-08-15 21:10             ` Stanislav Ochotnicky
  0 siblings, 0 replies; 8+ messages in thread
From: Stanislav Ochotnicky @ 2009-08-15 21:10 UTC (permalink / raw
  To: gentoo-soc; +Cc: gentoo-qa

[-- Attachment #1: Type: text/plain, Size: 4552 bytes --]

My final (GSoC) report on collagen is here.

First, what was going on this past week...I've focused on adding
documentation (docstrings, comments) and a little bit of refactoring.
Then there were quite a few modifications to simplify installation and
now I can say that simple:

 # python setup.py install

plus few configuration steps afterwards will do the whole installation
procedure. Alternatively it's possible to use bundled ebuild instead of
setup.py script. This was actually first time I've created setup.py
script and I have to admit that for simple stuff (like collagen) it's
quite easy to create it from scratch in a few minutes.

So to the main summary of the project, what works, what doesn't really
work but is planned for post-GSoC era etc etc.

We have working end-to-end system for automatic distributed
compilation of packages from portage tree with information stored
in database.  Information being contents of packages or in case
of compilation failure build.log, environment, emerge --info and
application log for good measure. We were able to catch few bugs in
ebuilds already even with limited resources (meaning I was mostly
compiling in a virtualbox on one of my machines). Even if QA like
this was not originally the main goal of collagen, it might later turn
out to be exactly that. Bug hunting monster :-)

As I mentioned in last week's report web interface for this was
done in django and I've implemented 3 basic use-cases for showing
data in database. If you've worked with Django before then you know
it's quite easy to add more functionality.

At least one more important use-case to be implemented is "is this package
colliding with something else?" or even better "show me groups of packages
colliding with each other". Data is there, database schema is able to
support these use cases so this will definitely be implemented
(although presumably after GSoC).

This is where we come to "what now" part. Whole collagen didn't get a
lot of testing yet, and I'd love to try it on more than one virtualbox
machine. There are a lot of places where performance could be improved
(read: where we would not install already installed packages and such).
Another unimplemented idea was using remote binary hosts. However I
really think that this would make much more sense if it was implemented
together with improved binary support for portage. Then there was
authentication of tinderboxes to matchbox for communication between
them. Right now machine able to connect to matchbox server could
probably run python code as user running matchbox. I haven't tried, but
python documentation for pickle module is pretty clear about insecurity
of this approach (I was assured this won't be a problem after auth is
done though). 

All in all collagen is far from being finished, polished piece of
software right now. But at least in my opinion it already showed that it
could be useful and improvements can now be done in quite modular way.
One exception being of course database schema, but even changes there
are nothing to be afraid of (I presonally tried, it's possible without
destroying the data :-) ). Expect few more updates on 
soc.gentooexperimental bugtracking/documents. Then 

Now let me say big thank you to my mentor weaver and the whole
Gentoo team. Few examples:
 * weaver - for changing our meeting time quite a few times because of me,
            and for the patience.
 * zmedico - for being portage king, always ready to help out and even
             explain 'why', not just how
 * robbat2|na - for becoming base unit of problem solving recently.
 * .* - for answering quite a few of my questions and giving me ideas.
         Most of you probably never noticed that you helped me
         (sometimes even I didn't notice at the time :-) ).

I hope I'll work with you to further improve collagen and after
that, other Gentoo projects. I am not going anywhere yet, as long
as you won't chase me away with a stick :-) For me this is not really
over, I'll chill out a little bit of course, but if at all possible I'd
like to continue contributing.

-- 
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc


jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-08-15 21:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-26 12:51 [gentoo-qa] [GSoC-status] Collagen - database schema and further changes Stanislav Ochotnicky
2009-07-03 15:41 ` [gentoo-qa] " Stanislav Ochotnicky
2009-07-09 21:36   ` Stanislav Ochotnicky
2009-07-17 13:39     ` Stanislav Ochotnicky
2009-07-25 22:50       ` Stanislav Ochotnicky
2009-07-31  9:44         ` Stanislav Ochotnicky
2009-08-07  8:12           ` Stanislav Ochotnicky
     [not found]           ` <20090807081410.GB29277@w0rm.ynet.sk>
2009-08-15 21:10             ` Stanislav Ochotnicky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox