* [gentoo-qa] [GSoC-status] Tree-wide collision checking and files database
@ 2009-06-12 13:32 Stanislav Ochotnicky
[not found] ` <90b936c0906120805m18cc4f55j3b33d0d17d855970@mail.gmail.com>
0 siblings, 1 reply; 2+ messages in thread
From: Stanislav Ochotnicky @ 2009-06-12 13:32 UTC (permalink / raw
To: gentoo-qa; +Cc: gentoo-soc
[-- Attachment #1: Type: text/plain, Size: 3414 bytes --]
Hi everyone,
some of you already know that work on GSoC project "Tree-wide collision
checking and provided files database" has been started a few weeks ago.
For the rest, I will make a short introduction and goals of this
project (collagen).
Collagen aims to improve quality of ebuilds in portage tree. It does
this by compiling as many ebuilds as possible. It specifically takes
into account various atoms in DEPEND variable. For example if package
ebuild states that it needs =dev-libs/glib-2*, that package should be
compilable with every version of glib-2* in portage (taking into account
keywords). Therefore collagen will install one version of glib-2*, then
ebuild in question, collect information, uninstall ebuild and first
glib version. If repeats this process for every glib-2* in the tree.
Original idea was to have two sides:
* master server (matchbox)
* slaves compiling packages (tinderboxes)
Master server decides what needs to be compiled (automatically or
semi-automatically). Tinderbox asks for job, master provides package
name (and optionally version). Tinderbox then goes and tries to compile
package with different sets of dependencies reporting results to
Matchbox.
It seems that whole process could be sped up by hosting binary
packages on one central server (Binary host). Obviously various versions
of the same package would be created and therefore unique names could be
created by using some metadata to create hash part of filename. On a
first thought I would use USE flags and DEPEND as metadata to hash.
So far two other projects came to light as possible source of
inspiration and/or collaboration:
* catalyst (mainly tinderbox generating part)
* AutotuA (automatic generic job framework)
Especially AutotuA seems like good candidate for merging.
It doesn't seem possible to compile every project with every version of
every dependency, therefore I'd like to ask for your opinion especially
about this part. One idea I had was to restrict testing to highest build
number for given version. For example we have:
glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against
glib-2.18.4-r2 and will assume that r1 would be OK too (or users would
upgrade since it's a bugfix release)
Another approach to optimizing use of resources would be to have a
priority list of packages that need most testing. I imagine this could
be created by analyzing logs from gentoo mirrors, and figuring out which
packages are downloaded most frequently.
We would probably need at least one tinderbox per glibc version if I am
not mistaken since this cannot be freely up/downgraded.
This email was meant just as a teaser, more information (data model, UML
diagrams) is available on project website (look for Documents):
http://soc.gentooexperimental.org/projects/show/collision-database
I'd love to be hear some suggestions, opinions and criticism. You can
use this thread, or even various options on gentooexperimental.org.
--
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc
jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* [gentoo-qa] Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database
[not found] ` <90b936c0906120805m18cc4f55j3b33d0d17d855970@mail.gmail.com>
@ 2009-06-12 19:13 ` Stanislav Ochotnicky
0 siblings, 0 replies; 2+ messages in thread
From: Stanislav Ochotnicky @ 2009-06-12 19:13 UTC (permalink / raw
To: gentoo-soc; +Cc: gentoo-qa
[-- Attachment #1: Type: text/plain, Size: 8360 bytes --]
On 10:05 Fri 12 Jun , Jeremy Olexa wrote:
> On Fri, Jun 12, 2009 at 8:32 AM, Stanislav
> Ochotnicky<sochotnicky@gmail.com> wrote:
> > Hi everyone,
> >
> > some of you already know that work on GSoC project "Tree-wide collision
> > checking and provided files database" has been started a few weeks ago.
> > For the rest, I will make a short introduction and goals of this
> > project (collagen).
> >
> > Collagen aims to improve quality of ebuilds in portage tree. It does
> > this by compiling as many ebuilds as possible. It specifically takes
> > into account various atoms in DEPEND variable. For example if package
> > ebuild states that it needs =dev-libs/glib-2*, that package should be
> > compilable with every version of glib-2* in portage (taking into account
> > keywords). Therefore collagen will install one version of glib-2*, then
> > ebuild in question, collect information, uninstall ebuild and first
> > glib version. If repeats this process for every glib-2* in the tree.
>
> Testing against every version of the deps as required seems like it is
> diverging from the original "Tree-wide collision checking and provided
> files database" - Would you say that the goal of this project is
> becoming more QA orientated? Something like: "Matchbox: A tinderboxen
> master server to provide QA for ebuilds"
Yes that's true, this project is moving towards QA (it was proposed by
QA developer after all). SoC list was only CCed, To: was to gentoo-qa. I
should have warn in my original email so that responses wouldn't get
lost between two lists. Added QA to cc. I believe crossposting is OK in
this case. I apologize if that's not the case.
This is from one of my discussions with weaver (my mentor):
<quote mode=summary>
say package X depends on libfoo, and there is libfoo-1.2.3 and libfoo-1.3.4
in portage, as well as libfoo-1.0.1 in the attic
(http://sources.gentoo.org/viewcvs.py/gentoo-x86/ - view dead files)
but the dev was a bit sloppy and didn't check with libfoo-1.2.3 and that
version has a file collision with X, while libfoo-1.3.4 doesn't and X fails
to compile with libfoo-1.0.1, even though some people might still have that
on their systems so you have some build failures/collisions here that are
QA problems which should be caught and can only be caught by iterating
through all versions of the dependencies.
</quote>
> If you were strictly collision checking, then you don't care about
> every version of glib-2* you only care about the package in question
> and what installed files it provides. However for the provided files,
> you do care about every version of glib-2*, not for the other package,
> but to list the installed files of glib-2*
Yes that's true. Strictly speaking collisions could be caught easily by
compiling every package (usually with as many USE flags enabled as
possible). However It would be nice to catch ebuilds that don't specify correct
versions in DEPEND. Now that I think about it, it may be a good idea to
allow matchbox to specify if tinderbox should try to compile against
every version of dependencies or just one. By default we would only
check against latest version, but specific packages could be set in a
way to check every version of dependencies.
> After writing that down, I can see why you want to compile, check,
> uninstall, re-compile, repeat...but I worry about how efficient it is
> and what ways to improve that.
That's my concern too. That's why I wanted to use central binary host.
Every compiled package could then be reused across all tinderboxes (with
same architecture of course).
I was counting on them being on high-speed network connection (ideally
LAN).
There is one more good thing about always starting with nothing but
bare system in the beginning. If something is missing in DEPEND we will catch
it easily.
> >
> > Original idea was to have two sides:
> > * master server (matchbox)
> > * slaves compiling packages (tinderboxes)
> >
> > Master server decides what needs to be compiled (automatically or
> > semi-automatically). Tinderbox asks for job, master provides package
> > name (and optionally version). Tinderbox then goes and tries to compile
> > package with different sets of dependencies reporting results to
> > Matchbox.
> >
> > It seems that whole process could be sped up by hosting binary
> > packages on one central server (Binary host). Obviously various versions
> > of the same package would be created and therefore unique names could be
> > created by using some metadata to create hash part of filename. On a
> > first thought I would use USE flags and DEPEND as metadata to hash.
>
> This is a cool aspect of the project, I hope you can work with solar
> and zmedico to improve binpkgs. USE flags seem to be the trouble spot
> of binpkgs.
That's my other concern. I know that there was GSoC project to improve
binary support in portage. Merging two project into one would not
achieve much IMO. However I know for certain that to a certain degree I
could make further work easier and I will do my best to do so. So make
things as simple as possible, but not simpler :-)
> >
> > So far two other projects came to light as possible source of
> > inspiration and/or collaboration:
> > * catalyst (mainly tinderbox generating part)
> > * AutotuA (automatic generic job framework)
> >
> > Especially AutotuA seems like good candidate for merging.
> >
> > It doesn't seem possible to compile every project with every version of
> > every dependency, therefore I'd like to ask for your opinion especially
> > about this part. One idea I had was to restrict testing to highest build
> > number for given version. For example we have:
> > glib-2.18.4-r1 and glib-2.18.4-r2, therefore we will only test against
> > glib-2.18.4-r2 and will assume that r1 would be OK too (or users would
> > upgrade since it's a bugfix release)
>
> IMO, you have two choices. Latest stable or latest ~arch. Stable users
> will not upgrade from glib-2.18.4-r1 to -r2 until -r2 is stable so
> that argument is out.
I should have checked ebuilds before posting I guess :-) That was meant
as an example. For the sake of argument consider they are both arch (not
~arch).
> >
> > Another approach to optimizing use of resources would be to have a
> > priority list of packages that need most testing. I imagine this could
> > be created by analyzing logs from gentoo mirrors, and figuring out which
> > packages are downloaded most frequently.
>
> Mirror log analysis is a fundamentally hard thing to do given the vast
> network of mirrors that we have.
Of course. It's not meant to be precise, just approximation to what
packages are favourites. Even then I realize there will always be
regional tendencies. Oh well...masses :-)
> >
> > We would probably need at least one tinderbox per glibc version if I am
> > not mistaken since this cannot be freely up/downgraded.
>
> Its free to upgrade ;) Can't downgrade. Given how large the glibc
> tracker bugs get, I don't think this project should use the latest
> glibc available. Unless you are trying to hunt down bugs, but I think
> you will get buried with compile failures. If the goal of this project
> is to data mine the installed package's information, that is not
> dependant on a glibc version. Please think about this some more before
> going down that road, I want this project to be successful ;)
Right. That was a suggestion for QA to think about. I would like to
think about this as my opportunity to give something back to Gentoo
after all these years as a user (taking, not giving much back). So it
all boils down to what would Gentoo (QA team) need to improve Gentoo
further. If it is enough to test latest stable glibc then that's how
it's gonna be done. Maybe again make it possible to change behaviour in
the future.
--
Stanislav Ochotnicky
Working for Gentoo Linux http://www.gentoo.org
Implementing Tree-wide collision checking and provided files database
http://soc.gentooexperimental.org/projects/show/collision-database
Blog: http://inputvalidation.blogspot.com/search/label/gsoc
jabber: sochotnicky@gmail.com
icq: 74274152
PGP: https://dl.getdropbox.com/u/165616/sochotnicky-key.asc
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-06-12 19:14 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-12 13:32 [gentoo-qa] [GSoC-status] Tree-wide collision checking and files database Stanislav Ochotnicky
[not found] ` <90b936c0906120805m18cc4f55j3b33d0d17d855970@mail.gmail.com>
2009-06-12 19:13 ` [gentoo-qa] Re: [gentoo-soc] " Stanislav Ochotnicky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox