[gentoo-portage-dev] Current portage well designed, but badly used

public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-portage-dev] Current portage well designed, but badly used
@ 2004-11-27 23:10 Gustavo Barbieri
  2004-11-27 23:48 ` Michael Tindal
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Gustavo Barbieri @ 2004-11-27 23:10 UTC (permalink / raw
  To: gentoo-portage-dev

Hello,

I'm playing with portage and noticed it's well designed, but there are
some mistakes in its usage at the moment. For example:

Categories are mixed: there is a net-www/apache and net-www/mod_*
(apache modules), but there is a more convenient category www-apache/
for them. This is one example, there are more mistakes.  There is any
plan to fix them in next portage releases?

Some packages use numbering version padded with zero, that's good to
list with shell functions, but it's bad because you can't change them
to numbers and them back to string. For example:
mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
becomes 1.0 and you can't map back to the ebuild.

Portage provides metadata.xml, cool. But it's hardly used :(
metadata.xml seems to provide tags for maintainers, changelogs and
long description, many (most?) packages don't use them.

The portage library is too heavy, complicated and make things slow.
Heavy and complicated I noticed from (trying to) look at the source,
slow by usage. For example:

time emerge # without parameters
real    0m0.614s
user    0m0.487s
sys     0m0.046s

time emerge -pv world # 16 packages to be upgraded
real    0m22.664s
user    0m12.423s
sys     0m1.130s

It's too much, look at debian apt, it's fast. And I can't see why
portage is slow.
Forgive me if I'm wrong, but portage just need to parse
/var/lib/portage/world (237 entries in my case), them for each check
if there is any other version greater than and if so check for
dependencies. Why 22seconds? A hand made take less than 1.

Also, a brief explanation on why I was playing with portage and some
requests: I'm coding (for fun, no plan to get in a production state)
yet another graphical package manager atop portage with the newbie in
mind. But to achieve my goal I need:

   - a fast portage. Now I'm doing a module to do this for me (see
more above), at least the basics, like get package information,
versions, ... and if possible resolve primary dependencies (just to
show to user in a tab "Dependencies", hidden by default).

   - more meta data, if possible a list of urls to screenshots (most
packages have a screenshots section), if the url links to an html,
provide a threshold of images size to get, so it connects and
downloads every image bigger than it... cached of course.

   - portage to act as a daemon, queue requests  and fetch packages.
If portage could be a daemon with 3 threads: one that download
packages, one that compiles and one to manage the other and accept
requests; then it could schedule download to maximize download
throughput, downloading smaller packages first while respecting
dependencies, compile while download and wait until packages are there
and the "emerge" command just send commands to it.  It would be handy
since compiling times are huge.

About the fast portage: I know portage is a complex monster and is the
heart of gentoo, if it breaks, everything breaks. But how about a
python module to be used by other packages that just want to view the
portage and its packages. If eventually this module works as expected
and have every current portage feature, it could replace the old one.
   I started to code my own "fast portage", but some things are picky
to do, and I want to know how you do that: how do you parse ebuilds to
get USE, DESCRIPTION, SLOT, DEPEND, ... ?
   If you want to know why my implementation is fast: I use lazy
evaluation as far as possible. For example, I load every package, but
the attributes to available versions, installed versions, the status,
are just calculated on deman, I use python property() and
setters/getters for that. Since hardly you'll use every attribute from
everythin, it loads much faster.
   I have preliminar code here:
http://ltc08.ic.unicamp.br/~gustavo/packagemanager.tar.bz2, but some
modifications I did were lost in a power outtage + xfs... I just have
the .pyc, if someone knows how to get the .py back...

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri
@ 2004-11-27 23:48 ` Michael Tindal
  2004-11-28 17:08   ` Gustavo Barbieri
  2004-11-28  3:41 ` Luke-Jr
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Michael Tindal @ 2004-11-27 23:48 UTC (permalink / raw
  To: gentoo-portage-dev

Gustavo Barbieri wrote:

>Hello,
>
>I'm playing with portage and noticed it's well designed, but there are
>some mistakes in its usage at the moment. For example:
>
>Categories are mixed: there is a net-www/apache and net-www/mod_*
>(apache modules), but there is a more convenient category www-apache/
>for them. This is one example, there are more mistakes.  There is any
>plan to fix them in next portage releases?
>
>Some packages use numbering version padded with zero, that's good to
>list with shell functions, but it's bad because you can't change them
>to numbers and them back to string. For example:
>mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
>becomes 1.0 and you can't map back to the ebuild.
>
>Portage provides metadata.xml, cool. But it's hardly used :(
>metadata.xml seems to provide tags for maintainers, changelogs and
>long description, many (most?) packages don't use them.
>  
>
All of this you mentioned is really irrelevant to sys-apps/portage, but 
more relevant to the tree.
The categories you mentioned are held resposible by individual 
maintainers, not the portage team. 
To see who is responsible for the category/package/ebuild, you use the 
metadata.xml.  Packages dont
need to use the metadata.xml directly because it is just that, 
metadata.  It is used to provide the information
you just stated, and it serves its purpose well.

>The portage library is too heavy, complicated and make things slow.
>Heavy and complicated I noticed from (trying to) look at the source,
>slow by usage. For example:
>
>time emerge # without parameters
>real    0m0.614s
>user    0m0.487s
>sys     0m0.046s
>
>time emerge -pv world # 16 packages to be upgraded
>real    0m22.664s
>user    0m12.423s
>sys     0m1.130s
>
>It's too much, look at debian apt, it's fast. And I can't see why
>portage is slow.
>Forgive me if I'm wrong, but portage just need to parse
>/var/lib/portage/world (237 entries in my case), them for each check
>if there is any other version greater than and if so check for
>dependencies. Why 22seconds? A hand made take less than 1.
>
>  
>
You can't compare apt to portage, dont even try to go down this route.  
Its like comparing apples to lemons.
Portage is slow, but it is being fixed.  CVS portage for one is a whole 
lot faster.

>Also, a brief explanation on why I was playing with portage and some
>requests: I'm coding (for fun, no plan to get in a production state)
>yet another graphical package manager atop portage with the newbie in
>mind. But to achieve my goal I need:
>
>   - a fast portage. Now I'm doing a module to do this for me (see
>more above), at least the basics, like get package information,
>versions, ... and if possible resolve primary dependencies (just to
>show to user in a tab "Dependencies", hidden by default).
>
>   - more meta data, if possible a list of urls to screenshots (most
>packages have a screenshots section), if the url links to an html,
>provide a threshold of images size to get, so it connects and
>downloads every image bigger than it... cached of course.
>  
>
This is uncessary and would add extra bloat to the tree, and adds more 
complexity to our dev team.
If you want something like this your best bet would be to provide a 
patch for its functionality on
bugs.gentoo.org, but I wouldnt be surprised if it wasnt accepted.

>   - portage to act as a daemon, queue requests  and fetch packages.
>If portage could be a daemon with 3 threads: one that download
>packages, one that compiles and one to manage the other and accept
>requests; then it could schedule download to maximize download
>throughput, downloading smaller packages first while respecting
>dependencies, compile while download and wait until packages are there
>and the "emerge" command just send commands to it.  It would be handy
>since compiling times are huge.
>  
>
There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in 
portage CVS, although it
doesnt use threads, because there is no way to kill processes (wget, 
etc.) spawned from within
a thread, so youd have stale processes after Ctrl+C'ing portage.

>About the fast portage: I know portage is a complex monster and is the
>heart of gentoo, if it breaks, everything breaks. But how about a
>python module to be used by other packages that just want to view the
>portage and its packages. If eventually this module works as expected
>and have every current portage feature, it could replace the old one.
>  
>
Jstubbs is working on an api that will make its way into a later 
revision of portage.  As far as parsing
ebuilds, they are sourced directly from bash.

-- 
Michael Tindal (urilith)
Gentoo Linux Developer
python | dotnet | apache

-- The best way to create is to destroy.


--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-27 23:48 ` Michael Tindal
@ 2004-11-28 17:08   ` Gustavo Barbieri
  2004-11-28 17:31     ` Andrew Gaffney
  2004-11-30 14:22     ` Brian Harring
  0 siblings, 2 replies; 25+ messages in thread
From: Gustavo Barbieri @ 2004-11-28 17:08 UTC (permalink / raw
  To: gentoo-portage-dev

On Sat, 27 Nov 2004 17:48:54 -0600, Michael Tindal <urilith@gentoo.org> wrote:
> Gustavo Barbieri wrote:
> 
> 
> 
> >Hello,
> >
> >I'm playing with portage and noticed it's well designed, but there are
> >some mistakes in its usage at the moment. For example:
> >
> >Categories are mixed: there is a net-www/apache and net-www/mod_*
> >(apache modules), but there is a more convenient category www-apache/
> >for them. This is one example, there are more mistakes.  There is any
> >plan to fix them in next portage releases?
> >
> >Some packages use numbering version padded with zero, that's good to
> >list with shell functions, but it's bad because you can't change them
> >to numbers and them back to string. For example:
> >mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> >becomes 1.0 and you can't map back to the ebuild.
> >
> >Portage provides metadata.xml, cool. But it's hardly used :(
> >metadata.xml seems to provide tags for maintainers, changelogs and
> >long description, many (most?) packages don't use them.
> >
> >
> All of this you mentioned is really irrelevant to sys-apps/portage, but
> more relevant to the tree.
> The categories you mentioned are held resposible by individual
> maintainers, not the portage team.
> To see who is responsible for the category/package/ebuild, you use the
> metadata.xml.  Packages dont
> need to use the metadata.xml directly because it is just that,
> metadata.  It is used to provide the information
> you just stated, and it serves its purpose well.

I just think that if metadata.xml was filled with long description and
maintainers emails, it would help people over there. Ok, packages
don't need to use it directly, but tools might want to show to users
more info about something.

Talking about metadata, why does HOMEPAGE and DESCRIPTION are in
ebuilds and not in metadata.xml, IMHO they're not used to build the
package in any way.   Maybe if we move those (always filled)
information to metadata.xml, people would fill other fields there.

Also, you said that this is irrelevant to the portage application, but
to the portage tree. Where can I talk to portage tree maintainers? If
I need to patch the entire portage with metadata.xml and stuff like
that, it will be an huge work, but if portage maintainers ask the
package maintainers to do it for next releases, many people would do
small jobs, easier than small group doing many jobs.


> >The portage library is too heavy, complicated and make things slow.
> >Heavy and complicated I noticed from (trying to) look at the source,
> >slow by usage. For example:
> >
> >time emerge # without parameters
> >real    0m0.614s
> >user    0m0.487s
> >sys     0m0.046s
> >
> >time emerge -pv world # 16 packages to be upgraded
> >real    0m22.664s
> >user    0m12.423s
> >sys     0m1.130s
> >
> >It's too much, look at debian apt, it's fast. And I can't see why
> >portage is slow.
> >Forgive me if I'm wrong, but portage just need to parse
> >/var/lib/portage/world (237 entries in my case), them for each check
> >if there is any other version greater than and if so check for
> >dependencies. Why 22seconds? A hand made take less than 1.
> >
> >
> >
> You can't compare apt to portage, dont even try to go down this route.
> Its like comparing apples to lemons.
> Portage is slow, but it is being fixed.  CVS portage for one is a whole
> lot faster.

I'll look at CVS, but I don't see why portage need to be slow. As you
said, it's being fixed.

and I can't see that difference between portage and apt in the area
portage is slow, ok apt uses a db and don't need to check use flags,
but they're orders of magnitude different. Even lemons and apples are
that different ;)


> >Also, a brief explanation on why I was playing with portage and some
> >requests: I'm coding (for fun, no plan to get in a production state)
> >yet another graphical package manager atop portage with the newbie in
> >mind. But to achieve my goal I need:
> >
> >   - a fast portage. Now I'm doing a module to do this for me (see
> >more above), at least the basics, like get package information,
> >versions, ... and if possible resolve primary dependencies (just to
> >show to user in a tab "Dependencies", hidden by default).
> >
> >   - more meta data, if possible a list of urls to screenshots (most
> >packages have a screenshots section), if the url links to an html,
> >provide a threshold of images size to get, so it connects and
> >downloads every image bigger than it... cached of course.
> >
> >
> This is uncessary and would add extra bloat to the tree, and adds more
> complexity to our dev team.
> If you want something like this your best bet would be to provide a
> patch for its functionality on
> bugs.gentoo.org, but I wouldnt be surprised if it wasnt accepted.

I mean I want this in metadata.xml, not in ebuilds or so... how can
this add complexity to dev team? You mean writing the xml parser, it's
easy and I can send the patches.  Also, I can provide tools that check
urls to see if they still exists, like the homepage and screenshots.


> >   - portage to act as a daemon, queue requests  and fetch packages.
> >If portage could be a daemon with 3 threads: one that download
> >packages, one that compiles and one to manage the other and accept
> >requests; then it could schedule download to maximize download
> >throughput, downloading smaller packages first while respecting
> >dependencies, compile while download and wait until packages are there
> >and the "emerge" command just send commands to it.  It would be handy
> >since compiling times are huge.
> >
> >
> There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in
> portage CVS, although it
> doesnt use threads, because there is no way to kill processes (wget,
> etc.) spawned from within
> a thread, so youd have stale processes after Ctrl+C'ing portage.

Great!
BTW, with threads I meant the concept of more than one thing running
in parallel, don't need to be posix threads, can be process or even
one process using select()


> >About the fast portage: I know portage is a complex monster and is the
> >heart of gentoo, if it breaks, everything breaks. But how about a
> >python module to be used by other packages that just want to view the
> >portage and its packages. If eventually this module works as expected
> >and have every current portage feature, it could replace the old one.
> >
> >
> Jstubbs is working on an api that will make its way into a later
> revision of portage.  As far as parsing
> ebuilds, they are sourced directly from bash.

There is any explanation/roadmap/design I can look at? Jstubbs reads
this list? What's his goals, how he want to achieve it?

About parsing of ebuilds, what do I need to source before the ebuild
itself? I mean, to get things like "inherit" working.

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28 17:08   ` Gustavo Barbieri
@ 2004-11-28 17:31     ` Andrew Gaffney
  2004-11-28 17:56       ` Gustavo Barbieri
  2004-11-30 14:22     ` Brian Harring
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Gaffney @ 2004-11-28 17:31 UTC (permalink / raw
  To: gentoo-portage-dev

Gustavo Barbieri wrote:
> Talking about metadata, why does HOMEPAGE and DESCRIPTION are in
> ebuilds and not in metadata.xml, IMHO they're not used to build the
> package in any way.   Maybe if we move those (always filled)
> information to metadata.xml, people would fill other fields there.

HOMEPAGE and DESCRIPTION have been in ebuilds for a *long* time where 
metadata.xml is a fairly recent addition.

> Also, you said that this is irrelevant to the portage application, but
> to the portage tree. Where can I talk to portage tree maintainers? If
> I need to patch the entire portage with metadata.xml and stuff like
> that, it will be an huge work, but if portage maintainers ask the
> package maintainers to do it for next releases, many people would do
> small jobs, easier than small group doing many jobs.

Almost every dev is a portage tree maintainer. There is no master tree authority 
that all ebuilds must pass through before hitting the tree.

-- 
Andrew Gaffney
Gentoo Linux Developer
Installer Project

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28 17:31     ` Andrew Gaffney
@ 2004-11-28 17:56       ` Gustavo Barbieri
  0 siblings, 0 replies; 25+ messages in thread
From: Gustavo Barbieri @ 2004-11-28 17:56 UTC (permalink / raw
  To: gentoo-portage-dev

On Sun, 28 Nov 2004 11:31:52 -0600, Andrew Gaffney <agaffney@gentoo.org> wrote:
> Gustavo Barbieri wrote:
> > Talking about metadata, why does HOMEPAGE and DESCRIPTION are in
> > ebuilds and not in metadata.xml, IMHO they're not used to build the
> > package in any way.   Maybe if we move those (always filled)
> > information to metadata.xml, people would fill other fields there.
> 
> HOMEPAGE and DESCRIPTION have been in ebuilds for a *long* time where
> metadata.xml is a fairly recent addition.
> 
> > Also, you said that this is irrelevant to the portage application, but
> > to the portage tree. Where can I talk to portage tree maintainers? If
> > I need to patch the entire portage with metadata.xml and stuff like
> > that, it will be an huge work, but if portage maintainers ask the
> > package maintainers to do it for next releases, many people would do
> > small jobs, easier than small group doing many jobs.
> 
> Almost every dev is a portage tree maintainer. There is no master tree authority
> that all ebuilds must pass through before hitting the tree.

Okay, but at least promote a paragraph in some gentoo weekly news?
Something quick, just to mention metadata.xml.

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28 17:08   ` Gustavo Barbieri
  2004-11-28 17:31     ` Andrew Gaffney
@ 2004-11-30 14:22     ` Brian Harring
  2004-11-30 14:53       ` Jason Stubbs
  2004-12-01  3:55       ` Gustavo Barbieri
  1 sibling, 2 replies; 25+ messages in thread
From: Brian Harring @ 2004-11-30 14:22 UTC (permalink / raw
  To: gentoo-portage-dev

On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
> > >The portage library is too heavy, complicated and make things slow.
> > >Heavy and complicated I noticed from (trying to) look at the source,
> > >slow by usage.
You *really* should explain how it's heavy and complicated. 
Generalizations don't help to improve it :)

> > >time emerge # without parameters
> > >real    0m0.614s
> > >user    0m0.487s
> > >sys     0m0.046s
> > >
> > >time emerge -pv world # 16 packages to be upgraded
> > >real    0m22.664s
> > >user    0m12.423s
> > >sys     0m1.130s
There's quite a large difference between just importing portage, and
actually parsing your profile, determining what your use flags are
(since profiles can define defaults, and some use flags are based upon
packages being installed dependant on the profile, perl fex).  That, and
walking/building the depgraph, querying the cache, locking, etc.

> > >
> > >It's too much, look at debian apt, it's fast. And I can't see why
> > >portage is slow.
> > >Forgive me if I'm wrong, but portage just need to parse
> > >/var/lib/portage/world (237 entries in my case), them for each check
> > >if there is any other version greater than and if so check for
> > >dependencies. Why 22seconds? A hand made take less than 1.
Checks first level depends of all packages in world also.  So that list
just got larger :)
Regarding debian apt, it's likely apples/oranges as urilith stated.  One
thing to note is that afaik, debian dependencies lack versions- they're
basically a flat namespace.

fex, if a dpkg states it deps mysql, there is mysql.  Singular.
W/ portage, well, need to determine what version is available based upon
keywords, package.mask, and users /etc/portage/package.keywords (and
other things).

Note I work on portage, not dpkg/apt.  So I could be talking out of my
ass there...

> I'll look at CVS, but I don't see why portage need to be slow. As you
> said, it's being fixed.
Elaborate on how it's slow.  There are various algs/processes that you
could be referencing.  Rough cvs improvements, 33% bash sourcing
improvement- for those thinking parsing bash == slow portage, it's not
the case.  Users *never* see portage sourcing ebuilds for their keys
(DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and
the ebuilds in the overlay are _only_ sourced when they've changed.  The
improvements in bash sourcing speed in cvs were A) intended to fix env
handling for ebuilds (ancillary benefit), and B) speed up regen for devs
and the server that generates the metacache for rsync users.

So no, bash isn't really what's slowing things down.  :)

If you're referencing the nice long pause after sync'ing, that's
transfer of the cache from ${PORTDIR}/metadata/cache to
/var/cache/edb/dep/${PORTDIR}; that's been speed up also.  Potentially
might have that pause eliminated also, although that would require
ensuring the tree is readonly- that's another can of worms.

Aside from that, there is searching speed, which is a bit slowed down by
the current use of locks in the default portage_db_flat ebuild metadata
cache.  Additionally, portage_db_flat uses seperate files for each cpv
(category/package-version), so there is considerable overhead from
opening/closing a crapload of files.

Using portage_db_anydbm improves this, although it has a few issues of
it's own.

If you're referencing doing a search based on description, well, the
cache backend as mentioned above slows things down pretty majorly.  Even
with anydbm, it still has to proceed cpv by cpv- basically walk the
entire cache, *while* verifying the cache isn't stale- eg, check the
stored mtime, and compare it to the ebuilds mtime.

Things could be speed up by treating the tree/cache on disk as readonly-
this is something being bantered about, and may happen.  Treating the
tree/cache as readonly means we don't have to do any locking in the
cache, nor staleness checks (less IO).

> 
> and I can't see that difference between portage and apt in the area
> portage is slow, ok apt uses a db and don't need to check use flags,
> but they're orders of magnitude different. Even lemons and apples are
> that different ;)
See above.

> > >   - portage to act as a daemon, queue requests  and fetch packages.
> > >If portage could be a daemon with 3 threads: one that download
> > >packages, one that compiles and one to manage the other and accept
> > >requests; then it could schedule download to maximize download
> > >throughput,
parallel fetch is in cvs already.  Portage 2.0.51 already supports this
in a way, 
(emerge -f targets &> /dev/null &); emerge targets

> > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in
> > portage CVS, although it
> > doesnt use threads, because there is no way to kill processes (wget,
> > etc.) spawned from within
> > a thread, so youd have stale processes after Ctrl+C'ing portage.
Doesn't apply in this case, daemonized ebuild.sh just speeds up bash
sourcing which most users won't see.  Devs, on the other hand see it
since they use cvs- no rsyncing of a pregenerated cache.

> 
> Great!
> BTW, with threads I meant the concept of more than one thing running
> in parallel, don't need to be posix threads, can be process or even
> one process using select()
Currently implemented via fork.  Doing long running threads in python is
a bit trickier then you might suspect (tried that route, stopping a
thread w/out having it check up every 5 seconds is pretty fricking
hard/annoying).

> > Jstubbs is working on an api that will make its way into a later
> > revision of portage.  As far as parsing
> > ebuilds, they are sourced directly from bash.
> 
> There is any explanation/roadmap/design I can look at? Jstubbs reads
> this list? What's his goals, how he want to achieve it?
He'd have to state his goals- 
offhand, afaik he threw in some of this goals in 
http://dev.gentoo.org/~jstubbs/portage/goals.txt
as are a collection of mine, and some of genone (Marius Mauch).

> 
> About parsing of ebuilds, what do I need to source before the ebuild
> itself? I mean, to get things like "inherit" working.
All of ebuild.sh.  Seriously. :)

Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild;
lot of functions are expected to exist for ebuilds to work, inherit fex
(bash function).

I'd suggest grabbing
http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through
bin/ebuild*.sh and bin/isolated*

With the exemption of ebuild-daemon.sh, all of that code is required to
create the appropriate bash environment that ebuilds expect.  Even with
that default env, eclasses exist to extend it and add new functionality.

Portage *does* have issues that need correcting, calling
patterns/design/structure changed, etc.  Trying to elaborate on the
issues above, hope it provides some insight into why things are they way
they are (and potential avenues to check out for improving performance).
~brian

--
gentoo-portage-dev@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-30 14:22     ` Brian Harring
@ 2004-11-30 14:53       ` Jason Stubbs
  2004-12-01  4:13         ` Gustavo Barbieri
  2004-12-01  3:55       ` Gustavo Barbieri
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Stubbs @ 2004-11-30 14:53 UTC (permalink / raw
  To: gentoo-portage-dev

On Tuesday 30 November 2004 23:22, Brian Harring wrote:
> On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
> > On Sat, 27 Nov 2004 17:48:54 -0600, Michael Tindal <urilith@gentoo.org> 
> > wrote: 
> > > Jstubbs is working on an api that will make its way into a later
> > > revision of portage.  As far as parsing
> > > ebuilds, they are sourced directly from bash.
> >
> > There is any explanation/roadmap/design I can look at? Jstubbs reads
> > this list? What's his goals, how he want to achieve it?

I read your first message, thought to myself "does this deserve an answer?" 
and then ignored the entire thread, apart from Michael's and Brian's posts.

> He'd have to state his goals-

Strict clear dependency resolution. It's already slower in CVS and, at least 
theoretically, can only become slower. On the other hand, it should save 
hours and hours in compile failures.

Regards,
Jason Stubbs

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-30 14:53       ` Jason Stubbs
@ 2004-12-01  4:13         ` Gustavo Barbieri
  2004-12-01  8:41           ` Brian Harring
  0 siblings, 1 reply; 25+ messages in thread
From: Gustavo Barbieri @ 2004-12-01  4:13 UTC (permalink / raw
  To: gentoo-portage-dev

On Tue, 30 Nov 2004 23:53:19 +0900, Jason Stubbs <jstubbs@gentoo.org> wrote:
> On Tuesday 30 November 2004 23:22, Brian Harring wrote:
> > On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
> > > On Sat, 27 Nov 2004 17:48:54 -0600, Michael Tindal <urilith@gentoo.org>
> > > wrote:
> > > > Jstubbs is working on an api that will make its way into a later
> > > > revision of portage.  As far as parsing
> > > > ebuilds, they are sourced directly from bash.
> > >
> > > There is any explanation/roadmap/design I can look at? Jstubbs reads
> > > this list? What's his goals, how he want to achieve it?
> 
> I read your first message, thought to myself "does this deserve an answer?"
> and then ignored the entire thread, apart from Michael's and Brian's posts.

Sorry, I didn't mean to be rude or cause a tornado in this list, it
was a mix of thousands of ideas and doubts with non-native language
that makes me choose the bad mail subject.

 
> > He'd have to state his goals-
> 
> Strict clear dependency resolution. It's already slower in CVS and, at least
> theoretically, can only become slower. On the other hand, it should save
> hours and hours in compile failures.

This is good :)

Also, I read things from http://dev.gentoo.org/~jstubbs/portage/ and
your ideas are really great.

Things that I can help now:
   Create and convert to using a CPV class: I have a Package and
PackageVersion classes working, PackageVersion knows how to compare to
each other and how to get more info, everything delayed until they're
used. The Package class have a list of PackageVersion, this list is
just loaded from portage when accessed.
     http://ltc08.ic.unicamp.br/~gustavo/packagemanagementsystem.py,
if you have some time, look at the classes, don't mind at the other
parts, since are just quick hack to access the portage.
   Use of iterators/generators are of a great help, saves memory and
even time... Python 2.4 have an equivalent, but that doesn't build
lists, it's called "generator expressions", probably you already know.
In my code there are some generators.

>From goals.txt I'm not sure about the modularisation section, but I
can help with process management. A friend and I are playing with
depedency solving to help improve boot speed (there is a bugreport in
bugs.gentoo.org) our goal is to have a working C code to solve the
boot process dependencies, but we can try to make it general enough
and then write a python wrapper over it. Right now we have a working
prototype in python:
http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_6.py

Thank you for your time, 

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01  4:13         ` Gustavo Barbieri
@ 2004-12-01  8:41           ` Brian Harring
  2004-12-01 13:41             ` Gustavo Barbieri
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Harring @ 2004-12-01  8:41 UTC (permalink / raw
  To: gentoo-portage-dev

On Tue, 2004-11-30 at 20:13, Gustavo Barbieri wrote:
> From goals.txt I'm not sure about the modularisation section, but I
> can help with process management. A friend and I are playing with
> depedency solving to help improve boot speed (there is a bugreport in
> bugs.gentoo.org) our goal is to have a working C code to solve the
> boot process dependencies, but we can try to make it general enough
> and then write a python wrapper over it. Right now we have a working
> prototype in python:
> http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_6.py
You'd probably want to talk to the baselayout peeps; how will this
differ from the existing rc_parallel_startup?
Aside from being c based rather then bash that is....
~brian


--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01  8:41           ` Brian Harring
@ 2004-12-01 13:41             ` Gustavo Barbieri
  0 siblings, 0 replies; 25+ messages in thread
From: Gustavo Barbieri @ 2004-12-01 13:41 UTC (permalink / raw
  To: gentoo-portage-dev, ferringb

On Wed, 01 Dec 2004 00:41:24 -0800, Brian Harring <ferringb@gentoo.org> wrote:
> On Tue, 2004-11-30 at 20:13, Gustavo Barbieri wrote:
> > From goals.txt I'm not sure about the modularisation section, but I
> > can help with process management. A friend and I are playing with
> > depedency solving to help improve boot speed (there is a bugreport in
> > bugs.gentoo.org) our goal is to have a working C code to solve the
> > boot process dependencies, but we can try to make it general enough
> > and then write a python wrapper over it. Right now we have a working
> > prototype in python:
> > http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_6.py
> You'd probably want to talk to the baselayout peeps; how will this
> differ from the existing rc_parallel_startup?
> Aside from being c based rather then bash that is....

I already do: http://bugs.gentoo.org/show_bug.cgi?id=69579

there is more info there, reference papers.

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-30 14:22     ` Brian Harring
  2004-11-30 14:53       ` Jason Stubbs
@ 2004-12-01  3:55       ` Gustavo Barbieri
  2004-12-01  9:37         ` Gregorio Guidi
  1 sibling, 1 reply; 25+ messages in thread
From: Gustavo Barbieri @ 2004-12-01  3:55 UTC (permalink / raw
  To: gentoo-portage-dev, ferringb

On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org> wrote:
> On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote:
> > > >The portage library is too heavy, complicated and make things slow.
> > > >Heavy and complicated I noticed from (trying to) look at the source,
> > > >slow by usage.
> You *really* should explain how it's heavy and complicated.
> Generalizations don't help to improve it :)

You're right, but to explain these I need to understand it a bit more.
By now you can see it just as user feeling.


> > > >time emerge # without parameters
> > > >real    0m0.614s
> > > >user    0m0.487s
> > > >sys     0m0.046s
> > > >
> > > >time emerge -pv world # 16 packages to be upgraded
> > > >real    0m22.664s
> > > >user    0m12.423s
> > > >sys     0m1.130s
> There's quite a large difference between just importing portage, and
> actually parsing your profile, determining what your use flags are
> (since profiles can define defaults, and some use flags are based upon
> packages being installed dependant on the profile, perl fex).  That, and
> walking/building the depgraph, querying the cache, locking, etc.

These two were not related, just put them together since i "measured" 
them in sequence.

I just think that 1/2 second for just printing "usage" message is too
much, I already experienced more than seconds. But this doesn't real
matters, forget it.


> > > >It's too much, look at debian apt, it's fast. And I can't see why
> > > >portage is slow.
> > > >Forgive me if I'm wrong, but portage just need to parse
> > > >/var/lib/portage/world (237 entries in my case), them for each check
> > > >if there is any other version greater than and if so check for
> > > >dependencies. Why 22seconds? A hand made take less than 1.
>
> Checks first level depends of all packages in world also.  So that list
> just got larger :)

I know, but that larger?
Anyway, I'll try to understand how you do things, what's read from
disk to memory, data structures... any documents on that? just reading
the source is painful :/


> Regarding debian apt, it's likely apples/oranges as urilith stated.  One
> thing to note is that afaik, debian dependencies lack versions- they're
> basically a flat namespace.
>
> fex, if a dpkg states it deps mysql, there is mysql.  Singular.
> W/ portage, well, need to determine what version is available based upon
> keywords, package.mask, and users /etc/portage/package.keywords (and
> other things).

As I see, this doesn't make algorithms worst (exponentially), just add
a constant... that constant is that huge?


> Note I work on portage, not dpkg/apt.  So I could be talking out of my
> ass there...

Ok, and I work with none so far... :) [but as a gentoo user I want to
improve my sys]

 
> > I'll look at CVS, but I don't see why portage need to be slow. As you
> > said, it's being fixed.
> Elaborate on how it's slow.  There are various algs/processes that you
> could be referencing.  Rough cvs improvements, 33% bash sourcing
> improvement- for those thinking parsing bash == slow portage, it's not
> the case.  Users *never* see portage sourcing ebuilds for their keys
> (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and
> the ebuilds in the overlay are _only_ sourced when they've changed.  The
> improvements in bash sourcing speed in cvs were A) intended to fix env
> handling for ebuilds (ancillary benefit), and B) speed up regen for devs
> and the server that generates the metacache for rsync users.
> 
> So no, bash isn't really what's slowing things down.  :)

Good to know, from previous messages I was believing that every
"emerge -s" did source the whole portage tree :)



> If you're referencing the nice long pause after sync'ing, that's
> transfer of the cache from ${PORTDIR}/metadata/cache to
> /var/cache/edb/dep/${PORTDIR}; that's been speed up also.  Potentially
> might have that pause eliminated also, although that would require
> ensuring the tree is readonly- that's another can of worms.

No, I don't care about the caching stuff, just that "emerge
something_without_deps" takes too long.


> Aside from that, there is searching speed, which is a bit slowed down by
> the current use of locks in the default portage_db_flat ebuild metadata
> cache.  Additionally, portage_db_flat uses seperate files for each cpv
> (category/package-version), so there is considerable overhead from
> opening/closing a crapload of files.
> 
> Using portage_db_anydbm improves this, although it has a few issues of
> it's own.

Hum, here it comes. That's the part I think is slow and my reasons
(guess) are those you said: locks and the "lots of small files"
instead of one, probably real/optimized/indexed, database.

Sorry, but I was not aware of the _anydbm stuff, were I can read more about it?


> If you're referencing doing a search based on description, well, the
> cache backend as mentioned above slows things down pretty majorly.  Even
> with anydbm, it still has to proceed cpv by cpv- basically walk the
> entire cache, *while* verifying the cache isn't stale- eg, check the
> stored mtime, and compare it to the ebuilds mtime.
> 
> Things could be speed up by treating the tree/cache on disk as readonly-
> this is something being bantered about, and may happen.  Treating the
> tree/cache as readonly means we don't have to do any locking in the
> cache, nor staleness checks (less IO).

Optimizing for the common case, it's a valid assumption. It will save
us a lot of time and may cause little problem, since portage is much
less write than read.



> > and I can't see that difference between portage and apt in the area
> > portage is slow, ok apt uses a db and don't need to check use flags,
> > but they're orders of magnitude different. Even lemons and apples are
> > that different ;)
> See above.
> 
> > > >   - portage to act as a daemon, queue requests  and fetch packages.
> > > >If portage could be a daemon with 3 threads: one that download
> > > >packages, one that compiles and one to manage the other and accept
> > > >requests; then it could schedule download to maximize download
> > > >throughput,
> parallel fetch is in cvs already.  Portage 2.0.51 already supports this
> in a way,
> (emerge -f targets &> /dev/null &); emerge targets
> 
> > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in
> > > portage CVS, although it
> > > doesnt use threads, because there is no way to kill processes (wget,
> > > etc.) spawned from within
> > > a thread, so youd have stale processes after Ctrl+C'ing portage.
> Doesn't apply in this case, daemonized ebuild.sh just speeds up bash
> sourcing which most users won't see.  Devs, on the other hand see it
> since they use cvs- no rsyncing of a pregenerated cache.
>
> > Great!
> > BTW, with threads I meant the concept of more than one thing running
> > in parallel, don't need to be posix threads, can be process or even
> > one process using select()
> Currently implemented via fork.  Doing long running threads in python is
> a bit trickier then you might suspect (tried that route, stopping a
> thread w/out having it check up every 5 seconds is pretty fricking
> hard/annoying).
>
> > > Jstubbs is working on an api that will make its way into a later
> > > revision of portage.  As far as parsing
> > > ebuilds, they are sourced directly from bash.
> >
> > There is any explanation/roadmap/design I can look at? Jstubbs reads
> > this list? What's his goals, how he want to achieve it?
> He'd have to state his goals-
> offhand, afaik he threw in some of this goals in
> http://dev.gentoo.org/~jstubbs/portage/goals.txt
> as are a collection of mine, and some of genone (Marius Mauch).
> 
> >
> > About parsing of ebuilds, what do I need to source before the ebuild
> > itself? I mean, to get things like "inherit" working.
> All of ebuild.sh.  Seriously. :)
> 
> Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild;
> lot of functions are expected to exist for ebuilds to work, inherit fex
> (bash function).

Ok.


> I'd suggest grabbing
> http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through
> bin/ebuild*.sh and bin/isolated*
> 
> With the exemption of ebuild-daemon.sh, all of that code is required to
> create the appropriate bash environment that ebuilds expect.  Even with
> that default env, eclasses exist to extend it and add new functionality.
> 
> Portage *does* have issues that need correcting, calling
> patterns/design/structure changed, etc.  Trying to elaborate on the
> issues above, hope it provides some insight into why things are they way
> they are (and potential avenues to check out for improving performance).

I'll try the CVS.

Thank you for your time and patient for replying to my kinda rude
question/doubts. I'll try to help as far as possible. If there are
minor works, I could start learning portage in more depth.


-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01  3:55       ` Gustavo Barbieri
@ 2004-12-01  9:37         ` Gregorio Guidi
  2004-12-01 10:59           ` Brian Harring
  0 siblings, 1 reply; 25+ messages in thread
From: Gregorio Guidi @ 2004-12-01  9:37 UTC (permalink / raw
  To: gentoo-portage-dev

On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org>

> If you're referencing the nice long pause after sync'ing, that's
> transfer of the cache from ${PORTDIR}/metadata/cache to
> /var/cache/edb/dep/${PORTDIR}; that's been speed up also.  Potentially
> might have that pause eliminated also, although that would require
> ensuring the tree is readonly- that's another can of worms.

I was just thinkng about that, and I could not fully understand this point.
Could you explain a bit more?

Thanks
Gregorio

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01  9:37         ` Gregorio Guidi
@ 2004-12-01 10:59           ` Brian Harring
  2004-12-01 11:25             ` Gregorio Guidi
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Harring @ 2004-12-01 10:59 UTC (permalink / raw
  To: gentoo-portage-dev

On Wed, 2004-12-01 at 01:37, Gregorio Guidi wrote:
> On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org>
> 
> > If you're referencing the nice long pause after sync'ing, that's
> > transfer of the cache from ${PORTDIR}/metadata/cache to
> > /var/cache/edb/dep/${PORTDIR}; that's been speed up also.  Potentially
> > might have that pause eliminated also, although that would require
> > ensuring the tree is readonly- that's another can of worms.
> 
> I was just thinkng about that, and I could not fully understand this point.
> Could you explain a bit more?
Expanding on the commentary earlier about a readonly tree/cache vs the
current- rsync'd trees are currently treated as modifiable by the user. 
In other words, the cache for the tree must be verified- this means
pulling the mtime from the ebuild, and comparing it to the stored mtime
in the cache.  Additionally, if the ebuild inherits eclasses, need to
check the eclass's mtime and the stored mtime.  All of this is done to
ensure that if the eclass/ebuild changes, the cache is accurate.

So... the rsync'd cache ($portdir/metadata/cache) is accurate to the
rsync'd tree, minus user modifications- since the tree is treated as
modifiable, the cache *can potentially* be updated.

That nice lil long pause is portage transfering the cache from
portdir/metadata/cache, into the users local cache
(/var/cache/edb/dep/$portdir typically).  This is done for a few
reasons,

1) don't modify $portdir/metadata/cache.  Modifying the rsync'd cache
directly results in more to rsync for the next sync.  Going that route
would result in a lot of dial up users out for blood.
2) the user may not be using the same cache backend as the distributed
cache- the rsync'd cache is portage_db_flat, while the user may be using
a custom sqlite backend.  I suspect this will be come more common place
whenever cvs becomes stabled- the cache backend is being refactored
currently.
3) cleanse stale entries from the cache, so you don't end up w/ a couple
thousand extra files sitting in your local cache.  Stable portage
accomplishes this by wiping *all* local cache entries, and transferring
the *entire* rsync'd cache over.  Cvs is smarter, transfers only what
has changed, and removes the stale entries (this is why it's faster).

So... that's why portage basically copies the cache to a different
location.  If the tree were treated as readonly, and it was enforced in
some manner, that transfer wouldn't be needed.  This is something that's
being batted around also- the code for making the cache readonly is
already done fex.
~brian

--
gentoo-portage-dev@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01 10:59           ` Brian Harring
@ 2004-12-01 11:25             ` Gregorio Guidi
  2004-12-01 12:08               ` Brian Harring
  0 siblings, 1 reply; 25+ messages in thread
From: Gregorio Guidi @ 2004-12-01 11:25 UTC (permalink / raw
  To: gentoo-portage-dev

On Wednesday 01 December 2004 11:59, Brian Harring wrote:
> On Wed, 2004-12-01 at 01:37, Gregorio Guidi wrote:
> > On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org>
>
> 1) don't modify $portdir/metadata/cache.  Modifying the rsync'd cache
> directly results in more to rsync for the next sync.  Going that route
> would result in a lot of dial up users out for blood.
> 2) the user may not be using the same cache backend as the distributed
> cache- the rsync'd cache is portage_db_flat, while the user may be using
> a custom sqlite backend.  I suspect this will be come more common place
> whenever cvs becomes stabled- the cache backend is being refactored
> currently.
> 3) cleanse stale entries from the cache, so you don't end up w/ a couple
> thousand extra files sitting in your local cache.  Stable portage
> accomplishes this by wiping *all* local cache entries, and transferring
> the *entire* rsync'd cache over.  Cvs is smarter, transfers only what
> has changed, and removes the stale entries (this is why it's faster).

I was interested exactly in points 1 and 3 (and I'm glad to hear there's even
a sqlite backend).
With respect to point 3, the solution you imlpemented is surely the right 
thing to do.
With respect to point 1, maybe a good solution could be to just check cache 
validity in $portdir/metadata when the cache is needed, and write a new cache 
in /var/cache/edb if it is not valid.

But since either a solution to point 3 or to point 1 would resolve the basic 
speed problem, the cvs implementation should be nearly optimal.

> So... that's why portage basically copies the cache to a different
> location.  If the tree were treated as readonly, and it was enforced in
> some manner, that transfer wouldn't be needed.  This is something that's
> being batted around also- the code for making the cache readonly is
> already done fex.
> ~brian
>
>
> --
> gentoo-portage-dev@gentoo.org mailing list

Thanks a lot for your answer!
Gregorio

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01 11:25             ` Gregorio Guidi
@ 2004-12-01 12:08               ` Brian Harring
  2004-12-01 13:25                 ` Gregorio Guidi
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Harring @ 2004-12-01 12:08 UTC (permalink / raw
  To: gentoo-portage-dev

On Wed, 2004-12-01 at 03:25, Gregorio Guidi wrote:
> On Wednesday 01 December 2004 11:59, Brian Harring wrote:
> > On Wed, 2004-12-01 at 01:37, Gregorio Guidi wrote:
> > > On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org>
> >
> > 1) don't modify $portdir/metadata/cache.  Modifying the rsync'd cache
> > directly results in more to rsync for the next sync.  Going that route
> > would result in a lot of dial up users out for blood.
> > 2) the user may not be using the same cache backend as the distributed
> > cache- the rsync'd cache is portage_db_flat, while the user may be using
> > a custom sqlite backend.  I suspect this will be come more common place
> > whenever cvs becomes stabled- the cache backend is being refactored
> > currently.
> > 3) cleanse stale entries from the cache, so you don't end up w/ a couple
> > thousand extra files sitting in your local cache.  Stable portage
> > accomplishes this by wiping *all* local cache entries, and transferring
> > the *entire* rsync'd cache over.  Cvs is smarter, transfers only what
> > has changed, and removes the stale entries (this is why it's faster).
> 
> I was interested exactly in points 1 and 3 (and I'm glad to hear there's even
> a sqlite backend).
Nothing official sqlite thus far, I just know a couple of people who are
did their own DIY backend for it.  The cache refactoring should include
a sqlite backend, although it hasn't been coded yet.

> With respect to point 3, the solution you imlpemented is surely the right 
> thing to do.
> With respect to point 1, maybe a good solution could be to just check cache 
> validity in $portdir/metadata when the cache is needed, and write a new cache 
> in /var/cache/edb if it is not valid.
Err... elaborate
Assuming I'm following, you're proposing doing the updates to local
cache, and reading from the metacache?  If so, that's twice the level of
potential stats/reads required, which won't speed it up much :)
~brian



--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01 12:08               ` Brian Harring
@ 2004-12-01 13:25                 ` Gregorio Guidi
  2004-12-01 13:38                   ` Jason Stubbs
  0 siblings, 1 reply; 25+ messages in thread
From: Gregorio Guidi @ 2004-12-01 13:25 UTC (permalink / raw
  To: gentoo-portage-dev

On Wednesday 01 December 2004 13:08, Brian Harring wrote:
> > > 1) don't modify $portdir/metadata/cache.  Modifying the rsync'd cache
> > > directly results in more to rsync for the next sync.  Going that route
> > > would result in a lot of dial up users out for blood.
> > > 2) the user may not be using the same cache backend as the distributed
> > > cache- the rsync'd cache is portage_db_flat, while the user may be
> > > using a custom sqlite backend.  I suspect this will be come more common
> > > place whenever cvs becomes stabled- the cache backend is being
> > > refactored currently.
> > > 3) cleanse stale entries from the cache, so you don't end up w/ a
> > > couple thousand extra files sitting in your local cache.  Stable
> > > portage accomplishes this by wiping *all* local cache entries, and
> > > transferring the *entire* rsync'd cache over.  Cvs is smarter,
> > > transfers only what has changed, and removes the stale entries (this is
> > > why it's faster).

> > With respect to point 3, the solution you imlpemented is surely the right
> > thing to do.
> > With respect to point 1, maybe a good solution could be to just check
> > cache validity in $portdir/metadata when the cache is needed, and write a
> > new cache in /var/cache/edb if it is not valid.
>
> Err... elaborate
> Assuming I'm following, you're proposing doing the updates to local
> cache, and reading from the metacache?  If so, that's twice the level of
> potential stats/reads required, which won't speed it up much :)
> ~brian

Nah, it would be something like:
- see if cache exists in /var/cache/edb (one cheap access() call)
 - if it exists, validate (costly stat() calls, but 99% success)
 - if it does not exists or is not valid, read cache in $portdir/metadata and   
   validate (stat() calls, 99% success).
 - if cache is not valid, create new cache in /var/cache/edb (ebuild  
  sourcing: takes a lot of times).
- read cache.

- after emerge sync, go through cache files in /var/cache/edb (usually not 
  many) and remove entries corresponding to deleted ebuilds.

But I'm not aware of portage internals, for instance, I'm thinking of a system 
where the cache is read only one time and then stored in memory.
Does portage does so, or does it read the cache every time a key is needed 
from there?

Gregorio

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-12-01 13:25                 ` Gregorio Guidi
@ 2004-12-01 13:38                   ` Jason Stubbs
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Stubbs @ 2004-12-01 13:38 UTC (permalink / raw
  To: gentoo-portage-dev

On Wednesday 01 December 2004 22:25, Gregorio Guidi wrote:
> Nah, it would be something like:
> - see if cache exists in /var/cache/edb (one cheap access() call)
>  - if it exists, validate (costly stat() calls, but 99% success)
>  - if it does not exists or is not valid, read cache in $portdir/metadata
> and validate (stat() calls, 99% success).
>  - if cache is not valid, create new cache in /var/cache/edb (ebuild
>   sourcing: takes a lot of times).
> - read cache.
>
> - after emerge sync, go through cache files in /var/cache/edb (usually not
>   many) and remove entries corresponding to deleted ebuilds.

You've just described what portage does with almost perfect accuracy. Let me 
rearrange it for you:

> - see if cache exists in /var/cache/edb (one cheap access() call)
>  - if it exists, validate (costly stat() calls, but 99% success)
>  - if cache is not valid, create new cache in /var/cache/edb (ebuild
>   sourcing: takes a lot of times).
> - read cache.

> - after emerge sync, go through cache files in /var/cache/edb (usually not
>   many) and remove entries corresponding to deleted ebuilds.
>  - if it does not exists or is not valid, read cache in $portdir/metadata
> and validate (stat() calls, 99% success).

If /var/cache/edb is not valid during standard emerge operation, there is 
almost no chance that $portdir/metadata will be valid.

Please do some research before putting forth any more "new" ideas. The code is 
not easy to read, but is getting better. However, unless you're willing to 
put in the effort of deciphering it, making suggestions is by no means 
helpful.

Regards,
Jason Stubbs



--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri
  2004-11-27 23:48 ` Michael Tindal
@ 2004-11-28  3:41 ` Luke-Jr
  2004-11-28 17:19   ` Gustavo Barbieri
  2004-11-28  5:44 ` Ed Grimm
  2004-11-28 19:37 ` Paul de Vrieze
  3 siblings, 1 reply; 25+ messages in thread
From: Luke-Jr @ 2004-11-28  3:41 UTC (permalink / raw
  To: gentoo-portage-dev

On Saturday 27 November 2004 11:10 pm, Gustavo Barbieri wrote:
> Categories are mixed: there is a net-www/apache and net-www/mod_*
> (apache modules), but there is a more convenient category www-apache/
> for them. This is one example, there are more mistakes.  There is any
> plan to fix them in next portage releases?

IIRC, net-www is an old category that  should be www-* sometime in the future. 
I believe this change was in a notice on the main site a while ago.

>
> Some packages use numbering version padded with zero, that's good to
> list with shell functions, but it's bad because you can't change them
> to numbers and them back to string. For example:
> mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> becomes 1.0 and you can't map back to the ebuild.

Versions are *not* decimal numbers, but a set of three integers. Version 1.15 
is a higher version than 1.2. It might be seen as nitpicking, but "integers" 
generally always refers to a whole number (1 or 2, not 1.3 or 2.4)

>
> Portage provides metadata.xml, cool. But it's hardly used :(
> metadata.xml seems to provide tags for maintainers, changelogs and
> long description, many (most?) packages don't use them.

They should. It's a semi-gradual process.
-- 
Luke-Jr
Developer, Utopios
http://utopios.org/

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28  3:41 ` Luke-Jr
@ 2004-11-28 17:19   ` Gustavo Barbieri
  0 siblings, 0 replies; 25+ messages in thread
From: Gustavo Barbieri @ 2004-11-28 17:19 UTC (permalink / raw
  To: gentoo-portage-dev

On Sun, 28 Nov 2004 03:41:53 +0000, Luke-Jr <luke-jr@utopios.org> wrote:
> On Saturday 27 November 2004 11:10 pm, Gustavo Barbieri wrote:
> > Categories are mixed: there is a net-www/apache and net-www/mod_*
> > (apache modules), but there is a more convenient category www-apache/
> > for them. This is one example, there are more mistakes.  There is any
> > plan to fix them in next portage releases?
> 
> IIRC, net-www is an old category that  should be www-* sometime in the future.
> I believe this change was in a notice on the main site a while ago.

Sorry, I didn't follow gentoo news very much, I'll start to.

> > Some packages use numbering version padded with zero, that's good to
> > list with shell functions, but it's bad because you can't change them
> > to numbers and them back to string. For example:
> > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> > becomes 1.0 and you can't map back to the ebuild.
> 
> Versions are *not* decimal numbers, but a set of three integers. Version 1.15
> is a higher version than 1.2. It might be seen as nitpicking, but "integers"
> generally always refers to a whole number (1 or 2, not 1.3 or 2.4)

Yes, I know that.   In my fast portage module I convert version to a
PackageVersion class, that split the version according to naming
policy, then I build a list of version numbers as integers and the
last element is a string, "" if no letter modifier or the letter
itself if there is one, like openssl. Then i have one suffix,
converted to a tuple of 2 integers, the first is the position of the
"alpha|pre|rc|..." order and the second is the modifier number. Then I
have the release version.

My problem was that if I convert: app-misc/gcal/gcal-3.01.ebuild, then
I have:  (3,1) as version, back to string as "3.1" instead of "3.1".

To overcome this I keep the original version string and use the
numbers just to compare versions.

> > Portage provides metadata.xml, cool. But it's hardly used :(
> > metadata.xml seems to provide tags for maintainers, changelogs and
> > long description, many (most?) packages don't use them.
> 
> They should. It's a semi-gradual process.

I did a lot of ebuilds but never realised of them!
http://www.gentoo.org/doc/en/ebuild-submit.xml doesn't mention that,
however http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml
does. Maybe people doen't care about it because it's not that evident
or required to accept the ebuild.

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri
  2004-11-27 23:48 ` Michael Tindal
  2004-11-28  3:41 ` Luke-Jr
@ 2004-11-28  5:44 ` Ed Grimm
  2004-11-28  6:18   ` John Nilsson
  2004-11-28 17:22   ` Gustavo Barbieri
  2004-11-28 19:37 ` Paul de Vrieze
  3 siblings, 2 replies; 25+ messages in thread
From: Ed Grimm @ 2004-11-28  5:44 UTC (permalink / raw
  To: gentoo-portage-dev, Gustavo Barbieri

On Sat, 27 Nov 2004, Gustavo Barbieri wrote:

> Some packages use numbering version padded with zero, that's good to
> list with shell functions, but it's bad because you can't change them
> to numbers and them back to string. For example:
> mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> becomes 1.0 and you can't map back to the ebuild.

It's worse than that.  They're not always integers.  It's safest to
treat version numbers as strings as much as possible; when one needs to
break them into integer portions, do this for comparison only, and save
the original.  Finally, a number of packages would require that you
provide a mechanism for determining all version numbers that aren't
strictly numeric.  Openssl, with its \d+.\d+.\d+[a-z] versions is easy.
hddtemp, with its alpha/beta tags, is doable but tedious.

There may be others which are more problematic.  I haven't seen Gentoo
using them, but many kernels are distributed with -[a-z][a-z]\d+
versions, which indicate which alternate maintainer managed the
additional patches beyond the standard kernel version - which is newer,
-mm5 or -bk15?  The world may never know.  (It's only determinate for
specific kernel versions, and frequently it's an apples and lemonade
comparison, as they don't address the same issues.)

Ed

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28  5:44 ` Ed Grimm
@ 2004-11-28  6:18   ` John Nilsson
  2004-11-28 15:58     ` Allen Parker
  2004-11-28 17:22   ` Gustavo Barbieri
  1 sibling, 1 reply; 25+ messages in thread
From: John Nilsson @ 2004-11-28  6:18 UTC (permalink / raw
  To: gentoo-portage-dev

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

On sön, 2004-11-28 at 05:44 +0000, Ed Grimm wrote:
> There may be others which are more problematic.  I haven't seen Gentoo
> using them, but many kernels are distributed with -[a-z][a-z]\d+
> versions, which indicate which alternate maintainer managed the
> additional patches beyond the standard kernel version - which is newer,
> -mm5 or -bk15?  The world may never know.  (It's only determinate for
> specific kernel versions, and frequently it's an apples and lemonade
> comparison, as they don't address the same issues.)

Would it be to much overhead if the ebuilds just linked to previous
versions instead? Like the ineed stuff of the init scripts. This way no
no version parsing at all would be needed.

-John

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28  6:18   ` John Nilsson
@ 2004-11-28 15:58     ` Allen Parker
  0 siblings, 0 replies; 25+ messages in thread
From: Allen Parker @ 2004-11-28 15:58 UTC (permalink / raw
  To: gentoo-portage-dev

On Sun, 28 Nov 2004 07:18:06 +0100, John Nilsson <john@milsson.nu> wrote:
> On sön, 2004-11-28 at 05:44 +0000, Ed Grimm wrote:
> > There may be others which are more problematic.  I haven't seen Gentoo
> > using them, but many kernels are distributed with -[a-z][a-z]\d+
> > versions, which indicate which alternate maintainer managed the
> > additional patches beyond the standard kernel version - which is newer,
> > -mm5 or -bk15?  The world may never know.  (It's only determinate for
> > specific kernel versions, and frequently it's an apples and lemonade
> > comparison, as they don't address the same issues.)
> 
> Would it be to much overhead if the ebuilds just linked to previous
> versions instead? Like the ineed stuff of the init scripts. This way no
> no version parsing at all would be needed.
> 
> -John

the big probem with that, john is that "stale" ebuilds are removed
often... also from apache-1.3.27 -> apache-2.0.41 there's a pretty
HUGE difference in how the packages are actually
compiled/treated/options, etc... for another example, look at the php5
vs the php4 ebuilds... see a difference? it doesn't make sense to do
what you are thinking because as soon as a new version comes out that
obsoletes the old one with new features, etc you end up having to hack
ALL of your ebuilds to support the new features and you're in the same
place you were before. Another thing you should look at, is
eclasses... webapp.eclass especially since it's pretty widely used.
eclasses do what i think you'd want to accomplish with the linking to
previous versions setup. Oh, and by the way, version parsing isn't
something that can be easily avoided. see above on why "world + dog =
same version" is a bad idea.

off list, when i wake up tomorrow, i'll email you a snippet of
conversation i had with johnm about eclasses last nite in
#gentoo-dev... it might be enlightening.

my .02 of a monetary unit. 

Allen Parker
-- 
________________________________________
To avoid being added to my spam filter:
1. Utilize list replies unless otherwise requested.
2. If you DO send me a personal email, use english.
3. HTML isn't cute. It belongs on the web, not in my inbox.

--
gentoo-portage-dev@gentoo.org mailing list

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28  5:44 ` Ed Grimm
  2004-11-28  6:18   ` John Nilsson
@ 2004-11-28 17:22   ` Gustavo Barbieri
  2004-11-29  0:39     ` Gustavo Barbieri
  1 sibling, 1 reply; 25+ messages in thread
From: Gustavo Barbieri @ 2004-11-28 17:22 UTC (permalink / raw
  To: gentoo-portage-dev

On Sun, 28 Nov 2004 05:44:50 +0000 (GMT), Ed Grimm
<paranoid@gentoo.evolution.tgape.org> wrote:
> On Sat, 27 Nov 2004, Gustavo Barbieri wrote:
> 
> > Some packages use numbering version padded with zero, that's good to
> > list with shell functions, but it's bad because you can't change them
> > to numbers and them back to string. For example:
> > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> > becomes 1.0 and you can't map back to the ebuild.
> 
> It's worse than that.  They're not always integers.  It's safest to
> treat version numbers as strings as much as possible; when one needs to
> break them into integer portions, do this for comparison only, and save
> the original.  Finally, a number of packages would require that you
> provide a mechanism for determining all version numbers that aren't
> strictly numeric.  Openssl, with its \d+.\d+.\d+[a-z] versions is easy.
> hddtemp, with its alpha/beta tags, is doable but tedious.

I did a PackageVersion class that parses these sections using a
regular expression following the package naming policy. I don't send
the code since I'll need to recode it, my XFS lost many data after a
power failure :( But I have the bytecode if you want to test it or
help me to get the .py back.


 
> There may be others which are more problematic.  I haven't seen Gentoo
> using them, but many kernels are distributed with -[a-z][a-z]\d+
> versions, which indicate which alternate maintainer managed the
> additional patches beyond the standard kernel version - which is newer,
> -mm5 or -bk15?  The world may never know.  (It's only determinate for
> specific kernel versions, and frequently it's an apples and lemonade
> comparison, as they don't address the same issues.)

Gentoo uses separated kernel sources, I use Con Kolivas source, it's
ck-sources in the tree.

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-28 17:22   ` Gustavo Barbieri
@ 2004-11-29  0:39     ` Gustavo Barbieri
  0 siblings, 0 replies; 25+ messages in thread
From: Gustavo Barbieri @ 2004-11-29  0:39 UTC (permalink / raw
  To: gentoo-portage-dev

On Sun, 28 Nov 2004 15:22:44 -0200, Gustavo Barbieri <barbieri@gmail.com> wrote:
> On Sun, 28 Nov 2004 05:44:50 +0000 (GMT), Ed Grimm
> <paranoid@gentoo.evolution.tgape.org> wrote:
> > On Sat, 27 Nov 2004, Gustavo Barbieri wrote:
> > > Some packages use numbering version padded with zero, that's good to
> > > list with shell functions, but it's bad because you can't change them
> > > to numbers and them back to string. For example:
> > > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> > > becomes 1.0 and you can't map back to the ebuild.
> >
> > It's worse than that.  They're not always integers.  It's safest to
> > treat version numbers as strings as much as possible; when one needs to
> > break them into integer portions, do this for comparison only, and save
> > the original.  Finally, a number of packages would require that you
> > provide a mechanism for determining all version numbers that aren't
> > strictly numeric.  Openssl, with its \d+.\d+.\d+[a-z] versions is easy.
> > hddtemp, with its alpha/beta tags, is doable but tedious.
> 
> I did a PackageVersion class that parses these sections using a
> regular expression following the package naming policy. I don't send
> the code since I'll need to recode it, my XFS lost many data after a
> power failure :( But I have the bytecode if you want to test it or
> help me to get the .py back.

Okay, I hacked a free version of  decompyle from BSD to support python
2.3 bytecode and recovered my file.

If you want to check it out:
http://ltc08.ic.unicamp.br/~gustavo/packagemanager.tar.bz2, the file
of interest is lib/packagemanager/packagemanagementsystem.py, there
are some classes, he PackageVersion is of interest, please look at it.
(if you don't want to get the package:
http://ltc08.ic.unicamp.br/~gustavo/packagemanagementsystem.py)


-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri@jabber.org
  ICQ#: 17249123
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

--
gentoo-portage-dev@gentoo.org mailing list


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-portage-dev] Current portage well designed, but badly used
  2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri
                   ` (2 preceding siblings ...)
  2004-11-28  5:44 ` Ed Grimm
@ 2004-11-28 19:37 ` Paul de Vrieze
  3 siblings, 0 replies; 25+ messages in thread
From: Paul de Vrieze @ 2004-11-28 19:37 UTC (permalink / raw
  To: gentoo-portage-dev

[-- Attachment #1: Type: text/plain, Size: 4720 bytes --]

On Sunday 28 November 2004 00:10, Gustavo Barbieri wrote:
> Hello,
>
> I'm playing with portage and noticed it's well designed, but there are
> some mistakes in its usage at the moment. For example:
>
> Categories are mixed: there is a net-www/apache and net-www/mod_*
> (apache modules), but there is a more convenient category www-apache/
> for them. This is one example, there are more mistakes.  There is any
> plan to fix them in next portage releases?
>
> Some packages use numbering version padded with zero, that's good to
> list with shell functions, but it's bad because you can't change them
> to numbers and them back to string. For example:
> mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it
> becomes 1.0 and you can't map back to the ebuild.
>
> Portage provides metadata.xml, cool. But it's hardly used :(
> metadata.xml seems to provide tags for maintainers, changelogs and
> long description, many (most?) packages don't use them.
>
> The portage library is too heavy, complicated and make things slow.
> Heavy and complicated I noticed from (trying to) look at the source,
> slow by usage. For example:
>
> time emerge # without parameters
> real    0m0.614s
> user    0m0.487s
> sys     0m0.046s
>
> time emerge -pv world # 16 packages to be upgraded
> real    0m22.664s
> user    0m12.423s
> sys     0m1.130s
>
> It's too much, look at debian apt, it's fast. And I can't see why
> portage is slow.
> Forgive me if I'm wrong, but portage just need to parse
> /var/lib/portage/world (237 entries in my case), them for each check
> if there is any other version greater than and if so check for
> dependencies. Why 22seconds? A hand made take less than 1.
>
>
> Also, a brief explanation on why I was playing with portage and some
> requests: I'm coding (for fun, no plan to get in a production state)
> yet another graphical package manager atop portage with the newbie in
> mind. But to achieve my goal I need:
>
>    - a fast portage. Now I'm doing a module to do this for me (see
> more above), at least the basics, like get package information,
> versions, ... and if possible resolve primary dependencies (just to
> show to user in a tab "Dependencies", hidden by default).
>
>    - more meta data, if possible a list of urls to screenshots (most
> packages have a screenshots section), if the url links to an html,
> provide a threshold of images size to get, so it connects and
> downloads every image bigger than it... cached of course.
>
>    - portage to act as a daemon, queue requests  and fetch packages.
> If portage could be a daemon with 3 threads: one that download
> packages, one that compiles and one to manage the other and accept
> requests; then it could schedule download to maximize download
> throughput, downloading smaller packages first while respecting
> dependencies, compile while download and wait until packages are there
> and the "emerge" command just send commands to it.  It would be handy
> since compiling times are huge.
>
>
> About the fast portage: I know portage is a complex monster and is the
> heart of gentoo, if it breaks, everything breaks. But how about a
> python module to be used by other packages that just want to view the
> portage and its packages. If eventually this module works as expected
> and have every current portage feature, it could replace the old one.
>    I started to code my own "fast portage", but some things are picky
> to do, and I want to know how you do that: how do you parse ebuilds to
> get USE, DESCRIPTION, SLOT, DEPEND, ... ?
>    If you want to know why my implementation is fast: I use lazy
> evaluation as far as possible. For example, I load every package, but
> the attributes to available versions, installed versions, the status,
> are just calculated on deman, I use python property() and
> setters/getters for that. Since hardly you'll use every attribute from
> everythin, it loads much faster.
>    I have preliminar code here:
> http://ltc08.ic.unicamp.br/~gustavo/packagemanager.tar.bz2, but some
> modifications I did were lost in a power outtage + xfs... I just have
> the .pyc, if someone knows how to get the .py back...

Well, as said, portage does not parse by itself, but uses bash. This is not 
really fast. The biggest issue however is the absense of lazy evaluation. 
I've been looking at it too, and even have a c++ based parser that can be 
accessed as a python module, but it's undocumented and has issues as it is 
not a full bash replacement, and ebuilds expect bash.

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-12-01 13:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri
2004-11-27 23:48 ` Michael Tindal
2004-11-28 17:08   ` Gustavo Barbieri
2004-11-28 17:31     ` Andrew Gaffney
2004-11-28 17:56       ` Gustavo Barbieri
2004-11-30 14:22     ` Brian Harring
2004-11-30 14:53       ` Jason Stubbs
2004-12-01  4:13         ` Gustavo Barbieri
2004-12-01  8:41           ` Brian Harring
2004-12-01 13:41             ` Gustavo Barbieri
2004-12-01  3:55       ` Gustavo Barbieri
2004-12-01  9:37         ` Gregorio Guidi
2004-12-01 10:59           ` Brian Harring
2004-12-01 11:25             ` Gregorio Guidi
2004-12-01 12:08               ` Brian Harring
2004-12-01 13:25                 ` Gregorio Guidi
2004-12-01 13:38                   ` Jason Stubbs
2004-11-28  3:41 ` Luke-Jr
2004-11-28 17:19   ` Gustavo Barbieri
2004-11-28  5:44 ` Ed Grimm
2004-11-28  6:18   ` John Nilsson
2004-11-28 15:58     ` Allen Parker
2004-11-28 17:22   ` Gustavo Barbieri
2004-11-29  0:39     ` Gustavo Barbieri
2004-11-28 19:37 ` Paul de Vrieze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox