* [gentoo-portage-dev] Current portage well designed, but badly used @ 2004-11-27 23:10 Gustavo Barbieri 2004-11-27 23:48 ` Michael Tindal ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Gustavo Barbieri @ 2004-11-27 23:10 UTC (permalink / raw To: gentoo-portage-dev Hello, I'm playing with portage and noticed it's well designed, but there are some mistakes in its usage at the moment. For example: Categories are mixed: there is a net-www/apache and net-www/mod_* (apache modules), but there is a more convenient category www-apache/ for them. This is one example, there are more mistakes. There is any plan to fix them in next portage releases? Some packages use numbering version padded with zero, that's good to list with shell functions, but it's bad because you can't change them to numbers and them back to string. For example: mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it becomes 1.0 and you can't map back to the ebuild. Portage provides metadata.xml, cool. But it's hardly used :( metadata.xml seems to provide tags for maintainers, changelogs and long description, many (most?) packages don't use them. The portage library is too heavy, complicated and make things slow. Heavy and complicated I noticed from (trying to) look at the source, slow by usage. For example: time emerge # without parameters real 0m0.614s user 0m0.487s sys 0m0.046s time emerge -pv world # 16 packages to be upgraded real 0m22.664s user 0m12.423s sys 0m1.130s It's too much, look at debian apt, it's fast. And I can't see why portage is slow. Forgive me if I'm wrong, but portage just need to parse /var/lib/portage/world (237 entries in my case), them for each check if there is any other version greater than and if so check for dependencies. Why 22seconds? A hand made take less than 1. Also, a brief explanation on why I was playing with portage and some requests: I'm coding (for fun, no plan to get in a production state) yet another graphical package manager atop portage with the newbie in mind. But to achieve my goal I need: - a fast portage. Now I'm doing a module to do this for me (see more above), at least the basics, like get package information, versions, ... and if possible resolve primary dependencies (just to show to user in a tab "Dependencies", hidden by default). - more meta data, if possible a list of urls to screenshots (most packages have a screenshots section), if the url links to an html, provide a threshold of images size to get, so it connects and downloads every image bigger than it... cached of course. - portage to act as a daemon, queue requests and fetch packages. If portage could be a daemon with 3 threads: one that download packages, one that compiles and one to manage the other and accept requests; then it could schedule download to maximize download throughput, downloading smaller packages first while respecting dependencies, compile while download and wait until packages are there and the "emerge" command just send commands to it. It would be handy since compiling times are huge. About the fast portage: I know portage is a complex monster and is the heart of gentoo, if it breaks, everything breaks. But how about a python module to be used by other packages that just want to view the portage and its packages. If eventually this module works as expected and have every current portage feature, it could replace the old one. I started to code my own "fast portage", but some things are picky to do, and I want to know how you do that: how do you parse ebuilds to get USE, DESCRIPTION, SLOT, DEPEND, ... ? If you want to know why my implementation is fast: I use lazy evaluation as far as possible. For example, I load every package, but the attributes to available versions, installed versions, the status, are just calculated on deman, I use python property() and setters/getters for that. Since hardly you'll use every attribute from everythin, it loads much faster. I have preliminar code here: http://ltc08.ic.unicamp.br/~gustavo/packagemanager.tar.bz2, but some modifications I did were lost in a power outtage + xfs... I just have the .pyc, if someone knows how to get the .py back... -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri @ 2004-11-27 23:48 ` Michael Tindal 2004-11-28 17:08 ` Gustavo Barbieri 2004-11-28 3:41 ` Luke-Jr ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Michael Tindal @ 2004-11-27 23:48 UTC (permalink / raw To: gentoo-portage-dev Gustavo Barbieri wrote: >Hello, > >I'm playing with portage and noticed it's well designed, but there are >some mistakes in its usage at the moment. For example: > >Categories are mixed: there is a net-www/apache and net-www/mod_* >(apache modules), but there is a more convenient category www-apache/ >for them. This is one example, there are more mistakes. There is any >plan to fix them in next portage releases? > >Some packages use numbering version padded with zero, that's good to >list with shell functions, but it's bad because you can't change them >to numbers and them back to string. For example: >mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it >becomes 1.0 and you can't map back to the ebuild. > >Portage provides metadata.xml, cool. But it's hardly used :( >metadata.xml seems to provide tags for maintainers, changelogs and >long description, many (most?) packages don't use them. > > All of this you mentioned is really irrelevant to sys-apps/portage, but more relevant to the tree. The categories you mentioned are held resposible by individual maintainers, not the portage team. To see who is responsible for the category/package/ebuild, you use the metadata.xml. Packages dont need to use the metadata.xml directly because it is just that, metadata. It is used to provide the information you just stated, and it serves its purpose well. >The portage library is too heavy, complicated and make things slow. >Heavy and complicated I noticed from (trying to) look at the source, >slow by usage. For example: > >time emerge # without parameters >real 0m0.614s >user 0m0.487s >sys 0m0.046s > >time emerge -pv world # 16 packages to be upgraded >real 0m22.664s >user 0m12.423s >sys 0m1.130s > >It's too much, look at debian apt, it's fast. And I can't see why >portage is slow. >Forgive me if I'm wrong, but portage just need to parse >/var/lib/portage/world (237 entries in my case), them for each check >if there is any other version greater than and if so check for >dependencies. Why 22seconds? A hand made take less than 1. > > > You can't compare apt to portage, dont even try to go down this route. Its like comparing apples to lemons. Portage is slow, but it is being fixed. CVS portage for one is a whole lot faster. >Also, a brief explanation on why I was playing with portage and some >requests: I'm coding (for fun, no plan to get in a production state) >yet another graphical package manager atop portage with the newbie in >mind. But to achieve my goal I need: > > - a fast portage. Now I'm doing a module to do this for me (see >more above), at least the basics, like get package information, >versions, ... and if possible resolve primary dependencies (just to >show to user in a tab "Dependencies", hidden by default). > > - more meta data, if possible a list of urls to screenshots (most >packages have a screenshots section), if the url links to an html, >provide a threshold of images size to get, so it connects and >downloads every image bigger than it... cached of course. > > This is uncessary and would add extra bloat to the tree, and adds more complexity to our dev team. If you want something like this your best bet would be to provide a patch for its functionality on bugs.gentoo.org, but I wouldnt be surprised if it wasnt accepted. > - portage to act as a daemon, queue requests and fetch packages. >If portage could be a daemon with 3 threads: one that download >packages, one that compiles and one to manage the other and accept >requests; then it could schedule download to maximize download >throughput, downloading smaller packages first while respecting >dependencies, compile while download and wait until packages are there >and the "emerge" command just send commands to it. It would be handy >since compiling times are huge. > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in portage CVS, although it doesnt use threads, because there is no way to kill processes (wget, etc.) spawned from within a thread, so youd have stale processes after Ctrl+C'ing portage. >About the fast portage: I know portage is a complex monster and is the >heart of gentoo, if it breaks, everything breaks. But how about a >python module to be used by other packages that just want to view the >portage and its packages. If eventually this module works as expected >and have every current portage feature, it could replace the old one. > > Jstubbs is working on an api that will make its way into a later revision of portage. As far as parsing ebuilds, they are sourced directly from bash. -- Michael Tindal (urilith) Gentoo Linux Developer python | dotnet | apache -- The best way to create is to destroy. -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-27 23:48 ` Michael Tindal @ 2004-11-28 17:08 ` Gustavo Barbieri 2004-11-28 17:31 ` Andrew Gaffney 2004-11-30 14:22 ` Brian Harring 0 siblings, 2 replies; 25+ messages in thread From: Gustavo Barbieri @ 2004-11-28 17:08 UTC (permalink / raw To: gentoo-portage-dev On Sat, 27 Nov 2004 17:48:54 -0600, Michael Tindal <urilith@gentoo.org> wrote: > Gustavo Barbieri wrote: > > > > >Hello, > > > >I'm playing with portage and noticed it's well designed, but there are > >some mistakes in its usage at the moment. For example: > > > >Categories are mixed: there is a net-www/apache and net-www/mod_* > >(apache modules), but there is a more convenient category www-apache/ > >for them. This is one example, there are more mistakes. There is any > >plan to fix them in next portage releases? > > > >Some packages use numbering version padded with zero, that's good to > >list with shell functions, but it's bad because you can't change them > >to numbers and them back to string. For example: > >mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > >becomes 1.0 and you can't map back to the ebuild. > > > >Portage provides metadata.xml, cool. But it's hardly used :( > >metadata.xml seems to provide tags for maintainers, changelogs and > >long description, many (most?) packages don't use them. > > > > > All of this you mentioned is really irrelevant to sys-apps/portage, but > more relevant to the tree. > The categories you mentioned are held resposible by individual > maintainers, not the portage team. > To see who is responsible for the category/package/ebuild, you use the > metadata.xml. Packages dont > need to use the metadata.xml directly because it is just that, > metadata. It is used to provide the information > you just stated, and it serves its purpose well. I just think that if metadata.xml was filled with long description and maintainers emails, it would help people over there. Ok, packages don't need to use it directly, but tools might want to show to users more info about something. Talking about metadata, why does HOMEPAGE and DESCRIPTION are in ebuilds and not in metadata.xml, IMHO they're not used to build the package in any way. Maybe if we move those (always filled) information to metadata.xml, people would fill other fields there. Also, you said that this is irrelevant to the portage application, but to the portage tree. Where can I talk to portage tree maintainers? If I need to patch the entire portage with metadata.xml and stuff like that, it will be an huge work, but if portage maintainers ask the package maintainers to do it for next releases, many people would do small jobs, easier than small group doing many jobs. > >The portage library is too heavy, complicated and make things slow. > >Heavy and complicated I noticed from (trying to) look at the source, > >slow by usage. For example: > > > >time emerge # without parameters > >real 0m0.614s > >user 0m0.487s > >sys 0m0.046s > > > >time emerge -pv world # 16 packages to be upgraded > >real 0m22.664s > >user 0m12.423s > >sys 0m1.130s > > > >It's too much, look at debian apt, it's fast. And I can't see why > >portage is slow. > >Forgive me if I'm wrong, but portage just need to parse > >/var/lib/portage/world (237 entries in my case), them for each check > >if there is any other version greater than and if so check for > >dependencies. Why 22seconds? A hand made take less than 1. > > > > > > > You can't compare apt to portage, dont even try to go down this route. > Its like comparing apples to lemons. > Portage is slow, but it is being fixed. CVS portage for one is a whole > lot faster. I'll look at CVS, but I don't see why portage need to be slow. As you said, it's being fixed. and I can't see that difference between portage and apt in the area portage is slow, ok apt uses a db and don't need to check use flags, but they're orders of magnitude different. Even lemons and apples are that different ;) > >Also, a brief explanation on why I was playing with portage and some > >requests: I'm coding (for fun, no plan to get in a production state) > >yet another graphical package manager atop portage with the newbie in > >mind. But to achieve my goal I need: > > > > - a fast portage. Now I'm doing a module to do this for me (see > >more above), at least the basics, like get package information, > >versions, ... and if possible resolve primary dependencies (just to > >show to user in a tab "Dependencies", hidden by default). > > > > - more meta data, if possible a list of urls to screenshots (most > >packages have a screenshots section), if the url links to an html, > >provide a threshold of images size to get, so it connects and > >downloads every image bigger than it... cached of course. > > > > > This is uncessary and would add extra bloat to the tree, and adds more > complexity to our dev team. > If you want something like this your best bet would be to provide a > patch for its functionality on > bugs.gentoo.org, but I wouldnt be surprised if it wasnt accepted. I mean I want this in metadata.xml, not in ebuilds or so... how can this add complexity to dev team? You mean writing the xml parser, it's easy and I can send the patches. Also, I can provide tools that check urls to see if they still exists, like the homepage and screenshots. > > - portage to act as a daemon, queue requests and fetch packages. > >If portage could be a daemon with 3 threads: one that download > >packages, one that compiles and one to manage the other and accept > >requests; then it could schedule download to maximize download > >throughput, downloading smaller packages first while respecting > >dependencies, compile while download and wait until packages are there > >and the "emerge" command just send commands to it. It would be handy > >since compiling times are huge. > > > > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in > portage CVS, although it > doesnt use threads, because there is no way to kill processes (wget, > etc.) spawned from within > a thread, so youd have stale processes after Ctrl+C'ing portage. Great! BTW, with threads I meant the concept of more than one thing running in parallel, don't need to be posix threads, can be process or even one process using select() > >About the fast portage: I know portage is a complex monster and is the > >heart of gentoo, if it breaks, everything breaks. But how about a > >python module to be used by other packages that just want to view the > >portage and its packages. If eventually this module works as expected > >and have every current portage feature, it could replace the old one. > > > > > Jstubbs is working on an api that will make its way into a later > revision of portage. As far as parsing > ebuilds, they are sourced directly from bash. There is any explanation/roadmap/design I can look at? Jstubbs reads this list? What's his goals, how he want to achieve it? About parsing of ebuilds, what do I need to source before the ebuild itself? I mean, to get things like "inherit" working. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 17:08 ` Gustavo Barbieri @ 2004-11-28 17:31 ` Andrew Gaffney 2004-11-28 17:56 ` Gustavo Barbieri 2004-11-30 14:22 ` Brian Harring 1 sibling, 1 reply; 25+ messages in thread From: Andrew Gaffney @ 2004-11-28 17:31 UTC (permalink / raw To: gentoo-portage-dev Gustavo Barbieri wrote: > Talking about metadata, why does HOMEPAGE and DESCRIPTION are in > ebuilds and not in metadata.xml, IMHO they're not used to build the > package in any way. Maybe if we move those (always filled) > information to metadata.xml, people would fill other fields there. HOMEPAGE and DESCRIPTION have been in ebuilds for a *long* time where metadata.xml is a fairly recent addition. > Also, you said that this is irrelevant to the portage application, but > to the portage tree. Where can I talk to portage tree maintainers? If > I need to patch the entire portage with metadata.xml and stuff like > that, it will be an huge work, but if portage maintainers ask the > package maintainers to do it for next releases, many people would do > small jobs, easier than small group doing many jobs. Almost every dev is a portage tree maintainer. There is no master tree authority that all ebuilds must pass through before hitting the tree. -- Andrew Gaffney Gentoo Linux Developer Installer Project -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 17:31 ` Andrew Gaffney @ 2004-11-28 17:56 ` Gustavo Barbieri 0 siblings, 0 replies; 25+ messages in thread From: Gustavo Barbieri @ 2004-11-28 17:56 UTC (permalink / raw To: gentoo-portage-dev On Sun, 28 Nov 2004 11:31:52 -0600, Andrew Gaffney <agaffney@gentoo.org> wrote: > Gustavo Barbieri wrote: > > Talking about metadata, why does HOMEPAGE and DESCRIPTION are in > > ebuilds and not in metadata.xml, IMHO they're not used to build the > > package in any way. Maybe if we move those (always filled) > > information to metadata.xml, people would fill other fields there. > > HOMEPAGE and DESCRIPTION have been in ebuilds for a *long* time where > metadata.xml is a fairly recent addition. > > > Also, you said that this is irrelevant to the portage application, but > > to the portage tree. Where can I talk to portage tree maintainers? If > > I need to patch the entire portage with metadata.xml and stuff like > > that, it will be an huge work, but if portage maintainers ask the > > package maintainers to do it for next releases, many people would do > > small jobs, easier than small group doing many jobs. > > Almost every dev is a portage tree maintainer. There is no master tree authority > that all ebuilds must pass through before hitting the tree. Okay, but at least promote a paragraph in some gentoo weekly news? Something quick, just to mention metadata.xml. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 17:08 ` Gustavo Barbieri 2004-11-28 17:31 ` Andrew Gaffney @ 2004-11-30 14:22 ` Brian Harring 2004-11-30 14:53 ` Jason Stubbs 2004-12-01 3:55 ` Gustavo Barbieri 1 sibling, 2 replies; 25+ messages in thread From: Brian Harring @ 2004-11-30 14:22 UTC (permalink / raw To: gentoo-portage-dev On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: > > >The portage library is too heavy, complicated and make things slow. > > >Heavy and complicated I noticed from (trying to) look at the source, > > >slow by usage. You *really* should explain how it's heavy and complicated. Generalizations don't help to improve it :) > > >time emerge # without parameters > > >real 0m0.614s > > >user 0m0.487s > > >sys 0m0.046s > > > > > >time emerge -pv world # 16 packages to be upgraded > > >real 0m22.664s > > >user 0m12.423s > > >sys 0m1.130s There's quite a large difference between just importing portage, and actually parsing your profile, determining what your use flags are (since profiles can define defaults, and some use flags are based upon packages being installed dependant on the profile, perl fex). That, and walking/building the depgraph, querying the cache, locking, etc. > > > > > >It's too much, look at debian apt, it's fast. And I can't see why > > >portage is slow. > > >Forgive me if I'm wrong, but portage just need to parse > > >/var/lib/portage/world (237 entries in my case), them for each check > > >if there is any other version greater than and if so check for > > >dependencies. Why 22seconds? A hand made take less than 1. Checks first level depends of all packages in world also. So that list just got larger :) Regarding debian apt, it's likely apples/oranges as urilith stated. One thing to note is that afaik, debian dependencies lack versions- they're basically a flat namespace. fex, if a dpkg states it deps mysql, there is mysql. Singular. W/ portage, well, need to determine what version is available based upon keywords, package.mask, and users /etc/portage/package.keywords (and other things). Note I work on portage, not dpkg/apt. So I could be talking out of my ass there... > I'll look at CVS, but I don't see why portage need to be slow. As you > said, it's being fixed. Elaborate on how it's slow. There are various algs/processes that you could be referencing. Rough cvs improvements, 33% bash sourcing improvement- for those thinking parsing bash == slow portage, it's not the case. Users *never* see portage sourcing ebuilds for their keys (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and the ebuilds in the overlay are _only_ sourced when they've changed. The improvements in bash sourcing speed in cvs were A) intended to fix env handling for ebuilds (ancillary benefit), and B) speed up regen for devs and the server that generates the metacache for rsync users. So no, bash isn't really what's slowing things down. :) If you're referencing the nice long pause after sync'ing, that's transfer of the cache from ${PORTDIR}/metadata/cache to /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially might have that pause eliminated also, although that would require ensuring the tree is readonly- that's another can of worms. Aside from that, there is searching speed, which is a bit slowed down by the current use of locks in the default portage_db_flat ebuild metadata cache. Additionally, portage_db_flat uses seperate files for each cpv (category/package-version), so there is considerable overhead from opening/closing a crapload of files. Using portage_db_anydbm improves this, although it has a few issues of it's own. If you're referencing doing a search based on description, well, the cache backend as mentioned above slows things down pretty majorly. Even with anydbm, it still has to proceed cpv by cpv- basically walk the entire cache, *while* verifying the cache isn't stale- eg, check the stored mtime, and compare it to the ebuilds mtime. Things could be speed up by treating the tree/cache on disk as readonly- this is something being bantered about, and may happen. Treating the tree/cache as readonly means we don't have to do any locking in the cache, nor staleness checks (less IO). > > and I can't see that difference between portage and apt in the area > portage is slow, ok apt uses a db and don't need to check use flags, > but they're orders of magnitude different. Even lemons and apples are > that different ;) See above. > > > - portage to act as a daemon, queue requests and fetch packages. > > >If portage could be a daemon with 3 threads: one that download > > >packages, one that compiles and one to manage the other and accept > > >requests; then it could schedule download to maximize download > > >throughput, parallel fetch is in cvs already. Portage 2.0.51 already supports this in a way, (emerge -f targets &> /dev/null &); emerge targets > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in > > portage CVS, although it > > doesnt use threads, because there is no way to kill processes (wget, > > etc.) spawned from within > > a thread, so youd have stale processes after Ctrl+C'ing portage. Doesn't apply in this case, daemonized ebuild.sh just speeds up bash sourcing which most users won't see. Devs, on the other hand see it since they use cvs- no rsyncing of a pregenerated cache. > > Great! > BTW, with threads I meant the concept of more than one thing running > in parallel, don't need to be posix threads, can be process or even > one process using select() Currently implemented via fork. Doing long running threads in python is a bit trickier then you might suspect (tried that route, stopping a thread w/out having it check up every 5 seconds is pretty fricking hard/annoying). > > Jstubbs is working on an api that will make its way into a later > > revision of portage. As far as parsing > > ebuilds, they are sourced directly from bash. > > There is any explanation/roadmap/design I can look at? Jstubbs reads > this list? What's his goals, how he want to achieve it? He'd have to state his goals- offhand, afaik he threw in some of this goals in http://dev.gentoo.org/~jstubbs/portage/goals.txt as are a collection of mine, and some of genone (Marius Mauch). > > About parsing of ebuilds, what do I need to source before the ebuild > itself? I mean, to get things like "inherit" working. All of ebuild.sh. Seriously. :) Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild; lot of functions are expected to exist for ebuilds to work, inherit fex (bash function). I'd suggest grabbing http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through bin/ebuild*.sh and bin/isolated* With the exemption of ebuild-daemon.sh, all of that code is required to create the appropriate bash environment that ebuilds expect. Even with that default env, eclasses exist to extend it and add new functionality. Portage *does* have issues that need correcting, calling patterns/design/structure changed, etc. Trying to elaborate on the issues above, hope it provides some insight into why things are they way they are (and potential avenues to check out for improving performance). ~brian -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-30 14:22 ` Brian Harring @ 2004-11-30 14:53 ` Jason Stubbs 2004-12-01 4:13 ` Gustavo Barbieri 2004-12-01 3:55 ` Gustavo Barbieri 1 sibling, 1 reply; 25+ messages in thread From: Jason Stubbs @ 2004-11-30 14:53 UTC (permalink / raw To: gentoo-portage-dev On Tuesday 30 November 2004 23:22, Brian Harring wrote: > On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: > > On Sat, 27 Nov 2004 17:48:54 -0600, Michael Tindal <urilith@gentoo.org> > > wrote: > > > Jstubbs is working on an api that will make its way into a later > > > revision of portage. As far as parsing > > > ebuilds, they are sourced directly from bash. > > > > There is any explanation/roadmap/design I can look at? Jstubbs reads > > this list? What's his goals, how he want to achieve it? I read your first message, thought to myself "does this deserve an answer?" and then ignored the entire thread, apart from Michael's and Brian's posts. > He'd have to state his goals- Strict clear dependency resolution. It's already slower in CVS and, at least theoretically, can only become slower. On the other hand, it should save hours and hours in compile failures. Regards, Jason Stubbs -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-30 14:53 ` Jason Stubbs @ 2004-12-01 4:13 ` Gustavo Barbieri 2004-12-01 8:41 ` Brian Harring 0 siblings, 1 reply; 25+ messages in thread From: Gustavo Barbieri @ 2004-12-01 4:13 UTC (permalink / raw To: gentoo-portage-dev On Tue, 30 Nov 2004 23:53:19 +0900, Jason Stubbs <jstubbs@gentoo.org> wrote: > On Tuesday 30 November 2004 23:22, Brian Harring wrote: > > On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: > > > On Sat, 27 Nov 2004 17:48:54 -0600, Michael Tindal <urilith@gentoo.org> > > > wrote: > > > > Jstubbs is working on an api that will make its way into a later > > > > revision of portage. As far as parsing > > > > ebuilds, they are sourced directly from bash. > > > > > > There is any explanation/roadmap/design I can look at? Jstubbs reads > > > this list? What's his goals, how he want to achieve it? > > I read your first message, thought to myself "does this deserve an answer?" > and then ignored the entire thread, apart from Michael's and Brian's posts. Sorry, I didn't mean to be rude or cause a tornado in this list, it was a mix of thousands of ideas and doubts with non-native language that makes me choose the bad mail subject. > > He'd have to state his goals- > > Strict clear dependency resolution. It's already slower in CVS and, at least > theoretically, can only become slower. On the other hand, it should save > hours and hours in compile failures. This is good :) Also, I read things from http://dev.gentoo.org/~jstubbs/portage/ and your ideas are really great. Things that I can help now: Create and convert to using a CPV class: I have a Package and PackageVersion classes working, PackageVersion knows how to compare to each other and how to get more info, everything delayed until they're used. The Package class have a list of PackageVersion, this list is just loaded from portage when accessed. http://ltc08.ic.unicamp.br/~gustavo/packagemanagementsystem.py, if you have some time, look at the classes, don't mind at the other parts, since are just quick hack to access the portage. Use of iterators/generators are of a great help, saves memory and even time... Python 2.4 have an equivalent, but that doesn't build lists, it's called "generator expressions", probably you already know. In my code there are some generators. >From goals.txt I'm not sure about the modularisation section, but I can help with process management. A friend and I are playing with depedency solving to help improve boot speed (there is a bugreport in bugs.gentoo.org) our goal is to have a working C code to solve the boot process dependencies, but we can try to make it general enough and then write a python wrapper over it. Right now we have a working prototype in python: http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_6.py Thank you for your time, -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 4:13 ` Gustavo Barbieri @ 2004-12-01 8:41 ` Brian Harring 2004-12-01 13:41 ` Gustavo Barbieri 0 siblings, 1 reply; 25+ messages in thread From: Brian Harring @ 2004-12-01 8:41 UTC (permalink / raw To: gentoo-portage-dev On Tue, 2004-11-30 at 20:13, Gustavo Barbieri wrote: > From goals.txt I'm not sure about the modularisation section, but I > can help with process management. A friend and I are playing with > depedency solving to help improve boot speed (there is a bugreport in > bugs.gentoo.org) our goal is to have a working C code to solve the > boot process dependencies, but we can try to make it general enough > and then write a python wrapper over it. Right now we have a working > prototype in python: > http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_6.py You'd probably want to talk to the baselayout peeps; how will this differ from the existing rc_parallel_startup? Aside from being c based rather then bash that is.... ~brian -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 8:41 ` Brian Harring @ 2004-12-01 13:41 ` Gustavo Barbieri 0 siblings, 0 replies; 25+ messages in thread From: Gustavo Barbieri @ 2004-12-01 13:41 UTC (permalink / raw To: gentoo-portage-dev, ferringb On Wed, 01 Dec 2004 00:41:24 -0800, Brian Harring <ferringb@gentoo.org> wrote: > On Tue, 2004-11-30 at 20:13, Gustavo Barbieri wrote: > > From goals.txt I'm not sure about the modularisation section, but I > > can help with process management. A friend and I are playing with > > depedency solving to help improve boot speed (there is a bugreport in > > bugs.gentoo.org) our goal is to have a working C code to solve the > > boot process dependencies, but we can try to make it general enough > > and then write a python wrapper over it. Right now we have a working > > prototype in python: > > http://ltc08.ic.unicamp.br/~gustavo/pboot/parallel_6.py > You'd probably want to talk to the baselayout peeps; how will this > differ from the existing rc_parallel_startup? > Aside from being c based rather then bash that is.... I already do: http://bugs.gentoo.org/show_bug.cgi?id=69579 there is more info there, reference papers. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-30 14:22 ` Brian Harring 2004-11-30 14:53 ` Jason Stubbs @ 2004-12-01 3:55 ` Gustavo Barbieri 2004-12-01 9:37 ` Gregorio Guidi 1 sibling, 1 reply; 25+ messages in thread From: Gustavo Barbieri @ 2004-12-01 3:55 UTC (permalink / raw To: gentoo-portage-dev, ferringb On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org> wrote: > On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: > > > >The portage library is too heavy, complicated and make things slow. > > > >Heavy and complicated I noticed from (trying to) look at the source, > > > >slow by usage. > You *really* should explain how it's heavy and complicated. > Generalizations don't help to improve it :) You're right, but to explain these I need to understand it a bit more. By now you can see it just as user feeling. > > > >time emerge # without parameters > > > >real 0m0.614s > > > >user 0m0.487s > > > >sys 0m0.046s > > > > > > > >time emerge -pv world # 16 packages to be upgraded > > > >real 0m22.664s > > > >user 0m12.423s > > > >sys 0m1.130s > There's quite a large difference between just importing portage, and > actually parsing your profile, determining what your use flags are > (since profiles can define defaults, and some use flags are based upon > packages being installed dependant on the profile, perl fex). That, and > walking/building the depgraph, querying the cache, locking, etc. These two were not related, just put them together since i "measured" them in sequence. I just think that 1/2 second for just printing "usage" message is too much, I already experienced more than seconds. But this doesn't real matters, forget it. > > > >It's too much, look at debian apt, it's fast. And I can't see why > > > >portage is slow. > > > >Forgive me if I'm wrong, but portage just need to parse > > > >/var/lib/portage/world (237 entries in my case), them for each check > > > >if there is any other version greater than and if so check for > > > >dependencies. Why 22seconds? A hand made take less than 1. > > Checks first level depends of all packages in world also. So that list > just got larger :) I know, but that larger? Anyway, I'll try to understand how you do things, what's read from disk to memory, data structures... any documents on that? just reading the source is painful :/ > Regarding debian apt, it's likely apples/oranges as urilith stated. One > thing to note is that afaik, debian dependencies lack versions- they're > basically a flat namespace. > > fex, if a dpkg states it deps mysql, there is mysql. Singular. > W/ portage, well, need to determine what version is available based upon > keywords, package.mask, and users /etc/portage/package.keywords (and > other things). As I see, this doesn't make algorithms worst (exponentially), just add a constant... that constant is that huge? > Note I work on portage, not dpkg/apt. So I could be talking out of my > ass there... Ok, and I work with none so far... :) [but as a gentoo user I want to improve my sys] > > I'll look at CVS, but I don't see why portage need to be slow. As you > > said, it's being fixed. > Elaborate on how it's slow. There are various algs/processes that you > could be referencing. Rough cvs improvements, 33% bash sourcing > improvement- for those thinking parsing bash == slow portage, it's not > the case. Users *never* see portage sourcing ebuilds for their keys > (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and > the ebuilds in the overlay are _only_ sourced when they've changed. The > improvements in bash sourcing speed in cvs were A) intended to fix env > handling for ebuilds (ancillary benefit), and B) speed up regen for devs > and the server that generates the metacache for rsync users. > > So no, bash isn't really what's slowing things down. :) Good to know, from previous messages I was believing that every "emerge -s" did source the whole portage tree :) > If you're referencing the nice long pause after sync'ing, that's > transfer of the cache from ${PORTDIR}/metadata/cache to > /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially > might have that pause eliminated also, although that would require > ensuring the tree is readonly- that's another can of worms. No, I don't care about the caching stuff, just that "emerge something_without_deps" takes too long. > Aside from that, there is searching speed, which is a bit slowed down by > the current use of locks in the default portage_db_flat ebuild metadata > cache. Additionally, portage_db_flat uses seperate files for each cpv > (category/package-version), so there is considerable overhead from > opening/closing a crapload of files. > > Using portage_db_anydbm improves this, although it has a few issues of > it's own. Hum, here it comes. That's the part I think is slow and my reasons (guess) are those you said: locks and the "lots of small files" instead of one, probably real/optimized/indexed, database. Sorry, but I was not aware of the _anydbm stuff, were I can read more about it? > If you're referencing doing a search based on description, well, the > cache backend as mentioned above slows things down pretty majorly. Even > with anydbm, it still has to proceed cpv by cpv- basically walk the > entire cache, *while* verifying the cache isn't stale- eg, check the > stored mtime, and compare it to the ebuilds mtime. > > Things could be speed up by treating the tree/cache on disk as readonly- > this is something being bantered about, and may happen. Treating the > tree/cache as readonly means we don't have to do any locking in the > cache, nor staleness checks (less IO). Optimizing for the common case, it's a valid assumption. It will save us a lot of time and may cause little problem, since portage is much less write than read. > > and I can't see that difference between portage and apt in the area > > portage is slow, ok apt uses a db and don't need to check use flags, > > but they're orders of magnitude different. Even lemons and apples are > > that different ;) > See above. > > > > > - portage to act as a daemon, queue requests and fetch packages. > > > >If portage could be a daemon with 3 threads: one that download > > > >packages, one that compiles and one to manage the other and accept > > > >requests; then it could schedule download to maximize download > > > >throughput, > parallel fetch is in cvs already. Portage 2.0.51 already supports this > in a way, > (emerge -f targets &> /dev/null &); emerge targets > > > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in > > > portage CVS, although it > > > doesnt use threads, because there is no way to kill processes (wget, > > > etc.) spawned from within > > > a thread, so youd have stale processes after Ctrl+C'ing portage. > Doesn't apply in this case, daemonized ebuild.sh just speeds up bash > sourcing which most users won't see. Devs, on the other hand see it > since they use cvs- no rsyncing of a pregenerated cache. > > > Great! > > BTW, with threads I meant the concept of more than one thing running > > in parallel, don't need to be posix threads, can be process or even > > one process using select() > Currently implemented via fork. Doing long running threads in python is > a bit trickier then you might suspect (tried that route, stopping a > thread w/out having it check up every 5 seconds is pretty fricking > hard/annoying). > > > > Jstubbs is working on an api that will make its way into a later > > > revision of portage. As far as parsing > > > ebuilds, they are sourced directly from bash. > > > > There is any explanation/roadmap/design I can look at? Jstubbs reads > > this list? What's his goals, how he want to achieve it? > He'd have to state his goals- > offhand, afaik he threw in some of this goals in > http://dev.gentoo.org/~jstubbs/portage/goals.txt > as are a collection of mine, and some of genone (Marius Mauch). > > > > > About parsing of ebuilds, what do I need to source before the ebuild > > itself? I mean, to get things like "inherit" working. > All of ebuild.sh. Seriously. :) > > Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild; > lot of functions are expected to exist for ebuilds to work, inherit fex > (bash function). Ok. > I'd suggest grabbing > http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through > bin/ebuild*.sh and bin/isolated* > > With the exemption of ebuild-daemon.sh, all of that code is required to > create the appropriate bash environment that ebuilds expect. Even with > that default env, eclasses exist to extend it and add new functionality. > > Portage *does* have issues that need correcting, calling > patterns/design/structure changed, etc. Trying to elaborate on the > issues above, hope it provides some insight into why things are they way > they are (and potential avenues to check out for improving performance). I'll try the CVS. Thank you for your time and patient for replying to my kinda rude question/doubts. I'll try to help as far as possible. If there are minor works, I could start learning portage in more depth. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 3:55 ` Gustavo Barbieri @ 2004-12-01 9:37 ` Gregorio Guidi 2004-12-01 10:59 ` Brian Harring 0 siblings, 1 reply; 25+ messages in thread From: Gregorio Guidi @ 2004-12-01 9:37 UTC (permalink / raw To: gentoo-portage-dev On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org> > If you're referencing the nice long pause after sync'ing, that's > transfer of the cache from ${PORTDIR}/metadata/cache to > /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially > might have that pause eliminated also, although that would require > ensuring the tree is readonly- that's another can of worms. I was just thinkng about that, and I could not fully understand this point. Could you explain a bit more? Thanks Gregorio -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 9:37 ` Gregorio Guidi @ 2004-12-01 10:59 ` Brian Harring 2004-12-01 11:25 ` Gregorio Guidi 0 siblings, 1 reply; 25+ messages in thread From: Brian Harring @ 2004-12-01 10:59 UTC (permalink / raw To: gentoo-portage-dev On Wed, 2004-12-01 at 01:37, Gregorio Guidi wrote: > On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org> > > > If you're referencing the nice long pause after sync'ing, that's > > transfer of the cache from ${PORTDIR}/metadata/cache to > > /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially > > might have that pause eliminated also, although that would require > > ensuring the tree is readonly- that's another can of worms. > > I was just thinkng about that, and I could not fully understand this point. > Could you explain a bit more? Expanding on the commentary earlier about a readonly tree/cache vs the current- rsync'd trees are currently treated as modifiable by the user. In other words, the cache for the tree must be verified- this means pulling the mtime from the ebuild, and comparing it to the stored mtime in the cache. Additionally, if the ebuild inherits eclasses, need to check the eclass's mtime and the stored mtime. All of this is done to ensure that if the eclass/ebuild changes, the cache is accurate. So... the rsync'd cache ($portdir/metadata/cache) is accurate to the rsync'd tree, minus user modifications- since the tree is treated as modifiable, the cache *can potentially* be updated. That nice lil long pause is portage transfering the cache from portdir/metadata/cache, into the users local cache (/var/cache/edb/dep/$portdir typically). This is done for a few reasons, 1) don't modify $portdir/metadata/cache. Modifying the rsync'd cache directly results in more to rsync for the next sync. Going that route would result in a lot of dial up users out for blood. 2) the user may not be using the same cache backend as the distributed cache- the rsync'd cache is portage_db_flat, while the user may be using a custom sqlite backend. I suspect this will be come more common place whenever cvs becomes stabled- the cache backend is being refactored currently. 3) cleanse stale entries from the cache, so you don't end up w/ a couple thousand extra files sitting in your local cache. Stable portage accomplishes this by wiping *all* local cache entries, and transferring the *entire* rsync'd cache over. Cvs is smarter, transfers only what has changed, and removes the stale entries (this is why it's faster). So... that's why portage basically copies the cache to a different location. If the tree were treated as readonly, and it was enforced in some manner, that transfer wouldn't be needed. This is something that's being batted around also- the code for making the cache readonly is already done fex. ~brian -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 10:59 ` Brian Harring @ 2004-12-01 11:25 ` Gregorio Guidi 2004-12-01 12:08 ` Brian Harring 0 siblings, 1 reply; 25+ messages in thread From: Gregorio Guidi @ 2004-12-01 11:25 UTC (permalink / raw To: gentoo-portage-dev On Wednesday 01 December 2004 11:59, Brian Harring wrote: > On Wed, 2004-12-01 at 01:37, Gregorio Guidi wrote: > > On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org> > > 1) don't modify $portdir/metadata/cache. Modifying the rsync'd cache > directly results in more to rsync for the next sync. Going that route > would result in a lot of dial up users out for blood. > 2) the user may not be using the same cache backend as the distributed > cache- the rsync'd cache is portage_db_flat, while the user may be using > a custom sqlite backend. I suspect this will be come more common place > whenever cvs becomes stabled- the cache backend is being refactored > currently. > 3) cleanse stale entries from the cache, so you don't end up w/ a couple > thousand extra files sitting in your local cache. Stable portage > accomplishes this by wiping *all* local cache entries, and transferring > the *entire* rsync'd cache over. Cvs is smarter, transfers only what > has changed, and removes the stale entries (this is why it's faster). I was interested exactly in points 1 and 3 (and I'm glad to hear there's even a sqlite backend). With respect to point 3, the solution you imlpemented is surely the right thing to do. With respect to point 1, maybe a good solution could be to just check cache validity in $portdir/metadata when the cache is needed, and write a new cache in /var/cache/edb if it is not valid. But since either a solution to point 3 or to point 1 would resolve the basic speed problem, the cvs implementation should be nearly optimal. > So... that's why portage basically copies the cache to a different > location. If the tree were treated as readonly, and it was enforced in > some manner, that transfer wouldn't be needed. This is something that's > being batted around also- the code for making the cache readonly is > already done fex. > ~brian > > > -- > gentoo-portage-dev@gentoo.org mailing list Thanks a lot for your answer! Gregorio -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 11:25 ` Gregorio Guidi @ 2004-12-01 12:08 ` Brian Harring 2004-12-01 13:25 ` Gregorio Guidi 0 siblings, 1 reply; 25+ messages in thread From: Brian Harring @ 2004-12-01 12:08 UTC (permalink / raw To: gentoo-portage-dev On Wed, 2004-12-01 at 03:25, Gregorio Guidi wrote: > On Wednesday 01 December 2004 11:59, Brian Harring wrote: > > On Wed, 2004-12-01 at 01:37, Gregorio Guidi wrote: > > > On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring <ferringb@gentoo.org> > > > > 1) don't modify $portdir/metadata/cache. Modifying the rsync'd cache > > directly results in more to rsync for the next sync. Going that route > > would result in a lot of dial up users out for blood. > > 2) the user may not be using the same cache backend as the distributed > > cache- the rsync'd cache is portage_db_flat, while the user may be using > > a custom sqlite backend. I suspect this will be come more common place > > whenever cvs becomes stabled- the cache backend is being refactored > > currently. > > 3) cleanse stale entries from the cache, so you don't end up w/ a couple > > thousand extra files sitting in your local cache. Stable portage > > accomplishes this by wiping *all* local cache entries, and transferring > > the *entire* rsync'd cache over. Cvs is smarter, transfers only what > > has changed, and removes the stale entries (this is why it's faster). > > I was interested exactly in points 1 and 3 (and I'm glad to hear there's even > a sqlite backend). Nothing official sqlite thus far, I just know a couple of people who are did their own DIY backend for it. The cache refactoring should include a sqlite backend, although it hasn't been coded yet. > With respect to point 3, the solution you imlpemented is surely the right > thing to do. > With respect to point 1, maybe a good solution could be to just check cache > validity in $portdir/metadata when the cache is needed, and write a new cache > in /var/cache/edb if it is not valid. Err... elaborate Assuming I'm following, you're proposing doing the updates to local cache, and reading from the metacache? If so, that's twice the level of potential stats/reads required, which won't speed it up much :) ~brian -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 12:08 ` Brian Harring @ 2004-12-01 13:25 ` Gregorio Guidi 2004-12-01 13:38 ` Jason Stubbs 0 siblings, 1 reply; 25+ messages in thread From: Gregorio Guidi @ 2004-12-01 13:25 UTC (permalink / raw To: gentoo-portage-dev On Wednesday 01 December 2004 13:08, Brian Harring wrote: > > > 1) don't modify $portdir/metadata/cache. Modifying the rsync'd cache > > > directly results in more to rsync for the next sync. Going that route > > > would result in a lot of dial up users out for blood. > > > 2) the user may not be using the same cache backend as the distributed > > > cache- the rsync'd cache is portage_db_flat, while the user may be > > > using a custom sqlite backend. I suspect this will be come more common > > > place whenever cvs becomes stabled- the cache backend is being > > > refactored currently. > > > 3) cleanse stale entries from the cache, so you don't end up w/ a > > > couple thousand extra files sitting in your local cache. Stable > > > portage accomplishes this by wiping *all* local cache entries, and > > > transferring the *entire* rsync'd cache over. Cvs is smarter, > > > transfers only what has changed, and removes the stale entries (this is > > > why it's faster). > > With respect to point 3, the solution you imlpemented is surely the right > > thing to do. > > With respect to point 1, maybe a good solution could be to just check > > cache validity in $portdir/metadata when the cache is needed, and write a > > new cache in /var/cache/edb if it is not valid. > > Err... elaborate > Assuming I'm following, you're proposing doing the updates to local > cache, and reading from the metacache? If so, that's twice the level of > potential stats/reads required, which won't speed it up much :) > ~brian Nah, it would be something like: - see if cache exists in /var/cache/edb (one cheap access() call) - if it exists, validate (costly stat() calls, but 99% success) - if it does not exists or is not valid, read cache in $portdir/metadata and validate (stat() calls, 99% success). - if cache is not valid, create new cache in /var/cache/edb (ebuild sourcing: takes a lot of times). - read cache. - after emerge sync, go through cache files in /var/cache/edb (usually not many) and remove entries corresponding to deleted ebuilds. But I'm not aware of portage internals, for instance, I'm thinking of a system where the cache is read only one time and then stored in memory. Does portage does so, or does it read the cache every time a key is needed from there? Gregorio -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-12-01 13:25 ` Gregorio Guidi @ 2004-12-01 13:38 ` Jason Stubbs 0 siblings, 0 replies; 25+ messages in thread From: Jason Stubbs @ 2004-12-01 13:38 UTC (permalink / raw To: gentoo-portage-dev On Wednesday 01 December 2004 22:25, Gregorio Guidi wrote: > Nah, it would be something like: > - see if cache exists in /var/cache/edb (one cheap access() call) > - if it exists, validate (costly stat() calls, but 99% success) > - if it does not exists or is not valid, read cache in $portdir/metadata > and validate (stat() calls, 99% success). > - if cache is not valid, create new cache in /var/cache/edb (ebuild > sourcing: takes a lot of times). > - read cache. > > - after emerge sync, go through cache files in /var/cache/edb (usually not > many) and remove entries corresponding to deleted ebuilds. You've just described what portage does with almost perfect accuracy. Let me rearrange it for you: > - see if cache exists in /var/cache/edb (one cheap access() call) > - if it exists, validate (costly stat() calls, but 99% success) > - if cache is not valid, create new cache in /var/cache/edb (ebuild > sourcing: takes a lot of times). > - read cache. > - after emerge sync, go through cache files in /var/cache/edb (usually not > many) and remove entries corresponding to deleted ebuilds. > - if it does not exists or is not valid, read cache in $portdir/metadata > and validate (stat() calls, 99% success). If /var/cache/edb is not valid during standard emerge operation, there is almost no chance that $portdir/metadata will be valid. Please do some research before putting forth any more "new" ideas. The code is not easy to read, but is getting better. However, unless you're willing to put in the effort of deciphering it, making suggestions is by no means helpful. Regards, Jason Stubbs -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri 2004-11-27 23:48 ` Michael Tindal @ 2004-11-28 3:41 ` Luke-Jr 2004-11-28 17:19 ` Gustavo Barbieri 2004-11-28 5:44 ` Ed Grimm 2004-11-28 19:37 ` Paul de Vrieze 3 siblings, 1 reply; 25+ messages in thread From: Luke-Jr @ 2004-11-28 3:41 UTC (permalink / raw To: gentoo-portage-dev On Saturday 27 November 2004 11:10 pm, Gustavo Barbieri wrote: > Categories are mixed: there is a net-www/apache and net-www/mod_* > (apache modules), but there is a more convenient category www-apache/ > for them. This is one example, there are more mistakes. There is any > plan to fix them in next portage releases? IIRC, net-www is an old category that should be www-* sometime in the future. I believe this change was in a notice on the main site a while ago. > > Some packages use numbering version padded with zero, that's good to > list with shell functions, but it's bad because you can't change them > to numbers and them back to string. For example: > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > becomes 1.0 and you can't map back to the ebuild. Versions are *not* decimal numbers, but a set of three integers. Version 1.15 is a higher version than 1.2. It might be seen as nitpicking, but "integers" generally always refers to a whole number (1 or 2, not 1.3 or 2.4) > > Portage provides metadata.xml, cool. But it's hardly used :( > metadata.xml seems to provide tags for maintainers, changelogs and > long description, many (most?) packages don't use them. They should. It's a semi-gradual process. -- Luke-Jr Developer, Utopios http://utopios.org/ -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 3:41 ` Luke-Jr @ 2004-11-28 17:19 ` Gustavo Barbieri 0 siblings, 0 replies; 25+ messages in thread From: Gustavo Barbieri @ 2004-11-28 17:19 UTC (permalink / raw To: gentoo-portage-dev On Sun, 28 Nov 2004 03:41:53 +0000, Luke-Jr <luke-jr@utopios.org> wrote: > On Saturday 27 November 2004 11:10 pm, Gustavo Barbieri wrote: > > Categories are mixed: there is a net-www/apache and net-www/mod_* > > (apache modules), but there is a more convenient category www-apache/ > > for them. This is one example, there are more mistakes. There is any > > plan to fix them in next portage releases? > > IIRC, net-www is an old category that should be www-* sometime in the future. > I believe this change was in a notice on the main site a while ago. Sorry, I didn't follow gentoo news very much, I'll start to. > > Some packages use numbering version padded with zero, that's good to > > list with shell functions, but it's bad because you can't change them > > to numbers and them back to string. For example: > > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > > becomes 1.0 and you can't map back to the ebuild. > > Versions are *not* decimal numbers, but a set of three integers. Version 1.15 > is a higher version than 1.2. It might be seen as nitpicking, but "integers" > generally always refers to a whole number (1 or 2, not 1.3 or 2.4) Yes, I know that. In my fast portage module I convert version to a PackageVersion class, that split the version according to naming policy, then I build a list of version numbers as integers and the last element is a string, "" if no letter modifier or the letter itself if there is one, like openssl. Then i have one suffix, converted to a tuple of 2 integers, the first is the position of the "alpha|pre|rc|..." order and the second is the modifier number. Then I have the release version. My problem was that if I convert: app-misc/gcal/gcal-3.01.ebuild, then I have: (3,1) as version, back to string as "3.1" instead of "3.1". To overcome this I keep the original version string and use the numbers just to compare versions. > > Portage provides metadata.xml, cool. But it's hardly used :( > > metadata.xml seems to provide tags for maintainers, changelogs and > > long description, many (most?) packages don't use them. > > They should. It's a semi-gradual process. I did a lot of ebuilds but never realised of them! http://www.gentoo.org/doc/en/ebuild-submit.xml doesn't mention that, however http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml does. Maybe people doen't care about it because it's not that evident or required to accept the ebuild. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri 2004-11-27 23:48 ` Michael Tindal 2004-11-28 3:41 ` Luke-Jr @ 2004-11-28 5:44 ` Ed Grimm 2004-11-28 6:18 ` John Nilsson 2004-11-28 17:22 ` Gustavo Barbieri 2004-11-28 19:37 ` Paul de Vrieze 3 siblings, 2 replies; 25+ messages in thread From: Ed Grimm @ 2004-11-28 5:44 UTC (permalink / raw To: gentoo-portage-dev, Gustavo Barbieri On Sat, 27 Nov 2004, Gustavo Barbieri wrote: > Some packages use numbering version padded with zero, that's good to > list with shell functions, but it's bad because you can't change them > to numbers and them back to string. For example: > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > becomes 1.0 and you can't map back to the ebuild. It's worse than that. They're not always integers. It's safest to treat version numbers as strings as much as possible; when one needs to break them into integer portions, do this for comparison only, and save the original. Finally, a number of packages would require that you provide a mechanism for determining all version numbers that aren't strictly numeric. Openssl, with its \d+.\d+.\d+[a-z] versions is easy. hddtemp, with its alpha/beta tags, is doable but tedious. There may be others which are more problematic. I haven't seen Gentoo using them, but many kernels are distributed with -[a-z][a-z]\d+ versions, which indicate which alternate maintainer managed the additional patches beyond the standard kernel version - which is newer, -mm5 or -bk15? The world may never know. (It's only determinate for specific kernel versions, and frequently it's an apples and lemonade comparison, as they don't address the same issues.) Ed -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 5:44 ` Ed Grimm @ 2004-11-28 6:18 ` John Nilsson 2004-11-28 15:58 ` Allen Parker 2004-11-28 17:22 ` Gustavo Barbieri 1 sibling, 1 reply; 25+ messages in thread From: John Nilsson @ 2004-11-28 6:18 UTC (permalink / raw To: gentoo-portage-dev [-- Attachment #1: Type: text/plain, Size: 733 bytes --] On sön, 2004-11-28 at 05:44 +0000, Ed Grimm wrote: > There may be others which are more problematic. I haven't seen Gentoo > using them, but many kernels are distributed with -[a-z][a-z]\d+ > versions, which indicate which alternate maintainer managed the > additional patches beyond the standard kernel version - which is newer, > -mm5 or -bk15? The world may never know. (It's only determinate for > specific kernel versions, and frequently it's an apples and lemonade > comparison, as they don't address the same issues.) Would it be to much overhead if the ebuilds just linked to previous versions instead? Like the ineed stuff of the init scripts. This way no no version parsing at all would be needed. -John [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 6:18 ` John Nilsson @ 2004-11-28 15:58 ` Allen Parker 0 siblings, 0 replies; 25+ messages in thread From: Allen Parker @ 2004-11-28 15:58 UTC (permalink / raw To: gentoo-portage-dev On Sun, 28 Nov 2004 07:18:06 +0100, John Nilsson <john@milsson.nu> wrote: > On sön, 2004-11-28 at 05:44 +0000, Ed Grimm wrote: > > There may be others which are more problematic. I haven't seen Gentoo > > using them, but many kernels are distributed with -[a-z][a-z]\d+ > > versions, which indicate which alternate maintainer managed the > > additional patches beyond the standard kernel version - which is newer, > > -mm5 or -bk15? The world may never know. (It's only determinate for > > specific kernel versions, and frequently it's an apples and lemonade > > comparison, as they don't address the same issues.) > > Would it be to much overhead if the ebuilds just linked to previous > versions instead? Like the ineed stuff of the init scripts. This way no > no version parsing at all would be needed. > > -John the big probem with that, john is that "stale" ebuilds are removed often... also from apache-1.3.27 -> apache-2.0.41 there's a pretty HUGE difference in how the packages are actually compiled/treated/options, etc... for another example, look at the php5 vs the php4 ebuilds... see a difference? it doesn't make sense to do what you are thinking because as soon as a new version comes out that obsoletes the old one with new features, etc you end up having to hack ALL of your ebuilds to support the new features and you're in the same place you were before. Another thing you should look at, is eclasses... webapp.eclass especially since it's pretty widely used. eclasses do what i think you'd want to accomplish with the linking to previous versions setup. Oh, and by the way, version parsing isn't something that can be easily avoided. see above on why "world + dog = same version" is a bad idea. off list, when i wake up tomorrow, i'll email you a snippet of conversation i had with johnm about eclasses last nite in #gentoo-dev... it might be enlightening. my .02 of a monetary unit. Allen Parker -- ________________________________________ To avoid being added to my spam filter: 1. Utilize list replies unless otherwise requested. 2. If you DO send me a personal email, use english. 3. HTML isn't cute. It belongs on the web, not in my inbox. -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 5:44 ` Ed Grimm 2004-11-28 6:18 ` John Nilsson @ 2004-11-28 17:22 ` Gustavo Barbieri 2004-11-29 0:39 ` Gustavo Barbieri 1 sibling, 1 reply; 25+ messages in thread From: Gustavo Barbieri @ 2004-11-28 17:22 UTC (permalink / raw To: gentoo-portage-dev On Sun, 28 Nov 2004 05:44:50 +0000 (GMT), Ed Grimm <paranoid@gentoo.evolution.tgape.org> wrote: > On Sat, 27 Nov 2004, Gustavo Barbieri wrote: > > > Some packages use numbering version padded with zero, that's good to > > list with shell functions, but it's bad because you can't change them > > to numbers and them back to string. For example: > > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > > becomes 1.0 and you can't map back to the ebuild. > > It's worse than that. They're not always integers. It's safest to > treat version numbers as strings as much as possible; when one needs to > break them into integer portions, do this for comparison only, and save > the original. Finally, a number of packages would require that you > provide a mechanism for determining all version numbers that aren't > strictly numeric. Openssl, with its \d+.\d+.\d+[a-z] versions is easy. > hddtemp, with its alpha/beta tags, is doable but tedious. I did a PackageVersion class that parses these sections using a regular expression following the package naming policy. I don't send the code since I'll need to recode it, my XFS lost many data after a power failure :( But I have the bytecode if you want to test it or help me to get the .py back. > There may be others which are more problematic. I haven't seen Gentoo > using them, but many kernels are distributed with -[a-z][a-z]\d+ > versions, which indicate which alternate maintainer managed the > additional patches beyond the standard kernel version - which is newer, > -mm5 or -bk15? The world may never know. (It's only determinate for > specific kernel versions, and frequently it's an apples and lemonade > comparison, as they don't address the same issues.) Gentoo uses separated kernel sources, I use Con Kolivas source, it's ck-sources in the tree. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-28 17:22 ` Gustavo Barbieri @ 2004-11-29 0:39 ` Gustavo Barbieri 0 siblings, 0 replies; 25+ messages in thread From: Gustavo Barbieri @ 2004-11-29 0:39 UTC (permalink / raw To: gentoo-portage-dev On Sun, 28 Nov 2004 15:22:44 -0200, Gustavo Barbieri <barbieri@gmail.com> wrote: > On Sun, 28 Nov 2004 05:44:50 +0000 (GMT), Ed Grimm > <paranoid@gentoo.evolution.tgape.org> wrote: > > On Sat, 27 Nov 2004, Gustavo Barbieri wrote: > > > Some packages use numbering version padded with zero, that's good to > > > list with shell functions, but it's bad because you can't change them > > > to numbers and them back to string. For example: > > > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > > > becomes 1.0 and you can't map back to the ebuild. > > > > It's worse than that. They're not always integers. It's safest to > > treat version numbers as strings as much as possible; when one needs to > > break them into integer portions, do this for comparison only, and save > > the original. Finally, a number of packages would require that you > > provide a mechanism for determining all version numbers that aren't > > strictly numeric. Openssl, with its \d+.\d+.\d+[a-z] versions is easy. > > hddtemp, with its alpha/beta tags, is doable but tedious. > > I did a PackageVersion class that parses these sections using a > regular expression following the package naming policy. I don't send > the code since I'll need to recode it, my XFS lost many data after a > power failure :( But I have the bytecode if you want to test it or > help me to get the .py back. Okay, I hacked a free version of decompyle from BSD to support python 2.3 bytecode and recovered my file. If you want to check it out: http://ltc08.ic.unicamp.br/~gustavo/packagemanager.tar.bz2, the file of interest is lib/packagemanager/packagemanagementsystem.py, there are some classes, he PackageVersion is of interest, please look at it. (if you don't want to get the package: http://ltc08.ic.unicamp.br/~gustavo/packagemanagementsystem.py) -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [gentoo-portage-dev] Current portage well designed, but badly used 2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri ` (2 preceding siblings ...) 2004-11-28 5:44 ` Ed Grimm @ 2004-11-28 19:37 ` Paul de Vrieze 3 siblings, 0 replies; 25+ messages in thread From: Paul de Vrieze @ 2004-11-28 19:37 UTC (permalink / raw To: gentoo-portage-dev [-- Attachment #1: Type: text/plain, Size: 4720 bytes --] On Sunday 28 November 2004 00:10, Gustavo Barbieri wrote: > Hello, > > I'm playing with portage and noticed it's well designed, but there are > some mistakes in its usage at the moment. For example: > > Categories are mixed: there is a net-www/apache and net-www/mod_* > (apache modules), but there is a more convenient category www-apache/ > for them. This is one example, there are more mistakes. There is any > plan to fix them in next portage releases? > > Some packages use numbering version padded with zero, that's good to > list with shell functions, but it's bad because you can't change them > to numbers and them back to string. For example: > mail-mta/nullmailer-1.00_rc7-r4. If you Convert it to integers, it > becomes 1.0 and you can't map back to the ebuild. > > Portage provides metadata.xml, cool. But it's hardly used :( > metadata.xml seems to provide tags for maintainers, changelogs and > long description, many (most?) packages don't use them. > > The portage library is too heavy, complicated and make things slow. > Heavy and complicated I noticed from (trying to) look at the source, > slow by usage. For example: > > time emerge # without parameters > real 0m0.614s > user 0m0.487s > sys 0m0.046s > > time emerge -pv world # 16 packages to be upgraded > real 0m22.664s > user 0m12.423s > sys 0m1.130s > > It's too much, look at debian apt, it's fast. And I can't see why > portage is slow. > Forgive me if I'm wrong, but portage just need to parse > /var/lib/portage/world (237 entries in my case), them for each check > if there is any other version greater than and if so check for > dependencies. Why 22seconds? A hand made take less than 1. > > > Also, a brief explanation on why I was playing with portage and some > requests: I'm coding (for fun, no plan to get in a production state) > yet another graphical package manager atop portage with the newbie in > mind. But to achieve my goal I need: > > - a fast portage. Now I'm doing a module to do this for me (see > more above), at least the basics, like get package information, > versions, ... and if possible resolve primary dependencies (just to > show to user in a tab "Dependencies", hidden by default). > > - more meta data, if possible a list of urls to screenshots (most > packages have a screenshots section), if the url links to an html, > provide a threshold of images size to get, so it connects and > downloads every image bigger than it... cached of course. > > - portage to act as a daemon, queue requests and fetch packages. > If portage could be a daemon with 3 threads: one that download > packages, one that compiles and one to manage the other and accept > requests; then it could schedule download to maximize download > throughput, downloading smaller packages first while respecting > dependencies, compile while download and wait until packages are there > and the "emerge" command just send commands to it. It would be handy > since compiling times are huge. > > > About the fast portage: I know portage is a complex monster and is the > heart of gentoo, if it breaks, everything breaks. But how about a > python module to be used by other packages that just want to view the > portage and its packages. If eventually this module works as expected > and have every current portage feature, it could replace the old one. > I started to code my own "fast portage", but some things are picky > to do, and I want to know how you do that: how do you parse ebuilds to > get USE, DESCRIPTION, SLOT, DEPEND, ... ? > If you want to know why my implementation is fast: I use lazy > evaluation as far as possible. For example, I load every package, but > the attributes to available versions, installed versions, the status, > are just calculated on deman, I use python property() and > setters/getters for that. Since hardly you'll use every attribute from > everythin, it loads much faster. > I have preliminar code here: > http://ltc08.ic.unicamp.br/~gustavo/packagemanager.tar.bz2, but some > modifications I did were lost in a power outtage + xfs... I just have > the .pyc, if someone knows how to get the .py back... Well, as said, portage does not parse by itself, but uses bash. This is not really fast. The biggest issue however is the absense of lazy evaluation. I've been looking at it too, and even have a c++ based parser that can be accessed as a python module, but it's undocumented and has issues as it is not a full bash replacement, and ebuilds expect bash. Paul -- Paul de Vrieze Gentoo Developer Mail: pauldv@gentoo.org Homepage: http://www.devrieze.net [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2004-12-01 13:41 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-11-27 23:10 [gentoo-portage-dev] Current portage well designed, but badly used Gustavo Barbieri 2004-11-27 23:48 ` Michael Tindal 2004-11-28 17:08 ` Gustavo Barbieri 2004-11-28 17:31 ` Andrew Gaffney 2004-11-28 17:56 ` Gustavo Barbieri 2004-11-30 14:22 ` Brian Harring 2004-11-30 14:53 ` Jason Stubbs 2004-12-01 4:13 ` Gustavo Barbieri 2004-12-01 8:41 ` Brian Harring 2004-12-01 13:41 ` Gustavo Barbieri 2004-12-01 3:55 ` Gustavo Barbieri 2004-12-01 9:37 ` Gregorio Guidi 2004-12-01 10:59 ` Brian Harring 2004-12-01 11:25 ` Gregorio Guidi 2004-12-01 12:08 ` Brian Harring 2004-12-01 13:25 ` Gregorio Guidi 2004-12-01 13:38 ` Jason Stubbs 2004-11-28 3:41 ` Luke-Jr 2004-11-28 17:19 ` Gustavo Barbieri 2004-11-28 5:44 ` Ed Grimm 2004-11-28 6:18 ` John Nilsson 2004-11-28 15:58 ` Allen Parker 2004-11-28 17:22 ` Gustavo Barbieri 2004-11-29 0:39 ` Gustavo Barbieri 2004-11-28 19:37 ` Paul de Vrieze
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox