From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9445 invoked from network); 1 Dec 2004 03:56:02 +0000 Received: from smtp.gentoo.org (156.56.111.197) by lists.gentoo.org with AES256-SHA encrypted SMTP; 1 Dec 2004 03:56:02 +0000 Received: from lists.gentoo.org ([156.56.111.196] helo=parrot.gentoo.org) by smtp.gentoo.org with esmtp (Exim 4.41) id 1CZLb0-0005hW-4r for arch-gentoo-portage-dev@lists.gentoo.org; Wed, 01 Dec 2004 03:56:02 +0000 Received: (qmail 8199 invoked by uid 89); 1 Dec 2004 03:56:00 +0000 Mailing-List: contact gentoo-portage-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail Reply-To: gentoo-portage-dev@lists.gentoo.org X-BeenThere: gentoo-portage-dev@gentoo.org Received: (qmail 155 invoked from network); 1 Dec 2004 03:55:59 +0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=UE3KKc7zP8nXPLcnrkwj3pFhw3ylIwYkmQdpVI12mYzIpEz6aGGCW3FVejZabH8J5uRex+JhQ5lqZbmmah37tOqgs3PFy8kDTe3+zlVHK8KKPrT/A16cQz1DWYHasoKhh+X79DA+o/n43AVs4k3aVlbkeyu7uLsAuX8dkDHl/mo= Message-ID: <9ef20ef3041130195572195bea@mail.gmail.com> Date: Wed, 1 Dec 2004 01:55:59 -0200 From: Gustavo Barbieri Reply-To: Gustavo Barbieri To: gentoo-portage-dev@lists.gentoo.org, ferringb@gentoo.org In-Reply-To: <1101824527.32056.102.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <9ef20ef3041127151046107fb5@mail.gmail.com> <41A91266.5080804@gentoo.org> <9ef20ef3041128090844573b74@mail.gmail.com> <1101824527.32056.102.camel@localhost.localdomain> Subject: Re: [gentoo-portage-dev] Current portage well designed, but badly used X-Archives-Salt: 3b79404d-2cb4-4d0b-b896-527f4239efa0 X-Archives-Hash: d4ea3440fae2ea3ca7004a4e3e79362e On Tue, 30 Nov 2004 06:22:07 -0800, Brian Harring wrote: > On Sun, 2004-11-28 at 09:08, Gustavo Barbieri wrote: > > > >The portage library is too heavy, complicated and make things slow. > > > >Heavy and complicated I noticed from (trying to) look at the source, > > > >slow by usage. > You *really* should explain how it's heavy and complicated. > Generalizations don't help to improve it :) You're right, but to explain these I need to understand it a bit more. By now you can see it just as user feeling. > > > >time emerge # without parameters > > > >real 0m0.614s > > > >user 0m0.487s > > > >sys 0m0.046s > > > > > > > >time emerge -pv world # 16 packages to be upgraded > > > >real 0m22.664s > > > >user 0m12.423s > > > >sys 0m1.130s > There's quite a large difference between just importing portage, and > actually parsing your profile, determining what your use flags are > (since profiles can define defaults, and some use flags are based upon > packages being installed dependant on the profile, perl fex). That, and > walking/building the depgraph, querying the cache, locking, etc. These two were not related, just put them together since i "measured" them in sequence. I just think that 1/2 second for just printing "usage" message is too much, I already experienced more than seconds. But this doesn't real matters, forget it. > > > >It's too much, look at debian apt, it's fast. And I can't see why > > > >portage is slow. > > > >Forgive me if I'm wrong, but portage just need to parse > > > >/var/lib/portage/world (237 entries in my case), them for each check > > > >if there is any other version greater than and if so check for > > > >dependencies. Why 22seconds? A hand made take less than 1. > > Checks first level depends of all packages in world also. So that list > just got larger :) I know, but that larger? Anyway, I'll try to understand how you do things, what's read from disk to memory, data structures... any documents on that? just reading the source is painful :/ > Regarding debian apt, it's likely apples/oranges as urilith stated. One > thing to note is that afaik, debian dependencies lack versions- they're > basically a flat namespace. > > fex, if a dpkg states it deps mysql, there is mysql. Singular. > W/ portage, well, need to determine what version is available based upon > keywords, package.mask, and users /etc/portage/package.keywords (and > other things). As I see, this doesn't make algorithms worst (exponentially), just add a constant... that constant is that huge? > Note I work on portage, not dpkg/apt. So I could be talking out of my > ass there... Ok, and I work with none so far... :) [but as a gentoo user I want to improve my sys] > > I'll look at CVS, but I don't see why portage need to be slow. As you > > said, it's being fixed. > Elaborate on how it's slow. There are various algs/processes that you > could be referencing. Rough cvs improvements, 33% bash sourcing > improvement- for those thinking parsing bash == slow portage, it's not > the case. Users *never* see portage sourcing ebuilds for their keys > (DEPENDS, DESCRIPTION, etc) unless they're doing overlay ebuilds, and > the ebuilds in the overlay are _only_ sourced when they've changed. The > improvements in bash sourcing speed in cvs were A) intended to fix env > handling for ebuilds (ancillary benefit), and B) speed up regen for devs > and the server that generates the metacache for rsync users. > > So no, bash isn't really what's slowing things down. :) Good to know, from previous messages I was believing that every "emerge -s" did source the whole portage tree :) > If you're referencing the nice long pause after sync'ing, that's > transfer of the cache from ${PORTDIR}/metadata/cache to > /var/cache/edb/dep/${PORTDIR}; that's been speed up also. Potentially > might have that pause eliminated also, although that would require > ensuring the tree is readonly- that's another can of worms. No, I don't care about the caching stuff, just that "emerge something_without_deps" takes too long. > Aside from that, there is searching speed, which is a bit slowed down by > the current use of locks in the default portage_db_flat ebuild metadata > cache. Additionally, portage_db_flat uses seperate files for each cpv > (category/package-version), so there is considerable overhead from > opening/closing a crapload of files. > > Using portage_db_anydbm improves this, although it has a few issues of > it's own. Hum, here it comes. That's the part I think is slow and my reasons (guess) are those you said: locks and the "lots of small files" instead of one, probably real/optimized/indexed, database. Sorry, but I was not aware of the _anydbm stuff, were I can read more about it? > If you're referencing doing a search based on description, well, the > cache backend as mentioned above slows things down pretty majorly. Even > with anydbm, it still has to proceed cpv by cpv- basically walk the > entire cache, *while* verifying the cache isn't stale- eg, check the > stored mtime, and compare it to the ebuilds mtime. > > Things could be speed up by treating the tree/cache on disk as readonly- > this is something being bantered about, and may happen. Treating the > tree/cache as readonly means we don't have to do any locking in the > cache, nor staleness checks (less IO). Optimizing for the common case, it's a valid assumption. It will save us a lot of time and may cause little problem, since portage is much less write than read. > > and I can't see that difference between portage and apt in the area > > portage is slow, ok apt uses a db and don't need to check use flags, > > but they're orders of magnitude different. Even lemons and apples are > > that different ;) > See above. > > > > > - portage to act as a daemon, queue requests and fetch packages. > > > >If portage could be a daemon with 3 threads: one that download > > > >packages, one that compiles and one to manage the other and accept > > > >requests; then it could schedule download to maximize download > > > >throughput, > parallel fetch is in cvs already. Portage 2.0.51 already supports this > in a way, > (emerge -f targets &> /dev/null &); emerge targets > > > > There is a daemonized ebuild.sh (correct me if I'm wrong ferringb) in > > > portage CVS, although it > > > doesnt use threads, because there is no way to kill processes (wget, > > > etc.) spawned from within > > > a thread, so youd have stale processes after Ctrl+C'ing portage. > Doesn't apply in this case, daemonized ebuild.sh just speeds up bash > sourcing which most users won't see. Devs, on the other hand see it > since they use cvs- no rsyncing of a pregenerated cache. > > > Great! > > BTW, with threads I meant the concept of more than one thing running > > in parallel, don't need to be posix threads, can be process or even > > one process using select() > Currently implemented via fork. Doing long running threads in python is > a bit trickier then you might suspect (tried that route, stopping a > thread w/out having it check up every 5 seconds is pretty fricking > hard/annoying). > > > > Jstubbs is working on an api that will make its way into a later > > > revision of portage. As far as parsing > > > ebuilds, they are sourced directly from bash. > > > > There is any explanation/roadmap/design I can look at? Jstubbs reads > > this list? What's his goals, how he want to achieve it? > He'd have to state his goals- > offhand, afaik he threw in some of this goals in > http://dev.gentoo.org/~jstubbs/portage/goals.txt > as are a collection of mine, and some of genone (Marius Mauch). > > > > > About parsing of ebuilds, what do I need to source before the ebuild > > itself? I mean, to get things like "inherit" working. > All of ebuild.sh. Seriously. :) > > Thing to note is that ebuilds aren't just ran via bash /path/to/ebuild; > lot of functions are expected to exist for ebuilds to work, inherit fex > (bash function). Ok. > I'd suggest grabbing > http://dev.gentoo.org/~ferringb/portage-cvs.tar.bz2, and looking through > bin/ebuild*.sh and bin/isolated* > > With the exemption of ebuild-daemon.sh, all of that code is required to > create the appropriate bash environment that ebuilds expect. Even with > that default env, eclasses exist to extend it and add new functionality. > > Portage *does* have issues that need correcting, calling > patterns/design/structure changed, etc. Trying to elaborate on the > issues above, hope it provides some insight into why things are they way > they are (and potential avenues to check out for improving performance). I'll try the CVS. Thank you for your time and patient for replying to my kinda rude question/doubts. I'll try to help as far as possible. If there are minor works, I could start learning portage in more depth. -- Gustavo Sverzut Barbieri --------------------------------------- Computer Engineer 2001 - UNICAMP GPSL - Grupo Pro Software Livre Cell..: +55 (19) 9165 8010 Jabber: gsbarbieri@jabber.org ICQ#: 17249123 GPG: 0xB640E1A2 @ wwwkeys.pgp.net -- gentoo-portage-dev@gentoo.org mailing list