From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24809 invoked by uid 1002); 6 Dec 2003 16:58:50 -0600 Mailing-List: contact gentoo-portage-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail Reply-To: gentoo-portage-dev@gentoo.org X-BeenThere: gentoo-portage-dev@gentoo.org Received: (qmail 4627 invoked from network); 6 Dec 2003 16:58:45 -0600 From: George Shapovalov Organization: Gentoo Linux To: gentoo-portage-dev@gentoo.org Date: Sat, 6 Dec 2003 15:00:03 -0800 User-Agent: KMail/1.5.4 References: <200312050158.17479.george@gentoo.org> <200312051333.04801.george@gentoo.org> <200312061526.52187.pauldv@gentoo.org> In-Reply-To: <200312061526.52187.pauldv@gentoo.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200312061500.03882.george@gentoo.org> X-Spam-Status: No, hits=0.0 tagged_above=-100000.0 required=5.0 X-Spam-Level: Subject: Re: [gentoo-portage-dev] portage-ng concurse entry Was: Updated Portage project page X-Archives-Salt: 6743365a-7f26-4ade-9d5b-7c5c290eea82 X-Archives-Hash: cfe93652e8007e579a8a19e433a2b962 I have been thinking a bit more about dfs vs bfs traversals, so here goes :). On Saturday 06 December 2003 06:26, Paul de Vrieze wrote: > > Prolog interpreters work as depth-first searchers. Breath-first is in most Yea, and unfortunately there isn't much choice about it as recursion is pretty much the only composition method the standard talks. > cases a waste of both memory and clock cycles. In the case of portage this Its really only O(n) where small n stands for associated dependencies, so that's not that much. Besides it does nopt require recursion to be implemented, so in reality that hit quite often is not even there. And then it gives: > would mean something like first getting deep dependencies like glibc, and > then less important ones. And isn't this the "right" way to do it? Surely operating on a *complete* portage tree, that is the one where all the dependencies are explicitly stated, that wouldn't make any difference. But in reality glibc and similar dependencies, being a part of the system, are omitted in pretty much all ebuilds. Thus, with DFS traversal (which is what portage does now) one is getting glibc somewhere in the middle of world update quite often, which is not optimal if you want complete coherency (and it can be plain bad in the case of the major update. This is somewhat alleviated in our case for system packages by a mandatory bootstrap, but some stuff on perifery can be hit, if multiple things use the same library and it is affected for example (and not explicitly included, say because it is consdered "basic")). > > I somewhat agree with that. However my observation is that these packages > > most often are libraries or other helpers that user is usualy not > > interested in directly. And the long operations are the ones where you > > have to do searches or operations involving world. So I have been > > thinking along the lines of making the longest operations fast and > > bounded, getting the O(1) > > responsivity. > > But in reality I do not think this shpould be "the ultimate solution" or > > even that there should be one. See the next one. > > I don't believe that the average size of world is going to grow > significantly. No, but quite often it includes "major" packages and that means traversal of a significant portion of the portage.., but the leafs representing uncommon packages are not touched, that I agree on. Thus the more efficient approach (and I had a few thoughts on this, although without much detail, so didn't document it :)) would be to operate on a memory-mapped cache and on startup suck into memory only the entries having weight above certain threshold. Weights should probably be proportional to the numbers of dependencies *coming into* the correpsponding package. Thus commonly used stuff gets read and the traversals are instantaneous, but 99% of the tree just sits on disk waitning to be called for... > > And finally: > > > My biggest concern when reading your small paper is that you chose a > > > deterministic approach to this problem, while in fact the problem is > > > non-determinisitic. > > > > Could you please elaborate? I am afraid we are thinking about slightly > > different things here. > > I didn't write that, guess you'll need to ask the person who did. Um, sorry, I somehow got confused and thought some postings on -dev were from you as well. I just sent relevant parts to proper place :). George -- gentoo-portage-dev@gentoo.org mailing list