From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1M9Zxt-00082u-G3 for garchives@archives.gentoo.org; Thu, 28 May 2009 07:23:50 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id ADAA6E0408; Thu, 28 May 2009 07:23:47 +0000 (UTC) Received: from dev.gentooexperimental.org (dev.gentooexperimental.org [81.93.240.53]) by pigeon.gentoo.org (Postfix) with ESMTP id 6663EE0408 for ; Thu, 28 May 2009 07:23:47 +0000 (UTC) Received: from lolcathost.localnet (xdsl-84-44-142-186.netcologne.de [84.44.142.186]) by dev.gentooexperimental.org (Postfix) with ESMTP id E535962D5B4 for ; Thu, 28 May 2009 09:23:46 +0200 (CEST) From: Patrick Lauer To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] Gentoo Council Reminder for May 28 Date: Thu, 28 May 2009 09:23:46 +0200 User-Agent: KMail/1.11.90 (Linux/2.6.30-rc6-git7; KDE/4.2.87; x86_64; ; ) References: <1243460607.3480.3@NeddySeagoon> <1243489596.10450.24.camel@localhost> In-Reply-To: <1243489596.10450.24.camel@localhost> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Message-Id: <200905280923.46297.patrick@gentoo.org> X-Archives-Salt: 9231b26d-c0d5-426a-938f-9cd075f2dc46 X-Archives-Hash: 9d7d746aff4edfeaee7bd541f70e487e On Thursday 28 May 2009 07:46:36 Tiziano M=FCller wrote: > And here is why (I'm only looking at the non-degenerated case with valid > metadata, ignoring overlays which some consider a corner case (I don't > understand that argument, but that's another thing)): overlays tend to come without metadata. Just enabling the KDE overlay chang= ed=20 the time for "emerge -upNDv world" from ~30 seconds cold cache to ~120=20 seconds. Running emerge --metadata gets the performance back to pretty much= =20 the old levels. > When the package manager looks at a package, it first reads the > package's ebuild directory and gets the mtimes. It does the same for the > cache entries and validates the caches (there is more stuff in here, > like checking eclasses and so on). Eclasses are negligible because you only have to look at them once for the= =20 whole caclulation. You can cache the mtime for the duration of your operati= on. > Then the following happens based on the "solution" we choose: > eapi-in-filename: the package manager starts from the highest version > with a supported eapi (the others are inexistant with the used glob). > For that ebuild it reads the cache entry and decides whether or not it > can be used.=20 In this case you amusingly do NOT want to cache the eapi in the cache, so y= ou=20 can even defer sourcing the ebuild until you actually need the metadata. (You don't want to cache it because you need to check the file mtime anyway= ,=20 and then you read the filename anyway. No need to look for it in another pl= ace=20 then :) ) > If not, it proceeds to the next version, if yes, it's done. > eapi-in-ebuild: the package manager reads all cache entries and sorts > out those with an EAPI it doesn't support. The rest gets ordered and the > same procedure as above applies. > > So, one of the main differences is: "reading one cache file" (if running > unstable you can asssume you support the highest version, thus reading > only one cache file) vs. "reading all cache files". That assumes a dumb cache format.=20 Why don't we make the cache more efficient so you read one file per package= /=20 category / ... ? > > I did some performance measurements based on that. I have 1507 installed > packages with 5541 different versions/revisions. > > Reading from hot cache: > 1507 files: ~50ms > 5541 files: ~170ms > > Reading from cold cache: > 1507 files: ~2.8s > 5541 files: ~6s And now you need to pull metadata for dependency calculation. How big is th= e=20 impact of that? > > I made a lot of assumptions here (neglecting seek between ebuild-dir and > metadata-dir, other processes using the drive, 80 ebuilds from overlays > where the ebuild would have to be read, etc.). But estimating from the > numbers above I'd say that a "emerge -uD world"/"paludis -i world" will > be at least twice as slow, which I think is not acceptable. I find that quite acceptable. As long as we're using such a bad layout the= =20 performance is secondary. To fix the performance you'd "only" have to guarantee that the repo is=20 unchanged (readonly), so you can add lots of simple caches/indexes - no nee= d=20 to source any ebuild for metadata again, one cachefile for eapi if you want= =20 =2E.. I bet you find lots of small improvements that that would yield. Much= more=20 impressive than managing to avoid a few open() here and there ... > And I also don't understand your point of stating it's "bad design". Bad design is like smelly feet. It's hard not to notice ... > I mean: when coding you should "not optimize prematurely", but with > eapi-in-ebuild it is against the other principle of "not pessimize > prematurely" (Sutter/Alexandrescu: C++ Coding Standards). If you quote that try the full quote: "We should forget about small efficiencies, say about 97% of the time:=20 premature optimization is the root of all evil." In other words, we should not try to make that path faster when we can avoi= d=20 hitting it at all with a small design revision.