From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id B76991381F3 for ; Mon, 24 Jun 2013 23:21:10 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 7E1A2E095D; Mon, 24 Jun 2013 23:21:07 +0000 (UTC) Received: from gerard.telenet-ops.be (gerard.telenet-ops.be [195.130.132.48]) by pigeon.gentoo.org (Postfix) with ESMTP id 36DD5E0931 for ; Mon, 24 Jun 2013 23:21:06 +0000 (UTC) Received: from TOMWIJ-GENTOO ([94.226.55.127]) by gerard.telenet-ops.be with bizsmtp id sPM41l00Y2khLEN0HPM4k7; Tue, 25 Jun 2013 01:21:05 +0200 Date: Tue, 25 Jun 2013 01:18:07 +0200 From: Tom Wijsman To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] Re: Packages up for grabs Message-ID: <20130625011807.5a891b92@TOMWIJ-GENTOO> In-Reply-To: References: <1371376191.10717.15.camel@localhost> <1371390923.28535.67.camel@big_daddy.dol-sen.ca> <20130616164445.0c8f8f55@TOMWIJ-GENTOO> <1371402560.28535.79.camel@big_daddy.dol-sen.ca> <1371403298.22480.8.camel@localhost> <20130616202324.45cb3262@TOMWIJ-GENTOO> <20130616232427.063566d4@TOMWIJ-GENTOO> X-Mailer: Claws Mail 3.9.0 (GTK+ 2.24.18; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/g1DVGun/X1w19+QnS=J/q99"; protocol="application/pgp-signature" X-Archives-Salt: 09a411e7-341e-41b4-96f9-cf0109978674 X-Archives-Hash: a421602d948ae04e9f684bf7b28caa64 --Sig_/g1DVGun/X1w19+QnS=J/q99 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 24 Jun 2013 15:27:19 +0000 (UTC) Duncan <1i5t5.duncan@cox.net> wrote: > > I have one; it's great to help make my boot short, but it isn't > > really a great improvement for the Portage tree. Better I/O isn't a > > solution to computational complexity; it doesn't deal with the CPU > > bottleneck. >=20 > But here, agreed with ciaranm, the cpu's not the bottleneck, at least > not from cold-cache. It doesn't even up the cpu clocking from > minimum as it's mostly filesystem access. Once the cache is warm, > then yes, it ups the CPU speed and I see the single-core behavior you > mention, but cold- cache, no way; it's I/O bound. >=20 > And with an ssd, the portage tree update (the syncs both of gentoo > and the overlays) went from a /crawling/ console scroll, to scrolling > so fast I can't read it. We're not talking about the Portage tree update, but about the dependency tree generation, which relies much more on the CPU than I/O. A lot of loops inside loops inside loops, comparisons and more data structure magic is going on; if this were optimized to be of a lower complexity or be processed by multiple cores, this would speed up a lot. Take a look at the profiler image and try to get a quick understanding of the code; after following a few function calls, it will become clear. Granted, I/O is still a part of the problem which is why I think caches would help too; but from what I see the time / space complexity is just too high, so you don't even have to deem this as CPU or I/O bound... > >> Quite apart from the theory and question of making the existing > >> code faster vs. a new from-scratch implementation, there's the > >> practical question of what options one can actually use to deal > >> with the problem /now/. > >=20 > > Don't rush it: Do you know the problem well? Does the solution > > properly deal with it? Is it still usable some months / years from > > now? >=20 > Not necessarily. But first we must /get/ to some months / years from=20 > now, and that's a lot easier if the best is made of the current=20 > situation, while a long term fix is being developed. True, we have make and use the most out of Portage as long as possible. > >> FWIW, one solution (particularly for folks who don't claim to have > >> reasonable coding skills and thus have limited options in that > >> regard) is to throw hardware at the problem. > >=20 > > Improvements in algorithmic complexity (exponential) are much bigger > > than improvements you can achieve by buying new hardware (linear). >=20 > Same song different verse. Fixing the algorithmic complexity is fine > and certainly a good idea longer term, but it's not something I can > use at my next update. Throwing hardware at the problem is usable > now. If you have the money; yes, that's an option. Though I think a lot of people see Linux as something you don't need to throw a lot of money at; it should run on low end systems, and that's kind of the type of users we shouldn't just neglect going forward. > >> [2] ... SNIP ... runs ~1 hour ... SNIP ... > >=20 > > Sounds great, but the same thing could run in much less time. I have > > worse hardware, and it doesn't take much longer than yours do; so, I > > don't really see the benefits new hardware bring to the table. And > > that HDD to SSD change, that's really a once in a lifetime flood. >=20 > I expect I'm more particular than most about checking changelogs. I=20 > certainly don't read them all, but if there's a revision-bump for=20 > instance, I like to see what the gentoo devs considered important > enough to do a revision bump. And I religiously check portage logs, > selecting mentioned bug numbers probably about half the time, which > pops up a menu with a gentoo bug search on the number, from which I > check the bug details and sometimes the actual git commit code. For > all my overlays I check the git whatchanged logs, and I have a helper > script that lets me fetch and then check git whatchanged for a number > of my live packages, including openrc (where I switched to live-git > precisely /because/ I was following it closely enough to find the git > whatchanged logs useful, both for general information and for > troubleshooting when something went wrong -- release versions simply > didn't have enough resolution, too many things changing in each > openrc release to easily track down problems and file bugs as > appropriate), as well. I stick more to releases and checking the changes for things where I want to know the changes for; for the others, they either don't matter or they shouldn't really hurt as a surprise. If there's something that would really surprise me then I'd expect some news on that. > And you're probably not rebuilding well over a hundred live-packages=20 > (thank $DEITY and the devs in question for ccache!) at every update, > in addition to the usual (deep) @world version-bump and newuse > updates, are you? Developers rebuild those to see upcoming breakage. Apart from that, I don't use many -9999 as to not go too unstable. > >> [3] Also relevant, 16 gigs RAM, PORTAGETMPDIR on tmpfs. > >=20 > > Sounds all cool, but think about your CPU again; saturate it... > >=20 > > Building the Linux kernel with `make -j32 -l8` versus `make -j8` is > > a huge difference; most people follow the latter instructions, > > without really thinking through what actually happens with the > > underlying data. The former queues up jobs for your processor; so > > the moment a job is done a new job will be ready, so, you don't > > need to wait on the disk. >=20 > Truth is, I used to run a plain make -j (no number and no -l at all) > on my kernel builds, just to watch the system stress and then so > elegantly recover. It's an amazing thing to watch, this Linux kernel > thing and how it deals with cpu oversaturation. =3D:^) If you have the memory to pull it off, which involves money again. > But I suppose I've gotten more conservative in my old age. =3D:^P =20 > Needlessly oversaturating the CPU (and RAM) only slows things down > and forces cache dump and swappage. The trick is to set it a bit before the point of oversaturating; low enough so most packages don't oversaturize, it could be put more precisely for every package but that time is better spent elsewhere > > Something completely different; look at the history of data mining, > > today's algorithms are much much faster than those of years ago. > >=20 > > Just to point out that different implementations and configurations > > have much more power in cutting time than the typical hardware > > change does. >=20 > I agree and am not arguing that. All I'm saying is that there are=20 > measures that a sysadmin can take today to at least help work around > the problem, today, while all those faster algorithms are being > developed, implemented, tested and deployed. =3D:^) Not everyone is a sysadmin with a server; I'm just a student running a laptop bought some years ago, and I'm kind of the type that doesn't replace it while it still works fine otherwise. Maybe when I graduate... I think we can both agree a faster system does a better job at it; but they won't deal with crux of the problem, the algorithmic complexity. Dealing with both, as you mention, is the real deal. --=20 With kind regards, Tom Wijsman (TomWij) Gentoo Developer E-mail address : TomWij@gentoo.org GPG Public Key : 6D34E57D GPG Fingerprint : C165 AF18 AB4C 400B C3D2 ABF0 95B2 1FCD 6D34 E57D --Sig_/g1DVGun/X1w19+QnS=J/q99 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQEcBAEBAgAGBQJRyNO1AAoJEJWyH81tNOV9n1UIAKe3tt7QKDtjHdSfNu1NJegy ZLz4BHmUUIIuPN/MrvDhbX3LzEU3hBKrlej6tStT1I10zHLq1ibAPwNAg104ocGB NmVO6zqv7iA6CjgyZTjSwtYkmXfKbmEBJsJkGp4e+5GqvYsXw59r+zsRt3r7dIAp Eg+U2DBbYzQ2YANwnEzq3PWWSawmg96GqVgagPxJDya8J8tgouQQKlvfDSHfY6lf eW+DvqsRRAAzW/uH66yZMJ9dcGVUTgVC9U1wT8gx5NsM8MvC62C7kSNaTYCC0TbX WS886JWHAjzE8b8c5yp03611Hj+qrZ8Y6nIhtfV9IdYDyylk8DNm4FtRIpLKYJU= =OZ0C -----END PGP SIGNATURE----- --Sig_/g1DVGun/X1w19+QnS=J/q99--