On Mon, 24 Jun 2013 15:27:19 +0000 (UTC) Duncan <1i5t5.duncan@cox.net> wrote: > > I have one; it's great to help make my boot short, but it isn't > > really a great improvement for the Portage tree. Better I/O isn't a > > solution to computational complexity; it doesn't deal with the CPU > > bottleneck. > > But here, agreed with ciaranm, the cpu's not the bottleneck, at least > not from cold-cache. It doesn't even up the cpu clocking from > minimum as it's mostly filesystem access. Once the cache is warm, > then yes, it ups the CPU speed and I see the single-core behavior you > mention, but cold- cache, no way; it's I/O bound. > > And with an ssd, the portage tree update (the syncs both of gentoo > and the overlays) went from a /crawling/ console scroll, to scrolling > so fast I can't read it. We're not talking about the Portage tree update, but about the dependency tree generation, which relies much more on the CPU than I/O. A lot of loops inside loops inside loops, comparisons and more data structure magic is going on; if this were optimized to be of a lower complexity or be processed by multiple cores, this would speed up a lot. Take a look at the profiler image and try to get a quick understanding of the code; after following a few function calls, it will become clear. Granted, I/O is still a part of the problem which is why I think caches would help too; but from what I see the time / space complexity is just too high, so you don't even have to deem this as CPU or I/O bound... > >> Quite apart from the theory and question of making the existing > >> code faster vs. a new from-scratch implementation, there's the > >> practical question of what options one can actually use to deal > >> with the problem /now/. > > > > Don't rush it: Do you know the problem well? Does the solution > > properly deal with it? Is it still usable some months / years from > > now? > > Not necessarily. But first we must /get/ to some months / years from > now, and that's a lot easier if the best is made of the current > situation, while a long term fix is being developed. True, we have make and use the most out of Portage as long as possible. > >> FWIW, one solution (particularly for folks who don't claim to have > >> reasonable coding skills and thus have limited options in that > >> regard) is to throw hardware at the problem. > > > > Improvements in algorithmic complexity (exponential) are much bigger > > than improvements you can achieve by buying new hardware (linear). > > Same song different verse. Fixing the algorithmic complexity is fine > and certainly a good idea longer term, but it's not something I can > use at my next update. Throwing hardware at the problem is usable > now. If you have the money; yes, that's an option. Though I think a lot of people see Linux as something you don't need to throw a lot of money at; it should run on low end systems, and that's kind of the type of users we shouldn't just neglect going forward. > >> [2] ... SNIP ... runs ~1 hour ... SNIP ... > > > > Sounds great, but the same thing could run in much less time. I have > > worse hardware, and it doesn't take much longer than yours do; so, I > > don't really see the benefits new hardware bring to the table. And > > that HDD to SSD change, that's really a once in a lifetime flood. > > I expect I'm more particular than most about checking changelogs. I > certainly don't read them all, but if there's a revision-bump for > instance, I like to see what the gentoo devs considered important > enough to do a revision bump. And I religiously check portage logs, > selecting mentioned bug numbers probably about half the time, which > pops up a menu with a gentoo bug search on the number, from which I > check the bug details and sometimes the actual git commit code. For > all my overlays I check the git whatchanged logs, and I have a helper > script that lets me fetch and then check git whatchanged for a number > of my live packages, including openrc (where I switched to live-git > precisely /because/ I was following it closely enough to find the git > whatchanged logs useful, both for general information and for > troubleshooting when something went wrong -- release versions simply > didn't have enough resolution, too many things changing in each > openrc release to easily track down problems and file bugs as > appropriate), as well. I stick more to releases and checking the changes for things where I want to know the changes for; for the others, they either don't matter or they shouldn't really hurt as a surprise. If there's something that would really surprise me then I'd expect some news on that. > And you're probably not rebuilding well over a hundred live-packages > (thank $DEITY and the devs in question for ccache!) at every update, > in addition to the usual (deep) @world version-bump and newuse > updates, are you? Developers rebuild those to see upcoming breakage. Apart from that, I don't use many -9999 as to not go too unstable. > >> [3] Also relevant, 16 gigs RAM, PORTAGETMPDIR on tmpfs. > > > > Sounds all cool, but think about your CPU again; saturate it... > > > > Building the Linux kernel with `make -j32 -l8` versus `make -j8` is > > a huge difference; most people follow the latter instructions, > > without really thinking through what actually happens with the > > underlying data. The former queues up jobs for your processor; so > > the moment a job is done a new job will be ready, so, you don't > > need to wait on the disk. > > Truth is, I used to run a plain make -j (no number and no -l at all) > on my kernel builds, just to watch the system stress and then so > elegantly recover. It's an amazing thing to watch, this Linux kernel > thing and how it deals with cpu oversaturation. =:^) If you have the memory to pull it off, which involves money again. > But I suppose I've gotten more conservative in my old age. =:^P > Needlessly oversaturating the CPU (and RAM) only slows things down > and forces cache dump and swappage. The trick is to set it a bit before the point of oversaturating; low enough so most packages don't oversaturize, it could be put more precisely for every package but that time is better spent elsewhere > > Something completely different; look at the history of data mining, > > today's algorithms are much much faster than those of years ago. > > > > Just to point out that different implementations and configurations > > have much more power in cutting time than the typical hardware > > change does. > > I agree and am not arguing that. All I'm saying is that there are > measures that a sysadmin can take today to at least help work around > the problem, today, while all those faster algorithms are being > developed, implemented, tested and deployed. =:^) Not everyone is a sysadmin with a server; I'm just a student running a laptop bought some years ago, and I'm kind of the type that doesn't replace it while it still works fine otherwise. Maybe when I graduate... I think we can both agree a faster system does a better job at it; but they won't deal with crux of the problem, the algorithmic complexity. Dealing with both, as you mention, is the real deal. -- With kind regards, Tom Wijsman (TomWij) Gentoo Developer E-mail address : TomWij@gentoo.org GPG Public Key : 6D34E57D GPG Fingerprint : C165 AF18 AB4C 400B C3D2 ABF0 95B2 1FCD 6D34 E57D