* [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. @ 2013-04-25 18:58 Александр Берсенев 2013-04-26 1:17 ` Zac Medico 0 siblings, 1 reply; 7+ messages in thread From: Александр Берсенев @ 2013-04-25 18:58 UTC (permalink / raw To: gentoo-soc [-- Attachment #1: Type: text/plain, Size: 1690 bytes --] Hello, my name is Alexander Bersenev, I am postgraduate of Institute of Mathematics and Mechanics(Russia). I want to propose a project for GSoC 2013 and ask what do you think about it. In short: I want to reduce the "Calculating dependencies" phase of emerge. On my notebook "emerge -pv bash" command takes 40 secs to calculate a deps. If I launch it again, it take about 40 secs again(a have a lot of RAM, so there was no HDD usage). Of course, quick cprofile profiling showed no places to optimize because such optimizations already have been made. The main idea is add some caching layers(more high-level, than in /usr/portage/metadata/md5-cache/). The main goal is to find and eliminate repeated computations between "emerge" runs. As part of work I plan to examine approaches of other pkg managers(yum, aptitude). I heard from Donnie Berkholz in IRC about pkgcore project. He said it works faster in practice. But it has some problems with EAPI5 support. What is better: actualize a pkgcore code or try to dig into portage? Or it is the bad ideas at all? ---- Some info about me: - github: https://github.com/alexbers/ - twitter: https://twitter.com/alex_bers - I was participated in GSoC 2011 with Autodep(auto dependency checker) project. - I administer ~250 nodes cluster in Institute of Mathematics and Mechanics - I use Gentoo as my primary OS since 2007. - I interested in computer security. Participated in Defcon CTF(Las Vegas) and in Nuit du Hack CTF(Paris, won 4000 euro) as member of Hackerdom team. Also we organize RuCTF and RuCTFE annual competitions, which likely are the biggest in Russia(http://ructf.org/index.en.html). ---- Best, Alexander Bersenev [-- Attachment #2: Type: text/html, Size: 2421 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. 2013-04-25 18:58 [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project Александр Берсенев @ 2013-04-26 1:17 ` Zac Medico 2013-04-26 11:43 ` Александр Берсенев 0 siblings, 1 reply; 7+ messages in thread From: Zac Medico @ 2013-04-26 1:17 UTC (permalink / raw To: gentoo-soc On Thu, Apr 25, 2013 at 11:58 AM, Александр Берсенев <bay@hackerdom.ru> wrote: > Hello, > > my name is Alexander Bersenev, I am postgraduate of Institute of Mathematics > and Mechanics(Russia). Hello, it's nice to meet you. > I want to propose a project for GSoC 2013 and ask what do you think about > it. > > In short: I want to reduce the "Calculating dependencies" phase of emerge. > > On my notebook "emerge -pv bash" command takes 40 secs to calculate a deps. > If I launch it again, it take about 40 secs again(a have a lot of RAM, so > there was no HDD usage). A few things to note: 1) It will make a big difference if there is a bash version upgrade, or if the bash USE flags have changed. This is due to the --complete-graph-if-new-use and --complete-graph-if-new-ver options which are enabled by default. This behavior serves to protect reverse-dependencies from being broken. 2) Portage assumes that the portage tree can be modified between each emerge invocation. This is assumption necessary for development situations, but it has the disadvantage of introducing some extra overhead (comparing checksums of ebuilds and eclasses to the checksums found in the corresponding md5-cache entries). It would be possible to have an alternative "frozen tree" mode of operation which assumes that the portage tree can _not_ be modified between emerge invocations, and this mode would be more optimal for non-development situations. 3) Putting the portage tree on squashfs can help in some situations, since it allows the whole tree to easily fit into RAM and be accessed quickly. > Of course, quick cprofile profiling showed no places to optimize because > such optimizations already have been made. > > The main idea is add some caching layers(more high-level, than in > /usr/portage/metadata/md5-cache/). The main goal is to find and eliminate > repeated computations between "emerge" runs. > > As part of work I plan to examine approaches of other pkg managers(yum, > aptitude). > > I heard from Donnie Berkholz in IRC about pkgcore project. He said it works > faster in practice. But it has some problems with EAPI5 support. > > What is better: actualize a pkgcore code or try to dig into portage? Or it > is > the bad ideas at all? I suspect the pkgcore may already have a "frozen tree" mode, among other optimizations. However, it's not very useful until EAPI 5 support is completed. Adding "frozen tree" support to portage might be a nice enhancement, but I'm not sure how much performance increase that it would yield. The --complete-graph-* options that I've mentioned introduce a large amount of overhead that could easily overshadow any performance increase that a "frozen tree" optimization would give you. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. 2013-04-26 1:17 ` Zac Medico @ 2013-04-26 11:43 ` Александр Берсенев 2013-04-26 15:59 ` Zac Medico 0 siblings, 1 reply; 7+ messages in thread From: Александр Берсенев @ 2013-04-26 11:43 UTC (permalink / raw To: gentoo-soc [-- Attachment #1: Type: text/plain, Size: 4043 bytes --] Thanks for the answer, I lke "frozen tree" approach, because I think that most users don't change a package tree by hands. In this case it would be nice to have a command to invalidate the caches(like "yum clean" in yum). The "Calculating dependencies" stage time is short on servers with few packages installed. But the more packages one have installed, the more time spent on "Calculating dependencies"(and also on "installing" phase). I seen this on three my notebooks(between 2007-2013) and on my dedicated tinderbox server, which tried to install every package in portage(and check for missed dependencies). I have got 8gb of RAM, so HDD is almost unused after first run. Is it possingle to cache complete dependency graph(or parts of this graph) between launches? When I have been doing my last GSoC project(also about dependencies), I didn't manage to find a database of reverse deps. If it is not exists, may it be useful to create it to determine if full graph check is needed? Best, Alexander Bersenev 2013/4/26 Zac Medico <zmedico@gentoo.org> > On Thu, Apr 25, 2013 at 11:58 AM, Александр Берсенев <bay@hackerdom.ru> > wrote: > > Hello, > > > > my name is Alexander Bersenev, I am postgraduate of Institute of > Mathematics > > and Mechanics(Russia). > > Hello, it's nice to meet you. > > > I want to propose a project for GSoC 2013 and ask what do you think about > > it. > > > > In short: I want to reduce the "Calculating dependencies" phase of > emerge. > > > > On my notebook "emerge -pv bash" command takes 40 secs to calculate a > deps. > > If I launch it again, it take about 40 secs again(a have a lot of RAM, so > > there was no HDD usage). > > A few things to note: > > 1) It will make a big difference if there is a bash version upgrade, > or if the bash USE flags have changed. This is due to the > --complete-graph-if-new-use and --complete-graph-if-new-ver options > which are enabled by default. This behavior serves to protect > reverse-dependencies from being broken. > 2) Portage assumes that the portage tree can be modified between each > emerge invocation. This is assumption necessary for development > situations, but it has the disadvantage of introducing some extra > overhead (comparing checksums of ebuilds and eclasses to the checksums > found in the corresponding md5-cache entries). It would be possible to > have an alternative "frozen tree" mode of operation which assumes that > the portage tree can _not_ be modified between emerge invocations, and > this mode would be more optimal for non-development situations. > > 3) Putting the portage tree on squashfs can help in some situations, > since it allows the whole tree to easily fit into RAM and be accessed > quickly. > > > Of course, quick cprofile profiling showed no places to optimize because > > such optimizations already have been made. > > > > The main idea is add some caching layers(more high-level, than in > > /usr/portage/metadata/md5-cache/). The main goal is to find and eliminate > > repeated computations between "emerge" runs. > > > > As part of work I plan to examine approaches of other pkg managers(yum, > > aptitude). > > > > I heard from Donnie Berkholz in IRC about pkgcore project. He said it > works > > faster in practice. But it has some problems with EAPI5 support. > > > > What is better: actualize a pkgcore code or try to dig into portage? Or > it > > is > > the bad ideas at all? > > I suspect the pkgcore may already have a "frozen tree" mode, among > other optimizations. However, it's not very useful until EAPI 5 > support is completed. > > Adding "frozen tree" support to portage might be a nice enhancement, > but I'm not sure how much performance increase that it would yield. > The --complete-graph-* options that I've mentioned introduce a large > amount of overhead that could easily overshadow any performance > increase that a "frozen tree" optimization would give you. > > [-- Attachment #2: Type: text/html, Size: 5240 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. 2013-04-26 11:43 ` Александр Берсенев @ 2013-04-26 15:59 ` Zac Medico 2013-04-27 0:19 ` James Cloos 0 siblings, 1 reply; 7+ messages in thread From: Zac Medico @ 2013-04-26 15:59 UTC (permalink / raw To: gentoo-soc On Fri, Apr 26, 2013 at 4:43 AM, Александр Берсенев <bay@hackerdom.ru> wrote: > Is it possingle to cache complete dependency graph(or parts of this graph) > between launches? Yes, but it's very much dependent on using a "frozen tree" mode as we've discussed, because the emerge --dynamic-deps option is enabled by default. The --dynamic-deps behavior causes the dependency graph mutate when the portage trees or overlays mutate. > When I have been doing my last GSoC project(also about dependencies), I > didn't manage to find a database of reverse deps. If it is not exists, may > it be useful to create it to determine if full graph check is needed? It doesn't exist because of the default --dynamic-deps behavior and the lack of a "frozen tree" mode. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. 2013-04-26 15:59 ` Zac Medico @ 2013-04-27 0:19 ` James Cloos 2013-04-27 5:29 ` Александр Берсенев 0 siblings, 1 reply; 7+ messages in thread From: James Cloos @ 2013-04-27 0:19 UTC (permalink / raw To: gentoo-soc As someone whe often does edit ebuilds in overlays (very occasionally in /usr/portage, too), having to run something to update the cache for said overlay is OK. But it *must* update just the cli-specified overlay(s), w/o having to go through and update everything every time it is run. For comparison, my primary workstation, with several overlays, takes several minutes to do a dep. Even with a hot cache. Improving that to something reasonable is the single most important change portage can get. Also, if /var/db/pkg is to be cached, the existing /var/db/pkg layout should remain as a backup, so that the cache of what is installed can be restored easily should it ever get corrupted. Portage can update that cache after updating the /var/db/pkg/ tree. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. 2013-04-27 0:19 ` James Cloos @ 2013-04-27 5:29 ` Александр Берсенев 2013-04-27 19:23 ` Александр Берсенев 0 siblings, 1 reply; 7+ messages in thread From: Александр Берсенев @ 2013-04-27 5:29 UTC (permalink / raw To: gentoo-soc [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] The modification date of /usr/portage and overlays dir can be used as a signal for emerge to drop some caches. In this case the drop cache command could just do "touch <dir>", and utils, modifying tree(e.g. "ebuild <ebuild> manifest") could do this operation as well. Best, Alexander Bersenev 2013/4/27 James Cloos <cloos@jhcloos.com> > As someone whe often does edit ebuilds in overlays (very occasionally in > /usr/portage, too), having to run something to update the cache for said > overlay is OK. > > But it *must* update just the cli-specified overlay(s), w/o having to go > through and update everything every time it is run. > > For comparison, my primary workstation, with several overlays, takes > several minutes to do a dep. Even with a hot cache. Improving that to > something reasonable is the single most important change portage can get. > > Also, if /var/db/pkg is to be cached, the existing /var/db/pkg layout > should remain as a backup, so that the cache of what is installed can > be restored easily should it ever get corrupted. Portage can update > that cache after updating the /var/db/pkg/ tree. > > -JimC > -- > James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 > > [-- Attachment #2: Type: text/html, Size: 1770 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project. 2013-04-27 5:29 ` Александр Берсенев @ 2013-04-27 19:23 ` Александр Берсенев 0 siblings, 0 replies; 7+ messages in thread From: Александр Берсенев @ 2013-04-27 19:23 UTC (permalink / raw To: gentoo-soc [-- Attachment #1: Type: text/plain, Size: 1499 bytes --] Posted a proposal on https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/bay/28002 . Best, Alexander Bersenev 2013/4/27 Александр Берсенев <bay@hackerdom.ru> > The modification date of /usr/portage and overlays dir can be used as a > signal for emerge to drop some caches. In this case the drop cache command > could just do "touch <dir>", and utils, modifying tree(e.g. "ebuild > <ebuild> manifest") could do this operation as well. > > Best, > Alexander Bersenev > > > > 2013/4/27 James Cloos <cloos@jhcloos.com> > >> As someone whe often does edit ebuilds in overlays (very occasionally in >> /usr/portage, too), having to run something to update the cache for said >> overlay is OK. >> >> But it *must* update just the cli-specified overlay(s), w/o having to go >> through and update everything every time it is run. >> >> For comparison, my primary workstation, with several overlays, takes >> several minutes to do a dep. Even with a hot cache. Improving that to >> something reasonable is the single most important change portage can get. >> >> Also, if /var/db/pkg is to be cached, the existing /var/db/pkg layout >> should remain as a backup, so that the cache of what is installed can >> be restored easily should it ever get corrupted. Portage can update >> that cache after updating the /var/db/pkg/ tree. >> >> -JimC >> -- >> James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6 >> >> > [-- Attachment #2: Type: text/html, Size: 2454 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-04-27 19:23 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-25 18:58 [gentoo-soc] rfc: reducing the time of "Calculating dependencies" phase project Александр Берсенев 2013-04-26 1:17 ` Zac Medico 2013-04-26 11:43 ` Александр Берсенев 2013-04-26 15:59 ` Zac Medico 2013-04-27 0:19 ` James Cloos 2013-04-27 5:29 ` Александр Берсенев 2013-04-27 19:23 ` Александр Берсенев
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox