Thanks for the answer,

I lke "frozen tree" approach, because I think that most users don't change
a package tree by hands.
In this case it would be nice to have a command to invalidate the
caches(like "yum clean" in yum).

The "Calculating dependencies" stage time is short on servers with few
packages installed. But the more packages one have installed, the more time
spent on "Calculating dependencies"(and also on "installing" phase). I seen
this on three my notebooks(between 2007-2013) and on my dedicated tinderbox
server, which tried to install every package in portage(and check for
missed dependencies).

I have got 8gb of RAM, so HDD is almost unused after first run.

Is it possingle to cache complete dependency graph(or parts of this graph)
between launches?
When I have been doing my last GSoC project(also about dependencies), I
didn't manage to find a database of reverse deps. If it is not exists, may
it be useful to create it to determine if full graph check is needed?

Best,
Alexander Bersenev


2013/4/26 Zac Medico <zmedico@gentoo.org>

> On Thu, Apr 25, 2013 at 11:58 AM, Александр Берсенев <bay@hackerdom.ru>
> wrote:
> > Hello,
> >
> > my name is Alexander Bersenev, I am postgraduate of Institute of
> Mathematics
> > and Mechanics(Russia).
>
> Hello, it's nice to meet you.
>
> > I want to propose a project for GSoC 2013 and ask what do you think about
> > it.
> >
> > In short: I want to reduce the "Calculating dependencies" phase of
> emerge.
> >
> > On my notebook "emerge -pv bash" command takes 40 secs to calculate a
> deps.
> > If I launch it again, it take about 40 secs again(a have a lot of RAM, so
> > there was no HDD usage).
>
> A few things to note:
>
> 1) It will make a big difference if there is a bash version upgrade,
> or if the bash USE flags have changed. This is due to the
> --complete-graph-if-new-use and --complete-graph-if-new-ver options
> which are enabled by default. This behavior serves to protect
> reverse-dependencies from being broken.
>
2) Portage assumes that the portage tree can be modified between each
> emerge invocation. This is assumption necessary for development
> situations, but it has the disadvantage of introducing some extra
> overhead (comparing checksums of ebuilds and eclasses to the checksums
> found in the corresponding md5-cache entries). It would be possible to
> have an alternative "frozen tree" mode of operation which assumes that
> the portage tree can _not_ be modified between emerge invocations, and
> this mode would be more optimal for non-development situations.
>
> 3) Putting the portage tree on squashfs can help in some situations,
> since it allows the whole tree to easily fit into RAM and be accessed
> quickly.
>
> > Of course, quick cprofile profiling showed no places to optimize because
> > such optimizations already have been made.
> >
> > The main idea is add some caching layers(more high-level, than in
> > /usr/portage/metadata/md5-cache/). The main goal is to find and eliminate
> > repeated computations between "emerge" runs.
> >
> > As part of work I plan to examine approaches of other pkg managers(yum,
> > aptitude).
> >
> > I heard from Donnie Berkholz in IRC about pkgcore project. He said it
> works
> > faster in practice. But it has some problems with EAPI5 support.
> >
> > What is better: actualize a pkgcore code or try to dig into portage? Or
> it
> > is
> > the bad ideas at all?
>
> I suspect the pkgcore may already have a "frozen tree" mode, among
> other optimizations. However, it's not very useful until EAPI 5
> support is completed.
>
> Adding "frozen tree" support to portage might be a nice enhancement,
> but I'm not sure how much performance increase that it would yield.
> The --complete-graph-* options that I've mentioned introduce a large
> amount of overhead that could easily overshadow any performance
> increase that a "frozen tree" optimization would give you.
>
>