From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-soc+bounces-1831-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	by finch.gentoo.org (Postfix) with ESMTP id 0D01F1381F3
	for <garchives@archives.gentoo.org>; Fri, 26 Apr 2013 11:43:36 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 8AA73E084D;
	Fri, 26 Apr 2013 11:43:35 +0000 (UTC)
Received: from mail-bk0-f50.google.com (mail-bk0-f50.google.com [209.85.214.50])
	(using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id A95BBE084D
	for <gentoo-soc@lists.gentoo.org>; Fri, 26 Apr 2013 11:43:34 +0000 (UTC)
Received: by mail-bk0-f50.google.com with SMTP id ik5so1685781bkc.23
        for <gentoo-soc@lists.gentoo.org>; Fri, 26 Apr 2013 04:43:33 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20120113;
        h=mime-version:x-received:in-reply-to:references:date:message-id
         :subject:from:to:content-type:x-gm-message-state;
        bh=vuX4Mn1FR0fh9mK0VTZcvucqnHarpeEqohkOBbisnUQ=;
        b=H3BvHzwlYP7Ng6Kw0686NLZB/wGdP+b1HtGQMZMk81eQSCaMcu+qlYqhDyjVGKQPRg
         ZJ/137WpphdbHn8uGz4YbH5xf+xptg8zWA5//+Z+dwoc8OkXkJYLdGGtVMaHthrs1ZiS
         fIACkhs4xGvhe8YZSh9qjo/syQsBB0ViOz2Nx5jhXDAy2nHbUBWJIZAAIc7Lqh4qW2jp
         cD9oq0s+LnDtf9hQniQ3Bmyi8Oox8Y2F3bZGQmacfbj6HzWv491ENknjpz1xD41ekhZA
         Lt8dzw6cbMNAP7ewlX5lAUs2/Sy8bt2xvg8RHlgd5NMZUBaSNdy0v6Ls5FDlny1BAeEN
         Bjsw==
Precedence: bulk
List-Post: <mailto:gentoo-soc@lists.gentoo.org>
List-Help: <mailto:gentoo-soc+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-soc+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-soc+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-soc.gentoo.org>
X-BeenThere: gentoo-soc@lists.gentoo.org
Reply-to: gentoo-soc@lists.gentoo.org
MIME-Version: 1.0
X-Received: by 10.204.174.143 with SMTP id t15mr18026605bkz.37.1366976612890;
 Fri, 26 Apr 2013 04:43:32 -0700 (PDT)
Received: by 10.205.0.10 with HTTP; Fri, 26 Apr 2013 04:43:32 -0700 (PDT)
In-Reply-To: <CAMiTYSpdo05pjpO4yoQ1pkoEsuh0zX6B=N9=y_juwQ11YTez9A@mail.gmail.com>
References: <CAPomEdyQSFjvvBrbEfO1g_9uOc0cAZy1S8kFc=51bYVnfQmhqw@mail.gmail.com>
	<CAMiTYSpdo05pjpO4yoQ1pkoEsuh0zX6B=N9=y_juwQ11YTez9A@mail.gmail.com>
Date: Fri, 26 Apr 2013 17:43:32 +0600
Message-ID: <CAPomEdzcwvv7DxTx2WVNqzScCy3O0e2aD94JEJ9LRb-V9TwnfQ@mail.gmail.com>
Subject: Re: [gentoo-soc] rfc: reducing the time of "Calculating dependencies"
 phase project.
From: =?UTF-8?B?0JDQu9C10LrRgdCw0L3QtNGAINCR0LXRgNGB0LXQvdC10LI=?= <bay@hackerdom.ru>
To: gentoo-soc@lists.gentoo.org
Content-Type: multipart/alternative; boundary=bcaec52d569d6db4b504db420b3e
X-Gm-Message-State: ALoCoQl9vzYPADJG9UjjfeROEy3TquceoHmwxhS/aaPJzlRMlRGQ5u8sy0o/sCtJNTq86kGenZLx
X-Archives-Salt: 00bc44cd-6dbd-40b2-95ec-262f1377c6f6
X-Archives-Hash: ab1783c198636d1e0a50b86720a8e690

--bcaec52d569d6db4b504db420b3e
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks for the answer,

I lke "frozen tree" approach, because I think that most users don't change
a package tree by hands.
In this case it would be nice to have a command to invalidate the
caches(like "yum clean" in yum).

The "Calculating dependencies" stage time is short on servers with few
packages installed. But the more packages one have installed, the more time
spent on "Calculating dependencies"(and also on "installing" phase). I seen
this on three my notebooks(between 2007-2013) and on my dedicated tinderbox
server, which tried to install every package in portage(and check for
missed dependencies).

I have got 8gb of RAM, so HDD is almost unused after first run.

Is it possingle to cache complete dependency graph(or parts of this graph)
between launches?
When I have been doing my last GSoC project(also about dependencies), I
didn't manage to find a database of reverse deps. If it is not exists, may
it be useful to create it to determine if full graph check is needed?

Best,
Alexander Bersenev


2013/4/26 Zac Medico <zmedico@gentoo.org>

> On Thu, Apr 25, 2013 at 11:58 AM, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B0=D0=
=BD=D0=B4=D1=80 =D0=91=D0=B5=D1=80=D1=81=D0=B5=D0=BD=D0=B5=D0=B2 <bay@hacke=
rdom.ru>
> wrote:
> > Hello,
> >
> > my name is Alexander Bersenev, I am postgraduate of Institute of
> Mathematics
> > and Mechanics(Russia).
>
> Hello, it's nice to meet you.
>
> > I want to propose a project for GSoC 2013 and ask what do you think abo=
ut
> > it.
> >
> > In short: I want to reduce the "Calculating dependencies" phase of
> emerge.
> >
> > On my notebook "emerge -pv bash" command takes 40 secs to calculate a
> deps.
> > If I launch it again, it take about 40 secs again(a have a lot of RAM, =
so
> > there was no HDD usage).
>
> A few things to note:
>
> 1) It will make a big difference if there is a bash version upgrade,
> or if the bash USE flags have changed. This is due to the
> --complete-graph-if-new-use and --complete-graph-if-new-ver options
> which are enabled by default. This behavior serves to protect
> reverse-dependencies from being broken.
>
2) Portage assumes that the portage tree can be modified between each
> emerge invocation. This is assumption necessary for development
> situations, but it has the disadvantage of introducing some extra
> overhead (comparing checksums of ebuilds and eclasses to the checksums
> found in the corresponding md5-cache entries). It would be possible to
> have an alternative "frozen tree" mode of operation which assumes that
> the portage tree can _not_ be modified between emerge invocations, and
> this mode would be more optimal for non-development situations.
>
> 3) Putting the portage tree on squashfs can help in some situations,
> since it allows the whole tree to easily fit into RAM and be accessed
> quickly.
>
> > Of course, quick cprofile profiling showed no places to optimize becaus=
e
> > such optimizations already have been made.
> >
> > The main idea is add some caching layers(more high-level, than in
> > /usr/portage/metadata/md5-cache/). The main goal is to find and elimina=
te
> > repeated computations between "emerge" runs.
> >
> > As part of work I plan to examine approaches of other pkg managers(yum,
> > aptitude).
> >
> > I heard from Donnie Berkholz in IRC about pkgcore project. He said it
> works
> > faster in practice. But it has some problems with EAPI5 support.
> >
> > What is better: actualize a pkgcore code or try to dig into portage? Or
> it
> > is
> > the bad ideas at all?
>
> I suspect the pkgcore may already have a "frozen tree" mode, among
> other optimizations. However, it's not very useful until EAPI 5
> support is completed.
>
> Adding "frozen tree" support to portage might be a nice enhancement,
> but I'm not sure how much performance increase that it would yield.
> The --complete-graph-* options that I've mentioned introduce a large
> amount of overhead that could easily overshadow any performance
> increase that a "frozen tree" optimization would give you.
>
>

--bcaec52d569d6db4b504db420b3e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks for the answer,<div><br></div><div><div>I lke &quot=
;frozen tree&quot; approach, because I think that most users don&#39;t chan=
ge a package tree by hands.</div><div>In this case it would be nice to have=
 a command to invalidate the caches(like &quot;yum clean&quot; in yum). =C2=
=A0</div>

</div><div><br></div><div><div>The &quot;Calculating dependencies&quot; sta=
ge time is short on servers with few packages installed. But the more packa=
ges one have installed, the more time spent on &quot;Calculating dependenci=
es&quot;(and also on &quot;installing&quot; phase). I seen this on three my=
 notebooks(between 2007-2013) and on my dedicated tinderbox server, which t=
ried to install every package in portage(and check for missed dependencies)=
.</div>
<div><br></div><div>I have got 8gb of RAM, so HDD is almost unused after fi=
rst run.</div><div></div></div><div><br></div><div>Is it possingle to cache=
 complete dependency graph(or parts of this graph) between launches?</div>
<div>When I have been doing my last GSoC project(also about dependencies), =
I didn&#39;t manage to find a database of reverse deps. If it is not exists=
, may it be useful to create it to determine if full graph check is needed?=
<br>

</div><div><br></div><div style>Best,</div><div style>Alexander Bersenev</d=
iv><div style><br></div><div><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">2013/4/26 Zac Medico <span dir=3D"ltr">&lt;<a href=3D"mailto:zm=
edico@gentoo.org" target=3D"_blank">zmedico@gentoo.org</a>&gt;</span><br>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div>On Thu, Apr 25, 2013 at 11:58 AM, =D0=90=D0=BB=D0=B5=
=D0=BA=D1=81=D0=B0=D0=BD=D0=B4=D1=80 =D0=91=D0=B5=D1=80=D1=81=D0=B5=D0=BD=
=D0=B5=D0=B2 &lt;<a href=3D"mailto:bay@hackerdom.ru" target=3D"_blank">bay@=
hackerdom.ru</a>&gt; wrote:<br>


&gt; Hello,<br>
&gt;<br>
&gt; my name is Alexander Bersenev, I am postgraduate of Institute of Mathe=
matics<br>
&gt; and Mechanics(Russia).<br>
<br>
</div>Hello, it&#39;s nice to meet you.<br>
<div><br>
&gt; I want to propose a project for GSoC 2013 and ask what do you think ab=
out<br>
&gt; it.<br>
&gt;<br>
&gt; In short: I want to reduce the &quot;Calculating dependencies&quot; ph=
ase of emerge.<br>
&gt;<br>
&gt; On my notebook &quot;emerge -pv bash&quot; command takes 40 secs to ca=
lculate a deps.<br>
&gt; If I launch it again, it take about 40 secs again(a have a lot of RAM,=
 so<br>
&gt; there was no HDD usage).<br>
<br>
</div>A few things to note:<br>
<br>
1) It will make a big difference if there is a bash version upgrade,<br>
or if the bash USE flags have changed. This is due to the<br>
--complete-graph-if-new-use and --complete-graph-if-new-ver options<br>
which are enabled by default. This behavior serves to protect<br>
reverse-dependencies from being broken.=C2=A0<br></blockquote><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1p=
x;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1=
ex">


2) Portage assumes that the portage tree can be modified between each<br>
emerge invocation. This is assumption necessary for development<br>
situations, but it has the disadvantage of introducing some extra<br>
overhead (comparing checksums of ebuilds and eclasses to the checksums<br>
found in the corresponding md5-cache entries). It would be possible to<br>
have an alternative &quot;frozen tree&quot; mode of operation which assumes=
 that<br>
the portage tree can _not_ be modified between emerge invocations, and<br>
this mode would be more optimal for non-development situations.<br>
<br>
3) Putting the portage tree on squashfs can help in some situations,<br>
since it allows the whole tree to easily fit into RAM and be accessed<br>
quickly.<br>
<div><br>
&gt; Of course, quick cprofile profiling showed no places to optimize becau=
se<br>
&gt; such optimizations already have been made.<br>
&gt;<br>
&gt; The main idea is add some caching layers(more high-level, than in<br>
&gt; /usr/portage/metadata/md5-cache/). The main goal is to find and elimin=
ate<br>
&gt; repeated computations between &quot;emerge&quot; runs.<br>
&gt;<br>
&gt; As part of work I plan to examine approaches of other pkg managers(yum=
,<br>
&gt; aptitude).<br>
&gt;<br>
&gt; I heard from Donnie Berkholz in IRC about pkgcore project. He said it =
works<br>
&gt; faster in practice. But it has some problems with EAPI5 support.<br>
&gt;<br>
&gt; What is better: actualize a pkgcore code or try to dig into portage? O=
r it<br>
&gt; is<br>
&gt; the bad ideas at all?<br>
<br>
</div>I suspect the pkgcore may already have a &quot;frozen tree&quot; mode=
, among<br>
other optimizations. However, it&#39;s not very useful until EAPI 5<br>
support is completed.<br>
<br>
Adding &quot;frozen tree&quot; support to portage might be a nice enhanceme=
nt,<br>
but I&#39;m not sure how much performance increase that it would yield.<br>
The --complete-graph-* options that I&#39;ve mentioned introduce a large<br=
>
amount of overhead that could easily overshadow any performance<br>
increase that a &quot;frozen tree&quot; optimization would give you.<br>
<br>
</blockquote></div><br></div></div></div>

--bcaec52d569d6db4b504db420b3e--