public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Michał Górny" <mgorny@gentoo.org>
To: gentoo-portage-dev@lists.gentoo.org,Zac Medico
	<zmedico@gentoo.org>,Chun-Yu Shei <cshei@google.com>
Subject: Re: [gentoo-portage-dev] Add caching to a few commonly used functions
Date: Sun, 28 Jun 2020 03:12:21 +0000	[thread overview]
Message-ID: <81F397B3-8261-47BF-B089-7947C4BF7827@gentoo.org> (raw)
In-Reply-To: <47f241aa-977f-e044-6770-f9f314747f85@gentoo.org>

Dnia June 28, 2020 3:00:00 AM UTC, Zac Medico <zmedico@gentoo.org> napisał(a):
>On 6/26/20 11:34 PM, Chun-Yu Shei wrote:
>> Hi,
>> 
>> I was recently interested in whether portage could be speed up, since
>> dependency resolution can sometimes take a while on slower machines.
>> After generating some flame graphs with cProfile and vmprof, I found
>3
>> functions which seem to be called extremely frequently with the same
>> arguments: catpkgsplit, use_reduce, and match_from_list.  In the
>first
>> two cases, it was simple to cache the results in dicts, while
>> match_from_list was a bit trickier, since it seems to be a
>requirement
>> that it return actual entries from the input "candidate_list".  I
>also
>> ran into some test failures if I did the caching after the
>> mydep.unevaluated_atom.use and mydep.repo checks towards the end of
>the
>> function, so the caching is only done up to just before that point.
>> 
>> The catpkgsplit change seems to definitely be safe, and I'm pretty
>sure
>> the use_reduce one is too, since anything that could possibly change
>the
>> result is hashed.  I'm a bit less certain about the match_from_list
>one,
>> although all tests are passing.
>> 
>> With all 3 patches together, "emerge -uDvpU --with-bdeps=y @world"
>> speeds up from 43.53 seconds to 30.96 sec -- a 40.6% speedup. 
>"emerge
>> -ep @world" is just a tiny bit faster, going from 18.69 to 18.22 sec
>> (2.5% improvement).  Since the upgrade case is far more common, this
>> would really help in daily use, and it shaves about 30 seconds off
>> the time you have to wait to get to the [Yes/No] prompt (from ~90s to
>> 60s) on my old Sandy Bridge laptop when performing normal upgrades.
>> 
>> Hopefully, at least some of these patches can be incorporated, and
>please
>> let me know if any changes are necessary.
>> 
>> Thanks,
>> Chun-Yu
>
>Using global variables for caches like these causes a form of memory
>leak for use cases involving long-running processes that need to work
>with many different repositories (and perhaps multiple versions of
>those
>repositories).
>
>There are at least a couple of different strategies that we can use to
>avoid this form of memory leak:
>
>1) Limit the scope of the caches so that they have some sort of garbage
>collection life cycle. For example, it would be natural for the
>depgraph
>class to have a local cache of use_reduce results, so that the cache
>can
>be garbage collected along with the depgraph.
>
>2) Eliminate redundant calls. For example, redundant calls to
>catpkgslit
>can be avoided by constructing more _pkg_str instances, since
>catpkgsplit is able to return early when its argument happens to be a
>_pkg_str instance.

I think the weak stuff from the standard library might also be helpful.

--
Best regards, 
Michał Górny


  reply	other threads:[~2020-06-28  3:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27  6:34 [gentoo-portage-dev] Add caching to a few commonly used functions Chun-Yu Shei
2020-06-27  6:34 ` [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function Chun-Yu Shei
2020-06-27 11:33   ` Michał Górny
2020-06-29  1:58   ` Sid Spry
2020-07-06 15:26     ` Francesco Riosa
     [not found]       ` <cdb0d821-67c1-edb6-2cbc-f26eaa0d3d70@veremit.xyz>
2020-07-06 16:10         ` Francesco Riosa
2020-07-06 17:30           ` Chun-Yu Shei
2020-07-06 18:03             ` Zac Medico
2020-07-07  3:41               ` Zac Medico
2020-07-09  7:03                 ` Chun-Yu Shei
2020-07-09  7:03                   ` [gentoo-portage-dev] [PATCH] Add caching to use_reduce, vercmp, and catpkgsplit Chun-Yu Shei
2020-07-12 21:46                     ` Zac Medico
2020-07-13  6:30                       ` [gentoo-portage-dev] [PATCH] Add caching to use_reduce, Chun-Yu Shei
2020-07-13  6:30                         ` [gentoo-portage-dev] [PATCH] Add caching to use_reduce, vercmp, and catpkgsplit Chun-Yu Shei
2020-07-13 17:28                           ` Zac Medico
2020-07-13 18:54                             ` Ulrich Mueller
2020-07-13 19:04                               ` Chun-Yu Shei
2020-07-13 19:24                                 ` Ulrich Mueller
2020-07-09 21:04                   ` [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function Alec Warner
2020-07-09 21:06                     ` Chun-Yu Shei
2020-07-09 21:13                       ` Alec Warner
2020-06-27  6:34 ` [gentoo-portage-dev] [PATCH 2/3] Add caching to use_reduce function Chun-Yu Shei
2020-06-27  6:34 ` [gentoo-portage-dev] [PATCH 3/3] Add partial caching to match_from_list Chun-Yu Shei
2020-06-27  7:35 ` [gentoo-portage-dev] Add caching to a few commonly used functions Fabian Groffen
2020-06-27  7:43   ` Chun-Yu Shei
2020-06-27  8:31   ` Kent Fredric
2020-06-28  3:00 ` Zac Medico
2020-06-28  3:12   ` Michał Górny [this message]
2020-06-28  3:42     ` Zac Medico
2020-06-28  5:30       ` Michał Górny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81F397B3-8261-47BF-B089-7947C4BF7827@gentoo.org \
    --to=mgorny@gentoo.org \
    --cc=cshei@google.com \
    --cc=gentoo-portage-dev@lists.gentoo.org \
    --cc=zmedico@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox