From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 78A8B1384B4 for ; Wed, 18 Nov 2015 17:55:31 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 2A43121C02A; Wed, 18 Nov 2015 17:55:23 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 4036721C022 for ; Wed, 18 Nov 2015 17:55:22 +0000 (UTC) Received: from [192.168.1.100] (c-98-218-46-55.hsd1.md.comcast.net [98.218.46.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mjo) by smtp.gentoo.org (Postfix) with ESMTPSA id EC88D340821 for ; Wed, 18 Nov 2015 17:55:18 +0000 (UTC) Subject: Re: [gentoo-dev] ChangeLog - Infra Response; update 2015/11/11, potential impact to 30min rsync cycle To: gentoo-dev@lists.gentoo.org References: <5636029F.1020304@gentoo.org> <22071.2900.829031.639829@a1i15.kph.uni-mainz.de> <20151105125406.3f2053f0@gentoo.org> <20151114170116.11634.qmail@stuge.se> <20151118144849.32759.qmail@stuge.se> From: Michael Orlitzky X-Enigmail-Draft-Status: N1110 Message-ID: <564CBB79.2030006@gentoo.org> Date: Wed, 18 Nov 2015 12:55:05 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: <20151118144849.32759.qmail@stuge.se> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: f86f0e05-7c2b-4393-a28b-4b912c4d8396 X-Archives-Hash: d33216733ee6f1419ceb188bde5d228d On 11/18/2015 09:48 AM, Peter Stuge wrote: > Peter Stuge wrote: >> Robin H. Johnson wrote: >>> However, the largest sticking point, even with parallel threads, is that >>> it seems the base ChangeLog generation is incredibly slow. It averages >>> above 350ms per package right now (at 19k packages in a full cycle, it's >>> a long time), but some packages can take up to 5 seconds so far. >> >> Which code is doing this generation? Sorry - ENOOVERVIEW. :\ > > Bump. Does anyone know where I can take a look at this code? > I don't know, but since no one else is answering, I'll try to find out. There are a few bugs on b.g.o. (search "changelog") that suggest `egencache --update-changelog` is being used. The egencache command is part of portage, so.... $ git clone http://anongit.gentoo.org/git/proj/portage.git Looking at bin/egencache, you'll find a bunch of indirection, but ultimately, the generate_changelog() method of the GenChangeLogs class is doing the work. The implementation is straightforward. I suspect the slow part is, # now grab all the commits revlist_cmd = ['git', self._work_tree, 'rev-list'] if self._changelog_reversed: revlist_cmd.append('--reverse') revlist_cmd.extend(['HEAD', '--', '.']) commits = self.grab(revlist_cmd).split() where @staticmethod def grab(cmd): p = subprocess.Popen(cmd, stdout=subprocess.PIPE) return _unicode_decode(p.communicate()[0], encoding=_encodings['stdio'], errors='strict') That's taking about half a second if I run it from the command-line.