From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 64A881395E2 for ; Sun, 6 Nov 2016 18:46:38 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 8FB7FE08E1; Sun, 6 Nov 2016 18:46:37 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 5E59EE08D2 for ; Sun, 6 Nov 2016 18:46:37 +0000 (UTC) Received: from [IPv6:2600:8802:607:6600:2e33:7aff:fef2:3005] (unknown [IPv6:2600:8802:607:6600:2e33:7aff:fef2:3005]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zmedico) by smtp.gentoo.org (Postfix) with ESMTPSA id 4F74A3418A2; Sun, 6 Nov 2016 18:46:36 +0000 (UTC) Subject: Re: [gentoo-portage-dev] [PATCH] sync: call git prune before shallow fetch (bug 599008) To: =?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= References: <1478378595-4269-1-git-send-email-zmedico@gentoo.org> <20161105225039.3ead02f2@pomiocik> <482f2a6d-f75c-cf38-a1b3-f952c1f2dc6b@gentoo.org> <20161105232236.587db57d@pomiocik> <845bf71c-87c9-2745-9f75-8d537ee89782@gentoo.org> <20161106105907.5440858d.mgorny@gentoo.org> Cc: gentoo-portage-dev@lists.gentoo.org From: Zac Medico Message-ID: Date: Sun, 6 Nov 2016 10:46:34 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: <20161106105907.5440858d.mgorny@gentoo.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Archives-Salt: cc09c4eb-643a-4a58-b0a8-fb6ecafda2b9 X-Archives-Hash: e326eb4bb5dc4cc8b06d55d675d369cb On 11/06/2016 01:59 AM, Michał Górny wrote: > On Sat, 5 Nov 2016 15:56:20 -0700 > Zac Medico wrote: > >> On 11/05/2016 03:22 PM, Michał Górny wrote: >>> On Sat, 5 Nov 2016 15:11:10 -0700 >>> Zac Medico wrote: >>> >>>> On 11/05/2016 02:50 PM, Michał Górny wrote: >>>>> On Sat, 5 Nov 2016 13:43:15 -0700 >>>>> Zac Medico wrote: >>>>> >>>>>> This is necessary in order to avoid "There are too many unreachable >>>>>> loose objects" warnings from automatic git gc calls. >>>>>> >>>>>> X-Gentoo-Bug: 599008 >>>>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008 >>>>>> --- >>>>>> pym/portage/sync/modules/git/git.py | 6 ++++++ >>>>>> 1 file changed, 6 insertions(+) >>>>>> >>>>>> diff --git a/pym/portage/sync/modules/git/git.py b/pym/portage/sync/modules/git/git.py >>>>>> index f288733..c90cf88 100644 >>>>>> --- a/pym/portage/sync/modules/git/git.py >>>>>> +++ b/pym/portage/sync/modules/git/git.py >>>>>> @@ -101,6 +101,12 @@ class GitSync(NewBase): >>>>>> writemsg_level(msg + "\n", level=logging.ERROR, noiselevel=-1) >>>>>> return (e.returncode, False) >>>>>> >>>>>> + # For shallow fetch, unreachable objects must be pruned >>>>>> + # manually, since otherwise automatic git gc calls will >>>>>> + # eventually warn about them (see bug 599008). >>>>>> + subprocess.call(['git', 'prune'], >>>>>> + cwd=portage._unicode_encode(self.repo.location)) >>>>>> + >>>>>> git_cmd_opts += " --depth %d" % self.repo.sync_depth >>>>>> git_cmd = "%s fetch %s%s" % (self.bin_command, >>>>>> remote_branch.partition('/')[0], git_cmd_opts) >>>>> >>>>> Does it have a performance impact? >>>> >>>> Yes, it takes about 20 seconds on my laptop. I suppose we could make >>>> this an optional thing, so that those people can do it manually if they >>>> want. >>> >>> So we have improvement from at most few seconds for normal 'git pull' >>> to around a minute for shallow pull? >> >> Well we've got a least 3 resources to consider: >> >> 1) network bandwidth >> 2) disk usage >> 3) sync time >> >> For me, sync time doesn't really matter that much, but I suppose it >> might for some people. > > For a common user, network bandwidth is not a problem with git (except > maybe for the huge initial clone). Especially when syncing frequently, > the gain from subsequent --depth=1 is negligible. When syncing rarely, > you probably prefer snapshots anyway. > > I doubt this could be of benefit even to dial-up users; that is, > that more time would be saved on fetching than lost on all the ops > needed to make things continue to work. The additional data won't > affect the data plan users much probably either. > > Especially that Gentoo is all about fetching distfiles that are huge > compared to the git updates for the repository. > > As for the disk usage, again, the difference should be negligible. > The major difference is done on initial fetch. Of course, regularly > pruning the repository will reduce its size. But then, pruning it will > non-shallow fetches would probably achieve a similar effect thanks to > delta compression. > > That leaves the sync time. Which is becoming worse than rsync. Maybe this will be a reasonable default: * add a separate clone-depth setting which defaults to 1 * set the default sync-depth setting to 0 (unlimited) If the user enables shallow fetch by setting sync-depth to something other than 0, they I think we should call whatever commands are necessary to keep the repository healthy (including `git prune` if necessary). -- Thanks, Zac