From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id CD4A71388C0 for ; Fri, 26 Feb 2016 11:01:20 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 6C1EF21C017; Fri, 26 Feb 2016 11:01:11 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 8CD3D21C004 for ; Fri, 26 Feb 2016 11:01:10 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1aZG8t-0007X0-8y for gentoo-dev@lists.gentoo.org; Fri, 26 Feb 2016 12:01:03 +0100 Received: from pc123.math.cas.cz ([147.231.88.123]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 26 Feb 2016 12:01:03 +0100 Received: from martin by pc123.math.cas.cz with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 26 Feb 2016 12:01:03 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: gentoo-dev@lists.gentoo.org From: Martin Vaeth Subject: [gentoo-dev] Re: Bug #565566: Why is it still not fixed? Date: Fri, 26 Feb 2016 11:00:56 +0000 (UTC) Message-ID: References: <56CC937C.3030805@gentoo.org> <56CCD4DC.3040509@gentoo.org> <56CCFE65.5050201@gentoo.org> <20160224211613.51a613ff@gentp.lnet> X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: pc123.math.cas.cz User-Agent: slrn/1.0.1 (Linux) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Archives-Salt: 71350cf4-b0f6-40ae-980b-0c0b4ca061e9 X-Archives-Hash: 0f8a16a0d34e98b5aa22ea7c0923370f Gordon Pettey wrote: >> >> Already now this means that you need 2 (or already 3?) times the >> disk space as for an rysnc mirror; multiply all numbers by 4 >> if you used squashfs to store the tree. [...] > > Or, in 2-3 years, maybe people will stop with the hyperbole Hyperbole? Really? Let's first look at the current data. Instead of guessing I now fetched the git tree to get the exact number: git on ext2 (8K blocks): 704 M squashfs with lz4: 120 M lz4 is the fastest algorithm, but not the best concerning space. More seriously: The git data is still missing metadata information which will add some more. It seems my estimate of the factor 2*4 = 8 for the current state was rather realistic. Not to forget that this was a fresh checkout where the .git data itself is fully compressed in one file (which is by default not the case when you update frequently - it depends on your git configuration and perhaps whether you use a cron job for recompression). So perhaps for some git users the bracket in my estimate (3*4=12) is already correct. Whether 1 GB of permanent disk space only for the overhead of package management is appropriate, everybody must decide by himself. Compared to other distributions, this is an awful lot. Only for getting ChangeLogs it is IMHO way too much. And currently the git history is still almost empty... Before I turn to the future, some remarks: > The tree is a bunch of text files, of which a whole lot of text is > repeated That's why squashfs is so effective already compared to plain rsync. Of course, a lot of the *current* factor comes from this. > which is great for compression, which git does. You seem to pretend that I ignored this, but I did not: >> (there is possibility of some >> compression of history, but OTOH, many packages are added and >> removed, eclasses keep changing, etc.) Of course, concerning future, one must make some assumptions. Perhaps it is reasonable to assume that roughly a constant amount of new data is added every year, i.e., the quotient (git data/squashfs) increases every year by a constant summand. Compression will not change this "constantness", but at most influence the summand itself. Quite the opposite, in the moment when the history evades a certain size - depending on the memory window size used by the gzip implementation of git, compression will eventually become much less effective: You can see the difference essentially in the gzip vs. xz compresssion size, because the main difference here is the size of the mentioned memory window. And as mentioned above, unless you are regularly recompressing (by a cron job or by git configuration after updating) you hardly profit from the git compression at all. How large the yearly summand is, can only be guessed, currently. I think my assumption that after 1 year the number of new/modified files is roughly the total amount of files in the tree is realistic, perhaps even too low. (Not to forget that also every commit adds data by itself.) So in 2-3 years, the factor (compared to squashfs) might be roughly: 8*2.5 = 20 without recompressing .git 8 + 2.5 = 10.5 with fully compresssed .git (The latter factor is unrealistically low, because git's gzip compression is less effective than lz4 and *much* less effective than xz). And even if I should have overestimated the yearly summand by the factor 2, you only need to double the number of years which you have to wait...