From: Florian Schmaus <flow@gentoo.org>
To: gentoo-dev@lists.gentoo.org, "Michał Górny" <mgorny@gentoo.org>
Cc: William Hubbs <williamh@gentoo.org>
Subject: Re: [gentoo-dev] Re: EGO_SUM
Date: Mon, 22 May 2023 09:14:11 +0200 [thread overview]
Message-ID: <6ed0f286-f9eb-9e93-4fec-296646f79871@gentoo.org> (raw)
In-Reply-To: <65bac7eb93f9b9ecd95f1fb38892e914edb879f5.camel@gentoo.org>
[-- Attachment #1.1.1: Type: text/plain, Size: 4565 bytes --]
On 08/05/2023 14.03, Michał Górny wrote:
> On Mon, 2023-05-08 at 09:53 +0200, Florian Schmaus wrote:
>> Furthermore, both numbers, 256 MiB and 410 MiB, are based on the
>> over-approximation that every EGO_SUM package uses 1.6 MiB, which is
>> almost certainly not the case. The mean package-directory size of a
>> EGO_SUM using package at 2022-02-16 was 280 KiB.
>
> Please extend this analysis to Manifest changes over time, and how they
> are going to impact total gentoo.git size.
Gladly.
The average daily change caused by Manifests of EGO_SUM packages from
2020-02-16 to 2022-02-16 was at most 80 KiB. (See below for the
methodology used to obtain this number.)
In other words, a daily syncing user had at most 80 KiB traffic on
average per day to sync the Manifests of all EGO_SUM that existed on
2022-02-16.
Even in lesser developed regions of the world, 80 KiB a day are
manageable. And, this would still be the case if we double, quadruple or
octuple this number.
I note that this number does not include ebuilds and metadata. However,
one can easily over-approximate that the additional ebuilds and metadata
delta, that comes with the observed Manifest changes, is smaller than
the Manifest changes themselves. Therefore, a pessimistic approximation
is twice 80 KiB.
But then again, the 80 KiB are not considering transport compression.
And, as we have learned, Manifests roughly compress to 50% of their
original size. So the average EGO_SUM-generated network traffic,
assuming that it is compressed, remains in the region of hundred
kilobytes per day.
We can also use this number to over-approximate the growth rate of
gentoo.git due to EGO_SUM.
Assume that 120 EGO_SUM packages cause a daily growth rate of 160 KiB,
that is 2x 80 KiB and the number we have used above. Doubling this
number would yield the estimated rate of the current number of Go
packages in ::gentoo. This rate amounts to 320 KiB daily, increasing
gentoo.git by 114 MiB per year. Please double this number for a bit of
future safety.
In summary, this and the previous analysis finds not data-size-based
arguments against EGO_SUM's usage.
Using EGO_SUM is fine for users and developers. The ::gentoo increase,
even if it would quadruple the current size, does not entail any issues.
The expected average daily delta that EGO_SUM would cause today is also
no threat, even for users with low-bandwidth connections. The size
increase which EGO_SUM causes to gentoo.git is also within manageable
bounds. If an ebuild developer has 1-2 gigabytes free on their disk,
they will not need to buy a larger disk in the coming years if we start
using EGO_SUM again in ::gentoo.
- Flow
# Appendix: Methodology
We took gentoo.git at 2022-02-16 at the commit 60dc7a03ff2f. From there,
we created the numstat log (git log --numstat) of each Manifest of every
EGO_SUM package. We configured the numstat log to go back at most two
years in time, that is, till 2020-02-16. The numstat log contains the
changed lines (added/removed) of the Manifest in the target period. An
awk script calculated the total sum of added and removed lines. Note
that this treats removed lines equal to added lines, even though the
removed lines should cause significantly less network traffic. We also
extracted the date of the oldest commit in the observed period. This
date was used to calculate the total number of days in the period, which
accounts for packages that came to life after 2020-02-16 and would
otherwise skew the analysis towards smaller results.
Dividing the total number of changed lines by the number of days yields
the average number of lines changed per day per package.
We further determined the worst-observed line length of EGO_SUM packages
manifests, which was 404 bytes.
Summarizing the average number of lines changed over all packages
yielded 195.58093724672614. Multiplying this number by the maximal
observed line length of 404 bytes gives 79014.69 bytes per day or, in
other words, roughly 80 KiB per day.
The raw and post-processed results of this analysis are available at
https://dev.gentoo.org/~flow/gentoo-tree-analysis-results/2023-05-17T100838-gentoo-at-2022-02-16-60dc7a03ff2f/
The code used to carry out this analysis is available at
https://gitlab.gentoo.org/flow/gentoo-tree-analysis
for everyone to study the code, reproduce the results, and check for
issues and bugs.
As always, I appreciate any feedback.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]
next prev parent reply other threads:[~2023-05-22 7:14 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-17 7:37 [gentoo-dev] EGO_SUM Florian Schmaus
2023-04-17 9:28 ` [gentoo-dev] EGO_SUM Anna (cybertailor) Vyalkova
2023-04-27 18:00 ` William Hubbs
2023-04-27 18:18 ` David Seifert
2023-04-24 16:11 ` Florian Schmaus
2023-04-24 20:28 ` Sam James
2023-04-24 22:52 ` Alexey Zapparov
2023-04-26 15:31 ` Florian Schmaus
2023-04-26 16:12 ` Matt Turner
2023-04-26 19:31 ` Andrew Ammerlaan
2023-04-26 19:38 ` Chris Pritchard
2023-04-26 20:47 ` Matt Turner
2023-04-27 7:58 ` Florian Schmaus
2023-04-27 9:24 ` Ulrich Mueller
2023-04-28 6:59 ` Florian Schmaus
2023-04-27 12:54 ` Michał Górny
2023-04-27 23:12 ` Pascal Jäger
2023-04-28 0:38 ` Sam James
2023-04-28 4:27 ` Michał Górny
2023-04-28 5:31 ` Sam James
2023-04-28 6:59 ` Florian Schmaus
2023-04-28 14:34 ` Michał Górny
2023-05-02 19:32 ` Florian Schmaus
2023-05-02 19:38 ` Sam James
2023-04-29 22:34 ` Robin H. Johnson
2023-04-27 21:16 ` Sam James
2023-05-02 19:32 ` Florian Schmaus
2023-05-02 19:45 ` Sam James
2023-05-08 7:53 ` Florian Schmaus
2023-05-08 12:03 ` Michał Górny
2023-05-22 7:14 ` Florian Schmaus [this message]
2023-05-02 20:04 ` Matt Turner
2023-05-08 7:53 ` Florian Schmaus
2023-04-26 20:51 ` Sam James
2023-05-30 15:52 ` Florian Schmaus
2023-05-30 16:30 ` Anna (cybertailor) Vyalkova
2023-05-31 5:02 ` Oskari Pirhonen
2023-05-30 16:35 ` Arthur Zamarin
2023-05-31 6:20 ` Andrew Ammerlaan
2023-05-31 8:40 ` Ryan Qian
2023-05-31 9:06 ` Arsen Arsenović
2023-05-31 6:30 ` pascal.jaeger leimstift.de
2023-06-01 4:00 ` William Hubbs
2023-06-02 8:17 ` Florian Schmaus
2023-06-02 8:31 ` Michał Górny
2023-06-09 10:07 ` Florian Schmaus
2023-06-01 19:55 ` [gentoo-dev] EGO_SUM William Hubbs
2023-06-02 7:13 ` Joonas Niilola
2023-06-02 18:06 ` William Hubbs
2023-06-02 18:42 ` Joonas Niilola
2023-06-09 10:07 ` Florian Schmaus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ed0f286-f9eb-9e93-4fec-296646f79871@gentoo.org \
--to=flow@gentoo.org \
--cc=gentoo-dev@lists.gentoo.org \
--cc=mgorny@gentoo.org \
--cc=williamh@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox