public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Gerion Entrup <gerion.entrup@flump.de>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Date: Wed, 05 Jul 2023 20:40:34 +0200	[thread overview]
Message-ID: <2243341.Icojqenx9y@falbala> (raw)
In-Reply-To: <ZKSmqiS6gVIRfTfR@dj3ntoo>

[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]

Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote:
> > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > > just to be curious about the whole discussion. I did not follow in the
> > > deepest detail but what I got is:
> > > - EGO_SUM blows up the Manifest file, since every little Go module needs
> > >   to be respected. A lot of these Manifest files lead to a extremely
> > >   increased Portage tree size. EGO_SUM is just one example (though the
> > >   biggest one). Statically linked languages like Rust etc. have the same
> > >   problem.
> > > - The current solution is to prepackage all modules, put it somewhere on
> > >   a webserver and just manifest that file. This make the Portage tree
> > >   small in size again, but requires a webserver/mirror and is thus
> > >   unfriendly for overlay devs.
> > > 
> > > I'm not sure if it was mentioned before but has anyone considered hash
> > > trees / Merkle trees for the manifest file? The idea would be to hash
> > > the standard manifest file a second time if it gets too big and write
> > > down that hash as new manifest file and leave EGO_SUM as is.
> > This is out-of-tree/indirect Manifests, that I proposed here, more than
> > a year ago:
> > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2
> > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2
> > 
> > Developing it requires PMS work in addition to package manager
> > development, because it introduces phases.
> > 
> > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > - primary validation of distfiles
> > - secondary fetch of $SRC_URI per indirect Manifest
> > - secondary validation of additional distfiles
> > 
> > A significantly impacted use case is "emerge -f", it now needs to run
> > downloads twice.
> > 
> 
> I'm not sure double downloading is required. Consider a flow similar to
> this:
> 
> 1. distfiles are fetched as per the ebuild
> 2. distfiles are hashed into a temporary Manifest
> 3. temporary Manifest is hashed and compared with the hashes stored in
>    the in-tree Manifest for the direct Manifest

This is exactly, what I meant. A webstorage is not needed. A second
download process is also not needed. Just an additional Manifest format
is needed for ebuilds with more than n distfiles.


> A new Manifest format would be required in order to differentiate the
> current ones from an indirect one. This may require PMS changes,
> although I suspect ammending GLEP 74 may be enough since the PMS seems
> to just refer to the GLEP for a description of Manifests.
> 
> This would also either rely on a stable ordering of Manifest contents
> when generating it or having a separate file listing in the indirect
> Manifest which corresponds to the order in the direct Manifest. For the
> latter, it should also have separate entries for different package
> versions so that every single distfile for every single version of said
> package does not need to be fetched in order to build the direct
> Manifest.
> 
> I'm imagining something along these lines:
>     
>     INDIRECT true
>     PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ...
>     PACKAGE ...

Maybe it is reasonable to skip the distfile names at all (or just
provide a hash value of the concatenated file names). Then the manifest
would just contain two/three hashes (for as many distfiles as the ebuild
needs). Since these kind of indirect Manifests should be more rare than
the normal ones, a slightly longer processing time does not have much
impact I would say.



> Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
> containing the distfiles (and potentially other files if a repo does not
> have thin-manifests enabled) and their hashes in the order specified
> previously.
> 
> The indirect Manifest as described above would be large-ish for a
> package that has lots of distfiles, but likely much smaller than if each
> distfile had its set of hashes stored directly.

Without storing the filenames, the Manifest file would have the same
small size for any amount of distfiles needed.

Gerion


> Please correct me if there's some detail I've overlooked.
> 
> - Oskari
> 
> > The rest of the posts also go into the matter of duplication within
> > EGO_SUM & the indirect Manifests: limiting the growth requires some form
> > of content-addressed layout.
> > 
> > It's absolutely something we should get developed, but it's a lot of
> > work.
> > 
> > The indirect Manifests still provide a hosting challenge for overlays.
> > 
> 
> 
> 


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

  reply	other threads:[~2023-07-05 18:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <2ZKWN4KF.MKEFFMWE.LGPKYP47@RTL7EJXF.RN4PF6UF.MDFBGF3C>
     [not found] ` <be450641-94ff-a0d9-51da-3a7a3abcc6c7@gentoo.org>
     [not found]   ` <b7309a3f-2980-b390-a16a-0518cce1da75@gentoo.org>
     [not found]     ` <87y1k33aoy.fsf@gentoo.org>
2023-06-30  8:15       ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Florian Schmaus
2023-06-30  8:22         ` Sam James
2023-06-30  9:38           ` Tim Harder
2023-06-30 11:33             ` Eray Aslan
2023-07-03 10:17               ` Florian Schmaus
2023-07-04  7:13                 ` Tim Harder
2023-07-04 10:44                   ` Gerion Entrup
2023-07-04 21:56                     ` Robin H. Johnson
2023-07-04 23:09                       ` Oskari Pirhonen
2023-07-05 18:40                         ` Gerion Entrup [this message]
2023-07-05 19:32                           ` Rich Freeman
2023-07-06  2:48                           ` Oskari Pirhonen
2023-07-06  6:09                   ` Zoltan Puskas
2023-07-06 19:46                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open Hank Leininger
2023-07-08 20:49                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
2023-07-03 10:17           ` Florian Schmaus
2023-07-03 11:12             ` [gentoo-dev] EGO_SUM Ulrich Mueller
2023-07-08 21:21             ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
     [not found]     ` <cdf5ddb7-8f65-74cf-5594-3e3eec86c915@gentoo.org>
     [not found]       ` <1913d3c2-5f54-acea-0ed3-930371ea1884@gentoo.org>
     [not found]         ` <CAAr7Pr9+zq2NV=7zhj5e+4LWOmNavCrfMstNTqkthk5uxQVNtg@mail.gmail.com>
2023-07-14  7:14           ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: " Florian Schmaus
2023-07-14  7:33             ` Sam James
2023-07-14  8:19               ` Sam James
2023-07-14  9:07               ` Florian Schmaus
2023-07-14  8:39             ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees Ulrich Mueller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2243341.Icojqenx9y@falbala \
    --to=gerion.entrup@flump.de \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox