Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote: > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > just to be curious about the whole discussion. I did not follow in the > > > deepest detail but what I got is: > > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > > to be respected. A lot of these Manifest files lead to a extremely > > > increased Portage tree size. EGO_SUM is just one example (though the > > > biggest one). Statically linked languages like Rust etc. have the same > > > problem. > > > - The current solution is to prepackage all modules, put it somewhere on > > > a webserver and just manifest that file. This make the Portage tree > > > small in size again, but requires a webserver/mirror and is thus > > > unfriendly for overlay devs. > > > > > > I'm not sure if it was mentioned before but has anyone considered hash > > > trees / Merkle trees for the manifest file? The idea would be to hash > > > the standard manifest file a second time if it gets too big and write > > > down that hash as new manifest file and leave EGO_SUM as is. > > This is out-of-tree/indirect Manifests, that I proposed here, more than > > a year ago: > > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2 > > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2 > > > > Developing it requires PMS work in addition to package manager > > development, because it introduces phases. > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > - primary validation of distfiles > > - secondary fetch of $SRC_URI per indirect Manifest > > - secondary validation of additional distfiles > > > > A significantly impacted use case is "emerge -f", it now needs to run > > downloads twice. > > > > I'm not sure double downloading is required. Consider a flow similar to > this: > > 1. distfiles are fetched as per the ebuild > 2. distfiles are hashed into a temporary Manifest > 3. temporary Manifest is hashed and compared with the hashes stored in > the in-tree Manifest for the direct Manifest This is exactly, what I meant. A webstorage is not needed. A second download process is also not needed. Just an additional Manifest format is needed for ebuilds with more than n distfiles. > A new Manifest format would be required in order to differentiate the > current ones from an indirect one. This may require PMS changes, > although I suspect ammending GLEP 74 may be enough since the PMS seems > to just refer to the GLEP for a description of Manifests. > > This would also either rely on a stable ordering of Manifest contents > when generating it or having a separate file listing in the indirect > Manifest which corresponds to the order in the direct Manifest. For the > latter, it should also have separate entries for different package > versions so that every single distfile for every single version of said > package does not need to be fetched in order to build the direct > Manifest. > > I'm imagining something along these lines: > > INDIRECT true > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ... > PACKAGE ... Maybe it is reasonable to skip the distfile names at all (or just provide a hash value of the concatenated file names). Then the manifest would just contain two/three hashes (for as many distfiles as the ebuild needs). Since these kind of indirect Manifests should be more rare than the normal ones, a slightly longer processing time does not have much impact I would say. > Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest > containing the distfiles (and potentially other files if a repo does not > have thin-manifests enabled) and their hashes in the order specified > previously. > > The indirect Manifest as described above would be large-ish for a > package that has lots of distfiles, but likely much smaller than if each > distfile had its set of hashes stored directly. Without storing the filenames, the Manifest file would have the same small size for any amount of distfiles needed. Gerion > Please correct me if there's some detail I've overlooked. > > - Oskari > > > The rest of the posts also go into the matter of duplication within > > EGO_SUM & the indirect Manifests: limiting the growth requires some form > > of content-addressed layout. > > > > It's absolutely something we should get developed, but it's a lot of > > work. > > > > The indirect Manifests still provide a hosting challenge for overlays. > > > > >