From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id CA51815800D for ; Wed, 5 Jul 2023 18:40:45 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id A6A6CE0899; Wed, 5 Jul 2023 18:40:41 +0000 (UTC) Received: from mail.flump.de (flump.de [185.163.118.210]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 69AFCE0877 for ; Wed, 5 Jul 2023 18:40:41 +0000 (UTC) Received: from falbala.localnet (ip4d166edb.dynamic.kabel-deutschland.de [77.22.110.219]) by mail.flump.de (Postfix) with ESMTPSA id 3B91D8C0E5E for ; Wed, 5 Jul 2023 20:40:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=flump.de; s=mail; t=1688582440; bh=bc2Wf3Xi5Zofx/7daAIzfHoJ2Xvue21EXp/GwbBbjkY=; h=From:To:Subject:Date:In-Reply-To:References; b=5eSlEv/RoxLwqL0ZRrXVKuL11xsdTSQNzjqKKCNq4OujlMCeHRaRn6KEwzfoSsIFf Q1dC1PRLc/PYHeQBQpKSjHxdOw/RsJdhBk/1biEd9IOeE475aE/DcJr1ovyu7s6dFZ DS1tFyaSw2BQhsocpeSv1LBX5nzvqzFj6cJx3/MI= From: Gerion Entrup To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Date: Wed, 05 Jul 2023 20:40:34 +0200 Message-ID: <2243341.Icojqenx9y@falbala> In-Reply-To: References: <2ZKWN4KF.MKEFFMWE.LGPKYP47@RTL7EJXF.RN4PF6UF.MDFBGF3C> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart112658949.nniJfEyVGO"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-Archives-Salt: 5529b672-ba44-444e-adc2-9c5d010224ba X-Archives-Hash: 28f94b2957c01034c6d9f259b7e7b70e --nextPart112658949.nniJfEyVGO Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii"; protected-headers="v1" From: Gerion Entrup To: gentoo-dev@lists.gentoo.org Date: Wed, 05 Jul 2023 20:40:34 +0200 Message-ID: <2243341.Icojqenx9y@falbala> In-Reply-To: MIME-Version: 1.0 Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote: > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > just to be curious about the whole discussion. I did not follow in the > > > deepest detail but what I got is: > > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > > to be respected. A lot of these Manifest files lead to a extremely > > > increased Portage tree size. EGO_SUM is just one example (though the > > > biggest one). Statically linked languages like Rust etc. have the same > > > problem. > > > - The current solution is to prepackage all modules, put it somewhere on > > > a webserver and just manifest that file. This make the Portage tree > > > small in size again, but requires a webserver/mirror and is thus > > > unfriendly for overlay devs. > > > > > > I'm not sure if it was mentioned before but has anyone considered hash > > > trees / Merkle trees for the manifest file? The idea would be to hash > > > the standard manifest file a second time if it gets too big and write > > > down that hash as new manifest file and leave EGO_SUM as is. > > This is out-of-tree/indirect Manifests, that I proposed here, more than > > a year ago: > > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2 > > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2 > > > > Developing it requires PMS work in addition to package manager > > development, because it introduces phases. > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > - primary validation of distfiles > > - secondary fetch of $SRC_URI per indirect Manifest > > - secondary validation of additional distfiles > > > > A significantly impacted use case is "emerge -f", it now needs to run > > downloads twice. > > > > I'm not sure double downloading is required. Consider a flow similar to > this: > > 1. distfiles are fetched as per the ebuild > 2. distfiles are hashed into a temporary Manifest > 3. temporary Manifest is hashed and compared with the hashes stored in > the in-tree Manifest for the direct Manifest This is exactly, what I meant. A webstorage is not needed. A second download process is also not needed. Just an additional Manifest format is needed for ebuilds with more than n distfiles. > A new Manifest format would be required in order to differentiate the > current ones from an indirect one. This may require PMS changes, > although I suspect ammending GLEP 74 may be enough since the PMS seems > to just refer to the GLEP for a description of Manifests. > > This would also either rely on a stable ordering of Manifest contents > when generating it or having a separate file listing in the indirect > Manifest which corresponds to the order in the direct Manifest. For the > latter, it should also have separate entries for different package > versions so that every single distfile for every single version of said > package does not need to be fetched in order to build the direct > Manifest. > > I'm imagining something along these lines: > > INDIRECT true > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ... > PACKAGE ... Maybe it is reasonable to skip the distfile names at all (or just provide a hash value of the concatenated file names). Then the manifest would just contain two/three hashes (for as many distfiles as the ebuild needs). Since these kind of indirect Manifests should be more rare than the normal ones, a slightly longer processing time does not have much impact I would say. > Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest > containing the distfiles (and potentially other files if a repo does not > have thin-manifests enabled) and their hashes in the order specified > previously. > > The indirect Manifest as described above would be large-ish for a > package that has lots of distfiles, but likely much smaller than if each > distfile had its set of hashes stored directly. Without storing the filenames, the Manifest file would have the same small size for any amount of distfiles needed. Gerion > Please correct me if there's some detail I've overlooked. > > - Oskari > > > The rest of the posts also go into the matter of duplication within > > EGO_SUM & the indirect Manifests: limiting the growth requires some form > > of content-addressed layout. > > > > It's absolutely something we should get developed, but it's a lot of > > work. > > > > The indirect Manifests still provide a hosting challenge for overlays. > > > > > --nextPart112658949.nniJfEyVGO Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQGzBAABCAAdFiEEM/tVN9WpYYHnPZHxloeAdSYJHeoFAmSluSIACgkQloeAdSYJ HepJlQv9G2NveYTWafK/Qxbhdd4Z7/jRhwW7vqt/A4R/hXqlihQLgIWygwwgWDXM iE2IZLHWUOr6e7Cl2OZ6+pz4bQn03wmgMtJQsh9ML+EPb1Gb4et3Wumve0i5uRYQ sRpsV9RK9WIzMXkKCKm2P9q7aEdclL2ZVrxnBtr1DLR94lIq/yrnuboLAV9XubkD o9JEHOC/s8Lv7+9hXAjl5MvSIWcYOGG55W3drr2FTtGbwoDjGDIueWP/UhPVuct2 mL4VkmLjv8/BaG5Au65bduNq4Gh9BfA9xtluG1KJizEYvXkJ7V1HpTTHIEjVDOsu rFQhtEd8KPj9HBYSl2FwhwyE30l3O5uaTU1EtmJzXpH7MQFWhhyySCTlxypz7bH5 Ts/EZKZNppnEgUxj2ri2AX1u5D94qzh26uMrPFIuQ6Otfc0GpIHoelwy/f251Oa0 ImX9dJ1Rp+lLfyTx+ni/0XffPFfUYCtR1ksglNXrG8PkbTxgN6yHVOvOKfgJIQ/z qZeIsRb4 =8pNd -----END PGP SIGNATURE----- --nextPart112658949.nniJfEyVGO--