From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 46E9C15800D for ; Thu, 6 Jul 2023 02:48:34 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 4DF01E0880; Thu, 6 Jul 2023 02:48:28 +0000 (UTC) Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 26DB1E086F for ; Thu, 6 Jul 2023 02:48:28 +0000 (UTC) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-577ddda6ab1so3242317b3.0 for ; Wed, 05 Jul 2023 19:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688611707; x=1691203707; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=9NicwuOdiVkI2UnGOHcAyO5gUIZJww2NDBti9Aih/ZQ=; b=Oj60XTMuvreaovp/Tkpj2FDnAmH06VZ1S9AGDUibchgZPgBHnGuQB/8l03IwADhcJP 6CX1pbpTCLQ/+xkGSBaw8G6U94r6fkhpOxMy9bKZIJuWVNnzmx/j8i48vIEsD9TvFGI0 QBkWArkGiZ3uuze4dyVC7gOadzLWzePFq9ozBJ02rMSEiNaM4+dLxDfxV/lRxrsi8l2K uOIQCy5l94lr6ZXWWTYY+VebB2HmnVuCSp7rlEkd78AQnMZlVvUDIIVlncSOF2BfP16A 9efAw7E51I4k8CkXQ3pf7hO4RTWUuyUKJiUf7MU7LVXOavFu6ZVgFnlVINqfnICFxkVZ zFiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688611707; x=1691203707; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9NicwuOdiVkI2UnGOHcAyO5gUIZJww2NDBti9Aih/ZQ=; b=bZahKELbaROv304mHqJc8whqtp+8NZp+6k87npuLRjRGd4N3A2oBMQ7+hsQY1KuICJ XuWJde4AhW+03eZbyUkigq1AYFeFARxfl0N0YXZdrYiBP1dbdwic3aOZDmad0bpv8lra oJ1RdGkVIfACnABjb30pwLrsda2qef/++kJzKo6gYwuHGrVVmDj+QL/tWUenteBEWb+S ll0cOPkyFyhHXz9V00u+t0EgHILmEhnBH19K9aqzyeji6+kRSy9q5oT95FLE8R4n0RhU KcBFoP/OQ1BCwComgP+BS/Z0yi1+GYZo7/GOB/Rfs0yN9kk16/wyPFsgPgucd0BvjHvn QAvg== X-Gm-Message-State: ABy/qLZkyrNu52Q3zBDTsVhHiAgSkTUceiG75UtHpuvZOTsM32NvO2uL ANdxJztamS1HdH/GQiKb+SGxpDE/12Y= X-Google-Smtp-Source: APBJJlGv02aKWBUZsHS6t9znqHHDuWcyiqtruieQoCkhg19jkVUV4J7Z3qZwGjbiWOoPjJnj4FvVBQ== X-Received: by 2002:a81:9151:0:b0:578:9427:4aee with SMTP id i78-20020a819151000000b0057894274aeemr640076ywg.16.1688611707119; Wed, 05 Jul 2023 19:48:27 -0700 (PDT) Received: from dj3ntoo ([2607:fb90:8a49:ccd:ce31:cf2b:76a5:c1b6]) by smtp.gmail.com with ESMTPSA id g78-20020a0ddd51000000b0056d51c39c1fsm80994ywe.23.2023.07.05.19.48.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jul 2023 19:48:26 -0700 (PDT) Date: Wed, 5 Jul 2023 21:48:24 -0500 From: Oskari Pirhonen To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Message-ID: Mail-Followup-To: gentoo-dev@lists.gentoo.org References: <2ZKWN4KF.MKEFFMWE.LGPKYP47@RTL7EJXF.RN4PF6UF.MDFBGF3C> <2243341.Icojqenx9y@falbala> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="a1Tk0FISQTx+tStd" Content-Disposition: inline In-Reply-To: <2243341.Icojqenx9y@falbala> X-Archives-Salt: 0bcfe7b7-86ea-4b0b-a323-569143cf9e7f X-Archives-Hash: 32d546db30a828b8a347b247c6ed4ba0 --a1Tk0FISQTx+tStd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 05, 2023 at 20:40:34 +0200, Gerion Entrup wrote: > Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > > On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote: > > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > > just to be curious about the whole discussion. I did not follow in = the > > > > deepest detail but what I got is: > > > > - EGO_SUM blows up the Manifest file, since every little Go module = needs > > > > to be respected. A lot of these Manifest files lead to a extremely > > > > increased Portage tree size. EGO_SUM is just one example (though = the > > > > biggest one). Statically linked languages like Rust etc. have the= same > > > > problem. > > > > - The current solution is to prepackage all modules, put it somewhe= re on > > > > a webserver and just manifest that file. This make the Portage tr= ee > > > > small in size again, but requires a webserver/mirror and is thus > > > > unfriendly for overlay devs. > > > >=20 > > > > I'm not sure if it was mentioned before but has anyone considered h= ash > > > > trees / Merkle trees for the manifest file? The idea would be to ha= sh > > > > the standard manifest file a second time if it gets too big and wri= te > > > > down that hash as new manifest file and leave EGO_SUM as is. > > > This is out-of-tree/indirect Manifests, that I proposed here, more th= an > > > a year ago: > > > https://marc.info/?l=3Dgentoo-dev&m=3D168280762310716&w=3D2 > > > https://marc.info/?l=3Dgentoo-dev&m=3D165472088822215&w=3D2 > > >=20 > > > Developing it requires PMS work in addition to package manager > > > development, because it introduces phases. > > >=20 > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > > - primary validation of distfiles > > > - secondary fetch of $SRC_URI per indirect Manifest > > > - secondary validation of additional distfiles > > >=20 > > > A significantly impacted use case is "emerge -f", it now needs to run > > > downloads twice. > > >=20 > >=20 > > I'm not sure double downloading is required. Consider a flow similar to > > this: > >=20 > > 1. distfiles are fetched as per the ebuild > > 2. distfiles are hashed into a temporary Manifest > > 3. temporary Manifest is hashed and compared with the hashes stored in > > the in-tree Manifest for the direct Manifest >=20 > This is exactly, what I meant. A webstorage is not needed. A second > download process is also not needed. Just an additional Manifest format > is needed for ebuilds with more than n distfiles. >=20 >=20 > > A new Manifest format would be required in order to differentiate the > > current ones from an indirect one. This may require PMS changes, > > although I suspect ammending GLEP 74 may be enough since the PMS seems > > to just refer to the GLEP for a description of Manifests. > >=20 > > This would also either rely on a stable ordering of Manifest contents > > when generating it or having a separate file listing in the indirect > > Manifest which corresponds to the order in the direct Manifest. For the > > latter, it should also have separate entries for different package > > versions so that every single distfile for every single version of said > > package does not need to be fetched in order to build the direct > > Manifest. > >=20 > > I'm imagining something along these lines: > > =20 > > INDIRECT true > > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash= 1 ALGO2 hash2 ... > > PACKAGE ... >=20 > Maybe it is reasonable to skip the distfile names at all (or just > provide a hash value of the concatenated file names). Then the manifest > would just contain two/three hashes (for as many distfiles as the ebuild > needs). Since these kind of indirect Manifests should be more rare than > the normal ones, a slightly longer processing time does not have much > impact I would say. >=20 My reasoning behind having the list of files is so that the intermediat/direct Manifest can be accurately recreated. Consider the following (not-so-)hypothetical Manifest: =20 DIST dist.tar.gz 84703 BLAKE2B ... SHA512 ... DIST dist.tar.gz.asc 228 BLAKE2B ... SHA512 ... EBUILD package-r1.ebuild 1535 BLAKE2B ... SHA512 ... EBUILD package.ebuild 1536 BLAKE2B ... SHA512 ... MISC metadata.xml 959 BLAKE2B ... SHA512 ... It is "well behaved" because pkgdev created it. My main concern is if $OTHER_TOOLING generates the Manifest in a different order which would mean the Manifest may be correct, but you get a false negative since the hashes don't match what is in the in-tree indirect Manifest. Having the order specified in the indirect Manifest renders this moot because $OTHER_TOOLING would have to respect this in order to correctly handle indirect Manifests. Additionally, in repos without thin-manifests, the SRC_URI is not enough to build up the Manifest. This may or may not be an issue depending on if a repo's metadata/layout.conf is parsed as part of the Manifest verification process. >=20 >=20 > > Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest > > containing the distfiles (and potentially other files if a repo does not > > have thin-manifests enabled) and their hashes in the order specified > > previously. > >=20 > > The indirect Manifest as described above would be large-ish for a > > package that has lots of distfiles, but likely much smaller than if each > > distfile had its set of hashes stored directly. >=20 > Without storing the filenames, the Manifest file would have the same > small size for any amount of distfiles needed. >=20 Assuming layout.conf is parsed when the Manifest is verified (thus handling the thick Maniffest case), the file list can be omitted if GLEP 74 is ammended to specify an ordering on the entries. Side note: Portage itself does not seem to care about the ordering. I tested this by copying a package tree, moving some entries around, and running `ebuild /path/to/ebuild clean unpack`. - Oskari --a1Tk0FISQTx+tStd Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQfOU+JeXjo4uxN6vCp8he9GGIfEQUCZKYrcwAKCRCp8he9GGIf EZEsAQDo+V7V3Galcv+PxjhJ7aIgPbBtBZYVwTsVYJP/x2jXxgD+O4vd+A1Uuq1/ PoF+PXqaMzRRe6PydHmOBq343i3Iiwk= =QHZ3 -----END PGP SIGNATURE----- --a1Tk0FISQTx+tStd--