public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] EGO_SUM
@ 2023-04-17  7:37 Florian Schmaus
  2023-04-17  9:28 ` [gentoo-dev] EGO_SUM Anna (cybertailor) Vyalkova
                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-04-17  7:37 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1.1: Type: text/plain, Size: 2054 bytes --]

I want to continue the discussion to re-instate EGO_SUM, potentially 
leading to a democratic vote on whether EGO_SUM should be re-instated or 
deprecated.

For the past months, I tried to find *technical reasons*, e.g., reasons 
that affect end-users, that justify the deprecation of EGO_SUM. However, 
I was unable to find any. The closest thing I could find was portage 
being unable to process an ebuild due to its large environment (bug 
830187). However, as this happens while developing an ebuild, it should 
never affect users. Obviously this is a situation where EGO_SUM can not 
be used. Fortunately, it does not affect most Go packages, as seen in my 
previous analysis of Go packages in ::gentoo and their EGO_SUM size. 
Furthermore, newer portage versions, with USE=gentoo-dev, will 
proactively warn you if the environment caused by the ebuild becomes large.

All further arguments for the deprecation of EGO_SUM where of cosmetic 
nature.

However, the deprecation of EGO_SUM is harmful to Gentoo and its users. 
To briefly re-iterate the reasons:

The EGO_SUM alternatives
- do not have the same level of trust and therefore have a negative 
impact on security (a dubious tarball someone put somewhere, especially 
when proxy-maint)
- are not easily verifiable
- require additional effort when developing ebuilds
- hinder the packaging and Gentoo's adoption of Go-based projects, which 
is worrisome as Go is very popular
- prevent Go modules from being shared as DISTFILES on the mirrors 
across various packages

Last but not least, we have the same situation in the Rust ecosystem, 
but we allow the EGO_SUM "equivalent" there.

So with portage checking the environment of ebuilds and warning if it 
becomes too large, and with the arguments above, I do not see any reason 
we should outlaw EGO_SUM.

- Flow

Previous discussions:
https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
https://archives.gentoo.org/gentoo-dev/message/d78af7f168cef24bfa302f7f75c3ef11

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-17  7:37 [gentoo-dev] EGO_SUM Florian Schmaus
@ 2023-04-17  9:28 ` Anna (cybertailor) Vyalkova
  2023-04-27 18:00   ` William Hubbs
  2023-04-24 16:11 ` Florian Schmaus
  2023-06-01 19:55 ` [gentoo-dev] EGO_SUM William Hubbs
  2 siblings, 1 reply; 52+ messages in thread
From: Anna (cybertailor) Vyalkova @ 2023-04-17  9:28 UTC (permalink / raw
  To: gentoo-dev

On 2023-04-17 09:37, Florian Schmaus wrote:
> The EGO_SUM alternatives
> - do not have the same level of trust and therefore have a negative 
> impact on security (a dubious tarball someone put somewhere, especially 
> when proxy-maint)

Solution: generate release tarballs in upstream CI/CD.

> - are not easily verifiable

`go mod verify` (called by eclass) does part of the job.

> - require additional effort when developing ebuilds

Generating EGO_SUM needs effort on every bump too.

> - hinder the packaging and Gentoo's adoption of Go-based projects, which 
> is worrisome as Go is very popular

Go's approach to package management is the prime cause after all.
Downstream can only choose what workaround to apply.

> - prevent Go modules from being shared as DISTFILES on the mirrors 
> across various packages

Go modules often use pinned commits, so only a small share is reused.
 
> Last but not least, we have the same situation in the Rust ecosystem, 
> but we allow the EGO_SUM "equivalent" there.

Rust crates are not such a disaster as Go modules.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-17  7:37 [gentoo-dev] EGO_SUM Florian Schmaus
  2023-04-17  9:28 ` [gentoo-dev] EGO_SUM Anna (cybertailor) Vyalkova
@ 2023-04-24 16:11 ` Florian Schmaus
  2023-04-24 20:28   ` Sam James
  2023-05-30 15:52   ` Florian Schmaus
  2023-06-01 19:55 ` [gentoo-dev] EGO_SUM William Hubbs
  2 siblings, 2 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-04-24 16:11 UTC (permalink / raw
  To: council; +Cc: gentoo-dev

I like to ask the Gentoo council to vote on whether EGO_SUM should be 
reinstated ("un-deprecated") or not.

EGO_SUM is a project-comprehensive matter, as it affects not only 
Go-lang packaging but also the proxy-maint and GURU projects. 
Furthermore, as I have mentioned in my previous emails, the deprecation 
of EGO_SUM has a significant negative impact on our users and is, 
therefore, a global Gentoo issue.

Asking for council involvement should be a last resort and only be done 
in essential conflicts. But, unfortunately, I was unable to convince the 
relevant maintainer with arguments that the deprecation of EGO_SUM is 
harmful. And this matter is significant enough to proceed with this.

Most voices on the related mailing-list threads expressed support for 
reinstating EGO_SUM. At least, that is my impression. While the 
arguments used to deprecate EGO_SUM were mostly of esthetic nature.

I want to state what should be common sense. Namely, asking for a 
democratic vote is not a personal attack against any involved person. I 
contacted the council because of a design dispute about what is best for 
Gentoo and its users. Furthermore, I tried to create the best 
preconditions to reinstate EGO_SUM by working on portage. For example, 
making portage emit a warning if an ebuild process environment grows 
unreasonably large. And after advocating for reinstating EGO_SUM for a 
long time, I seek closure with that request to the council.

- Flow


On 17/04/2023 09.37, Florian Schmaus wrote:
> I want to continue the discussion to re-instate EGO_SUM, potentially 
> leading to a democratic vote on whether EGO_SUM should be re-instated or 
> deprecated.
> 
> For the past months, I tried to find *technical reasons*, e.g., reasons 
> that affect end-users, that justify the deprecation of EGO_SUM. However, 
> I was unable to find any. The closest thing I could find was portage 
> being unable to process an ebuild due to its large environment (bug 
> 830187). However, as this happens while developing an ebuild, it should 
> never affect users. Obviously this is a situation where EGO_SUM can not 
> be used. Fortunately, it does not affect most Go packages, as seen in my 
> previous analysis of Go packages in ::gentoo and their EGO_SUM size. 
> Furthermore, newer portage versions, with USE=gentoo-dev, will 
> proactively warn you if the environment caused by the ebuild becomes large.
> 
> All further arguments for the deprecation of EGO_SUM where of cosmetic 
> nature.
> 
> However, the deprecation of EGO_SUM is harmful to Gentoo and its users. 
> To briefly re-iterate the reasons:
> 
> The EGO_SUM alternatives
> - do not have the same level of trust and therefore have a negative 
> impact on security (a dubious tarball someone put somewhere, especially 
> when proxy-maint)
> - are not easily verifiable
> - require additional effort when developing ebuilds
> - hinder the packaging and Gentoo's adoption of Go-based projects, which 
> is worrisome as Go is very popular
> - prevent Go modules from being shared as DISTFILES on the mirrors 
> across various packages
> 
> Last but not least, we have the same situation in the Rust ecosystem, 
> but we allow the EGO_SUM "equivalent" there.
> 
> So with portage checking the environment of ebuilds and warning if it 
> becomes too large, and with the arguments above, I do not see any reason 
> we should outlaw EGO_SUM.
> 
> - Flow
> 
> Previous discussions:
> https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
> https://archives.gentoo.org/gentoo-dev/message/d78af7f168cef24bfa302f7f75c3ef11



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-24 16:11 ` Florian Schmaus
@ 2023-04-24 20:28   ` Sam James
  2023-04-24 22:52     ` Alexey Zapparov
  2023-04-26 15:31     ` Florian Schmaus
  2023-05-30 15:52   ` Florian Schmaus
  1 sibling, 2 replies; 52+ messages in thread
From: Sam James @ 2023-04-24 20:28 UTC (permalink / raw
  To: Florian Schmaus; +Cc: council, gentoo-dev, William Hubbs

[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

[CCing williamh@ as go-module.eclass & dev-lang/go maintainer.]

> I like to ask the Gentoo council to vote on whether EGO_SUM should be
> reinstated ("un-deprecated") or not.
>
> EGO_SUM is a project-comprehensive matter, as it affects not only
> Go-lang packaging but also the proxy-maint and GURU
> projects. Furthermore, as I have mentioned in my previous emails, the
> deprecation of EGO_SUM has a significant negative impact on our users
> and is, therefore, a global Gentoo issue.
>
> Asking for council involvement should be a last resort and only be
> done in essential conflicts. But, unfortunately, I was unable to
> convince the relevant maintainer with arguments that the deprecation
> of EGO_SUM is harmful. And this matter is significant enough to
> proceed with this.

My feeling on this is that this proposal isn't yet complete enough
for the council to assess. In the various previous discussions, the need
for _some_ limit to be implemented (derived from EGO_SUM) was clear from
the QA team and others.

Voting on the matter now would be reopening the issue which led EGO_SUM
to be deprecated in the first place, with only a partial mitigation
(the Portage warning).

Any such limit should be supported by pkgcheck, allow using EGO_SUM
for most packages, but exclude the pathological cases which we're
unlikely to want in ::gentoo.

(Limit-per-ebuild rather than per-package is one option of many,
too.)

>
> Most voices on the related mailing-list threads expressed support for
> reinstating EGO_SUM. At least, that is my impression. While the
> arguments used to deprecate EGO_SUM were mostly of esthetic nature.
>
> I want to state what should be common sense. Namely, asking for a
> democratic vote is not a personal attack against any involved
> person.
> [...]

I agree this is an important issue that affects the practicality
of using Gentoo for some, and for contributing to Gentoo to others.

>
> On 17/04/2023 09.37, Florian Schmaus wrote:
>> [original msg snipped]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-24 20:28   ` Sam James
@ 2023-04-24 22:52     ` Alexey Zapparov
  2023-04-26 15:31     ` Florian Schmaus
  1 sibling, 0 replies; 52+ messages in thread
From: Alexey Zapparov @ 2023-04-24 22:52 UTC (permalink / raw
  To: gentoo-dev; +Cc: Florian Schmaus, council, William Hubbs

My 2 cents. As somebody who contributes to ::guru, I would like to
second that having a burden of hosting dependencies tarballs feels
like an obstacle. Pursuing upstream projects to adopt dependencies
bundling is often difficult (it's hard to convince developers to
change their workflows to make the life of ebuild packagers easier).
Latter is leading to forking the project on GitHub/Gitlab with the
only goal to cut release of dependencies tarball.

On Mon, Apr 24, 2023 at 10:33 PM Sam James <sam@gentoo.org> wrote:
>
>
> Florian Schmaus <flow@gentoo.org> writes:
>
> [CCing williamh@ as go-module.eclass & dev-lang/go maintainer.]
>
> > I like to ask the Gentoo council to vote on whether EGO_SUM should be
> > reinstated ("un-deprecated") or not.
> >
> > EGO_SUM is a project-comprehensive matter, as it affects not only
> > Go-lang packaging but also the proxy-maint and GURU
> > projects. Furthermore, as I have mentioned in my previous emails, the
> > deprecation of EGO_SUM has a significant negative impact on our users
> > and is, therefore, a global Gentoo issue.
> >
> > Asking for council involvement should be a last resort and only be
> > done in essential conflicts. But, unfortunately, I was unable to
> > convince the relevant maintainer with arguments that the deprecation
> > of EGO_SUM is harmful. And this matter is significant enough to
> > proceed with this.
>
> My feeling on this is that this proposal isn't yet complete enough
> for the council to assess. In the various previous discussions, the need
> for _some_ limit to be implemented (derived from EGO_SUM) was clear from
> the QA team and others.
>
> Voting on the matter now would be reopening the issue which led EGO_SUM
> to be deprecated in the first place, with only a partial mitigation
> (the Portage warning).
>
> Any such limit should be supported by pkgcheck, allow using EGO_SUM
> for most packages, but exclude the pathological cases which we're
> unlikely to want in ::gentoo.
>
> (Limit-per-ebuild rather than per-package is one option of many,
> too.)
>
> >
> > Most voices on the related mailing-list threads expressed support for
> > reinstating EGO_SUM. At least, that is my impression. While the
> > arguments used to deprecate EGO_SUM were mostly of esthetic nature.
> >
> > I want to state what should be common sense. Namely, asking for a
> > democratic vote is not a personal attack against any involved
> > person.
> > [...]
>
> I agree this is an important issue that affects the practicality
> of using Gentoo for some, and for contributing to Gentoo to others.
>
> >
> > On 17/04/2023 09.37, Florian Schmaus wrote:
> >> [original msg snipped]


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-24 20:28   ` Sam James
  2023-04-24 22:52     ` Alexey Zapparov
@ 2023-04-26 15:31     ` Florian Schmaus
  2023-04-26 16:12       ` Matt Turner
  2023-04-26 20:51       ` Sam James
  1 sibling, 2 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-04-26 15:31 UTC (permalink / raw
  To: Sam James, council; +Cc: gentoo-dev, William Hubbs

Hi Sam,

thanks for your feedback. I am glad for everyone who engages in this 
discussion and shares their views and new information.

On 24/04/2023 22.28, Sam James wrote:
> Florian Schmaus <flow@gentoo.org> writes:
> 
> [CCing williamh@ as go-module.eclass & dev-lang/go maintainer.]
> 
>> I like to ask the Gentoo council to vote on whether EGO_SUM should be
>> reinstated ("un-deprecated") or not > In the various previous discussions, the need
> for _some_ limit to be implemented (derived from EGO_SUM) was clear from
> the QA team and others.

Asking to impose an artificial limit is based on the same unfounded 
belief under which EGO_SUM was deprecated in the first place. I am 
worried that if we follow this, then a potential next step is to argue 
about adding packages to ::gentoo.


> Voting on the matter now would be reopening the issue which led EGO_SUM
> to be deprecated in the first place, with only a partial mitigation
> (the Portage warning).

I am sorry, but I do not follow. I think this is partly because it is 
not clear "what" (else) to mitigate.

The discussion would be more productive if someone who is supporting the 
EGO_SUM deprecation could rationally summarize the main arguments why we 
deprecated EGO_SUM.


> Any such limit should be supported by pkgcheck, allow using EGO_SUM
> for most packages, but exclude the pathological cases which we're
> unlikely to want in ::gentoo.
> 
> (Limit-per-ebuild rather than per-package is one option of many,
> too.)

As you probably noticed, I am not aware why we should impose such a 
limit. Especially a per-package limit confines the ability to provide 
the user with multiple versions of a package, which sometimes comes in 
handy [1].


>> Most voices on the related mailing-list threads expressed support for
>> reinstating EGO_SUM. At least, that is my impression. While the
>> arguments used to deprecate EGO_SUM were mostly of esthetic nature.
>>
>> I want to state what should be common sense. Namely, asking for a
>> democratic vote is not a personal attack against any involved
>> person.
>> [...]
> 
> I agree this is an important issue that affects the practicality
> of using Gentoo for some, and for contributing to Gentoo to others.

Same data point: Just in the last few days, multiple users reported in 
#-guru issues they wouldn't have had if EGO_SUM was not deprecated.

- Flow

1: From my experience, this is also something Gentoo is praised for.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-26 15:31     ` Florian Schmaus
@ 2023-04-26 16:12       ` Matt Turner
  2023-04-26 19:31         ` Andrew Ammerlaan
  2023-04-27  7:58         ` Florian Schmaus
  2023-04-26 20:51       ` Sam James
  1 sibling, 2 replies; 52+ messages in thread
From: Matt Turner @ 2023-04-26 16:12 UTC (permalink / raw
  To: Florian Schmaus; +Cc: Sam James, council, gentoo-dev, William Hubbs

On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
> The discussion would be more productive if someone who is supporting the
> EGO_SUM deprecation could rationally summarize the main arguments why we
> deprecated EGO_SUM.

You're requesting the changes. It's on you to read the previous
threads and try to understand. It's not others' responsibilities to
justify the status quo to you, but tl;dr is Manifest files grew to
insane sizes for golang packages with many dependencies, and the
Manifest size is a cost all Gentoo users pay regardless of whether
they use the package.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-26 16:12       ` Matt Turner
@ 2023-04-26 19:31         ` Andrew Ammerlaan
  2023-04-26 19:38           ` Chris Pritchard
  2023-04-26 20:47           ` Matt Turner
  2023-04-27  7:58         ` Florian Schmaus
  1 sibling, 2 replies; 52+ messages in thread
From: Andrew Ammerlaan @ 2023-04-26 19:31 UTC (permalink / raw
  To: gentoo-dev

On 26/04/2023 18:12, Matt Turner wrote:
> On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
>> The discussion would be more productive if someone who is supporting the
>> EGO_SUM deprecation could rationally summarize the main arguments why we
>> deprecated EGO_SUM.
> 
> You're requesting the changes. It's on you to read the previous
> threads and try to understand. It's not others' responsibilities to
> justify the status quo to you, but tl;dr is Manifest files grew to
> insane sizes for golang packages with many dependencies, and the
> Manifest size is a cost all Gentoo users pay regardless of whether
> they use the package.
> 

This is a valid point and I think it is clear. What is not clear however 
is why the EGO_SUM method should be dropped entirely instead of keeping 
it as an option for overlays (with an appropriate warning). As I 
remember this is where the discussion got 'stuck' last time.

There are other cases where things are possible but prohibited in 
::gentoo by policy. E.g. the acct-user eclass allows setting 
ACCT_USER_ID to -1 for dynamic assignment, but we do not allow this in 
::gentoo. I don't see why we could not do the same for EGO_SUM, keep it 
as an option, while disallowing it in ::gentoo.

This way ridiculously large manifests are gone out of ::gentoo. But 
overlays can still use the EGO_SUM method for their go packages if a 
tarball is too much of a hassle. And everyone is happy. It is then the 
responsibility of the overlay maintainers to ensure that their manifests 
don't grow out of hand. A warning from the eclass and/or pkgcheck should 
ensure that they are aware of the potential problem.

What am I missing? I truly do not understand why this matter is not 
resolved already and why we continue to have this discussion again and 
again. The solution just seems so simple.

Best regards,
Andrew


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [gentoo-dev] Re: EGO_SUM
  2023-04-26 19:31         ` Andrew Ammerlaan
@ 2023-04-26 19:38           ` Chris Pritchard
  2023-04-26 20:47           ` Matt Turner
  1 sibling, 0 replies; 52+ messages in thread
From: Chris Pritchard @ 2023-04-26 19:38 UTC (permalink / raw
  To: gentoo-dev@lists.gentoo.org

> This way ridiculously large manifests are gone out of ::gentoo. But overlays can
> still use the EGO_SUM method for their go packages if a tarball is too much of
> a hassle. And everyone is happy. It is then the responsibility of the overlay
> maintainers to ensure that their manifests don't grow out of hand. A warning
> from the eclass and/or pkgcheck should ensure that they are aware of the
> potential problem.
> 
> What am I missing? I truly do not understand why this matter is not resolved
> already and why we continue to have this discussion again and again. The
> solution just seems so simple.

I agree with this as a viable solution, hosting vendor tarballs with the gentoo infrastructure is possible, though there would need to be a way to support proxy maintainers in uploading and hosting them, but to deprecate it and move on to removing it as an option for overlays is, in my view, a poor move. It adds a significant burden to overlay maintainers, who may have to move to paying for hosting of the vendor tarballs, forking repositories, or even not contributing at all.

Chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-26 19:31         ` Andrew Ammerlaan
  2023-04-26 19:38           ` Chris Pritchard
@ 2023-04-26 20:47           ` Matt Turner
  1 sibling, 0 replies; 52+ messages in thread
From: Matt Turner @ 2023-04-26 20:47 UTC (permalink / raw
  To: gentoo-dev

On Wed, Apr 26, 2023 at 3:31 PM Andrew Ammerlaan
<andrewammerlaan@gentoo.org> wrote:
>
> On 26/04/2023 18:12, Matt Turner wrote:
> > On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
> >> The discussion would be more productive if someone who is supporting the
> >> EGO_SUM deprecation could rationally summarize the main arguments why we
> >> deprecated EGO_SUM.
> >
> > You're requesting the changes. It's on you to read the previous
> > threads and try to understand. It's not others' responsibilities to
> > justify the status quo to you, but tl;dr is Manifest files grew to
> > insane sizes for golang packages with many dependencies, and the
> > Manifest size is a cost all Gentoo users pay regardless of whether
> > they use the package.
> >
>
> This is a valid point and I think it is clear. What is not clear however
> is why the EGO_SUM method should be dropped entirely instead of keeping
> it as an option for overlays (with an appropriate warning). As I
> remember this is where the discussion got 'stuck' last time.
>
> There are other cases where things are possible but prohibited in
> ::gentoo by policy. E.g. the acct-user eclass allows setting
> ACCT_USER_ID to -1 for dynamic assignment, but we do not allow this in
> ::gentoo. I don't see why we could not do the same for EGO_SUM, keep it
> as an option, while disallowing it in ::gentoo.

I suspect allowing it unrestricted in overlays is fine—which seems to
be the major practical issue that spurred this thread.

Sam suggested a requirement for a maximum Manifest size (presumably
thinking about ::gentoo), and Florian replied:

> Asking to impose an artificial limit is based on the same unfounded
> belief under which EGO_SUM was deprecated in the first place. I am
> worried that if we follow this, then a potential next step is to argue
> about adding packages to ::gentoo.

So I think that's where the disagreement is.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-26 15:31     ` Florian Schmaus
  2023-04-26 16:12       ` Matt Turner
@ 2023-04-26 20:51       ` Sam James
  1 sibling, 0 replies; 52+ messages in thread
From: Sam James @ 2023-04-26 20:51 UTC (permalink / raw
  To: gentoo-dev; +Cc: council, William Hubbs

[-- Attachment #1: Type: text/plain, Size: 2353 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

> Hi Sam,
>
> thanks for your feedback. I am glad for everyone who engages in this
> discussion and shares their views and new information.
>
> On 24/04/2023 22.28, Sam James wrote:
>> Florian Schmaus <flow@gentoo.org> writes:
>> [CCing williamh@ as go-module.eclass & dev-lang/go maintainer.]
>> 
>>> I like to ask the Gentoo council to vote on whether EGO_SUM should be
>>> reinstated ("un-deprecated") or not > In the various previous discussions, the need
>> for _some_ limit to be implemented (derived from EGO_SUM) was clear from
>> the QA team and others.
>
> Asking to impose an artificial limit is based on the same unfounded
> belief under which EGO_SUM was deprecated in the first place. I am
> worried that if we follow this, then a potential next step is to argue
> about adding packages to ::gentoo.
>
>
>> Voting on the matter now would be reopening the issue which led EGO_SUM
>> to be deprecated in the first place, with only a partial mitigation
>> (the Portage warning).
>
> I am sorry, but I do not follow. I think this is partly because it is
> not clear "what" (else) to mitigate.
>
> The discussion would be more productive if someone who is supporting
> the EGO_SUM deprecation could rationally summarize the main arguments
> why we deprecated EGO_SUM.

I think Matt handled this in his reply.

>
>
>> Any such limit should be supported by pkgcheck, allow using EGO_SUM
>> for most packages, but exclude the pathological cases which we're
>> unlikely to want in ::gentoo.
>> (Limit-per-ebuild rather than per-package is one option of many,
>> too.)
>
> As you probably noticed, I am not aware why we should impose such a
> limit. Especially a per-package limit confines the ability to provide
> the user with multiple versions of a package, which sometimes comes in
> handy [1].

You added a check to Portage (thank you!) to warn when the environment
size is too big. This is a runtime/dynamic check which we can't
determine purely from the repository, so pkgcheck can't notice it.

I would like pkgcheck to have an approximation of a too-large A
in an ebuild (can use Manifest as a proxy if required) derived from
the maximum environment size.

I thought I'd communicated that need for the counterpart before.

thanks,
sam

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-26 16:12       ` Matt Turner
  2023-04-26 19:31         ` Andrew Ammerlaan
@ 2023-04-27  7:58         ` Florian Schmaus
  2023-04-27  9:24           ` Ulrich Mueller
                             ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-04-27  7:58 UTC (permalink / raw
  To: Matt Turner, gentoo-dev; +Cc: Sam James, council, William Hubbs


[-- Attachment #1.1.1: Type: text/plain, Size: 2301 bytes --]

On 26/04/2023 18.12, Matt Turner wrote:
> On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
>> The discussion would be more productive if someone who is supporting the
>> EGO_SUM deprecation could rationally summarize the main arguments why we
>> deprecated EGO_SUM.
> 
> You're requesting the changes. It's on you to read the previous
> threads and try to understand. It's not others' responsibilities to
> justify the status quo to you, but tl;dr is Manifest files grew to
> insane sizes for golang packages with many dependencies, and the
> Manifest size is a cost all Gentoo users pay regardless of whether
> they use the package.

I am sorry. I did try to understand the reasoning in the previous 
threads. However, I do not conclude that the "cost" users must pay for 
EGO_SUM justifies EGO_SUM's deprecation. It is the other way around: 
EGO_SUM's advantages do not explain its deprecation, even if users have 
to pay a cost.

You write that the "Manifest sizes grew to insane sizes"?

At which boundary does a package size, the total size of the package's 
directory, become insane?

Disk space is cheap. Currently, ::gentoo, without metadata, is around 
470 MiB. If you add 10 Go packages with a whopping 200 KiB each, then 
this adds 2 MiB to that. I need someone to explain how this constitutes 
an issue with disk space. Even if we add 100 Go packages, probably 
roughly the number of Go packages we have in ::gentoo, then those 20 MiB 
are not significant. Needless to say that the average size of a Go 
package is less than the 200 KiB uses in this calculation.

Network traffic, while also being cheap, may be more of an issue. 
Currently, gentoo-latest.tar.xz is ~41 MiB. So on a conservative 
approximation ::gentoo compresses to 1/10. So, the 10 Go-packages cause 
200 KiB of additional traffic. Even when using a low-bandwidth 
connection, say 12 KiB/s, this only adds 17 extra seconds to the 
transfer duration.

Moreover, the rate of change would be a better metric if we want to 
quantify the cost of a package. For example, a user who syncs daily is 
more affected by a package with 1 KiB of daily changes than a 200 KiB 
package that changes once per year.

The "cost" for EGO_SUM is negligible.

- Flow

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-27  7:58         ` Florian Schmaus
@ 2023-04-27  9:24           ` Ulrich Mueller
  2023-04-28  6:59             ` Florian Schmaus
  2023-04-27 12:54           ` Michał Górny
  2023-04-27 21:16           ` Sam James
  2 siblings, 1 reply; 52+ messages in thread
From: Ulrich Mueller @ 2023-04-27  9:24 UTC (permalink / raw
  To: Florian Schmaus
  Cc: Matt Turner, gentoo-dev, Sam James, council, William Hubbs

[-- Attachment #1: Type: text/plain, Size: 576 bytes --]

>>>>> On Thu, 27 Apr 2023, Florian Schmaus wrote:

> Network traffic, while also being cheap, may be more of an issue.
> Currently, gentoo-latest.tar.xz is ~41 MiB. So on a conservative
> approximation ::gentoo compresses to 1/10. So, the 10 Go-packages
> cause 200 KiB of additional traffic. Even when using a low-bandwidth
> connection, say 12 KiB/s, this only adds 17 extra seconds to the
> transfer duration.

Manifest files contain binary hashes which don't compress to 1/10.
A factor of 1/2 may be more realistic (since a byte is represented by
two hex digits).

Ulrich

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27  7:58         ` Florian Schmaus
  2023-04-27  9:24           ` Ulrich Mueller
@ 2023-04-27 12:54           ` Michał Górny
  2023-04-27 23:12             ` Pascal Jäger
  2023-04-28  6:59             ` Florian Schmaus
  2023-04-27 21:16           ` Sam James
  2 siblings, 2 replies; 52+ messages in thread
From: Michał Górny @ 2023-04-27 12:54 UTC (permalink / raw
  To: gentoo-dev, Matt Turner; +Cc: Sam James, council, William Hubbs

On Thu, 2023-04-27 at 09:58 +0200, Florian Schmaus wrote:
> Disk space is cheap.

No, it's not.  Gentoo supports more hardware than your average PC with
beefy hard drive and/or possibility of installing one.  Let's not forget
that you need a ::gentoo checkout even on a system running purely
on binary packages.

Let's not forget that git keeps all history, so every bump of a Go
package with large Manifest has a permanent negative impact on clone
size.  A few version bumps of Go packages can easily outweigh complete
history of hundreds of other packages.

> Network traffic, while also being cheap, may be more of an issue. 

Again, you're making assumption based on living in a well-developed area
and discriminating against users who have shoddy Internet connectivity.

That said, this all was discussed in the past.  I really wish you would
humble down and try to find a solution that would work for everyone
instead of showing arrogance and lack of concern for users outside your
"majority" view of Gentoo.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-17  9:28 ` [gentoo-dev] EGO_SUM Anna (cybertailor) Vyalkova
@ 2023-04-27 18:00   ` William Hubbs
  2023-04-27 18:18     ` David Seifert
  0 siblings, 1 reply; 52+ messages in thread
From: William Hubbs @ 2023-04-27 18:00 UTC (permalink / raw
  To: gentoo-dev; +Cc: sam

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

On Mon, Apr 17, 2023 at 02:28:22PM +0500, Anna (cybertailor) Vyalkova wrote:
> On 2023-04-17 09:37, Florian Schmaus wrote:
> > The EGO_SUM alternatives
> > - do not have the same level of trust and therefore have a negative 
> > impact on security (a dubious tarball someone put somewhere, especially 
> > when proxy-maint)

I haven't read all of this thread yet, but I did speak with Sam last
night, and I have another idea about this.

- I still want to deprecate EGO_SUM, but I'm working in the background
  on reworking get-ego-vendor to generate the data that goes into
  src_uri directly. This would eliminate most of the processing in the
  eclass.

 
 That, however, doesn't remove the concern about big ebuilds and
 manifests. I will look at the remainder of the thread to figure out
 what is going on with that.

 William

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27 18:00   ` William Hubbs
@ 2023-04-27 18:18     ` David Seifert
  0 siblings, 0 replies; 52+ messages in thread
From: David Seifert @ 2023-04-27 18:18 UTC (permalink / raw
  To: gentoo-dev; +Cc: sam

On Thu, 2023-04-27 at 13:00 -0500, William Hubbs wrote:
>  That, however, doesn't remove the concern about big ebuilds and
>  manifests. I will look at the remainder of the thread to figure out
>  what is going on with that.

You do know that the main reason it was deprecated in ::gentoo was the
ballooning of manifests, not some SRC_URI-generating implementation
details of the eclass itself?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27  7:58         ` Florian Schmaus
  2023-04-27  9:24           ` Ulrich Mueller
  2023-04-27 12:54           ` Michał Górny
@ 2023-04-27 21:16           ` Sam James
  2023-05-02 19:32             ` Florian Schmaus
  2 siblings, 1 reply; 52+ messages in thread
From: Sam James @ 2023-04-27 21:16 UTC (permalink / raw
  To: gentoo-dev; +Cc: Matt Turner, council, William Hubbs

[-- Attachment #1: Type: text/plain, Size: 1960 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

> [[PGP Signed Part:Undecided]]
> On 26/04/2023 18.12, Matt Turner wrote:
>> On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
>>> The discussion would be more productive if someone who is supporting the
>>> EGO_SUM deprecation could rationally summarize the main arguments why we
>>> deprecated EGO_SUM.
>> You're requesting the changes. It's on you to read the previous
>> threads and try to understand. It's not others' responsibilities to
>> justify the status quo to you, but tl;dr is Manifest files grew to
>> insane sizes for golang packages with many dependencies, and the
>> Manifest size is a cost all Gentoo users pay regardless of whether
>> they use the package.
>
> I am sorry. I did try to understand the reasoning in the previous
> threads. However, I do not conclude that the "cost" users must pay for
> EGO_SUM justifies EGO_SUM's deprecation. It is the other way around:
> EGO_SUM's advantages do not explain its deprecation, even if users
> have to pay a cost.
>
> You write that the "Manifest sizes grew to insane sizes"?
>
> At which boundary does a package size, the total size of the package's
> directory, become insane?
>
> Disk space is cheap. Currently, ::gentoo, without metadata, is around
> 470 MiB. If you add 10 Go packages with a whopping 200 KiB each, then
> this adds 2 MiB to that. I need someone to explain how this
> constitutes an issue with disk space. Even if we add 100 Go packages,
> probably roughly the number of Go packages we have in ::gentoo, then
> those 20 MiB are not significant. Needless to say that the average
> size of a Go package is less than the 200 KiB uses in this
> calculation.

The numbers you've used here suggest you've missed some of the
big problematic cases from the past:
- https://bugs.gentoo.org/833478 (1.1MB manifest)
- https://bugs.gentoo.org/833477 (1.6MB manifest)

sam

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27 12:54           ` Michał Górny
@ 2023-04-27 23:12             ` Pascal Jäger
  2023-04-28  0:38               ` Sam James
  2023-04-28  6:59             ` Florian Schmaus
  1 sibling, 1 reply; 52+ messages in thread
From: Pascal Jäger @ 2023-04-27 23:12 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1526 bytes --]

Maybe I’m getting this wrong, but didn’t we switch to shallow checkouts for the systems repository? I remember it was a major outcry on the mailing list. So at least for end users git keeps no history and our repository history should not impact clone size of a shallow copy, should it?

> On Donnerstag, Apr. 27, 2023 at 14:54, Michał Górny <mgorny@gentoo.org (mailto:mgorny@gentoo.org)> wrote:
> On Thu, 2023-04-27 at 09:58 +0200, Florian Schmaus wrote:
> > Disk space is cheap.
>
> No, it's not. Gentoo supports more hardware than your average PC with
> beefy hard drive and/or possibility of installing one. Let's not forget
> that you need a ::gentoo checkout even on a system running purely
> on binary packages.
>
> Let's not forget that git keeps all history, so every bump of a Go
> package with large Manifest has a permanent negative impact on clone
> size. A few version bumps of Go packages can easily outweigh complete
> history of hundreds of other packages.
>
> > Network traffic, while also being cheap, may be more of an issue.
>
> Again, you're making assumption based on living in a well-developed area
> and discriminating against users who have shoddy Internet connectivity.
>
> That said, this all was discussed in the past. I really wish you would
> humble down and try to find a solution that would work for everyone
> instead of showing arrogance and lack of concern for users outside your
> "majority" view of Gentoo.
>
> --
> Best regards,
> Michał Górny
>
>

[-- Attachment #2: Type: text/html, Size: 2132 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27 23:12             ` Pascal Jäger
@ 2023-04-28  0:38               ` Sam James
  2023-04-28  4:27                 ` Michał Górny
  0 siblings, 1 reply; 52+ messages in thread
From: Sam James @ 2023-04-28  0:38 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]


Pascal Jäger <Pascal.jaeger@leimstift.de> writes:

> Maybe I’m getting this wrong, but didn’t  we switch to shallow
> checkouts for the systems repository? I remember it was a major
> outcry on the mailing list. So at least for end users git keeps no
> history and our repository history should not impact clone size of a
> shallow copy, should it? 
>

(Try to avoid top-posting if you can, reply after the message you're
replying to.)

rsync copies of the tree aren't affected by this, nor are full
git clones for development.

>
>
>     On Donnerstag, Apr. 27, 2023 at 14:54, Michał Górny <
>     mgorny@gentoo.org> wrote:
>     On Thu, 2023-04-27 at 09:58 +0200, Florian Schmaus wrote:
>    
>         Disk space is cheap.
>    
>    
>     No, it's not. Gentoo supports more hardware than your average PC
>     with
>     beefy hard drive and/or possibility of installing one. Let's not
>     forget
>     that you need a ::gentoo checkout even on a system running purely
>     on binary packages.
>    
>     Let's not forget that git keeps all history, so every bump of a
>     Go
>     package with large Manifest has a permanent negative impact on
>     clone
>     size. A few version bumps of Go packages can easily outweigh
>     complete
>     history of hundreds of other packages. 
>    
>    
>         Network traffic, while also being cheap, may be more of an
>         issue.
>    
>    
>     Again, you're making assumption based on living in a
>     well-developed area
>     and discriminating against users who have shoddy Internet
>     connectivity.
>    
>     That said, this all was discussed in the past. I really wish you
>     would
>     humble down and try to find a solution that would work for
>     everyone
>     instead of showing arrogance and lack of concern for users
>     outside your
>     "majority" view of Gentoo.
>    
>     --
>     Best regards,
>     Michał Górny
>    
>    


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-28  0:38               ` Sam James
@ 2023-04-28  4:27                 ` Michał Górny
  2023-04-28  5:31                   ` Sam James
  0 siblings, 1 reply; 52+ messages in thread
From: Michał Górny @ 2023-04-28  4:27 UTC (permalink / raw
  To: gentoo-dev

On Fri, 2023-04-28 at 01:38 +0100, Sam James wrote:
> Pascal Jäger <Pascal.jaeger@leimstift.de> writes:
> 
> > Maybe I’m getting this wrong, but didn’t  we switch to shallow
> > checkouts for the systems repository? I remember it was a major
> > outcry on the mailing list. So at least for end users git keeps no
> > history and our repository history should not impact clone size of a
> > shallow copy, should it? 
> > 
> 
> (Try to avoid top-posting if you can, reply after the message you're
> replying to.)
> 
> rsync copies of the tree aren't affected by this, nor are full
> git clones for development.
> 

Err, but full gentoo.git clones are definitely affected!  After all,
that's where huge ebuilds and their Manifests land first.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-28  4:27                 ` Michał Górny
@ 2023-04-28  5:31                   ` Sam James
  0 siblings, 0 replies; 52+ messages in thread
From: Sam James @ 2023-04-28  5:31 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 973 bytes --]


Michał Górny <mgorny@gentoo.org> writes:

> On Fri, 2023-04-28 at 01:38 +0100, Sam James wrote:
>> Pascal Jäger <Pascal.jaeger@leimstift.de> writes:
>> 
>> > Maybe I’m getting this wrong, but didn’t  we switch to shallow
>> > checkouts for the systems repository? I remember it was a major
>> > outcry on the mailing list. So at least for end users git keeps no
>> > history and our repository history should not impact clone size of a
>> > shallow copy, should it? 
>> > 
>> 
>> (Try to avoid top-posting if you can, reply after the message you're
>> replying to.)
>> 
>> rsync copies of the tree aren't affected by this, nor are full
>> git clones for development.
>> 
>
> Err, but full gentoo.git clones are definitely affected!  After all,
> that's where huge ebuilds and their Manifests land first.

I meant they're not affected by any changes to Portage's new default
of shallow clones, i.e. it doesn't help the problem for them.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-27  9:24           ` Ulrich Mueller
@ 2023-04-28  6:59             ` Florian Schmaus
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-04-28  6:59 UTC (permalink / raw
  To: Ulrich Mueller, gentoo-dev; +Cc: Matt Turner, Sam James, council, William Hubbs


[-- Attachment #1.1.1: Type: text/plain, Size: 848 bytes --]

On 27/04/2023 11.24, Ulrich Mueller wrote:
>>>>>> On Thu, 27 Apr 2023, Florian Schmaus wrote:
> 
>> Network traffic, while also being cheap, may be more of an issue.
>> Currently, gentoo-latest.tar.xz is ~41 MiB. So on a conservative
>> approximation ::gentoo compresses to 1/10. So, the 10 Go-packages
>> cause 200 KiB of additional traffic. Even when using a low-bandwidth
>> connection, say 12 KiB/s, this only adds 17 extra seconds to the
>> transfer duration.
> 
> Manifest files contain binary hashes which don't compress to 1/10.
> A factor of 1/2 may be more realistic (since a byte is represented by
> two hex digits).

You are right, Manifests compress roughly to half their size. I did not 
consider that. Thanks for pointing it out.

This does not affect the argument regarding the rate of change, though.

- Flow


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27 12:54           ` Michał Górny
  2023-04-27 23:12             ` Pascal Jäger
@ 2023-04-28  6:59             ` Florian Schmaus
  2023-04-28 14:34               ` Michał Górny
  2023-04-29 22:34               ` Robin H. Johnson
  1 sibling, 2 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-04-28  6:59 UTC (permalink / raw
  To: gentoo-dev, Michał Górny
  Cc: Matt Turner, Sam James, council, William Hubbs


[-- Attachment #1.1.1: Type: text/plain, Size: 2083 bytes --]

On 27/04/2023 14.54, Michał Górny wrote:
> On Thu, 2023-04-27 at 09:58 +0200, Florian Schmaus wrote:
>> Disk space is cheap.
> 
> No, it's not.  Gentoo supports more hardware than your average PC with
> beefy hard drive and/or possibility of installing one.  Let's not forget
> that you need a ::gentoo checkout even on a system running purely
> on binary packages.

You are right. Gentoo supports a broad range of hardware in many 
dimensions, e.g., architecture, release date, and composition.

You seem to suggest that are Gentoo systems that can not handle the 
additional disk space consumption of EGO_SUM Go-packages?

I can not imagine systems that are able to deal with the ~500 MiB 
::gentoo repository, but would break if the same repository would 
contain 100 additional Go-packages with 200 KiB each.

Even under a "worst-case" assumption, where we would have 256 
Go-packages with each having a 1 MiB package-directory size, any system 
that can handle the current state of ::gentoo should be able to take the 
additional 256 MiB (+ metadata).


>> Network traffic, while also being cheap, may be more of an issue.
> 
> Again, you're making assumption based on living in a well-developed area
> and discriminating against users who have shoddy Internet connectivity.
> 
> That said, this all was discussed in the past.  I really wish you would
> humble down and try to find a solution that would work for everyone
> instead of showing arrogance and lack of concern for users outside your
> "majority" view of Gentoo.

I am sorry. I will work on my humbleness.

I am only pursuing the modest request to legitimize any decision 
regarding EGO_SUM by a democratic vote.

As far as I can tell, there was never a democratic vote regarding 
EGO_SUM. But please correct me if I am wrong.

And I never said that I believe in representing the majority's opinion. 
That said, I prefer to have this voted on by an all-developer vote than 
a council vote. Then we would know what the majority voted for. Is that 
possible?

- Flow

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-28  6:59             ` Florian Schmaus
@ 2023-04-28 14:34               ` Michał Górny
  2023-05-02 19:32                 ` Florian Schmaus
  2023-04-29 22:34               ` Robin H. Johnson
  1 sibling, 1 reply; 52+ messages in thread
From: Michał Górny @ 2023-04-28 14:34 UTC (permalink / raw
  To: gentoo-dev; +Cc: Matt Turner, Sam James, council, William Hubbs

On Fri, 2023-04-28 at 08:59 +0200, Florian Schmaus wrote:
> On 27/04/2023 14.54, Michał Górny wrote:
> > On Thu, 2023-04-27 at 09:58 +0200, Florian Schmaus wrote:
> > > Disk space is cheap.
> > 
> > No, it's not.  Gentoo supports more hardware than your average PC with
> > beefy hard drive and/or possibility of installing one.  Let's not forget
> > that you need a ::gentoo checkout even on a system running purely
> > on binary packages.
> 
> You are right. Gentoo supports a broad range of hardware in many 
> dimensions, e.g., architecture, release date, and composition.
> 
> You seem to suggest that are Gentoo systems that can not handle the 
> additional disk space consumption of EGO_SUM Go-packages?
> 
> I can not imagine systems that are able to deal with the ~500 MiB 
> ::gentoo repository, but would break if the same repository would 
> contain 100 additional Go-packages with 200 KiB each.
> 
> Even under a "worst-case" assumption, where we would have 256 
> Go-packages with each having a 1 MiB package-directory size, any system 
> that can handle the current state of ::gentoo should be able to take the 
> additional 256 MiB (+ metadata).

That's the slippery slope of exponential growth.  If every developer
thought "oh, worst case it'll grow only 10%"...

There's roughly 19k packages in Gentoo.  Go packages constitute only
a small number of them, yet maintainers of these packages seem to assume
it's fine if they take up a significant portion of disk space.  That's
not fair at all.

In fact, I'm pretty sure I ground some numbers in the previous thread.

> > 
> I am only pursuing the modest request to legitimize any decision 
> regarding EGO_SUM by a democratic vote.
> 
> As far as I can tell, there was never a democratic vote regarding 
> EGO_SUM. But please correct me if I am wrong.

Since when are eclass design issues "legitimized" by "a democratic
vote"?  In the best case, they are handled via rough consensus.
In the worst, a single person can't stand a decision and bothers
everyone until they let them have their way.

Open source is not a democracy, it's volunteer effort.  People dedicate
their free time and do their best.  If you want something done, you have
to either do it yourself (and do it right!) or convince someone to do
it.  You don't overturn maintainers by "democratic votes", that's
actually how you shatter open source community and make volunteers stop
contributing.

Believe me, I've made enough bad decisions to know that now.

> And I never said that I believe in representing the majority's opinion. 
> That said, I prefer to have this voted on by an all-developer vote than 
> a council vote. Then we would know what the majority voted for. Is that 
> possible?

There's the General Resolution but it's supposed to be used only to
override Council decisions, so you should go with a Council vote first.

I don't believe this is a hill worth dying on but if you insist...
*shrug*.  I just wish you'd actually listen to people and put some real
effort to reach a compromise/consensus rather than pushing your narrow
solution through with no regard for consequences.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-28  6:59             ` Florian Schmaus
  2023-04-28 14:34               ` Michał Górny
@ 2023-04-29 22:34               ` Robin H. Johnson
  1 sibling, 0 replies; 52+ messages in thread
From: Robin H. Johnson @ 2023-04-29 22:34 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 8269 bytes --]

On Fri, Apr 28, 2023 at 08:59:29AM +0200, Florian Schmaus wrote:
> On 27/04/2023 14.54, Michał Górny wrote:
> > On Thu, 2023-04-27 at 09:58 +0200, Florian Schmaus wrote:
> >> Disk space is cheap.
> > 
> > No, it's not.  Gentoo supports more hardware than your average PC with
> > beefy hard drive and/or possibility of installing one.  Let's not forget
> > that you need a ::gentoo checkout even on a system running purely
> > on binary packages.
> 
> You are right. Gentoo supports a broad range of hardware in many 
> dimensions, e.g., architecture, release date, and composition.
> 
> You seem to suggest that are Gentoo systems that can not handle the 
> additional disk space consumption of EGO_SUM Go-packages?
> 
> I can not imagine systems that are able to deal with the ~500 MiB 
> ::gentoo repository, but would break if the same repository would 
> contain 100 additional Go-packages with 200 KiB each.
> 
> Even under a "worst-case" assumption, where we would have 256 
> Go-packages with each having a 1 MiB package-directory size, any system 
> that can handle the current state of ::gentoo should be able to take the 
> additional 256 MiB (+ metadata).
This email ended up more rambling than I intended, but I wanted to get the data
out there, and enable us to look deeper at the problems and potential impacts
of the solutions.

Before the ideas and data I wanted to note the semi-conceptual ways to package
new things that have many dependency artifacts (package or distfile).

Distfile-heavy packages:
------------------------
A package declares many distfile dependencies, but very few package
dependencies. The Manifest files in this case suffer a lot of
duplication - but the growth is mostly limited to ::gentoo (or
overlays).

Any change of a package that leads to slightly different Manifest file,
and while delta compression will reduce the growth factor, it's still
large (dropping a version, adding a version, adding a remotely-fetched patch.

Dependency-heavy packages:
--------------------------
A package declares many package dependencies, with the distfile growth
distributed over MANY packages. Major downside here is that
build-depends consume a lot more space & inodes to install all the
depends that are used for the ebuild, esp. when a given distfile might
be used for only one package. Want to build a complex Go-based package?
Debian/Ubuntu use this approach, and it shows might have to explicitly
package 70+ dependencies to get something you want packaged.
https://salsa.debian.org/go-team/packages/consul/-/blob/debian/sid/debian/control#L10-89
a quick back-of-napkin set of math show the Debian golang dep packages,
as of 22.04 LTS: ~30% are a dep for only one package; a further 30% are
a dep for only 2 packages.

----
With the above in mind, we see that it's not just the size of the Manifest, but
the combinatorial problem of Manifest revisions, with the saving roll of Git's
delta compression.

I pulled a Git listing of every Manifest blob that was larger than 64KiB
in Git history (excluding the historical conversion), and then go based
on those: 2718 blobs in total, taking up ~516MiB, 1600056 DIST entries,
for 166726 distinct distfiles.

I tried to break those distfiles down, based on filename patterns, or where
they occurred (sorted by number of distfiles here):
  76075 dist-tex (all in the tex category)
  33949 dist-mozilla (firefox*, thunderbird*)
  19314 dist-office 
  17802 dist-golang (*%2F@v%2F* files; 10160 .mod, 7642 .zip)
  10478 dist-rust (*.crate files)
   3630 dist-other
   1325 dist-jar-pom (*.jar, *.pom)
   1020 dist-tablebase-syzygy (distfiles for a specific package)
    981 dist-kde (kde manifests that met the threshold)
    980 dist-kernel-and-genpatches
    749 dist-tessdata (again specific packages)
    424 dist-bash (specific packages)
 166727 == total

The Rust & Golang counts *are* lower bounds, because it's not trivial to
take into account changes in packaging. However, the upper bound 
E.g. this distfile isn't immediately classifiable as Rust:
d3d12-rs-a990c93ec64eeab78f2292763d0715da9dba1d59.gh.tar.gz
To assume a worst case, assign the dist-other to the category of  your choice.

Ecosystems that are distfile-heavy, in order of Manifest sizes: TeX, Golang, Rust
Packages that are distfile-heavy: LibreOffice/OpenOffice, Firefox, Thunderbird

TeX has only a few packages, but the MOST distfiles.
dev-texlive/texlive-latexextra/Manifest peaked over 6MB with 15480 entries. For
all of Gentoo git history however, there have only been 19 revisions of that
Manifest. For all TeX packages, 286 revisions of Manifests over 37 packages.
Those 286 Manifest revisions clock in at ~94MB together before compression.

The Mozilla packages have the next most distfiles:
4 packages, 768 manifest revisions, but the largest single Manifest was only 285519 bytes.
~88MB for all the manifest revision bytes together.

The office packages (app-office/libreoffice-l10n & app-office/openoffice-bin)
are similar to Mozilla stats overall, and not much to discuss.
~35MB for all Manifest revisions together.

With those big 3 out the way, we're into Golang & Rust.
Golang:
83 packages, 787 Manifest revisions. Largest manifest was
sys-cluster/k3s/Manifest in blob f0e4d1761c0fe80a48b45007ad02024676490841,
coming in just under 1MiB. However, the duplication of distfiles between Manifests *really* shows up:
~247MB for all Manifest revisions together.

Rust:
48 packages, 543 Manifest revisions, largest Manifest was blob
af989423f436338fb3e1d4193448dada5b9154da of app-shells/nushell/Manifest at
336646 bytes. ~64MB for all Manifest revisions together.

--- End of data-analysis.

The estimates of Manifest compression were fine as a baseline, but Git uses
delta compression, and what tends to matter is the total number of unique
lines in a repo. The expansion *does* matter when the Manifests are checked out
at the same time.

If we took the Debian approach, we'd minimize the number of times a given
distfile has data repeated in Manifests, because it'd be abstracted a single
dependency entry. The apparent downside is the significant increase in
build-only dependencies that are rarely used.

Previously I'd sketched an idea for out-of-tree Manifests, that hoisted many
SRC_URI entries into a *versioned* Manifest artifact that wasn't present inside
the tree, but had to be fetched & verified first, and then used to fetch &
verify the actual distfiles.

That Manifest, while relatively small, would be subject to some of hosting
problems as the distfile dep tarballs presently used. However it *would* mean
that the deps are much harder to tamper with (because they'd still come from
the original upstreams).

I do understand that overlays/non-main-trees however find the dep tarball
concept to significantly impede packaging speed, and the out-of-tree Manifest
will also cause friction (even if it were inside their overlay repo, it's still
more work).

To that end, and I know this will likely require significant PMS work, I think
we need to look deeper at how to solve the underlying issues.

Putting the "what" of entries into the *ebuild*, e.g. with EGO_SUM is still
ideal from an ease-of-development and validation perspective. It's an accurate
representation of the *artifacts* that a package depends on. Those artifacts
might be packages, or external distfiles.

Where it breaks down is the mapping of those artifacts into in-tree data:
Duplication in Manifests, md5-metadata.

How do we avoid that duplication? The most obvious version is moving the
artifacts back to *some* form of package as much as it pains me.

The crappy part there is we're going to end up packaging 2823 different Golang
things, representing the 17802 distfiles.

A smaller-on-checkout, larger-in-history solution would be moving common DIST
entries to another Manifest, changing the way validation rules work.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-27 21:16           ` Sam James
@ 2023-05-02 19:32             ` Florian Schmaus
  2023-05-02 19:45               ` Sam James
  2023-05-02 20:04               ` Matt Turner
  0 siblings, 2 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-05-02 19:32 UTC (permalink / raw
  To: gentoo-dev, Sam James; +Cc: Matt Turner, council, William Hubbs


[-- Attachment #1.1.1: Type: text/plain, Size: 3603 bytes --]

On 27/04/2023 23.16, Sam James wrote:
> Florian Schmaus <flow@gentoo.org> writes:
> 
>> [[PGP Signed Part:Undecided]]
>> On 26/04/2023 18.12, Matt Turner wrote:
>>> On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
>>>> The discussion would be more productive if someone who is supporting the
>>>> EGO_SUM deprecation could rationally summarize the main arguments why we
>>>> deprecated EGO_SUM.
>>> You're requesting the changes. It's on you to read the previous
>>> threads and try to understand. It's not others' responsibilities to
>>> justify the status quo to you, but tl;dr is Manifest files grew to
>>> insane sizes for golang packages with many dependencies, and the
>>> Manifest size is a cost all Gentoo users pay regardless of whether
>>> they use the package.
>>
>> I am sorry. I did try to understand the reasoning in the previous
>> threads. However, I do not conclude that the "cost" users must pay for
>> EGO_SUM justifies EGO_SUM's deprecation. It is the other way around:
>> EGO_SUM's advantages do not explain its deprecation, even if users
>> have to pay a cost.
>>
>> You write that the "Manifest sizes grew to insane sizes"?
>>
>> At which boundary does a package size, the total size of the package's
>> directory, become insane?
>>
>> Disk space is cheap. Currently, ::gentoo, without metadata, is around
>> 470 MiB. If you add 10 Go packages with a whopping 200 KiB each, then
>> this adds 2 MiB to that. I need someone to explain how this
>> constitutes an issue with disk space. Even if we add 100 Go packages,
>> probably roughly the number of Go packages we have in ::gentoo, then
>> those 20 MiB are not significant. Needless to say that the average
>> size of a Go package is less than the 200 KiB uses in this
>> calculation.
> 
> The numbers you've used here suggest you've missed some of the
> big problematic cases from the past:
> - https://bugs.gentoo.org/833478 (1.1MB manifest)
> - https://bugs.gentoo.org/833477 (1.6MB manifest)

Thanks for pointing those bugs out.

But please allow me to clarify that I did not miss those "problematic" 
cases from the past.

I performed a tree-wide analysis regarding EGO_SUM and IIRC published 
the results in my previous post about EGO_SUM last year.
https://dev.gentoo.org/~flow/ego_sum-2022-01-01.txt shows the analysis 
results for ::gentoo as of 2022-01-01 (I've recently updated the file to 
contain the Manifest-size too).

Minikube (#833478) and k3s (#833477) appear there, too, with 
package-directory sizes over one MiB. However, those packages are under 
the top five of packages using EGO_SUM by package-directory size.

They do not represent the average Go package.

The mean size of a Manifest of a package using EGO_SUM was 186 KiB, and 
the median was even lower at 84 KiB. Only a tiny percentage of packages, 
below 5%, had a Manifest-size above one MiB.

It appears that some feel like the EGO_SUM size consumption is wasteful.

I am always sympathetic toward optimization efforts that save resources. 
Be it bytes-at-rest, transferred bytes, or CPU cycles. Often those can 
make a difference, or at least, they are evidence of engineering skills.

But even if all Go-packages using EGO_SUM had one-MiB-sized Manifests, 
it is unclear what the actual issue is.

Both bugs ask for action without describing the negative impact of those 
larger than 1 MiB Manifests. For example, there is no mention of someone 
being negatively affected by those bugs nor any observed reduction in 
functionality.

- Flow

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-04-28 14:34               ` Michał Górny
@ 2023-05-02 19:32                 ` Florian Schmaus
  2023-05-02 19:38                   ` Sam James
  0 siblings, 1 reply; 52+ messages in thread
From: Florian Schmaus @ 2023-05-02 19:32 UTC (permalink / raw
  To: gentoo-dev, elections; +Cc: council


[-- Attachment #1.1.1: Type: text/plain, Size: 1179 bytes --]

On 28/04/2023 16.34, Michał Górny wrote:
> On Fri, 2023-04-28 at 08:59 +0200, Florian Schmaus wrote:
>> And I never said that I believe in representing the majority's opinion.
>> That said, I prefer to have this voted on by an all-developer vote than
>> a council vote. Then we would know what the majority voted for. Is that
>> possible?
> 
> There's the General Resolution but it's supposed to be used only to
> override Council decisions, so you should go with a Council vote first.

Could we temporarily re-purpose Gentoo's election infrastructure to hold 
an all-developer opinion poll?

I imagine a poll asking for opinions, nothing binding. Furthermore, 
since Gentoo's voting infrastructure uses the Condorcet method, we could 
have multiple options.

A poll-preceding phase where voters can submit options for the poll 
would help to take everyone's position into account.

And then, performing a poll where everyone can rank the available 
options should allow us to get a pretty good idea about what the 
community of Gentoo developers thinks about this topic.

@gentoo-elections: would you be willing to assist in such a venture?

- Flow

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-02 19:32                 ` Florian Schmaus
@ 2023-05-02 19:38                   ` Sam James
  0 siblings, 0 replies; 52+ messages in thread
From: Sam James @ 2023-05-02 19:38 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev, elections, council

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

> [[PGP Signed Part:Undecided]]
> On 28/04/2023 16.34, Michał Górny wrote:
>> On Fri, 2023-04-28 at 08:59 +0200, Florian Schmaus wrote:
>>> And I never said that I believe in representing the majority's opinion.
>>> That said, I prefer to have this voted on by an all-developer vote than
>>> a council vote. Then we would know what the majority voted for. Is that
>>> possible?
>> There's the General Resolution but it's supposed to be used only to
>> override Council decisions, so you should go with a Council vote first.
>
> Could we temporarily re-purpose Gentoo's election infrastructure to
> hold an all-developer opinion poll?
>
> I imagine a poll asking for opinions, nothing binding. Furthermore,
> since Gentoo's voting infrastructure uses the Condorcet method, we
> could have multiple options.
>

You still haven't addressed all concerns on this ML, including my
last email, so I'd say this is a bit premature.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-02 19:32             ` Florian Schmaus
@ 2023-05-02 19:45               ` Sam James
  2023-05-08  7:53                 ` Florian Schmaus
  2023-05-02 20:04               ` Matt Turner
  1 sibling, 1 reply; 52+ messages in thread
From: Sam James @ 2023-05-02 19:45 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev, Matt Turner, council, William Hubbs

[-- Attachment #1: Type: text/plain, Size: 2609 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

> [[PGP Signed Part:Undecided]]
> On 27/04/2023 23.16, Sam James wrote:
>> Florian Schmaus <flow@gentoo.org> writes:
>> 
>>> [[PGP Signed Part:Undecided]]
>>> On 26/04/2023 18.12, Matt Turner wrote:
>>>> On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
>>>>> The discussion would be more productive if someone who is supporting the
>>>>> EGO_SUM deprecation could rationally summarize the main arguments why we
>>>>> deprecated EGO_SUM.
>>>> You're requesting the changes. It's on you to read the previous
>>>> threads and try to understand. It's not others' responsibilities to
>>>> justify the status quo to you, but tl;dr is Manifest files grew to
>>>> insane sizes for golang packages with many dependencies, and the
>>>> Manifest size is a cost all Gentoo users pay regardless of whether
>>>> they use the package.
>>>
>>> I am sorry. I did try to understand the reasoning in the previous
>>> threads. However, I do not conclude that the "cost" users must pay for
>>> EGO_SUM justifies EGO_SUM's deprecation. It is the other way around:
>>> EGO_SUM's advantages do not explain its deprecation, even if users
>>> have to pay a cost.
>>>
>>> You write that the "Manifest sizes grew to insane sizes"?
>>>
>>> At which boundary does a package size, the total size of the package's
>>> directory, become insane?
>>>
>>> Disk space is cheap. Currently, ::gentoo, without metadata, is around
>>> 470 MiB. If you add 10 Go packages with a whopping 200 KiB each, then
>>> this adds 2 MiB to that. I need someone to explain how this
>>> constitutes an issue with disk space. Even if we add 100 Go packages,
>>> probably roughly the number of Go packages we have in ::gentoo, then
>>> those 20 MiB are not significant. Needless to say that the average
>>> size of a Go package is less than the 200 KiB uses in this
>>> calculation.
>> The numbers you've used here suggest you've missed some of the
>> big problematic cases from the past:
>> - https://bugs.gentoo.org/833478 (1.1MB manifest)
>> - https://bugs.gentoo.org/833477 (1.6MB manifest)
>
> Thanks for pointing those bugs out.
>
> But please allow me to clarify that I did not miss those "problematic"
> cases from the past.

This kind of phrasing is the sort of thing which makes it seem like you
don't appreciate/acknowledge others' concerns.

I said problematic because it was clearly beyond what your worst-case
estimates were, i.e. far more than what you were saying would be a
large amount for the purposes of calculations.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-02 19:32             ` Florian Schmaus
  2023-05-02 19:45               ` Sam James
@ 2023-05-02 20:04               ` Matt Turner
  2023-05-08  7:53                 ` Florian Schmaus
  1 sibling, 1 reply; 52+ messages in thread
From: Matt Turner @ 2023-05-02 20:04 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev, Sam James, council, William Hubbs

On Tue, May 2, 2023 at 3:33 PM Florian Schmaus <flow@gentoo.org> wrote:
> I performed a tree-wide analysis regarding EGO_SUM and IIRC published
> the results in my previous post about EGO_SUM last year.
> https://dev.gentoo.org/~flow/ego_sum-2022-01-01.txt shows the analysis
> results for ::gentoo as of 2022-01-01 (I've recently updated the file to
> contain the Manifest-size too).
>
> Minikube (#833478) and k3s (#833477) appear there, too, with
> package-directory sizes over one MiB. However, those packages are under
> the top five of packages using EGO_SUM by package-directory size.
>
> They do not represent the average Go package.
>
> The mean size of a Manifest of a package using EGO_SUM was 186 KiB, and
> the median was even lower at 84 KiB. Only a tiny percentage of packages,
> below 5%, had a Manifest-size above one MiB.

It sounds like you've identified a compelling rationale for a Manifest
size limit.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-02 20:04               ` Matt Turner
@ 2023-05-08  7:53                 ` Florian Schmaus
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-05-08  7:53 UTC (permalink / raw
  To: Matt Turner, gentoo-dev

On 02.05.23 22:04, Matt Turner wrote:
> On Tue, May 2, 2023 at 3:33 PM Florian Schmaus <flow@gentoo.org> wrote:
>> I performed a tree-wide analysis regarding EGO_SUM and IIRC published
>> the results in my previous post about EGO_SUM last year.
>> https://dev.gentoo.org/~flow/ego_sum-2022-01-01.txt shows the analysis
>> results for ::gentoo as of 2022-01-01 (I've recently updated the file to
>> contain the Manifest-size too).
>>
>> Minikube (#833478) and k3s (#833477) appear there, too, with
>> package-directory sizes over one MiB. However, those packages are under
>> the top five of packages using EGO_SUM by package-directory size.
>>
>> They do not represent the average Go package.
>>
>> The mean size of a Manifest of a package using EGO_SUM was 186 KiB, and
>> the median was even lower at 84 KiB. Only a tiny percentage of packages,
>> below 5%, had a Manifest-size above one MiB.
> 
> It sounds like you've identified a compelling rationale for a Manifest
> size limit.

Please feel free and encouraged to elaborate on your thoughts about 
Manifest size limitation.

- Flow


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-02 19:45               ` Sam James
@ 2023-05-08  7:53                 ` Florian Schmaus
  2023-05-08 12:03                   ` Michał Górny
  0 siblings, 1 reply; 52+ messages in thread
From: Florian Schmaus @ 2023-05-08  7:53 UTC (permalink / raw
  To: Sam James, gentoo-dev; +Cc: Matt Turner, council, William Hubbs

On 02.05.23 21:45, Sam James wrote:
> Florian Schmaus <flow@gentoo.org> writes:
>> On 27/04/2023 23.16, Sam James wrote:
>>> Florian Schmaus <flow@gentoo.org> writes:
>>>> On 26/04/2023 18.12, Matt Turner wrote:
>>>>> On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
>>>>>> The discussion would be more productive if someone who is supporting the
>>>>>> EGO_SUM deprecation could rationally summarize the main arguments why we
>>>>>> deprecated EGO_SUM.
>>>>> You're requesting the changes. It's on you to read the previous
>>>>> threads and try to understand. It's not others' responsibilities to
>>>>> justify the status quo to you, but tl;dr is Manifest files grew to
>>>>> insane sizes for golang packages with many dependencies, and the
>>>>> Manifest size is a cost all Gentoo users pay regardless of whether
>>>>> they use the package.
>>>>
>>>> I am sorry. I did try to understand the reasoning in the previous
>>>> threads. However, I do not conclude that the "cost" users must pay for
>>>> EGO_SUM justifies EGO_SUM's deprecation. It is the other way around:
>>>> EGO_SUM's advantages do not explain its deprecation, even if users
>>>> have to pay a cost.
>>>>
>>>> You write that the "Manifest sizes grew to insane sizes"?
>>>>
>>>> At which boundary does a package size, the total size of the package's
>>>> directory, become insane?
>>>>
>>>> Disk space is cheap. Currently, ::gentoo, without metadata, is around
>>>> 470 MiB. If you add 10 Go packages with a whopping 200 KiB each, then
>>>> this adds 2 MiB to that. I need someone to explain how this
>>>> constitutes an issue with disk space. Even if we add 100 Go packages,
>>>> probably roughly the number of Go packages we have in ::gentoo, then
>>>> those 20 MiB are not significant. Needless to say that the average
>>>> size of a Go package is less than the 200 KiB uses in this
>>>> calculation.
>>> The numbers you've used here suggest you've missed some of the
>>> big problematic cases from the past:
>>> - https://bugs.gentoo.org/833478 (1.1MB manifest)
>>> - https://bugs.gentoo.org/833477 (1.6MB manifest)
>>
>> Thanks for pointing those bugs out.
>>
>> But please allow me to clarify that I did not miss those "problematic"
>> cases from the past.
> 
> This kind of phrasing is the sort of thing which makes it seem like you
> don't appreciate/acknowledge others' concerns.

I am genuinely sorry if my usage of "problematic" made it appear that I 
do not appreciate the other's concerns. Like most people on this mailing 
list, I appreciate everyone who cares about Gentoo and raises concerns.

I do, however, not share the concerns regarding EGO_SUM.

It is hard to share concerns based on rather abstract reasons—for 
example, the portrayal of EGO_SUM as unfair.

It would be easier to share concerns if somebody gave concrete reasons 
against EGO_SUM. For example, use cases that are no longer possible. Or 
developers or users who are restricted in their work by EGO_SUM in a 
relevant way.

But actual problems that currently speak against the use of EGO_SUM have 
not surfaced.


> I said problematic because it was clearly beyond what your worst-case
> estimates were, i.e. far more than what you were saying would be a
> large amount for the purposes of calculations.

Using the term "worst-case", even if I put it in quotes, probably got 
people on the wrong track. I am sorry for that; my bad. It is, in 
general, impossible even to approximate the worst-case size-increase of 
::gentoo.

Our best chance is to use historical data to interpolate the future.

My back-of-the-envolope calculation was 256 Go-packages, with each 
having 1 MiB. An analysis of the three on 2022-02-16, at the commit 
right before Minikube and k3s were cleaned, showed that only five 
packages out of 120 had larger package-directory sizes than one MiB.

256 Go-packages is roughly the number of Go-packages we have right now. 
Assuming they all have a package-directory size of 1.6 MiB, the most 
extensive EGO_SUM package the analysis yielded so far, we end up with 
410 MiB.

The point you criticize was that a system able to handle the current 
size of ::gentoo would also be able to manage an additional 256 MiB. The 
point still stands if we exchange the 256 MiB with 410 MiB.

Furthermore, both numbers, 256 MiB and 410 MiB, are based on the 
over-approximation that every EGO_SUM package uses 1.6 MiB, which is 
almost certainly not the case. The mean package-directory size of a 
EGO_SUM using package at 2022-02-16 was 280 KiB.

- Flow



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-08  7:53                 ` Florian Schmaus
@ 2023-05-08 12:03                   ` Michał Górny
  2023-05-22  7:14                     ` Florian Schmaus
  0 siblings, 1 reply; 52+ messages in thread
From: Michał Górny @ 2023-05-08 12:03 UTC (permalink / raw
  To: gentoo-dev, Sam James; +Cc: Matt Turner, council, William Hubbs

On Mon, 2023-05-08 at 09:53 +0200, Florian Schmaus wrote:
> On 02.05.23 21:45, Sam James wrote:
> > Florian Schmaus <flow@gentoo.org> writes:
> > > On 27/04/2023 23.16, Sam James wrote:
> > > > Florian Schmaus <flow@gentoo.org> writes:
> > > > > On 26/04/2023 18.12, Matt Turner wrote:
> > > > > > On Wed, Apr 26, 2023 at 11:31 AM Florian Schmaus <flow@gentoo.org> wrote:
> > > > > > > The discussion would be more productive if someone who is supporting the
> > > > > > > EGO_SUM deprecation could rationally summarize the main arguments why we
> > > > > > > deprecated EGO_SUM.
> > > > > > You're requesting the changes. It's on you to read the previous
> > > > > > threads and try to understand. It's not others' responsibilities to
> > > > > > justify the status quo to you, but tl;dr is Manifest files grew to
> > > > > > insane sizes for golang packages with many dependencies, and the
> > > > > > Manifest size is a cost all Gentoo users pay regardless of whether
> > > > > > they use the package.
> > > > > 
> > > > > I am sorry. I did try to understand the reasoning in the previous
> > > > > threads. However, I do not conclude that the "cost" users must pay for
> > > > > EGO_SUM justifies EGO_SUM's deprecation. It is the other way around:
> > > > > EGO_SUM's advantages do not explain its deprecation, even if users
> > > > > have to pay a cost.
> > > > > 
> > > > > You write that the "Manifest sizes grew to insane sizes"?
> > > > > 
> > > > > At which boundary does a package size, the total size of the package's
> > > > > directory, become insane?
> > > > > 
> > > > > Disk space is cheap. Currently, ::gentoo, without metadata, is around
> > > > > 470 MiB. If you add 10 Go packages with a whopping 200 KiB each, then
> > > > > this adds 2 MiB to that. I need someone to explain how this
> > > > > constitutes an issue with disk space. Even if we add 100 Go packages,
> > > > > probably roughly the number of Go packages we have in ::gentoo, then
> > > > > those 20 MiB are not significant. Needless to say that the average
> > > > > size of a Go package is less than the 200 KiB uses in this
> > > > > calculation.
> > > > The numbers you've used here suggest you've missed some of the
> > > > big problematic cases from the past:
> > > > - https://bugs.gentoo.org/833478 (1.1MB manifest)
> > > > - https://bugs.gentoo.org/833477 (1.6MB manifest)
> > > 
> > > Thanks for pointing those bugs out.
> > > 
> > > But please allow me to clarify that I did not miss those "problematic"
> > > cases from the past.
> > 
> > This kind of phrasing is the sort of thing which makes it seem like you
> > don't appreciate/acknowledge others' concerns.
> 
> I am genuinely sorry if my usage of "problematic" made it appear that I 
> do not appreciate the other's concerns. Like most people on this mailing 
> list, I appreciate everyone who cares about Gentoo and raises concerns.
> 
> I do, however, not share the concerns regarding EGO_SUM.
> 
> It is hard to share concerns based on rather abstract reasons—for 
> example, the portrayal of EGO_SUM as unfair.
> 
> It would be easier to share concerns if somebody gave concrete reasons 
> against EGO_SUM. For example, use cases that are no longer possible. Or 
> developers or users who are restricted in their work by EGO_SUM in a 
> relevant way.
> 
> But actual problems that currently speak against the use of EGO_SUM have 
> not surfaced.
> 
> 
> > I said problematic because it was clearly beyond what your worst-case
> > estimates were, i.e. far more than what you were saying would be a
> > large amount for the purposes of calculations.
> 
> Using the term "worst-case", even if I put it in quotes, probably got 
> people on the wrong track. I am sorry for that; my bad. It is, in 
> general, impossible even to approximate the worst-case size-increase of 
> ::gentoo.
> 
> Our best chance is to use historical data to interpolate the future.
> 
> My back-of-the-envolope calculation was 256 Go-packages, with each 
> having 1 MiB. An analysis of the three on 2022-02-16, at the commit 
> right before Minikube and k3s were cleaned, showed that only five 
> packages out of 120 had larger package-directory sizes than one MiB.
> 
> 256 Go-packages is roughly the number of Go-packages we have right now. 
> Assuming they all have a package-directory size of 1.6 MiB, the most 
> extensive EGO_SUM package the analysis yielded so far, we end up with 
> 410 MiB.
> 
> The point you criticize was that a system able to handle the current 
> size of ::gentoo would also be able to manage an additional 256 MiB. The 
> point still stands if we exchange the 256 MiB with 410 MiB.
> 
> Furthermore, both numbers, 256 MiB and 410 MiB, are based on the 
> over-approximation that every EGO_SUM package uses 1.6 MiB, which is 
> almost certainly not the case. The mean package-directory size of a 
> EGO_SUM using package at 2022-02-16 was 280 KiB.
> 

Please extend this analysis to Manifest changes over time, and how they
are going to impact total gentoo.git size.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-08 12:03                   ` Michał Górny
@ 2023-05-22  7:14                     ` Florian Schmaus
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-05-22  7:14 UTC (permalink / raw
  To: gentoo-dev, Michał Górny; +Cc: William Hubbs


[-- Attachment #1.1.1: Type: text/plain, Size: 4565 bytes --]

On 08/05/2023 14.03, Michał Górny wrote:
> On Mon, 2023-05-08 at 09:53 +0200, Florian Schmaus wrote:
>> Furthermore, both numbers, 256 MiB and 410 MiB, are based on the
>> over-approximation that every EGO_SUM package uses 1.6 MiB, which is
>> almost certainly not the case. The mean package-directory size of a
>> EGO_SUM using package at 2022-02-16 was 280 KiB.
> 
> Please extend this analysis to Manifest changes over time, and how they
> are going to impact total gentoo.git size.

Gladly.

The average daily change caused by Manifests of EGO_SUM packages from 
2020-02-16 to 2022-02-16 was at most 80 KiB. (See below for the 
methodology used to obtain this number.)

In other words, a daily syncing user had at most 80 KiB traffic on 
average per day to sync the Manifests of all EGO_SUM that existed on 
2022-02-16.

Even in lesser developed regions of the world, 80 KiB a day are 
manageable. And, this would still be the case if we double, quadruple or 
octuple this number.

I note that this number does not include ebuilds and metadata. However, 
one can easily over-approximate that the additional ebuilds and metadata 
delta, that comes with the observed Manifest changes, is smaller than 
the Manifest changes themselves. Therefore, a pessimistic approximation 
is twice 80 KiB.

But then again, the 80 KiB are not considering transport compression. 
And, as we have learned, Manifests roughly compress to 50% of their 
original size. So the average EGO_SUM-generated network traffic, 
assuming that it is compressed, remains in the region of hundred 
kilobytes per day.

We can also use this number to over-approximate the growth rate of 
gentoo.git due to EGO_SUM.

Assume that 120 EGO_SUM packages cause a daily growth rate of 160 KiB, 
that is 2x 80 KiB and the number we have used above. Doubling this 
number would yield the estimated rate of the current number of Go 
packages in ::gentoo. This rate amounts to 320 KiB daily, increasing 
gentoo.git by 114 MiB per year. Please double this number for a bit of 
future safety.

In summary, this and the previous analysis finds not data-size-based 
arguments against EGO_SUM's usage.

Using EGO_SUM is fine for users and developers. The ::gentoo increase, 
even if it would quadruple the current size, does not entail any issues. 
The expected average daily delta that EGO_SUM would cause today is also 
no threat, even for users with low-bandwidth connections. The size 
increase which EGO_SUM causes to gentoo.git is also within manageable 
bounds. If an ebuild developer has 1-2 gigabytes free on their disk, 
they will not need to buy a larger disk in the coming years if we start 
using EGO_SUM again in ::gentoo.

- Flow


# Appendix: Methodology

We took gentoo.git at 2022-02-16 at the commit 60dc7a03ff2f. From there, 
we created the numstat log (git log --numstat) of each Manifest of every 
EGO_SUM package. We configured the numstat log to go back at most two 
years in time, that is, till 2020-02-16. The numstat log contains the 
changed lines (added/removed) of the Manifest in the target period. An 
awk script calculated the total sum of added and removed lines. Note 
that this treats removed lines equal to added lines, even though the 
removed lines should cause significantly less network traffic. We also 
extracted the date of the oldest commit in the observed period. This 
date was used to calculate the total number of days in the period, which 
accounts for packages that came to life after 2020-02-16 and would 
otherwise skew the analysis towards smaller results.

Dividing the total number of changed lines by the number of days yields 
the average number of lines changed per day per package.

We further determined the worst-observed line length of EGO_SUM packages 
manifests, which was 404 bytes.

Summarizing the average number of lines changed over all packages 
yielded 195.58093724672614. Multiplying this number by the maximal 
observed line length of 404 bytes gives 79014.69 bytes per day or, in 
other words, roughly 80 KiB per day.

The raw and post-processed results of this analysis are available at

https://dev.gentoo.org/~flow/gentoo-tree-analysis-results/2023-05-17T100838-gentoo-at-2022-02-16-60dc7a03ff2f/

The code used to carry out this analysis is available at

https://gitlab.gentoo.org/flow/gentoo-tree-analysis

for everyone to study the code, reproduce the results, and check for 
issues and bugs.

As always, I appreciate any feedback.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-04-24 16:11 ` Florian Schmaus
  2023-04-24 20:28   ` Sam James
@ 2023-05-30 15:52   ` Florian Schmaus
  2023-05-30 16:30     ` Anna (cybertailor) Vyalkova
  2023-05-30 16:35     ` Arthur Zamarin
  1 sibling, 2 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-05-30 15:52 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1.1: Type: text/plain, Size: 2171 bytes --]

On 24/04/2023 18.11, Florian Schmaus wrote:
> I like to ask the Gentoo council to vote on whether EGO_SUM should be 
> reinstated ("un-deprecated") or not.

I am thankful that the council considered my request to vote on the 
topic. However, the council decided not to vote on this in its last 
session and to return the issue to the mailing lists.

Some see the requirement of some limitations as necessity it comes to 
reinstating EGO_SUM. Unfortunately, I could not see specific numbers 
mentioned since June 2022 in the three EGO_SUM threads [1, 2, 3] I am 
aware of.

To prevent harm from Gentoo, we should reach an agreement that everyone 
can live with. To achieve a consensus, and since I can not rule out that 
I missed a post that includes specific numbers, please share your ideas 
on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.

Having EGO_SUM would significantly increase the security of Gentoo's 
users (amongst other benefits).

Personally, I do not see that we currently need any form of limitation 
to reinstate EGO_SUM. I substantiated this with data based on a two-year 
history analysis of gentoo.git. The summary is that the
- size increase of ::gentoo is unproblematic for users
- additional sync delta of ::gentoo is unproblematic for users
- higher rate of gentoo.git's increase is unproblematic for developers
when we reinstate EGO_SUM in ::gentoo.

Therefore, we could (and IMHO should) simply un-deprecate EGO_SUM. 
However, I would review this decision once the number of Go packages has 
doubled or in two years (whatever comes first).

Many share the concerns of an EGO_SUM-less world. I know that some seek 
a compromise by reinstating EGO_SUM with some limitations. The ::gentoo 
repository is able to handle packages (at least) up to the range of 2 to 
1.5 MiB total package-directory size. Therefore I propose a limit in 
that range.

- Flow


1: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg95175.html
2: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg95279.html
3: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg97310.html

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [gentoo-dev] Re: EGO_SUM
  2023-05-30 15:52   ` Florian Schmaus
@ 2023-05-30 16:30     ` Anna (cybertailor) Vyalkova
  2023-05-31  5:02       ` Oskari Pirhonen
  2023-05-30 16:35     ` Arthur Zamarin
  1 sibling, 1 reply; 52+ messages in thread
From: Anna (cybertailor) Vyalkova @ 2023-05-30 16:30 UTC (permalink / raw
  To: gentoo-dev

On 2023-05-30 17:52, Florian Schmaus wrote:
> To prevent harm from Gentoo, we should reach an agreement that everyone 
> can live with. To achieve a consensus, and since I can not rule out that 
> I missed a post that includes specific numbers, please share your ideas 
> on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.

Instate a policy to allow EGO_SUM in the gentoo tree:

1) from proxied maintainers
2) if there are no more than N entries

and disallow use otherwise.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-30 15:52   ` Florian Schmaus
  2023-05-30 16:30     ` Anna (cybertailor) Vyalkova
@ 2023-05-30 16:35     ` Arthur Zamarin
  2023-05-31  6:20       ` Andrew Ammerlaan
                         ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Arthur Zamarin @ 2023-05-30 16:35 UTC (permalink / raw
  To: gentoo-dev, Florian Schmaus


[-- Attachment #1.1: Type: text/plain, Size: 5882 bytes --]

On 30/05/2023 18.52, Florian Schmaus wrote:
> 
> I am thankful that the council considered my request to vote on the
> topic. However, the council decided not to vote on this in its last
> session and to return the issue to the mailing lists.
> 
> Some see the requirement of some limitations as necessity it comes to
> reinstating EGO_SUM. Unfortunately, I could not see specific numbers
> mentioned since June 2022 in the three EGO_SUM threads [1, 2, 3] I am
> aware of.
> 
> To prevent harm from Gentoo, we should reach an agreement that everyone
> can live with. To achieve a consensus, and since I can not rule out that
> I missed a post that includes specific numbers, please share your ideas
> on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.

I still want to ask why in ::gentoo should it be enabled? I'm trying to
understand why? If you speak about overlays, then I agree that it should
be allowed there, but I don't see any benefit to it existence in
::gentoo. My reason for that difference: the existence of gentoo-devs
with access to ~devspace.

Currently the best solution *per package* is to speak with upstream, to
add a CI workflow which create a source tarball which includes `vendor`
dir. This is the best way, and I'm doing that for multiple upstream of
some random Go packages in ::gentoo. But I know the disadvantage -
requirement to speak with upstream, explain why, and add it to the
system. This is best long-run solution, but more hardships.

> Having EGO_SUM would significantly increase the security of Gentoo's
> users (amongst other benefits).

While technically correct, we return to same "confidence" issue in the
dev (a dev can add malicious code into ebuild). Yes, adding malicious
code inside vendor tarball to hide it is easier and robbat2 demonstrated
it as working.

How can we solve it? One weird idea I have is to use vendor tarball
consisting of multiple tarballs per package, and include hash for it
inside the vendor tarball. I think you can compare the manifest stored
in `go.sum` file in source code with the once from the tarball
(verification of that claim needed). As a result I think we can offline
verify it.

> Personally, I do not see that we currently need any form of limitation
> to reinstate EGO_SUM. I substantiated this with data based on a two-year
> history analysis of gentoo.git. The summary is that the
> - size increase of ::gentoo is unproblematic for users
> - additional sync delta of ::gentoo is unproblematic for users
> - higher rate of gentoo.git's increase is unproblematic for developers
> when we reinstate EGO_SUM in ::gentoo.

Why "unproblematic"? Where I leave I have quite high RTT, meaning each
download takes long initial time until fetches with good speed. Fetching
a lot of small files is really bad for me (even from mirror in same
country, sigh). Having big deltas hit hard the git packs, higher load on
a lot of places.

Thinking on infra side, I remember stories of the issues when go.pkg was
doing full `git clone` (not shallow copy) of the whole gentoo.git
repository. Now imagine we allow the huge and frequent deltas of go
modules to run, image how fast we get to huge full repository. Yes, now
we blacklist this stupid failure of go.pkg, but it might happen with
other service. Full git clones aren't that rare.

Also note that Go packages tend to update frequently (because of all the
bundling and security issues). The fact you don't see a lot of updates
in ::gentoo is because many of them are under less active developers
(not to offend anyone, it is fine to skip bumps were a good place, not
my place to criticize!).

Also please remember the issue of scale. Look at the amount of packages
under dev-python. There are a lot of tools written in Go.

> Therefore, we could (and IMHO should) simply un-deprecate EGO_SUM.
> However, I would review this decision once the number of Go packages has
> doubled or in two years (whatever comes first).
> 
> Many share the concerns of an EGO_SUM-less world. I know that some seek
> a compromise by reinstating EGO_SUM with some limitations. The ::gentoo
> repository is able to handle packages (at least) up to the range of 2 to
> 1.5 MiB total package-directory size. Therefore I propose a limit in
> that range.

My solution is as such:

1. Undeprecate EGO_SUM in eclass
2. Forbid it's usage in ::gentoo (done by pkgcheck, error level, will
fail CI and as such we can see the misuse). Overlays are allowed.
3. Maintainer starts talks with upstreams to add release workflow to
create vendored source tarball, in hopes of it succeeding. "Start early,
to future profit". I see this flow similar to the "always try to
upstream patches".
4. Until upstream adds it, in ::gentoo use vendor tarballs.

I also think many devs agree with this solution, but I can't talk for
them, so I'll be happy agreeing devs can at least reply shortly their
agreement or disagreement.

> - Flow
> 
> 
> 1: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg95175.html
> 2: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg95279.html
> 3: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg97310.html

I must say this conversation around EGO_SUM makes me a little sad the
long time it takes, and sometimes it feels like it derails to bad
directions (I mean less helpful once) too often. I think we should go to
the way Flow - suggest concrete action items (something easier for
Council / all devs to vote).

Also sorry this mail is a little jumping all over, it is quite hard for
me to write long mails in English, so if paragraphs are less coherent,
I'll be happy to explain them more :)

-- 
Arthur Zamarin
arthurzam@gentoo.org
Gentoo Linux developer (Python, pkgcore stack, Arch Teams, GURU)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-30 16:30     ` Anna (cybertailor) Vyalkova
@ 2023-05-31  5:02       ` Oskari Pirhonen
  0 siblings, 0 replies; 52+ messages in thread
From: Oskari Pirhonen @ 2023-05-31  5:02 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

On Tue, May 30, 2023 at 21:30:49 +0500, Anna (cybertailor) Vyalkova wrote:
> On 2023-05-30 17:52, Florian Schmaus wrote:
> > To prevent harm from Gentoo, we should reach an agreement that everyone 
> > can live with. To achieve a consensus, and since I can not rule out that 
> > I missed a post that includes specific numbers, please share your ideas 
> > on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.
> 
> Instate a policy to allow EGO_SUM in the gentoo tree:
> 
> 1) from proxied maintainers

I agree that allowing EGO_SUM in ::gentoo at least for proxy maintained
packages would be a good idea. I don't have any Go packages, but I can
see how it could be cumbersome to get a tarball hosted somewhere.

- Oskari

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-30 16:35     ` Arthur Zamarin
@ 2023-05-31  6:20       ` Andrew Ammerlaan
  2023-05-31  8:40         ` Ryan Qian
  2023-05-31  9:06         ` Arsen Arsenović
  2023-05-31  6:30       ` pascal.jaeger leimstift.de
  2023-06-02  8:17       ` Florian Schmaus
  2 siblings, 2 replies; 52+ messages in thread
From: Andrew Ammerlaan @ 2023-05-31  6:20 UTC (permalink / raw
  To: gentoo-dev

On 30/05/2023 18:35, Arthur Zamarin wrote:
> My solution is as such:
> 
> 1. Undeprecate EGO_SUM in eclass
> 2. Forbid it's usage in ::gentoo (done by pkgcheck, error level, will
> fail CI and as such we can see the misuse). Overlays are allowed.
> 3. Maintainer starts talks with upstreams to add release workflow to
> create vendored source tarball, in hopes of it succeeding. "Start early,
> to future profit". I see this flow similar to the "always try to
> upstream patches".
> 4. Until upstream adds it, in ::gentoo use vendor tarballs.
> 
> I also think many devs agree with this solution, but I can't talk for
> them, so I'll be happy agreeing devs can at least reply shortly their
> agreement or disagreement.

I fully agree with Arthur

With regards to proxy-maintained packages: The proxy can generate and 
upload the vendor tarball for the proxied, this is not that much extra 
work.

Best regards,
Andrew



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-30 16:35     ` Arthur Zamarin
  2023-05-31  6:20       ` Andrew Ammerlaan
@ 2023-05-31  6:30       ` pascal.jaeger leimstift.de
  2023-06-01  4:00         ` William Hubbs
  2023-06-02  8:17       ` Florian Schmaus
  2 siblings, 1 reply; 52+ messages in thread
From: pascal.jaeger leimstift.de @ 2023-05-31  6:30 UTC (permalink / raw
  To: gentoo-dev


> Arthur Zamarin <arthurzam@gentoo.org> hat am 30.05.2023 18:35 CEST geschrieben:
> 
> 
> Currently the best solution *per package* is to speak with upstream, to
> add a CI workflow which create a source tarball which includes `vendor`
> dir. This is the best way, and I'm doing that for multiple upstream of
> some random Go packages in ::gentoo. But I know the disadvantage -
> requirement to speak with upstream, explain why, and add it to the
> system. This is best long-run solution, but more hardships.
> 

I would like to add to this, that even if upstream is not willing to do this, devs could automate the creation of vendor tarballs using GitHub actions. I only did this for an upstream repositories that are also on GitHub and for projects written in Rust. Initially I did this for complicated Rust projects with several git submodules and submodules of submodules. But with a little tweaking of the GitHub actions I think it would be possible to use it for Go as well.  
https://wiki.gentoo.org/wiki/User:Schievel/autocreate_rust_sources

This is additional initial work, but once you set it up, you don't even have the extra work of creating a new EGO_SUM for every package release. Ideally you just have to change the version in the file name of the ebuild to bump a package.

Security wise I do not see a difference between this and creating the vendor tarball manually and uploading it to GitHub, as many proxy maintainers without devspace do it. 

Regards
Pascal


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-31  6:20       ` Andrew Ammerlaan
@ 2023-05-31  8:40         ` Ryan Qian
  2023-05-31  9:06         ` Arsen Arsenović
  1 sibling, 0 replies; 52+ messages in thread
From: Ryan Qian @ 2023-05-31  8:40 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

Just FYI, here is a working GitHub action for generating vendor tarballs in the same repo but with different branches https://github.com/bekcpear/gopkg-vendors/blob/main/.github/workflows/make-vendor.yaml
It has already worked for a long time.

Sincerely.
Ryan

> 在 2023年5月31日,14:20,Andrew Ammerlaan <andrewammerlaan@gentoo.org> 写道:
> 
> On 30/05/2023 18:35, Arthur Zamarin wrote:
>> My solution is as such:
>> 1. Undeprecate EGO_SUM in eclass
>> 2. Forbid it's usage in ::gentoo (done by pkgcheck, error level, will
>> fail CI and as such we can see the misuse). Overlays are allowed.
>> 3. Maintainer starts talks with upstreams to add release workflow to
>> create vendored source tarball, in hopes of it succeeding. "Start early,
>> to future profit". I see this flow similar to the "always try to
>> upstream patches".
>> 4. Until upstream adds it, in ::gentoo use vendor tarballs.
>> I also think many devs agree with this solution, but I can't talk for
>> them, so I'll be happy agreeing devs can at least reply shortly their
>> agreement or disagreement.
> 
> I fully agree with Arthur
> 
> With regards to proxy-maintained packages: The proxy can generate and upload the vendor tarball for the proxied, this is not that much extra work.
> 
> Best regards,
> Andrew
> 
> 
> 

[-- Attachment #2: Type: text/html, Size: 2458 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-31  6:20       ` Andrew Ammerlaan
  2023-05-31  8:40         ` Ryan Qian
@ 2023-05-31  9:06         ` Arsen Arsenović
  1 sibling, 0 replies; 52+ messages in thread
From: Arsen Arsenović @ 2023-05-31  9:06 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]


Andrew Ammerlaan <andrewammerlaan@gentoo.org> writes:

> On 30/05/2023 18:35, Arthur Zamarin wrote:
>> My solution is as such:
>> 1. Undeprecate EGO_SUM in eclass
>> 2. Forbid it's usage in ::gentoo (done by pkgcheck, error level, will
>> fail CI and as such we can see the misuse). Overlays are allowed.
>> 3. Maintainer starts talks with upstreams to add release workflow to
>> create vendored source tarball, in hopes of it succeeding. "Start early,
>> to future profit". I see this flow similar to the "always try to
>> upstream patches".
>> 4. Until upstream adds it, in ::gentoo use vendor tarballs.
>> I also think many devs agree with this solution, but I can't talk for
>> them, so I'll be happy agreeing devs can at least reply shortly their
>> agreement or disagreement.
>
> I fully agree with Arthur

+1

> With regards to proxy-maintained packages: The proxy can generate and upload
> the vendor tarball for the proxied, this is not that much extra work.

This expands the required trust in proxy maintainers, in a way which is
unusually easy to double check.

We can automate generating vendor tarballs (or more).  If implemented
such that tarballs are reproducible, it should be easy to verify by
running the same procedure from a different host and verifying.

There would still be a slight cost to an initial 'whitelist package'
step or such, but IMO, that's not a very large cost.  (and, also,
possibly some other mechanism could be implemented)

> Best regards,
> Andrew


-- 
Arsen Arsenović

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 381 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-31  6:30       ` pascal.jaeger leimstift.de
@ 2023-06-01  4:00         ` William Hubbs
  0 siblings, 0 replies; 52+ messages in thread
From: William Hubbs @ 2023-06-01  4:00 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1: Type: text/plain, Size: 2043 bytes --]

On Wed, May 31, 2023 at 08:30:58AM +0200, pascal.jaeger leimstift.de wrote:
> 
> > Arthur Zamarin <arthurzam@gentoo.org> hat am 30.05.2023 18:35 CEST geschrieben:
> > 
> > 
> > Currently the best solution *per package* is to speak with upstream, to
> > add a CI workflow which create a source tarball which includes `vendor`
> > dir. This is the best way, and I'm doing that for multiple upstream of
> > some random Go packages in ::gentoo. But I know the disadvantage -
> > requirement to speak with upstream, explain why, and add it to the
> > system. This is best long-run solution, but more hardships.
> > 
> 
> I would like to add to this, that even if upstream is not willing to do this, devs could automate the creation of vendor tarballs using GitHub actions. I only did this for an upstream repositories that are also on GitHub and for projects written in Rust. Initially I did this for complicated Rust projects with several git submodules and submodules of submodules. But with a little tweaking of the GitHub actions I think it would be possible to use it for Go as well.  
> https://wiki.gentoo.org/wiki/User:Schievel/autocreate_rust_sources
> 
> This is additional initial work, but once you set it up, you don't even have the extra work of creating a new EGO_SUM for every package release. Ideally you just have to change the version in the file name of the ebuild to bump a package.
> 
> Security wise I do not see a difference between this and creating the vendor tarball manually and uploading it to GitHub, as many proxy maintainers without devspace do it. 

Can we please avoid vendor tarballs? there are situations, say when a
dependency includes non-go code, when vendor tarballs do not work.
That is why I went with the dependency tarballs.

I haven't written github actions, but here is the script I use to create
them, partly thanks to Sam for this.

This is stored in my ~/bin directory and I run it from the top level of
a go project which does not have a "vendor" directory.

William

[-- Attachment #1.2: dep-tarball --]
[-- Type: text/plain, Size: 234 bytes --]

#!/bin/bash

if [[ -z $1 ]]; then
printf "no tarball name specified\n" >&2
return 1
fi

GOMODCACHE=${PWD}/go-mod go mod download -modcacherw
XZ_OPT='-T0 -9' \
tar --owner 0 --group 0 --posix -acf ${1}-deps.tar.xz go-mod
rm -fr go-mod

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-04-17  7:37 [gentoo-dev] EGO_SUM Florian Schmaus
  2023-04-17  9:28 ` [gentoo-dev] EGO_SUM Anna (cybertailor) Vyalkova
  2023-04-24 16:11 ` Florian Schmaus
@ 2023-06-01 19:55 ` William Hubbs
  2023-06-02  7:13   ` Joonas Niilola
  2023-06-09 10:07   ` Florian Schmaus
  2 siblings, 2 replies; 52+ messages in thread
From: William Hubbs @ 2023-06-01 19:55 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3947 bytes --]

I know I'm pretty late to this thread, but I'm going to respond to some
of the concerns and suggest another alternative.

On Mon, Apr 17, 2023 at 09:37:32AM +0200, Florian Schmaus wrote:
> I want to continue the discussion to re-instate EGO_SUM, potentially 
> leading to a democratic vote on whether EGO_SUM should be re-instated or 
> deprecated.
> 
> For the past months, I tried to find *technical reasons*, e.g., reasons 
> that affect end-users, that justify the deprecation of EGO_SUM. However, 
> I was unable to find any. The closest thing I could find was portage 
> being unable to process an ebuild due to its large environment (bug 
> 830187). However, as this happens while developing an ebuild, it should 
> never affect users. Obviously this is a situation where EGO_SUM can not 
> be used. Fortunately, it does not affect most Go packages, as seen in my 
> previous analysis of Go packages in ::gentoo and their EGO_SUM size. 
> Furthermore, newer portage versions, with USE=gentoo-dev, will 
> proactively warn you if the environment caused by the ebuild becomes large.
> 
> All further arguments for the deprecation of EGO_SUM where of cosmetic 
> nature.
> 
> However, the deprecation of EGO_SUM is harmful to Gentoo and its users. 
> To briefly re-iterate the reasons:
> 
> The EGO_SUM alternatives
> - do not have the same level of trust and therefore have a negative 
> impact on security (a dubious tarball someone put somewhere, especially 
> when proxy-maint)

For this, I would argue that vetting the tarball falls to the developer
who is proxying. If you don't trust the proxy maintainer you
are pushing for, it is easy to make a dependency tarball yourself and
add it to your dev space.

> - are not easily verifiable

I don't have a response to this other than to say that go does its
own verification of modules with the dependency tarballs that it can't
do with vendor tarballs.

> - require additional effort when developing ebuilds

This "additional effort" is pretty subjective. Making a dependency tarball
isn't a lot of work, especially with the script that I posted in this thread.

> - hinder the packaging and Gentoo's adoption of Go-based projects, which 
> is worrisome as Go is very popular

I don't have a response here. I don't see it as much of a henderance
(this is obviously subjective).

> - prevent Go modules from being shared as DISTFILES on the mirrors 
> across various packages
 
 The issue here is really the duplicate data in the dependency or vendor
 tarballs, and yes, there is a lot of it.

> Last but not least, we have the same situation in the Rust ecosystem, 
> but we allow the EGO_SUM "equivalent" there.

I'm not sure it is quite the same because Rust projects tend to have
much smaller numbers of dependencies.


Another thing to consider is that using EGO_SUM adds a significant
amount of processing to the go-module eclass.
I was advised recently that this isn't a good idea since bash is
slow, so I am considering moving most of that processing into
get-ego-vendor by having it generate the contents of SRC_URI directly
instead of using the eclass code to do that.

My thought is to have get-ego-vendor output the value for a variable,
GO_SRC_URI and add that to SRC_URI in the ebuild like so:

# The output from get-ego-vendor:
GO_SRC_URI="
	# dependency 1
	# dependency 2
	"

SRC_URI="https://main-project-here
	${GO_SRC_URI}"

This should speed things up some since most of the processing we are
doing in the eclass would be removed, so I would rather not see the council
force the use of EGO_SUM. This, however, is still going to hit the
limitation of bug 830187.

I am, however, open to another solution, so I will keep following this
thread.

I think the better question should be around what we can do to get bug 721088 or
bug 833567 to move forward.

Thanks,

William


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-06-01 19:55 ` [gentoo-dev] EGO_SUM William Hubbs
@ 2023-06-02  7:13   ` Joonas Niilola
  2023-06-02 18:06     ` William Hubbs
  2023-06-09 10:07   ` Florian Schmaus
  1 sibling, 1 reply; 52+ messages in thread
From: Joonas Niilola @ 2023-06-02  7:13 UTC (permalink / raw
  To: gentoo-dev, williamh


[-- Attachment #1.1: Type: text/plain, Size: 1325 bytes --]

On 1.6.2023 22.55, William Hubbs wrote:
>>
>> The EGO_SUM alternatives
>> - do not have the same level of trust and therefore have a negative 
>> impact on security (a dubious tarball someone put somewhere, especially 
>> when proxy-maint)
> 
> For this, I would argue that vetting the tarball falls to the developer
> who is proxying. If you don't trust the proxy maintainer you
> are pushing for, it is easy to make a dependency tarball yourself and
> add it to your dev space.
> 
> 
>> - require additional effort when developing ebuilds
> 
> This "additional effort" is pretty subjective. Making a dependency tarball
> isn't a lot of work, especially with the script that I posted in this thread.
> 

In theory it's "easy", but in practice how'd you work? This would be
fine when a single developer is proxying a single maintainer, but when a
a stack of devs (project) are proxying hundreds of different people, it
becomes messy and unsustainable rather fast.

I do want to point out that any proxied maintainer can and should upload
the vendor tarballs to their own Github / Gitlab distfile-repos for the
time being, but allowing EGO_SUM to be used again would be the easiest
solution here in my opinion for everyone involved. I'm aware it's pushed
back due to technicalities.

-- juippis

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-05-30 16:35     ` Arthur Zamarin
  2023-05-31  6:20       ` Andrew Ammerlaan
  2023-05-31  6:30       ` pascal.jaeger leimstift.de
@ 2023-06-02  8:17       ` Florian Schmaus
  2023-06-02  8:31         ` Michał Górny
  2 siblings, 1 reply; 52+ messages in thread
From: Florian Schmaus @ 2023-06-02  8:17 UTC (permalink / raw
  To: Arthur Zamarin, gentoo-dev


[-- Attachment #1.1.1: Type: text/plain, Size: 4049 bytes --]

Hi Arthur,

thanks for your mail.

On 30/05/2023 18.35, Arthur Zamarin wrote:
> On 30/05/2023 18.52, Florian Schmaus wrote:
>> To prevent harm from Gentoo, we should reach an agreement that everyone
>> can live with. To achieve a consensus, and since I can not rule out that
>> I missed a post that includes specific numbers, please share your ideas
>> on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.
> 
> I still want to ask why in ::gentoo should it be enabled? I'm trying to
> understand why? 

In short: Auditability

Let me try to explain with a simplified example.

Gentoo's ebuilds contain the instructions to transform source code 
(input) via a compilation process (transformation) into a binary image 
(output).

A pseudo-example ebuild may contain the following

foo-1.0.ebuild:
```
# Input
SRC_URI="https://foo-soft.org/foo/1.0/foo-1.0.tar.gz"

# Transformation
src_compile() {
     emake foo
}

# Output into imagedir $D
src_install() {
     emake DESTDIR="${D}" foo-install
}
```

A Gentoo developer, Gentoo user, or, anyone can look at the ebuild and 
immediately tell that it will likely not inject malicious code into the 
resulting binary image. Furthermore, the only input is from upstream, 
and while you may not look at every line of source code, you assign a 
certain trust level to upstream and probably assume that the input is 
also likely non-malicious.

That changes fundamentally with dependency tarballs. Now you have

foo-1.0.ebuild:
```
# Input
SRC_URI="
     https://foo-soft.org/foo/1.0/foo-1.0.tar.gz
     https://some-random.dude/on/the/internet/foo-1.0-deps.tar.gz
"

# Transformation
src_compile() {
     emake foo
}

# Output into imagedir $D
src_install() {
     emake DESTDIR="${D}" foo-install
}
```

Now you need to look into foo-1.0-deps.tar.gz if you want the keep the 
level of trust as before. And here, "look into foo-1.0-deps.tar.gz" 
means to ideally apply the same steps the creator of the tarball 
supposedly did and compare your foo-1.0-deps.tar.gz tarball with the one 
from the ebuild. To make matters worse, you can not simply compare the 
two tarballs bytewise, but you have to compare the archives for 
structural identity.

In the case of ::gentoo, this is especially problematic for 
proxy-maintained packages. See 
https://github.com/gentoo/gentoo/pull/27050 for an actual example.

Assuming that every developer will accurately audit the non-upstream 
inputs, a proxied maintainer provides, creates considerable wiggle room 
for a highly security-sensitive matter. And even if we would establish a 
firm policy, we still would need the tools to verify the non-upstream 
inputs (which we do not have currently). Furthermore, Gentoo lacks 
manpower, not only in the proxy-maint project, and verifying 
non-upstream inputs introduces additional effort maintaining ::gentoo.

Last but not least, this also affects non-proxied packages in ::gentoo.

Even if every one of my fellow Gentoo developers is trustworthy, the 
fact that most ebuilds are easily auditable by simply looking at them is 
a huge advantage. Of course, some ebuilds pull in a lot of third-party 
patches (Xen, for example), which makes it hard to verify those. But not 
having EGO_SUM means that *all Go-packages* are immediately more 
challenging to verify because of the non-upstream input that the 
dependency tarball presents. Regardless if a Gentoo developer created 
the tarball or not.


> Also please remember the issue of scale. Look at the amount of packages
> under dev-python. There are a lot of tools written in Go.

We currently have around 250 Go-packages in ::gentoo and dev-python/* 
alone contains 1600 packages. So the package-count numbers of the two 
programming languages are not yet comparable. But note that I suggested 
to review the EGO_SUM policy once the number of Go packages has doubled 
or in two years (whatever comes first) in my previous mail.

- Flow


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-06-02  8:17       ` Florian Schmaus
@ 2023-06-02  8:31         ` Michał Górny
  2023-06-09 10:07           ` Florian Schmaus
  0 siblings, 1 reply; 52+ messages in thread
From: Michał Górny @ 2023-06-02  8:31 UTC (permalink / raw
  To: gentoo-dev, Arthur Zamarin

On Fri, 2023-06-02 at 10:17 +0200, Florian Schmaus wrote:
> On 30/05/2023 18.35, Arthur Zamarin wrote:
> > On 30/05/2023 18.52, Florian Schmaus wrote:
> > > To prevent harm from Gentoo, we should reach an agreement that everyone
> > > can live with. To achieve a consensus, and since I can not rule out that
> > > I missed a post that includes specific numbers, please share your ideas
> > > on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.
> > 
> > I still want to ask why in ::gentoo should it be enabled? I'm trying to
> > understand why? 
> 
> In short: Auditability
> 
> Let me try to explain with a simplified example.
> 
> Gentoo's ebuilds contain the instructions to transform source code 
> (input) via a compilation process (transformation) into a binary image 
> (output).
> 
> A pseudo-example ebuild may contain the following
> 
> foo-1.0.ebuild:
> ```
> # Input
> SRC_URI="https://foo-soft.org/foo/1.0/foo-1.0.tar.gz"
> 
> # Transformation
> src_compile() {
>      emake foo
> }
> 
> # Output into imagedir $D
> src_install() {
>      emake DESTDIR="${D}" foo-install
> }
> ```
> 
> A Gentoo developer, Gentoo user, or, anyone can look at the ebuild and 
> immediately tell that it will likely not inject malicious code into the 
> resulting binary image. Furthermore, the only input is from upstream, 
> and while you may not look at every line of source code, you assign a 
> certain trust level to upstream and probably assume that the input is 
> also likely non-malicious.
> 

This reasoning is seriously flawed.  A "typical" EGO_SUM ebuilds
contains dozens to hundreds of different packages from dozens of
different authors.  You can't seriously expect anyone to be able to
reasonably establish trust to all of them.

In the end, gentoo.git security model is entirely reliant
on the developer verifying the final product and signing on it. 
Everything else is untrustworthy noise.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-06-02  7:13   ` Joonas Niilola
@ 2023-06-02 18:06     ` William Hubbs
  2023-06-02 18:42       ` Joonas Niilola
  0 siblings, 1 reply; 52+ messages in thread
From: William Hubbs @ 2023-06-02 18:06 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

On Fri, Jun 02, 2023 at 10:13:55AM +0300, Joonas Niilola wrote:
> On 1.6.2023 22.55, William Hubbs wrote:
> >>
> >> The EGO_SUM alternatives
> >> - do not have the same level of trust and therefore have a negative 
> >> impact on security (a dubious tarball someone put somewhere, especially 
> >> when proxy-maint)
> > 
> > For this, I would argue that vetting the tarball falls to the developer
> > who is proxying. If you don't trust the proxy maintainer you
> > are pushing for, it is easy to make a dependency tarball yourself and
> > add it to your dev space.
> > 
> > 
> >> - require additional effort when developing ebuilds
> > 
> > This "additional effort" is pretty subjective. Making a dependency tarball
> > isn't a lot of work, especially with the script that I posted in this thread.
> > 
> 
> In theory it's "easy", but in practice how'd you work? This would be
> fine when a single developer is proxying a single maintainer, but when a
> a stack of devs (project) are proxying hundreds of different people, it
> becomes messy and unsustainable rather fast.
 
 This comment is completely off topic for this thread, so start another
 thread for it if you want, but if hundreds of people are being proxied
 by proxy-maint, that seems to be a concern unrelated to this. It seems
 the fix for that is to advocate for some of these hundreds of people to
 become developers so they don't have to be proxied any more.

> I do want to point out that any proxied maintainer can and should upload
> the vendor tarballs to their own Github / Gitlab distfile-repos for the
> time being, but allowing EGO_SUM to be used again would be the easiest
> solution here in my opinion for everyone involved. I'm aware it's pushed
> back due to technicalities.

Like I said at another point in the thread, I want to get rid of EGO_SUM
by moving most of the processing for it out of the eclass. I'm looking
into that now. This will still run into the same problem as EGO_SUM if
$A is still exported, but it should speed up ebuild processing.

William

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-06-02 18:06     ` William Hubbs
@ 2023-06-02 18:42       ` Joonas Niilola
  0 siblings, 0 replies; 52+ messages in thread
From: Joonas Niilola @ 2023-06-02 18:42 UTC (permalink / raw
  To: gentoo-dev, williamh


[-- Attachment #1.1: Type: text/plain, Size: 1239 bytes --]

On 2.6.2023 21.06, William Hubbs wrote:
>>
>> In theory it's "easy", but in practice how'd you work? This would be
>> fine when a single developer is proxying a single maintainer, but when a
>> a stack of devs (project) are proxying hundreds of different people, it
>> becomes messy and unsustainable rather fast.
>  
>  This comment is completely off topic for this thread, so start another
>  thread for it if you want, but if hundreds of people are being proxied
>  by proxy-maint, that seems to be a concern unrelated to this. It seems
>  the fix for that is to advocate for some of these hundreds of people to
>  become developers so they don't have to be proxied any more.
> 

How is it offtopic when I'm answering concerns you raised?

Imagine there are tens of people who do 4 commits a year, roughly, to
bump random go packages. What do you believe is the time investment for
reviewing, testing and committing their contributions, vs. mentoring
them to become devs if they don't involve themselves much outside
bumping these packages? Also, will _you_ volunteer to mentor them?

It's so easy to push more work for others to do. Sorry if I come out
harsh but this is reality, not just theory.

-- juippis

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 614 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] Re: EGO_SUM
  2023-06-02  8:31         ` Michał Górny
@ 2023-06-09 10:07           ` Florian Schmaus
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-06-09 10:07 UTC (permalink / raw
  To: gentoo-dev, Arthur Zamarin, Michał Górny


[-- Attachment #1.1.1: Type: text/plain, Size: 2578 bytes --]

On 02/06/2023 10.31, Michał Górny wrote:
> On Fri, 2023-06-02 at 10:17 +0200, Florian Schmaus wrote:
>> On 30/05/2023 18.35, Arthur Zamarin wrote:
>>> On 30/05/2023 18.52, Florian Schmaus wrote:
>>>> To prevent harm from Gentoo, we should reach an agreement that everyone
>>>> can live with. To achieve a consensus, and since I can not rule out that
>>>> I missed a post that includes specific numbers, please share your ideas
>>>> on how EGO_SUM could be reinstated in ::gentoo by replying to this mail.
>>>
>>> I still want to ask why in ::gentoo should it be enabled? I'm trying to
>>> understand why?
>>
>> In short: Auditability
>> […]
>> A Gentoo developer, Gentoo user, or, anyone can look at the ebuild and
>> immediately tell that it will likely not inject malicious code into the
>> resulting binary image. Furthermore, the only input is from upstream,
>> and while you may not look at every line of source code, you assign a
>> certain trust level to upstream and probably assume that the input is
>> also likely non-malicious.
>>
> 
> This reasoning is seriously flawed.  A "typical" EGO_SUM ebuilds
> contains dozens to hundreds of different packages from dozens of
> different authors.  You can't seriously expect anyone to be able to
> reasonably establish trust to all of them.

I am sorry. I was unable to get my point across.

The security impact is unrelated to what you describe. You always have a 
certain degree of trust in upstream. Regardless if upstream is consumed 
by 100 Gentoo packages or if there are 100 entries in EGO_SUM.

The point was and is about *non-upstream input* in the ebuild. While 
EGO_SUM fetches its artifacts from upstream, a dependency tarball does 
typically not originate from upstream.

Even if we would not trust EGO_SUM upstream, consuming inputs via 
EGO_SUM would still be better from a security perspective because 
EGO_SUM upstream is consumed by Gentoo and all of Go's ecosystem. Hence, 
if something gets compromised, it will likely be detected quickly. 
Compared to dependency tarballs, which are usually only consumed by Gentoo.


 > In the end, gentoo.git security model is entirely reliant
 > on the developer verifying the final product and signing on it.
 > Everything else is untrustworthy noise.

How do you verify the output, that is, final product? This is hard, for 
example, reproducible builds are far from trivial to achieve.

On the other hand, ensuring that the input matches what upstream 
provides and expects is far more manageable.

- Flow

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-06-01 19:55 ` [gentoo-dev] EGO_SUM William Hubbs
  2023-06-02  7:13   ` Joonas Niilola
@ 2023-06-09 10:07   ` Florian Schmaus
  1 sibling, 0 replies; 52+ messages in thread
From: Florian Schmaus @ 2023-06-09 10:07 UTC (permalink / raw
  To: gentoo-dev

On 01/06/2023 21.55, William Hubbs wrote:
>> The EGO_SUM alternatives
>> - do not have the same level of trust and therefore have a negative
>> impact on security (a dubious tarball someone put somewhere, especially
>> when proxy-maint)
> 
> For this, I would argue that vetting the tarball falls to the developer
> who is proxying. If you don't trust the proxy maintainer you
> are pushing for, it is easy to make a dependency tarball yourself and
> add it to your dev space.
> 
>> - are not easily verifiable
> 
> I don't have a response to this other than to say that go does its
> own verification of modules with the dependency tarballs that it can't
> do with vendor tarballs.

Yes, go has "go mod verify", which was added to the go-mod eclass after 
I asked on 2022-10-21 in #gentoo-dev if the eclass verifies the 
dependency tarball. robbat2 was so kind to provide a proof of concept of 
the security issue I was pointing out, which is available under 
https://gist.github.com/robbat2/82f4c208b6674e707081eda689096d55. This 
demonstration of the issue triggered 
https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=733b4944c1a061269f96219cc96530f89d8f439e, 
which made the go-module.eclass run "go mod verify".

Unfortunately, a malicious contributor can trivially sidestep this 
verification step, rendering it ineffective. First, neither portage [1] 
nor PMS require that a later (source) archive can not override an 
existing file. This looseness allows, for example, the (non-upstream) 
dependency tarball, to override (upstream's) go.sum. Secondly, a 
dependency tarball could create the vendor/ directory, preventing the 
condition under which the go-module.eclass runs "go mod verify". Both 
approaches allow the dependency tarball to inject malicious code. With 
the first approach, "go mod verify" completes successfully; with the 
second, "go mod verify" is simply not invoked.

The verification, as is, is ineffective.


>> Last but not least, we have the same situation in the Rust ecosystem,
>> but we allow the EGO_SUM "equivalent" there.
> 
> I'm not sure it is quite the same because Rust projects tend to have
> much smaller numbers of dependencies.

I am curious to know of any specific reason why Rust projects generally 
get by with fewer dependencies. This impression may be deceiving, caused 
by the fact that the Go-lang ecosystem hosts several projects with a 
more significant number of dependencies. If you look at the analysis 
[2], you find that under the top 10 Go packages by EGO_SUM entry count 
are cri-o, prometheus, k3s, and k3d, among others. If someone rewrites 
any of those in Rust, they would probably end up with the same number of 
dependencies.


> Another thing to consider is that using EGO_SUM adds a significant
> amount of processing to the go-module eclass.
> I was advised recently that this isn't a good idea since bash is
> slow, so I am considering moving most of that processing into
> get-ego-vendor by having it generate the contents of SRC_URI directly
> instead of using the eclass code to do that.

Was this analyzed and quantified? Is this hurting us? The cache 
regeneration of an ebuild tree is an embarrassingly parallel operation, 
so this would need to be exponentially complex [3] to be of any 
significance.

It may be possible to tune the existing EGO_SUM handling. We should keep 
EGO_SUM if viable, as it directly maps Go's go.sum and makes developing 
Go-lang ebuilds as frictionless as possible.

- Flow



1: https://github.com/gentoo/portage/pull/1030
2: 
https://dev.gentoo.org/~flow/gentoo-tree-analysis-results/2023-05-17T100838-gentoo-at-2022-02-16-60dc7a03ff2f/post-processed-ego-sum.txt
3: something similar to what was recently found in the latex ebuilds, 
see 
https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6ee282f0645dcfccf1836b9cc7ae55556629eb8b


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-07-03 10:17           ` Florian Schmaus
@ 2023-07-03 11:12             ` Ulrich Mueller
  0 siblings, 0 replies; 52+ messages in thread
From: Ulrich Mueller @ 2023-07-03 11:12 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev, Sam James

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

>>>>> On Mon, 03 Jul 2023, Florian Schmaus wrote:

> So pkgcheck counting EGO_SUM entries would be sufficient for the
> purpose of having a static check that notices if the ebuild would
> likely run into the environment limit?

> To find a common compromise, I would possibly invest my time in
> developing such a test. Even though I do not deem such a check a
> strict prerequisite to reintroduce EGO_SUM.

The so-called "environment limit" is 32 pages, i.e. normally 128 KiB.
With the A variable anywhere near this, the size of the Manifest file
would be close to 1 MiB.

IMHO this is way too large to be used on a regular basis. I am aware
that we have some packages with large Manifests (71 packages above
50 KiB, 6 packages above 200 KiB, out of 18812 packages in total),
but these should really remain the exception.

Ulrich

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-07-03 11:13 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-17  7:37 [gentoo-dev] EGO_SUM Florian Schmaus
2023-04-17  9:28 ` [gentoo-dev] EGO_SUM Anna (cybertailor) Vyalkova
2023-04-27 18:00   ` William Hubbs
2023-04-27 18:18     ` David Seifert
2023-04-24 16:11 ` Florian Schmaus
2023-04-24 20:28   ` Sam James
2023-04-24 22:52     ` Alexey Zapparov
2023-04-26 15:31     ` Florian Schmaus
2023-04-26 16:12       ` Matt Turner
2023-04-26 19:31         ` Andrew Ammerlaan
2023-04-26 19:38           ` Chris Pritchard
2023-04-26 20:47           ` Matt Turner
2023-04-27  7:58         ` Florian Schmaus
2023-04-27  9:24           ` Ulrich Mueller
2023-04-28  6:59             ` Florian Schmaus
2023-04-27 12:54           ` Michał Górny
2023-04-27 23:12             ` Pascal Jäger
2023-04-28  0:38               ` Sam James
2023-04-28  4:27                 ` Michał Górny
2023-04-28  5:31                   ` Sam James
2023-04-28  6:59             ` Florian Schmaus
2023-04-28 14:34               ` Michał Górny
2023-05-02 19:32                 ` Florian Schmaus
2023-05-02 19:38                   ` Sam James
2023-04-29 22:34               ` Robin H. Johnson
2023-04-27 21:16           ` Sam James
2023-05-02 19:32             ` Florian Schmaus
2023-05-02 19:45               ` Sam James
2023-05-08  7:53                 ` Florian Schmaus
2023-05-08 12:03                   ` Michał Górny
2023-05-22  7:14                     ` Florian Schmaus
2023-05-02 20:04               ` Matt Turner
2023-05-08  7:53                 ` Florian Schmaus
2023-04-26 20:51       ` Sam James
2023-05-30 15:52   ` Florian Schmaus
2023-05-30 16:30     ` Anna (cybertailor) Vyalkova
2023-05-31  5:02       ` Oskari Pirhonen
2023-05-30 16:35     ` Arthur Zamarin
2023-05-31  6:20       ` Andrew Ammerlaan
2023-05-31  8:40         ` Ryan Qian
2023-05-31  9:06         ` Arsen Arsenović
2023-05-31  6:30       ` pascal.jaeger leimstift.de
2023-06-01  4:00         ` William Hubbs
2023-06-02  8:17       ` Florian Schmaus
2023-06-02  8:31         ` Michał Górny
2023-06-09 10:07           ` Florian Schmaus
2023-06-01 19:55 ` [gentoo-dev] EGO_SUM William Hubbs
2023-06-02  7:13   ` Joonas Niilola
2023-06-02 18:06     ` William Hubbs
2023-06-02 18:42       ` Joonas Niilola
2023-06-09 10:07   ` Florian Schmaus
     [not found] <2ZKWN4KF.MKEFFMWE.LGPKYP47@RTL7EJXF.RN4PF6UF.MDFBGF3C>
     [not found] ` <be450641-94ff-a0d9-51da-3a7a3abcc6c7@gentoo.org>
     [not found]   ` <b7309a3f-2980-b390-a16a-0518cce1da75@gentoo.org>
     [not found]     ` <87y1k33aoy.fsf@gentoo.org>
2023-06-30  8:15       ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Florian Schmaus
2023-06-30  8:22         ` Sam James
2023-07-03 10:17           ` Florian Schmaus
2023-07-03 11:12             ` [gentoo-dev] EGO_SUM Ulrich Mueller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox