public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
       [not found]     ` <87y1k33aoy.fsf@gentoo.org>
@ 2023-06-30  8:15       ` Florian Schmaus
  2023-06-30  8:22         ` Sam James
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Schmaus @ 2023-06-30  8:15 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1.1: Type: text/plain, Size: 3981 bytes --]

[in reply to a gentoo-project@ post, but it was asked to continue this 
on gentoo-dev@]

On 28/06/2023 16.46, Sam James wrote:
> Florian Schmaus <flow@gentoo.org> writes:
>> On 17/06/2023 10.37, Arthur Zamarin wrote:
>>> I also want to nominate people who I feel contribute a lot to Gentoo and
>>> I have a lot of interaction with (ordered by name, not priority):
>>> […]
>>> flow
>>
>> I apologize for the late reply, and thank you for the nomination. I am
>> honored and accept.
>>
>> As many of you know, I am spending a lot of time on the EGO_SUM
>> situation, as it is one of the most critical issues to solve.
>>
>> I have used the last few days to carefully consider whether a seat on
>> the council is more harmful or beneficial to my efforts regarding
>> EGO_SUM. On the one hand, council work means I have less time to
>> improve the EGO_SUM situation. On the other hand, a seat in the
>> council increases the probability of positively influencing Gentoo's
>> future, also regarding EGO_SUM.
>>
> 
> That's fine and it's great to see more people running!

Excellent that we share this view. :)


> But with regard to EGO_SUM: you didn't appear at the meeting where we discussed
> your previous EGO_SUM proposal,

Naively, as I am, I expected that the mailing list would be used for 
discussion and that the council meeting would be used chiefly for voting 
and intra-council discussion. And since the request to the council to 
vote on a concrete proposal was preceded by a multiple-week, if not 
month-long, mailing list discussion, I assumed that my presence in the 
council meeting was optional.

Had I known that my presence was required, or that the absence in the 
meeting would be blamed on me afterward, I would have appeared if possible.


> and questions remain unanswered on the
> ML (why not implement a check in pkgcheck similar to what is in Portage,
> for example)?

On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for 
the total package-directory size. I only care a little about the tool 
that checks this limit, but pkgcheck is an obvious choice. I also 
suggested that we review this policy once the number of Go packages has 
doubled or two years after this policy was established (whatever comes 
first).

But I fear you may be referring to another kind of check. You may be 
talking about a check that forbids EGO_SUM in ::gentoo but allows it 
overlays.

However, as stated before [2], this is not a viable approach. One reason 
why it is not practicable is auditability.


> The blocker is not a council seat, it's about addressing people's
> concerns...

Unfortunately, it appears that I am terrible at convincing everyone that 
the deprecation of EGO_SUM was a mistake. I tried to respond to every 
concern. Often, the response included arguments based on factual data. 
But eventually, I would only expect to convince some, as the EGO_SUM 
question touches the subjective realm of style.

I know that the EGO_SUM situation and the resulting discussion grew huge 
and left many understandably bored or confused, which then turned away. 
But that is a pity because it is a relevant discussion for Gentoo's 
long-term success.

The bottom line is that the EGO_SUM discussion yielded no evidence or 
even a slight indication that EGO_SUM was deprecated based on technical 
issues. Instead, it appears that EGO_SUM was deprecated because some 
deemed it unaesthetic.

Intelligibly, EGO_SUM can be considered ugly. Compared to a traditional 
Gentoo package, EGO_SUM-based ones are larger. The same is true for Rust 
packages. However, looking at the bigger picture, EGO_SUM's advantages 
outweigh its disadvantages.

- Flow


1: https://marc.info/?l=gentoo-dev&m=168546196902731 
<25308876-7ac4-8c90-8641-1034cc67c6b0@gentoo.org>
2: https://marc.info/?l=gentoo-dev&m=168569387514376 
<012fa74d-2910-ea90-6008-26cc23604d2f@gentoo.org>

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-06-30  8:15       ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Florian Schmaus
@ 2023-06-30  8:22         ` Sam James
  2023-06-30  9:38           ` Tim Harder
  2023-07-03 10:17           ` Florian Schmaus
  0 siblings, 2 replies; 23+ messages in thread
From: Sam James @ 2023-06-30  8:22 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 4861 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

> [[PGP Signed Part:Undecided]]
> [in reply to a gentoo-project@ post, but it was asked to continue this
> on gentoo-dev@]
>
> On 28/06/2023 16.46, Sam James wrote:
>> Florian Schmaus <flow@gentoo.org> writes:
>>> On 17/06/2023 10.37, Arthur Zamarin wrote:
>>>> I also want to nominate people who I feel contribute a lot to Gentoo and
>>>> I have a lot of interaction with (ordered by name, not priority):
>>>> […]
>>>> flow
>>>
>>> I apologize for the late reply, and thank you for the nomination. I am
>>> honored and accept.
>>>
>>> As many of you know, I am spending a lot of time on the EGO_SUM
>>> situation, as it is one of the most critical issues to solve.
>>>
>>> I have used the last few days to carefully consider whether a seat on
>>> the council is more harmful or beneficial to my efforts regarding
>>> EGO_SUM. On the one hand, council work means I have less time to
>>> improve the EGO_SUM situation. On the other hand, a seat in the
>>> council increases the probability of positively influencing Gentoo's
>>> future, also regarding EGO_SUM.
>>>
>> That's fine and it's great to see more people running!
>
> Excellent that we share this view. :)
>
>
>> But with regard to EGO_SUM: you didn't appear at the meeting where we discussed
>> your previous EGO_SUM proposal,
>
> Naively, as I am, I expected that the mailing list would be used for
> discussion and that the council meeting would be used chiefly for
> voting and intra-council discussion. And since the request to the
> council to vote on a concrete proposal was preceded by a
> multiple-week, if not month-long, mailing list discussion, I assumed
> that my presence in the council meeting was optional.
>
> Had I known that my presence was required, or that the absence in the
> meeting would be blamed on me afterward, I would have appeared if
> possible.

I'm not blaming you for anything. But you didn't speak in
#gentoo-council before the meeting (a few days before IIRC) when we
were discussing the problem, I pinged you during the meeting, and you
didn't appear there afterwards.

You also didn't seem to respond to the council decision (or
non-decision) in that meeting either, unless I've missed it.

It seems self-evident that discussion would happen in the meeting before
voting...? What am I misunderstanding?

We regularly discuss things before voting on them. Do you normally
observe council meetings? I don't think what we did in this instance
was at all unusual.

(Also: there's the issue of whether or not the council should really
be voting on overriding an eclass maintainer who would then be forced
to keep something working they don't want to. mgorny raised that.)

>
>
>> and questions remain unanswered on the
>> ML (why not implement a check in pkgcheck similar to what is in Portage,
>> for example)?
>
> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for
> the total package-directory size. I only care a little about the tool
> that checks this limit, but pkgcheck is an obvious choice. I also
> suggested that we review this policy once the number of Go packages
> has doubled or two years after this policy was established (whatever
> comes first).
>
> But I fear you may be referring to another kind of check. You may be
> talking about a check that forbids EGO_SUM in ::gentoo but allows it
> overlays.

My position on this has been consistent: a check is needed to statically
determine when the environment size is too big. Copying the Portage
check into pkgcheck (in terms of the metrics) would satisfy this.

That is, regardless of raw size, I'm asking for a calculation based on
the contents of EGO_SUM where, if exceeded, the package will not be
installable on some systems. You didn't have an issue implementing this
for Portage and I've mentioned this a bunch of times since, so I thought
it was clear what I was hoping to see.

I would also like (which is not what I was referring to here) some
limit on the size, given that we already have a limit on the size of
${FILESDIR}, but this is less of a concern for me given it's bounded
by the aforementioned environment size check.

>
> Intelligibly, EGO_SUM can be considered ugly. Compared to a
> traditional Gentoo package, EGO_SUM-based ones are larger. The same is
> true for Rust packages. However, looking at the bigger picture,
> EGO_SUM's advantages outweigh its disadvantages.
>

Again, am on record as being fine with the general EGO_SUM approach,
even if I wish we didn't need it, as I see it as inevitable for things
like yarn, .NET, and of course Rust as we already have it.

Just ideally not huge ones, and certainly not huge ones which then
aren't even reliably installable because of environment size.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-06-30  8:22         ` Sam James
@ 2023-06-30  9:38           ` Tim Harder
  2023-06-30 11:33             ` Eray Aslan
  2023-07-03 10:17           ` Florian Schmaus
  1 sibling, 1 reply; 23+ messages in thread
From: Tim Harder @ 2023-06-30  9:38 UTC (permalink / raw
  To: gentoo-dev

On 2023-06-30 Fri 02:22, Sam James wrote:
> My position on this has been consistent: a check is needed to statically
> determine when the environment size is too big. Copying the Portage
> check into pkgcheck (in terms of the metrics) would satisfy this.
> 
> That is, regardless of raw size, I'm asking for a calculation based on
> the contents of EGO_SUM where, if exceeded, the package will not be
> installable on some systems. You didn't have an issue implementing this
> for Portage and I've mentioned this a bunch of times since, so I thought
> it was clear what I was hoping to see.
> 
> I would also like (which is not what I was referring to here) some
> limit on the size, given that we already have a limit on the size of
> ${FILESDIR}, but this is less of a concern for me given it's bounded
> by the aforementioned environment size check.

Why do we have to keep exporting the related variables that generally
cause these size issues to the environment? I've asked as much on IRC
multiple times (nearly every time this discussion has been brought up)
and the answers I've gotten are some variation on "it's always been that
way" or "not exporting them would break using commands as external
programs" (e.g. calling via xargs).

The first response isn't a great argument and the second response, while
more valid, also feels less important than having a more minimalistic,
exported environment that causes less issues like this one and others
such as potentially affecting a package's build system in an unexpected
fashion. See bug #721088 for the related discussion on environment
variable exports.

From my stance, the spec should state that the only variables to be
exported are ones already "semi-standard" and used externally of package
manager internals in the expected fashion, which probably only includes
HOME, TMPDIR, and maybe ROOT. This would of course currently break
packages that use `xargs` while calling internal commands depending on
some of those exported variables, but from a cursory glance at the
gentoo repo, there aren't many ebuilds using that functionality and in
general those that are could be written in an easier to understand
fashion without using xargs. It should also be possible to proxy the
required variables to those commands in various fashions without using
the environment if using commands externally is extremely important to
the few ebuild maintainers who make use of that functionality.

In short, adding checks to portage and pkgcheck feels like a ill-suited
workaround that foists hacking around the error onto users or developers
due to a poor decision made decades ago on environment handling.

Tim


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-06-30  9:38           ` Tim Harder
@ 2023-06-30 11:33             ` Eray Aslan
  2023-07-03 10:17               ` Florian Schmaus
  0 siblings, 1 reply; 23+ messages in thread
From: Eray Aslan @ 2023-06-30 11:33 UTC (permalink / raw
  To: gentoo-dev

On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
> Why do we have to keep exporting the related variables that generally
> cause these size issues to the environment?

I really do not want to make a +1 response but this is an excellent
question that we need to answer before implementing EGO_SUM.

-- 
Eray


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-06-30  8:22         ` Sam James
  2023-06-30  9:38           ` Tim Harder
@ 2023-07-03 10:17           ` Florian Schmaus
  2023-07-03 11:12             ` [gentoo-dev] EGO_SUM Ulrich Mueller
  2023-07-08 21:21             ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
  1 sibling, 2 replies; 23+ messages in thread
From: Florian Schmaus @ 2023-07-03 10:17 UTC (permalink / raw
  To: gentoo-dev, Sam James


[-- Attachment #1.1.1: Type: text/plain, Size: 3698 bytes --]

On 30/06/2023 10.22, Sam James wrote:
> Florian Schmaus <flow@gentoo.org> writes:
>> [[PGP Signed Part:Undecided]]
>> [in reply to a gentoo-project@ post, but it was asked to continue this
>> on gentoo-dev@]
>> On 28/06/2023 16.46, Sam James wrote:
>>> and questions remain unanswered on the
>>> ML (why not implement a check in pkgcheck similar to what is in Portage,
>>> for example)?
>>
>> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for
>> the total package-directory size. I only care a little about the tool
>> that checks this limit, but pkgcheck is an obvious choice. I also
>> suggested that we review this policy once the number of Go packages
>> has doubled or two years after this policy was established (whatever
>> comes first).
>>
>> But I fear you may be referring to another kind of check. You may be
>> talking about a check that forbids EGO_SUM in ::gentoo but allows it
>> overlays.
> 
> My position on this has been consistent:  > a check is needed to statically
> determine when the environment size is too big. Copying the Portage
> check into pkgcheck (in terms of the metrics) would satisfy this.

It is not as easy as merely copying existing portage code into pkgcheck 
(unless I am missing something).

I've talked to arthurzam, and there appears to be a .environment file 
created by pkgcheck, which we could use to approximate the exported 
environment.

Another option would be to have pkgcheck count the EGO_SUM entries. The 
tree-sitter API for Bash, which pkgcheck already uses, seems to allow 
for that. But that would be different from the check in portage. 
Although, IMHO, counting EGO_SUM entries would be sufficient.


> That is, regardless of raw size, I'm asking for a calculation based on
> the contents of EGO_SUM where, if exceeded, the package will not be
> installable on some systems. You didn't have an issue implementing this
> for Portage and I've mentioned this a bunch of times since, so I thought
> it was clear what I was hoping to see.

So pkgcheck counting EGO_SUM entries would be sufficient for the purpose 
of having a static check that notices if the ebuild would likely run 
into the environment limit?

To find a common compromise, I would possibly invest my time in 
developing such a test. Even though I do not deem such a check a strict 
prerequisite to reintroduce EGO_SUM.


>> Intelligibly, EGO_SUM can be considered ugly. Compared to a
>> traditional Gentoo package, EGO_SUM-based ones are larger. The same is
>> true for Rust packages. However, looking at the bigger picture,
>> EGO_SUM's advantages outweigh its disadvantages.
>>
> 
> Again, am on record as being fine with the general EGO_SUM approach,
> even if I wish we didn't need it, as I see it as inevitable for things
> like yarn, .NET, and of course Rust as we already have it.
> 
> Just ideally not huge ones, and certainly not huge ones which then
> aren't even reliably installable because of environment size.

Talking about "reliably installable" makes it sound to me like there are 
cases where installing a EGO_SUM-based package sometimes works and 
sometimes not. But the kernel-limit is fixed and not even configurable, 
besides, of course patching the source (and in the absence of 
architectures with a page size below 4 KiB) [1].

Any developer testing whether or not an ebuild is installable would 
become immediately aware if the ebuild runs into the environment limit, 
or not.

That said, static code checks are always preferable over dynamic ones.

- Flow


1: 
https://elixir.bootlin.com/linux/v6.4.1/source/include/uapi/linux/binfmts.h#L15


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-06-30 11:33             ` Eray Aslan
@ 2023-07-03 10:17               ` Florian Schmaus
  2023-07-04  7:13                 ` Tim Harder
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Schmaus @ 2023-07-03 10:17 UTC (permalink / raw
  To: gentoo-dev, Eray Aslan


[-- Attachment #1.1.1: Type: text/plain, Size: 832 bytes --]

On 30/06/2023 13.33, Eray Aslan wrote:
> On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
>> Why do we have to keep exporting the related variables that generally
>> cause these size issues to the environment?
> 
> I really do not want to make a +1 response but this is an excellent
> question that we need to answer before implementing EGO_SUM.

Could you please discuss why you make the reintroduction of EGO_SUM 
dependent on this question?

Portage will show you a warning message if the exported environment 
approaches the kernel limit, and it will show a detailed error message 
if executing an ebuild failed due to the limit being reached. There 
seems to be no reason why you should not be able to allow EGO_SUM again 
without first fixing, for example, https://bugs.gentoo.org/721088.

- Flow

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM
  2023-07-03 10:17           ` Florian Schmaus
@ 2023-07-03 11:12             ` Ulrich Mueller
  2023-07-08 21:21             ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
  1 sibling, 0 replies; 23+ messages in thread
From: Ulrich Mueller @ 2023-07-03 11:12 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev, Sam James

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

>>>>> On Mon, 03 Jul 2023, Florian Schmaus wrote:

> So pkgcheck counting EGO_SUM entries would be sufficient for the
> purpose of having a static check that notices if the ebuild would
> likely run into the environment limit?

> To find a common compromise, I would possibly invest my time in
> developing such a test. Even though I do not deem such a check a
> strict prerequisite to reintroduce EGO_SUM.

The so-called "environment limit" is 32 pages, i.e. normally 128 KiB.
With the A variable anywhere near this, the size of the Manifest file
would be close to 1 MiB.

IMHO this is way too large to be used on a regular basis. I am aware
that we have some packages with large Manifests (71 packages above
50 KiB, 6 packages above 200 KiB, out of 18812 packages in total),
but these should really remain the exception.

Ulrich

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-03 10:17               ` Florian Schmaus
@ 2023-07-04  7:13                 ` Tim Harder
  2023-07-04 10:44                   ` Gerion Entrup
  2023-07-06  6:09                   ` Zoltan Puskas
  0 siblings, 2 replies; 23+ messages in thread
From: Tim Harder @ 2023-07-04  7:13 UTC (permalink / raw
  To: gentoo-dev

On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
>On 30/06/2023 13.33, Eray Aslan wrote:
>>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
>>>Why do we have to keep exporting the related variables that generally
>>>cause these size issues to the environment?
>>
>>I really do not want to make a +1 response but this is an excellent
>>question that we need to answer before implementing EGO_SUM.
>
>Could you please discuss why you make the reintroduction of EGO_SUM 
>dependent on this question?

Just to be clear, I don't particularly care about EGO_SUM enough to gate
its reintroduction (and don't have any leverage to do so anyway). I'm
just tired of the circular discussions around env issues that all seem
to avoid actual fixes, catering instead to functionality used by a
vanishingly small subset of ebuilds in the main repo that compels a
certain design mostly due to how portage functioned before EAPI 0.

Other than that, supporting EGO_SUM (or any other language ecosystem
trending towards distro-unfriendly releases) is fine as long as devs are
cognizant how the related global-scope eclass design affects everyone
running or working on the raw repo. I hope devs continue leveraging the
relatively recent benchmark tooling (and perhaps more future support) to
improve their work. Along those lines, it could be nice to see sample
benchmark data in commit messages for large, global-scope eclass work
just to reinforce that it was taken into account.

Tim


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-04  7:13                 ` Tim Harder
@ 2023-07-04 10:44                   ` Gerion Entrup
  2023-07-04 21:56                     ` Robin H. Johnson
  2023-07-06  6:09                   ` Zoltan Puskas
  1 sibling, 1 reply; 23+ messages in thread
From: Gerion Entrup @ 2023-07-04 10:44 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]

Am Dienstag, 4. Juli 2023, 09:13:30 CEST schrieb Tim Harder:
> On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
> >On 30/06/2023 13.33, Eray Aslan wrote:
> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
> >>>Why do we have to keep exporting the related variables that generally
> >>>cause these size issues to the environment?
> >>
> >>I really do not want to make a +1 response but this is an excellent
> >>question that we need to answer before implementing EGO_SUM.
> >
> >Could you please discuss why you make the reintroduction of EGO_SUM 
> >dependent on this question?
> 
> Just to be clear, I don't particularly care about EGO_SUM enough to gate
> its reintroduction (and don't have any leverage to do so anyway). I'm
> just tired of the circular discussions around env issues that all seem
> to avoid actual fixes, catering instead to functionality used by a
> vanishingly small subset of ebuilds in the main repo that compels a
> certain design mostly due to how portage functioned before EAPI 0.
> 
> Other than that, supporting EGO_SUM (or any other language ecosystem
> trending towards distro-unfriendly releases) is fine as long as devs are
> cognizant how the related global-scope eclass design affects everyone
> running or working on the raw repo. I hope devs continue leveraging the
> relatively recent benchmark tooling (and perhaps more future support) to
> improve their work. Along those lines, it could be nice to see sample
> benchmark data in commit messages for large, global-scope eclass work
> just to reinforce that it was taken into account.
> 
> Tim

Hi,

just to be curious about the whole discussion. I did not follow in the
deepest detail but what I got is:
- EGO_SUM blows up the Manifest file, since every little Go module needs
  to be respected. A lot of these Manifest files lead to a extremely
  increased Portage tree size. EGO_SUM is just one example (though the
  biggest one). Statically linked languages like Rust etc. have the same
  problem.
- The current solution is to prepackage all modules, put it somewhere on
  a webserver and just manifest that file. This make the Portage tree
  small in size again, but requires a webserver/mirror and is thus
  unfriendly for overlay devs.

I'm not sure if it was mentioned before but has anyone considered hash
trees / Merkle trees for the manifest file? The idea would be to hash
the standard manifest file a second time if it gets too big and write
down that hash as new manifest file and leave EGO_SUM as is.

When Portage tries to install the package, it can download all modules,
build the "normal" Manifest file like normally, but instead of directly
compare it to the Manifest in the tree it can hash it again and compare
that to the provided Manifest. With this, Portage should have more less
the same guarantees about the validity of the source code, but the
manifest file consists of just two hashes again.
What one would loose is the direct comparison of file names (they are
included in the "meta"-hash, though) or do I miss something?

Gerion

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-04 10:44                   ` Gerion Entrup
@ 2023-07-04 21:56                     ` Robin H. Johnson
  2023-07-04 23:09                       ` Oskari Pirhonen
  0 siblings, 1 reply; 23+ messages in thread
From: Robin H. Johnson @ 2023-07-04 21:56 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2185 bytes --]

On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> just to be curious about the whole discussion. I did not follow in the
> deepest detail but what I got is:
> - EGO_SUM blows up the Manifest file, since every little Go module needs
>   to be respected. A lot of these Manifest files lead to a extremely
>   increased Portage tree size. EGO_SUM is just one example (though the
>   biggest one). Statically linked languages like Rust etc. have the same
>   problem.
> - The current solution is to prepackage all modules, put it somewhere on
>   a webserver and just manifest that file. This make the Portage tree
>   small in size again, but requires a webserver/mirror and is thus
>   unfriendly for overlay devs.
> 
> I'm not sure if it was mentioned before but has anyone considered hash
> trees / Merkle trees for the manifest file? The idea would be to hash
> the standard manifest file a second time if it gets too big and write
> down that hash as new manifest file and leave EGO_SUM as is.
This is out-of-tree/indirect Manifests, that I proposed here, more than
a year ago:
https://marc.info/?l=gentoo-dev&m=168280762310716&w=2
https://marc.info/?l=gentoo-dev&m=165472088822215&w=2

Developing it requires PMS work in addition to package manager
development, because it introduces phases.

- primary fetch of $SRC_URI per ebuild, including indirect Manifest
- primary validation of distfiles
- secondary fetch of $SRC_URI per indirect Manifest
- secondary validation of additional distfiles

A significantly impacted use case is "emerge -f", it now needs to run
downloads twice.

The rest of the posts also go into the matter of duplication within
EGO_SUM & the indirect Manifests: limiting the growth requires some form
of content-addressed layout.

It's absolutely something we should get developed, but it's a lot of
work.

The indirect Manifests still provide a hosting challenge for overlays.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-04 21:56                     ` Robin H. Johnson
@ 2023-07-04 23:09                       ` Oskari Pirhonen
  2023-07-05 18:40                         ` Gerion Entrup
  0 siblings, 1 reply; 23+ messages in thread
From: Oskari Pirhonen @ 2023-07-04 23:09 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 4030 bytes --]

On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote:
> On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > just to be curious about the whole discussion. I did not follow in the
> > deepest detail but what I got is:
> > - EGO_SUM blows up the Manifest file, since every little Go module needs
> >   to be respected. A lot of these Manifest files lead to a extremely
> >   increased Portage tree size. EGO_SUM is just one example (though the
> >   biggest one). Statically linked languages like Rust etc. have the same
> >   problem.
> > - The current solution is to prepackage all modules, put it somewhere on
> >   a webserver and just manifest that file. This make the Portage tree
> >   small in size again, but requires a webserver/mirror and is thus
> >   unfriendly for overlay devs.
> > 
> > I'm not sure if it was mentioned before but has anyone considered hash
> > trees / Merkle trees for the manifest file? The idea would be to hash
> > the standard manifest file a second time if it gets too big and write
> > down that hash as new manifest file and leave EGO_SUM as is.
> This is out-of-tree/indirect Manifests, that I proposed here, more than
> a year ago:
> https://marc.info/?l=gentoo-dev&m=168280762310716&w=2
> https://marc.info/?l=gentoo-dev&m=165472088822215&w=2
> 
> Developing it requires PMS work in addition to package manager
> development, because it introduces phases.
> 
> - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> - primary validation of distfiles
> - secondary fetch of $SRC_URI per indirect Manifest
> - secondary validation of additional distfiles
> 
> A significantly impacted use case is "emerge -f", it now needs to run
> downloads twice.
> 

I'm not sure double downloading is required. Consider a flow similar to
this:

1. distfiles are fetched as per the ebuild
2. distfiles are hashed into a temporary Manifest
3. temporary Manifest is hashed and compared with the hashes stored in
   the in-tree Manifest for the direct Manifest

A new Manifest format would be required in order to differentiate the
current ones from an indirect one. This may require PMS changes,
although I suspect ammending GLEP 74 may be enough since the PMS seems
to just refer to the GLEP for a description of Manifests.

This would also either rely on a stable ordering of Manifest contents
when generating it or having a separate file listing in the indirect
Manifest which corresponds to the order in the direct Manifest. For the
latter, it should also have separate entries for different package
versions so that every single distfile for every single version of said
package does not need to be fetched in order to build the direct
Manifest.

I'm imagining something along these lines:
    
    INDIRECT true
    PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ...
    PACKAGE ...

Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
containing the distfiles (and potentially other files if a repo does not
have thin-manifests enabled) and their hashes in the order specified
previously.

The indirect Manifest as described above would be large-ish for a
package that has lots of distfiles, but likely much smaller than if each
distfile had its set of hashes stored directly.

Please correct me if there's some detail I've overlooked.

- Oskari

> The rest of the posts also go into the matter of duplication within
> EGO_SUM & the indirect Manifests: limiting the growth requires some form
> of content-addressed layout.
> 
> It's absolutely something we should get developed, but it's a lot of
> work.
> 
> The indirect Manifests still provide a hosting challenge for overlays.
> 
> -- 
> Robin Hugh Johnson
> Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
> E-Mail   : robbat2@gentoo.org
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
> GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-04 23:09                       ` Oskari Pirhonen
@ 2023-07-05 18:40                         ` Gerion Entrup
  2023-07-05 19:32                           ` Rich Freeman
  2023-07-06  2:48                           ` Oskari Pirhonen
  0 siblings, 2 replies; 23+ messages in thread
From: Gerion Entrup @ 2023-07-05 18:40 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]

Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote:
> > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > > just to be curious about the whole discussion. I did not follow in the
> > > deepest detail but what I got is:
> > > - EGO_SUM blows up the Manifest file, since every little Go module needs
> > >   to be respected. A lot of these Manifest files lead to a extremely
> > >   increased Portage tree size. EGO_SUM is just one example (though the
> > >   biggest one). Statically linked languages like Rust etc. have the same
> > >   problem.
> > > - The current solution is to prepackage all modules, put it somewhere on
> > >   a webserver and just manifest that file. This make the Portage tree
> > >   small in size again, but requires a webserver/mirror and is thus
> > >   unfriendly for overlay devs.
> > > 
> > > I'm not sure if it was mentioned before but has anyone considered hash
> > > trees / Merkle trees for the manifest file? The idea would be to hash
> > > the standard manifest file a second time if it gets too big and write
> > > down that hash as new manifest file and leave EGO_SUM as is.
> > This is out-of-tree/indirect Manifests, that I proposed here, more than
> > a year ago:
> > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2
> > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2
> > 
> > Developing it requires PMS work in addition to package manager
> > development, because it introduces phases.
> > 
> > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > - primary validation of distfiles
> > - secondary fetch of $SRC_URI per indirect Manifest
> > - secondary validation of additional distfiles
> > 
> > A significantly impacted use case is "emerge -f", it now needs to run
> > downloads twice.
> > 
> 
> I'm not sure double downloading is required. Consider a flow similar to
> this:
> 
> 1. distfiles are fetched as per the ebuild
> 2. distfiles are hashed into a temporary Manifest
> 3. temporary Manifest is hashed and compared with the hashes stored in
>    the in-tree Manifest for the direct Manifest

This is exactly, what I meant. A webstorage is not needed. A second
download process is also not needed. Just an additional Manifest format
is needed for ebuilds with more than n distfiles.


> A new Manifest format would be required in order to differentiate the
> current ones from an indirect one. This may require PMS changes,
> although I suspect ammending GLEP 74 may be enough since the PMS seems
> to just refer to the GLEP for a description of Manifests.
> 
> This would also either rely on a stable ordering of Manifest contents
> when generating it or having a separate file listing in the indirect
> Manifest which corresponds to the order in the direct Manifest. For the
> latter, it should also have separate entries for different package
> versions so that every single distfile for every single version of said
> package does not need to be fetched in order to build the direct
> Manifest.
> 
> I'm imagining something along these lines:
>     
>     INDIRECT true
>     PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ...
>     PACKAGE ...

Maybe it is reasonable to skip the distfile names at all (or just
provide a hash value of the concatenated file names). Then the manifest
would just contain two/three hashes (for as many distfiles as the ebuild
needs). Since these kind of indirect Manifests should be more rare than
the normal ones, a slightly longer processing time does not have much
impact I would say.



> Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
> containing the distfiles (and potentially other files if a repo does not
> have thin-manifests enabled) and their hashes in the order specified
> previously.
> 
> The indirect Manifest as described above would be large-ish for a
> package that has lots of distfiles, but likely much smaller than if each
> distfile had its set of hashes stored directly.

Without storing the filenames, the Manifest file would have the same
small size for any amount of distfiles needed.

Gerion


> Please correct me if there's some detail I've overlooked.
> 
> - Oskari
> 
> > The rest of the posts also go into the matter of duplication within
> > EGO_SUM & the indirect Manifests: limiting the growth requires some form
> > of content-addressed layout.
> > 
> > It's absolutely something we should get developed, but it's a lot of
> > work.
> > 
> > The indirect Manifests still provide a hosting challenge for overlays.
> > 
> 
> 
> 


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-05 18:40                         ` Gerion Entrup
@ 2023-07-05 19:32                           ` Rich Freeman
  2023-07-06  2:48                           ` Oskari Pirhonen
  1 sibling, 0 replies; 23+ messages in thread
From: Rich Freeman @ 2023-07-05 19:32 UTC (permalink / raw
  To: gentoo-dev

On Wed, Jul 5, 2023 at 2:40 PM Gerion Entrup <gerion.entrup@flump.de> wrote:
>
> Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> > On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote:
> > >
> > > Developing it requires PMS work in addition to package manager
> > > development, because it introduces phases.
> > >
> > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > > - primary validation of distfiles
> > > - secondary fetch of $SRC_URI per indirect Manifest
> > > - secondary validation of additional distfiles
> > >
> > > A significantly impacted use case is "emerge -f", it now needs to run
> > > downloads twice.
> >
> > I'm not sure double downloading is required. Consider a flow similar to
> > this:
> >
> > 1. distfiles are fetched as per the ebuild
> > 2. distfiles are hashed into a temporary Manifest
> > 3. temporary Manifest is hashed and compared with the hashes stored in
> >    the in-tree Manifest for the direct Manifest
>
> This is exactly, what I meant. A webstorage is not needed. A second
> download process is also not needed. Just an additional Manifest format
> is needed for ebuilds with more than n distfiles.
>

I suspect that Robin was proposing indirect manfests AND src uris, and
not just indirect manifests.  In any case, if he wasn't, then I'd
suggest it would make sense to have that so that we don't need giant
lists of src_uris or go sums or whatever in ebuilds.  Sure, the
manifests are even larger than the original file references, but those
will still be long.  Plus if a file is used by 5 versions of an ebuild
it will be present in the manifests once per hash function, but in the
ebuilds 5 times.

I agree though that if only the manifests are moved to a fetched file
then you could fetch that on the first pass, though you'd still need
the extra logic to parse it.  I'm not sure it really is much of a
difference to the effort involved.

Aren't go sums already content hashes?  It might make even more sense
to create some kind of modular manifest verification logic in portage
so that the same eclass that handles EGO_SUM could tell the package
manager how to check the integrity of the files that are fetched.
Well, assuming we trust whatever hash function they're using (I'm
afraid to check - maybe this isn't such a great idea...).

-- 
Rich


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-05 18:40                         ` Gerion Entrup
  2023-07-05 19:32                           ` Rich Freeman
@ 2023-07-06  2:48                           ` Oskari Pirhonen
  1 sibling, 0 replies; 23+ messages in thread
From: Oskari Pirhonen @ 2023-07-06  2:48 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 6065 bytes --]

On Wed, Jul 05, 2023 at 20:40:34 +0200, Gerion Entrup wrote:
> Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen:
> > On Tue, Jul 04, 2023 at 21:56:26 +0000, Robin H. Johnson wrote:
> > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote:
> > > > just to be curious about the whole discussion. I did not follow in the
> > > > deepest detail but what I got is:
> > > > - EGO_SUM blows up the Manifest file, since every little Go module needs
> > > >   to be respected. A lot of these Manifest files lead to a extremely
> > > >   increased Portage tree size. EGO_SUM is just one example (though the
> > > >   biggest one). Statically linked languages like Rust etc. have the same
> > > >   problem.
> > > > - The current solution is to prepackage all modules, put it somewhere on
> > > >   a webserver and just manifest that file. This make the Portage tree
> > > >   small in size again, but requires a webserver/mirror and is thus
> > > >   unfriendly for overlay devs.
> > > > 
> > > > I'm not sure if it was mentioned before but has anyone considered hash
> > > > trees / Merkle trees for the manifest file? The idea would be to hash
> > > > the standard manifest file a second time if it gets too big and write
> > > > down that hash as new manifest file and leave EGO_SUM as is.
> > > This is out-of-tree/indirect Manifests, that I proposed here, more than
> > > a year ago:
> > > https://marc.info/?l=gentoo-dev&m=168280762310716&w=2
> > > https://marc.info/?l=gentoo-dev&m=165472088822215&w=2
> > > 
> > > Developing it requires PMS work in addition to package manager
> > > development, because it introduces phases.
> > > 
> > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest
> > > - primary validation of distfiles
> > > - secondary fetch of $SRC_URI per indirect Manifest
> > > - secondary validation of additional distfiles
> > > 
> > > A significantly impacted use case is "emerge -f", it now needs to run
> > > downloads twice.
> > > 
> > 
> > I'm not sure double downloading is required. Consider a flow similar to
> > this:
> > 
> > 1. distfiles are fetched as per the ebuild
> > 2. distfiles are hashed into a temporary Manifest
> > 3. temporary Manifest is hashed and compared with the hashes stored in
> >    the in-tree Manifest for the direct Manifest
> 
> This is exactly, what I meant. A webstorage is not needed. A second
> download process is also not needed. Just an additional Manifest format
> is needed for ebuilds with more than n distfiles.
> 
> 
> > A new Manifest format would be required in order to differentiate the
> > current ones from an indirect one. This may require PMS changes,
> > although I suspect ammending GLEP 74 may be enough since the PMS seems
> > to just refer to the GLEP for a description of Manifests.
> > 
> > This would also either rely on a stable ordering of Manifest contents
> > when generating it or having a separate file listing in the indirect
> > Manifest which corresponds to the order in the direct Manifest. For the
> > latter, it should also have separate entries for different package
> > versions so that every single distfile for every single version of said
> > package does not need to be fetched in order to build the direct
> > Manifest.
> > 
> > I'm imagining something along these lines:
> >     
> >     INDIRECT true
> >     PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ...
> >     PACKAGE ...
> 
> Maybe it is reasonable to skip the distfile names at all (or just
> provide a hash value of the concatenated file names). Then the manifest
> would just contain two/three hashes (for as many distfiles as the ebuild
> needs). Since these kind of indirect Manifests should be more rare than
> the normal ones, a slightly longer processing time does not have much
> impact I would say.
> 

My reasoning behind having the list of files is so that the
intermediat/direct Manifest can be accurately recreated. Consider the
following (not-so-)hypothetical Manifest:
    
    DIST dist.tar.gz 84703 BLAKE2B ... SHA512 ...
    DIST dist.tar.gz.asc 228 BLAKE2B ... SHA512 ...
    EBUILD package-r1.ebuild 1535 BLAKE2B ... SHA512 ...
    EBUILD package.ebuild 1536 BLAKE2B ... SHA512 ...
    MISC metadata.xml 959 BLAKE2B ... SHA512 ...

It is "well behaved" because pkgdev created it. My main concern is if
$OTHER_TOOLING generates the Manifest in a different order which would
mean the Manifest may be correct, but you get a false negative since the
hashes don't match what is in the in-tree indirect Manifest. Having the
order specified in the indirect Manifest renders this moot because
$OTHER_TOOLING would have to respect this in order to correctly handle
indirect Manifests.

Additionally, in repos without thin-manifests, the SRC_URI is not enough
to build up the Manifest. This may or may not be an issue depending on
if a repo's metadata/layout.conf is parsed as part of the Manifest
verification process.

> 
> 
> > Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest
> > containing the distfiles (and potentially other files if a repo does not
> > have thin-manifests enabled) and their hashes in the order specified
> > previously.
> > 
> > The indirect Manifest as described above would be large-ish for a
> > package that has lots of distfiles, but likely much smaller than if each
> > distfile had its set of hashes stored directly.
> 
> Without storing the filenames, the Manifest file would have the same
> small size for any amount of distfiles needed.
> 

Assuming layout.conf is parsed when the Manifest is verified (thus
handling the thick Maniffest case), the file list can be omitted if GLEP
74 is ammended to specify an ordering on the entries.

Side note: Portage itself does not seem to care about the ordering. I
tested this by copying a package tree, moving some entries around, and
running `ebuild /path/to/ebuild clean unpack`.

- Oskari

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-04  7:13                 ` Tim Harder
  2023-07-04 10:44                   ` Gerion Entrup
@ 2023-07-06  6:09                   ` Zoltan Puskas
  2023-07-06 19:46                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open Hank Leininger
  2023-07-08 20:49                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
  1 sibling, 2 replies; 23+ messages in thread
From: Zoltan Puskas @ 2023-07-06  6:09 UTC (permalink / raw
  To: gentoo-dev

On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote:
> On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
> >On 30/06/2023 13.33, Eray Aslan wrote:
> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
> >>>Why do we have to keep exporting the related variables that generally
> >>>cause these size issues to the environment?
> >>
> >>I really do not want to make a +1 response but this is an excellent
> >>question that we need to answer before implementing EGO_SUM.
> >
> >Could you please discuss why you make the reintroduction of EGO_SUM 
> >dependent on this question?
> 
> Just to be clear, I don't particularly care about EGO_SUM enough to gate
> its reintroduction (and don't have any leverage to do so anyway). I'm
> just tired of the circular discussions around env issues that all seem
> to avoid actual fixes, catering instead to functionality used by a
> vanishingly small subset of ebuilds in the main repo that compels a
> certain design mostly due to how portage functioned before EAPI 0.
> 
> Other than that, supporting EGO_SUM (or any other language ecosystem
> trending towards distro-unfriendly releases) is fine as long as devs are
> cognizant how the related global-scope eclass design affects everyone
> running or working on the raw repo. I hope devs continue leveraging the
> relatively recent benchmark tooling (and perhaps more future support) to
> improve their work. Along those lines, it could be nice to see sample
> benchmark data in commit messages for large, global-scope eclass work
> just to reinforce that it was taken into account.
> 
> Tim
> 

I've been following the EGO_SUM thread for quite some time now. One other thing
I did not see mentioned in favour of EGO_SUM so far: reproducibility.

The problem with external tarballs is that they are gone once the ebuild is
dropped from the tree. Should a user ever want to roll back to a previous
version of an application, either by checking out on older version of the
portage tree or copying said ebuild into their local overlay, they still cannot
simply run an emerge on the it as they have to somehow recreate the tarball
itself too.

While upstream may not host everything forever, it's pretty much guaranteed to
be available for much longer than Gentoo's custom tarball bundles of
dependencies.

Regarding space we are also likely making trade-off. By deprecating EGO_SUM we
are saving space in the portage tree but in exchange inflating distfiles as it
will start accumulating the same dependencies potentially multiple times since
now the content is hidden in tarballs containing a combination of dependencies.
This is essentially the source file version of "statically linking".

Finally a personal opinion: I find dependency tarballs opaque. With EGO_SUM the
ebuild defines all the upstream sources it needs to build the package as well as
how to build it, but with the dependency tarball the sources are all hidden and
makes verification all that much harder.

Zoltan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open
  2023-07-06  6:09                   ` Zoltan Puskas
@ 2023-07-06 19:46                     ` Hank Leininger
  2023-07-08 20:49                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
  1 sibling, 0 replies; 23+ messages in thread
From: Hank Leininger @ 2023-07-06 19:46 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

On Thu, Jul 6, 2023 Zoltan Puskas wrote:
> I've been following the EGO_SUM thread for quite some time now. One
> other thing I did not see mentioned in favour of EGO_SUM so far:
> reproducibility.

> The problem with external tarballs is that they are gone once the
> ebuild is dropped from the tree. Should a user ever want to roll back
> to a previous version of an application, either by checking out on
> older version of the portage tree or copying said ebuild into their
> local overlay, they still cannot simply run an emerge on the it as
> they have to somehow recreate the tarball itself too.

> While upstream may not host everything forever, it's pretty much
> guaranteed to be available for much longer than Gentoo's custom
> tarball bundles of dependencies.

I see this brought up every once in a while in these EGO_SUM threads,
but I think reproducable tarballs are a solved problem, or at least, the
tools exist and we just need to decide how to best equip people with
them.

thesamesam/sam-gentoo-scripts has maint/bump-go which builds these
tarballs smartly and reproducably:

- use --sort=name to order files inside in a consistent way
- use consistent owner:group (portage:portage)
- use consistent LC and TZ settings
- set a standard timestamp (since 'go mod download' doesn't preserve
  upstream timestamps anyway, this loses no useful information)

With that, multiple developers can independently generate a -deps
tarball for a given Go package version with checksums that match. The
main distro tarball's checksums are verified against Manifest, and then
within it are the list and checksums of the individual downloads which
would be verified by go mod download (right?) and the resulting -deps
files should also match Manifest entries.

So a similar approach could be used in the case of expired ::gentoo
versions being installed, or overlays using -deps files without a way to
host them. Set things up so this can be done easily on demand or perhaps
automatically as needed (maybe through a variation on pkg_nofetch in a
Go eclass; that part is not obvious to me). 

Thanks,

-- 

Hank Leininger <hlein@korelogic.com>
9606 3BF9 B593 4CBC E31A  A384 6200 F6E3 781E 3DD7

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-06  6:09                   ` Zoltan Puskas
  2023-07-06 19:46                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open Hank Leininger
@ 2023-07-08 20:49                     ` Sam James
  1 sibling, 0 replies; 23+ messages in thread
From: Sam James @ 2023-07-08 20:49 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2324 bytes --]


Zoltan Puskas <zoltan@sinustrom.info> writes:

> On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote:
>> On 2023-07-03 Mon 04:17, Florian Schmaus wrote:
>> >On 30/06/2023 13.33, Eray Aslan wrote:
>> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote:
>> >>>Why do we have to keep exporting the related variables that generally
>> >>>cause these size issues to the environment?
>> >>
>> >>I really do not want to make a +1 response but this is an excellent
>> >>question that we need to answer before implementing EGO_SUM.
>> >
>> >Could you please discuss why you make the reintroduction of EGO_SUM 
>> >dependent on this question?
>> 
>> Just to be clear, I don't particularly care about EGO_SUM enough to gate
>> its reintroduction (and don't have any leverage to do so anyway). I'm
>> just tired of the circular discussions around env issues that all seem
>> to avoid actual fixes, catering instead to functionality used by a
>> vanishingly small subset of ebuilds in the main repo that compels a
>> certain design mostly due to how portage functioned before EAPI 0.
>> 
>> Other than that, supporting EGO_SUM (or any other language ecosystem
>> trending towards distro-unfriendly releases) is fine as long as devs are
>> cognizant how the related global-scope eclass design affects everyone
>> running or working on the raw repo. I hope devs continue leveraging the
>> relatively recent benchmark tooling (and perhaps more future support) to
>> improve their work. Along those lines, it could be nice to see sample
>> benchmark data in commit messages for large, global-scope eclass work
>> just to reinforce that it was taken into account.
>> 
>> Tim
>> 
>
> I've been following the EGO_SUM thread for quite some time now. One other thing
> I did not see mentioned in favour of EGO_SUM so far: reproducibility.
>
> The problem with external tarballs is that they are gone once the ebuild is
> dropped from the tree. Should a user ever want to roll back to a previous
> version of an application, either by checking out on older version of the
> portage tree or copying said ebuild into their local overlay, they still cannot
> simply run an emerge on the it as they have to somehow recreate the tarball
> itself too.

I believe Hank's email coves this.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-03 10:17           ` Florian Schmaus
  2023-07-03 11:12             ` [gentoo-dev] EGO_SUM Ulrich Mueller
@ 2023-07-08 21:21             ` Sam James
  1 sibling, 0 replies; 23+ messages in thread
From: Sam James @ 2023-07-08 21:21 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev, Sam James

[-- Attachment #1: Type: text/plain, Size: 4314 bytes --]


Florian Schmaus <flow@gentoo.org> writes:

> [[PGP Signed Part:Undecided]]
> On 30/06/2023 10.22, Sam James wrote:
>> Florian Schmaus <flow@gentoo.org> writes:
>>> [[PGP Signed Part:Undecided]]
>>> [in reply to a gentoo-project@ post, but it was asked to continue this
>>> on gentoo-dev@]
>>> On 28/06/2023 16.46, Sam James wrote:
>>>> and questions remain unanswered on the
>>>> ML (why not implement a check in pkgcheck similar to what is in Portage,
>>>> for example)?
>>>
>>> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for
>>> the total package-directory size. I only care a little about the tool
>>> that checks this limit, but pkgcheck is an obvious choice. I also
>>> suggested that we review this policy once the number of Go packages
>>> has doubled or two years after this policy was established (whatever
>>> comes first).
>>>
>>> But I fear you may be referring to another kind of check. You may be
>>> talking about a check that forbids EGO_SUM in ::gentoo but allows it
>>> overlays.
>> My position on this has been consistent:  > a check is needed to
>> statically
>> determine when the environment size is too big. Copying the Portage
>> check into pkgcheck (in terms of the metrics) would satisfy this.
>
> It is not as easy as merely copying existing portage code into
> pkgcheck (unless I am missing something).
>

That's why I said "in terms of the metrics".

> I've talked to arthurzam, and there appears to be a .environment file
> created by pkgcheck, which we could use to approximate the exported
> environment.
>
> Another option would be to have pkgcheck count the EGO_SUM
> entries. The tree-sitter API for Bash, which pkgcheck already uses,
> seems to allow for that. But that would be different from the check in
> portage. Although, IMHO, counting EGO_SUM entries would be sufficient.

Right.

>
>
>> That is, regardless of raw size, I'm asking for a calculation based on
>> the contents of EGO_SUM where, if exceeded, the package will not be
>> installable on some systems. You didn't have an issue implementing this
>> for Portage and I've mentioned this a bunch of times since, so I thought
>> it was clear what I was hoping to see.
>
> So pkgcheck counting EGO_SUM entries would be sufficient for the
> purpose of having a static check that notices if the ebuild would
> likely run into the environment limit?
>

If you check it actually fires in some of the old broken scenarios
(see Bugzilla), then yes. But I'd be interested in your thoughts on
radhermit's reply (please reply there).

> To find a common compromise, I would possibly invest my time in
> developing such a test. Even though I do not deem such a check a
> strict prerequisite to reintroduce EGO_SUM.

Yes, you've made clear you disagree.

>
>
>>> Intelligibly, EGO_SUM can be considered ugly. Compared to a
>>> traditional Gentoo package, EGO_SUM-based ones are larger. The same is
>>> true for Rust packages. However, looking at the bigger picture,
>>> EGO_SUM's advantages outweigh its disadvantages.
>>>
>> Again, am on record as being fine with the general EGO_SUM approach,
>> even if I wish we didn't need it, as I see it as inevitable for things
>> like yarn, .NET, and of course Rust as we already have it.
>> Just ideally not huge ones, and certainly not huge ones which then
>> aren't even reliably installable because of environment size.
>
> Talking about "reliably installable" makes it sound to me like there
> are cases where installing a EGO_SUM-based package sometimes works and
> sometimes not. But the kernel-limit is fixed and not even
> configurable, besides, of course patching the source (and in the
> absence of architectures with a page size below 4 KiB) [1].
>

ulm's reply notes that this is a limitation in the Linux kernel, so I
have no idea why musl tinderboxes seemed to disproportionately hit these
issues and I assume one of us either missing something or it was just
a crazy fluke.

> Any developer testing whether or notan ebuild is installable would
> become immediately aware if the ebuild runs into the environment
> limit, or not.
>

This clearly didn't happen with the previous examples (see what I said
above too), as there were times when they installed for some people, but
not in CI/tinderboxes. I don't know why and it merits investigation.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: Re: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
       [not found]         ` <CAAr7Pr9+zq2NV=7zhj5e+4LWOmNavCrfMstNTqkthk5uxQVNtg@mail.gmail.com>
@ 2023-07-14  7:14           ` Florian Schmaus
  2023-07-14  7:33             ` Sam James
  2023-07-14  8:39             ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees Ulrich Mueller
  0 siblings, 2 replies; 23+ messages in thread
From: Florian Schmaus @ 2023-07-14  7:14 UTC (permalink / raw
  To: Alec Warner, gentoo-dev; +Cc: gentoo-project


[-- Attachment #1.1.1: Type: text/plain, Size: 7592 bytes --]

Posted to gentoo-dev@ since we are now entering a technical discussion 
again.

For those who did not follow gentoo-project@, the previous posts include:

https://marc.info/?l=gentoo-project&m=168918875000738&w=2
https://marc.info/?l=gentoo-project&m=168881103930591&w=2

On 12/07/2023 21.28, Alec Warner wrote:
> On Wed, Jul 12, 2023 at 12:07 PM Florian Schmaus <flow@gentoo.org> wrote:
>> Apologies for not replying to everyone individually.
>>
>> I thank my fellow council candidates who took the time to reply to this
>> sensitive and obviously controversial matter. I understand that not
>> everyone feels comfortable taking a stance in this discussion.
>>
>> I asked the other council candidates about their opinion on EGO_SUM.
>> Unfortunately, some replies included only a rather shallow answer. A few
>> focused on criticism of my actions and how I approach the issue. Which
>> is obviously fine. I read it all and have empathy for everyone who feels
>> aggravated. You may or may not share the complaints. But let us focus on
>> the actual matter for a moment.
>>
>> Even the voices raised for a restricted reintroduction of EGO_SUM just
>> mention an abstract limit [1]. A concrete limit is not mentioned,
>> although I asked for it and provided my idea including specific limits.
>> Not knowing the concrete figures others have in mind makes it difficult
>> to find a compromise. For example, a fellow council candidate postulated
>> that it would be quicker for me to implement a limit-check in pkgcheck
>> than discuss EGO_SUM. I wish that were the case. Unfortunately it is
>> potentially not trivial to implement if we want such a check to be
>> robust. But even worse, a specific limit must be known before
>> implementing such a check. And we currently have none.
> 
> I think my concern here is that I don't expect the Council to really
> 'vote on a specific limit.' The limit is an implementation detail, it
> can change, it shouldn't require a council vote to change.
> 
> So my advice is "pick something reasonable that you think holds up to
> scrutiny, and implement with that" and "expect the limit to change,
> either because of the scrutiny, or because it might change in the
> future" and implement your check accordingly (so e.g. the limit is
> easily changeable.)

Please find below why this may not be enough.


>> But the real crux of an EGO_SUM reintroduction with a limit is the
>> following. Either the limit is too restrictive, and most packages are
>> affected by it and can not use EGO_SUM, which ultimately only
>> corresponds to the current state. Or the limit only affects a fraction
>> of the packages, so you should not bother having a limit.
> 
> Again the idea is there is already a limit ( the aforementioned
> environment limit ) and one of the goals is to have a QA check that
> says your ebuild is approaching that limit so you can do something
> productive about it, as well as to avoid ebuilds that are not
> installable. So just implement that. If you need a number, I think
> "90% of the env limit" is defensible (but again, any reasonable number
> will do fine.)

EGO_SUM affects two dimensions that could be limited/restricted:
A) the process environment, which may run into the Linux kernel
    environment limit on exec(3)
B) the size of the package directory, where EGO_SUM affects the size of
    ebuilds and the Manifest

I would be happy to put in any effort required to implement A) in 
pkgcheck, as I did for portage, if this check is the only thing that 
keeps us from reintroducing EGO_SUM.

Unfortunately, some argue that we need to limit B). Much of the effort I 
put into researching the EGO_SUM situation was analyzing how EGO_SUM's 
impact on package-directory size affects Gentoo. The result of the 
analysis strongly indicates that rather large package-directories can be 
sustained by ::gentoo in the foreseeable future. Especially since we are 
only talking about ~250 EGO_SUM packages currently, and a significant 
fraction of those packages will not have enormous package directories. 
And I also suggested that the policy is reconsidered at least every two 
years or once the number of EGO_SUM packages has doubled (whatever comes 
first).

My investigation of the history of EGO_SUM's deprecation has not 
surfaced any technical issue which justified EGO_SUM's deprecation with 
regard to B). It appears that technical issues do not drive the desire 
to limit B), but by esthetic preferences, which are highly subjective.

A), however, is a different beast. There is undeniably a kernel-enforced 
limit that we could hit due to an extremely large EGO_SUM (among other 
things). However, the only bug report I know that runs into this kernel 
limit was with texlive (bug #719202). The low number of recorded bugs 
caused by the environment limit matches with the fact that even the 
ebuild with the most EGO_SUM entries that I ever analyzed, 
app-containers/cri-o-1.23.1 (2022-02-16) with 2052 EGO_SUM entries, does 
*not* run into the environment limit.


>> The deprecation of EGO_SUM was and is unnecessary, a security issue, and
>> was almost wholly *not* driven by technical problems. EGO_SUM should be
>> re-instated.
>>
>> I know that some think likewise. I also know that others disagree. The
>> latter group includes some prominent and visible Gentoo developers.
>> People to whom I am thankful for their work on Gentoo and to whom Gentoo
>> owes a lot. However, it is unclear what the majority of Gentoo
>> developers thinks. I could very well be that the consensus amongst
>> Gentoo developers agrees with some of my fellow council candidates and
>> would like to keep the current state. It would be great if we find that
>> out. If we had a mechanism to perform a non-binding opinion poll amongst
>> Gentoo developers, and if that poll turns out that the consensus is to
>> keep EGO_SUM deprecated, then I could save myself a lot of time and effort.
> 
> I'm confused why you are asking about the 'consensus amongst
> developers' and then ask the council to vote.

If I knew that the majority of Gentoo developer's is fine with the 
deprecation of EGO_SUM, then I would not put in effort in re-instating 
EGO_SUM.


>> However, as of now, my conscience demands that I try to improve this
>> situation for the sake of our users. In a previous mail, I wrote that I
>> seek closure by asking the council to vote on that matter. And I will,
>> of course, accept any outcome of that vote.
> 
> My impression of the situation is that:
>   - Currently if asked, the council would likely vote no.
>   - They have requested you implement a QA check with a limit, and if
> you did that, many swing voters would vote yes.
> 
> My guidance from above is "implement the check with some reasonable
> limit" to unblock your swing voters, so they vote yes...
> 
> We don't need everyone to vote on what the limit is ..it's just
> wasting time IMHO.

It is not about everyone voting on that matter.

It is about asking everyone of their opinion on that matter, in a 
non-binding opinion poll where multiple options can be ranked [1]. 
Chances are that this would surface the consensus amongst Gentoo 
developers, and ideally, the Council would take the result of the poll 
into consideration when voting on that matter.

- Flow


1: I think that it is probably trivial to re-purpose our current voting 
infrastructure to perform opinion poll using the condorcet method.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: Re: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-14  7:14           ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: " Florian Schmaus
@ 2023-07-14  7:33             ` Sam James
  2023-07-14  8:19               ` Sam James
  2023-07-14  9:07               ` Florian Schmaus
  2023-07-14  8:39             ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees Ulrich Mueller
  1 sibling, 2 replies; 23+ messages in thread
From: Sam James @ 2023-07-14  7:33 UTC (permalink / raw
  To: gentoo-project; +Cc: Alec Warner, gentoo-dev


Florian Schmaus <flow@gentoo.org> writes:

> [[PGP Signed Part:Undecided]]
> Posted to gentoo-dev@ since we are now entering a technical discussion
> again.
>
> For those who did not follow gentoo-project@, the previous posts include:
>
> https://marc.info/?l=gentoo-project&m=168918875000738&w=2
> https://marc.info/?l=gentoo-project&m=168881103930591&w=2
>
> On 12/07/2023 21.28, Alec Warner wrote:
>> On Wed, Jul 12, 2023 at 12:07 PM Florian Schmaus <flow@gentoo.org> wrote:
>>> Apologies for not replying to everyone individually.
>>>
>>> I thank my fellow council candidates who took the time to reply to this
>>> sensitive and obviously controversial matter. I understand that not
>>> everyone feels comfortable taking a stance in this discussion.
>>>
>>> I asked the other council candidates about their opinion on EGO_SUM.
>>> Unfortunately, some replies included only a rather shallow answer. A few
>>> focused on criticism of my actions and how I approach the issue. Which
>>> is obviously fine. I read it all and have empathy for everyone who feels
>>> aggravated. You may or may not share the complaints. But let us focus on
>>> the actual matter for a moment.
>>>
>>> Even the voices raised for a restricted reintroduction of EGO_SUM just
>>> mention an abstract limit [1]. A concrete limit is not mentioned,
>>> although I asked for it and provided my idea including specific limits.
>>> Not knowing the concrete figures others have in mind makes it difficult
>>> to find a compromise. For example, a fellow council candidate postulated
>>> that it would be quicker for me to implement a limit-check in pkgcheck
>>> than discuss EGO_SUM. I wish that were the case. Unfortunately it is

I think this misrepresents my point. All I said was that a bound should
be added matching what's in Portage right now.

Please in future respond to me directly if you're going to claim something about what I've said.

> [...]
> EGO_SUM affects two dimensions that could be limited/restricted:
> A) the process environment, which may run into the Linux kernel
>    environment limit on exec(3)
> B) the size of the package directory, where EGO_SUM affects the size of
>    ebuilds and the Manifest
>
> [...]
>
> A), however, is a different beast. There is undeniably a
> kernel-enforced limit that we could hit due to an extremely large
> EGO_SUM (among other things). However, the only bug report I know that
> runs into this kernel limit was with texlive (bug #719202). The low
> number of recorded bugs caused by the environment limit matches with
> the fact that even the ebuild with the most EGO_SUM entries that I
> ever analyzed, app-containers/cri-o-1.23.1 (2022-02-16) with 2052
> EGO_SUM entries, does *not* run into the environment limit.
>

I thought I'd gave you a list before, but maybe it was someone else.

Anyway, a non-exhaustive list (I remember maybe two more but I got bored):
* https://bugs.gentoo.org/829545 ("app-admin/vault-1.9.1 - find: The environment is too large for exec().")
* https://bugs.gentoo.org/829684 ("app-metrics/prometheus-2.31.1 - find: The environment is too large for exec().")
* https://bugs.gentoo.org/830187 (you're CC'd on this) ("go lang ebuild: SRC_URI too long that it causes "Argument list too long" error")
* https://bugs.gentoo.org/831265 ("sys-cluster/minikube-1.24.0 - find: The environment is too large for exec().")
* a0be89b772474e3336d3de699d71482aa89d2444 ("app-emulation/nerdctl: drop 0.14.0")

Other related bugs (as it's useful as a summary of where we are):
* https://bugs.gentoo.org/540146 ("sys-apps/portage: limit no of exported variables in EAPI 6")
* https://bugs.gentoo.org/720180 ("sys-apps/portage: add support to delay export of "A" variable until last moment")
* https://bugs.gentoo.org/721088 ("[Future EAPI] Don't export A")
* https://bugs.gentoo.org/833567 ("[Future EAPI] src_fetch_extra phase the runs after src_unpack")

I am not aware of a bug (yet?) for radhermit's suggestion wrt external
helpers which is related but different to exporting fewer variables.

thanks,
sam


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: Re: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-14  7:33             ` Sam James
@ 2023-07-14  8:19               ` Sam James
  2023-07-14  9:07               ` Florian Schmaus
  1 sibling, 0 replies; 23+ messages in thread
From: Sam James @ 2023-07-14  8:19 UTC (permalink / raw
  To: Sam James; +Cc: gentoo-project, Alec Warner, gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 4933 bytes --]


Sam James <sam@gentoo.org> writes:

> Florian Schmaus <flow@gentoo.org> writes:
>
>> [[PGP Signed Part:Undecided]]
>> Posted to gentoo-dev@ since we are now entering a technical discussion
>> again.
>>
>> For those who did not follow gentoo-project@, the previous posts include:
>>
>> https://marc.info/?l=gentoo-project&m=168918875000738&w=2
>> https://marc.info/?l=gentoo-project&m=168881103930591&w=2
>>
>> On 12/07/2023 21.28, Alec Warner wrote:
>>> On Wed, Jul 12, 2023 at 12:07 PM Florian Schmaus <flow@gentoo.org> wrote:
>>>> Apologies for not replying to everyone individually.
>>>>
>>>> I thank my fellow council candidates who took the time to reply to this
>>>> sensitive and obviously controversial matter. I understand that not
>>>> everyone feels comfortable taking a stance in this discussion.
>>>>
>>>> I asked the other council candidates about their opinion on EGO_SUM.
>>>> Unfortunately, some replies included only a rather shallow answer. A few
>>>> focused on criticism of my actions and how I approach the issue. Which
>>>> is obviously fine. I read it all and have empathy for everyone who feels
>>>> aggravated. You may or may not share the complaints. But let us focus on
>>>> the actual matter for a moment.
>>>>
>>>> Even the voices raised for a restricted reintroduction of EGO_SUM just
>>>> mention an abstract limit [1]. A concrete limit is not mentioned,
>>>> although I asked for it and provided my idea including specific limits.
>>>> Not knowing the concrete figures others have in mind makes it difficult
>>>> to find a compromise. For example, a fellow council candidate postulated
>>>> that it would be quicker for me to implement a limit-check in pkgcheck
>>>> than discuss EGO_SUM. I wish that were the case. Unfortunately it is
>
> I think this misrepresents my point. All I said was that a bound should
> be added matching what's in Portage right now.
>
> Please in future respond to me directly if you're going to claim something about what I've said.
>
>> [...]
>> EGO_SUM affects two dimensions that could be limited/restricted:
>> A) the process environment, which may run into the Linux kernel
>>    environment limit on exec(3)
>> B) the size of the package directory, where EGO_SUM affects the size of
>>    ebuilds and the Manifest
>>
>> [...]
>>
>> A), however, is a different beast. There is undeniably a
>> kernel-enforced limit that we could hit due to an extremely large
>> EGO_SUM (among other things). However, the only bug report I know that
>> runs into this kernel limit was with texlive (bug #719202). The low
>> number of recorded bugs caused by the environment limit matches with
>> the fact that even the ebuild with the most EGO_SUM entries that I
>> ever analyzed, app-containers/cri-o-1.23.1 (2022-02-16) with 2052
>> EGO_SUM entries, does *not* run into the environment limit.
>>
>
> I thought I'd gave you a list before, but maybe it was someone else.
>
> Anyway, a non-exhaustive list (I remember maybe two more but I got bored):
> * https://bugs.gentoo.org/829545 ("app-admin/vault-1.9.1 - find: The environment is too large for exec().")
> * https://bugs.gentoo.org/829684 ("app-metrics/prometheus-2.31.1 - find: The environment is too large for exec().")
> * https://bugs.gentoo.org/830187 (you're CC'd on this) ("go lang ebuild: SRC_URI too long that it causes "Argument list too long" error")
> * https://bugs.gentoo.org/831265 ("sys-cluster/minikube-1.24.0 - find: The environment is too large for exec().")
> * a0be89b772474e3336d3de699d71482aa89d2444 ("app-emulation/nerdctl: drop 0.14.0")
>

Sorry, as I said this, I came across some more. These are the ones I was
thinking of:
* https://bugs.gentoo.org/830266 ("app-admin/filebeat-7.16.2 fails to compile: Assertion failed: bc_ctl.arg_max >= LINE_MAX (xargs.c: main: 511)")
* https://bugs.gentoo.org/832964 ("sys-cluster/kops-1.21.0 fails to compile: Assertion failed: bc_ctl.arg_max >= LINE_MAX (xargs.c: main: 511)")
* https://bugs.gentoo.org/833961 ("net-p2p/go-ipfs-0.11.0 - Assertion failed: bc_ctl.arg_max >= LINE_MAX (xargs.c: main: 511)")
* https://bugs.gentoo.org/835712 ("dev-util/packer-1.7.9 fails to compile: Assertion failed: bc_ctl.arg_max >= LINE_MAX (xargs.c: main: 511)")

> Other related bugs (as it's useful as a summary of where we are):
> * https://bugs.gentoo.org/540146 ("sys-apps/portage: limit no of exported variables in EAPI 6")
> * https://bugs.gentoo.org/720180 ("sys-apps/portage: add support to delay export of "A" variable until last moment")
> * https://bugs.gentoo.org/721088 ("[Future EAPI] Don't export A")
> * https://bugs.gentoo.org/833567 ("[Future EAPI] src_fetch_extra phase the runs after src_unpack")
>
> I am not aware of a bug (yet?) for radhermit's suggestion wrt external
> helpers which is related but different to exporting fewer variables.
>
> thanks,
> sam


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [gentoo-dev] Re: Flow's Manifesto and questions for nominees
  2023-07-14  7:14           ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: " Florian Schmaus
  2023-07-14  7:33             ` Sam James
@ 2023-07-14  8:39             ` Ulrich Mueller
  1 sibling, 0 replies; 23+ messages in thread
From: Ulrich Mueller @ 2023-07-14  8:39 UTC (permalink / raw
  To: Florian Schmaus; +Cc: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

>>>>> On Fri, 14 Jul 2023, Florian Schmaus wrote:

> Posted to gentoo-dev@ since we are now entering a technical discussion
> again.

Please avoid crossposting, because that doesn't work well. (For example,
the posting will have different Reply-To headers in gentoo-project and
in gentoo-dev.)

Ulrich

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 507 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: Re: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
  2023-07-14  7:33             ` Sam James
  2023-07-14  8:19               ` Sam James
@ 2023-07-14  9:07               ` Florian Schmaus
  1 sibling, 0 replies; 23+ messages in thread
From: Florian Schmaus @ 2023-07-14  9:07 UTC (permalink / raw
  To: gentoo-dev; +Cc: Sam James


[-- Attachment #1.1.1: Type: text/plain, Size: 5220 bytes --]

On 14/07/2023 09.33, Sam James wrote:
> 
> Florian Schmaus <flow@gentoo.org> writes:
> 
>> [[PGP Signed Part:Undecided]]
>> Posted to gentoo-dev@ since we are now entering a technical discussion
>> again.
>>
>> For those who did not follow gentoo-project@, the previous posts include:
>>
>> https://marc.info/?l=gentoo-project&m=168918875000738&w=2
>> https://marc.info/?l=gentoo-project&m=168881103930591&w=2
>>
>> On 12/07/2023 21.28, Alec Warner wrote:
>>> On Wed, Jul 12, 2023 at 12:07 PM Florian Schmaus <flow@gentoo.org> wrote:
>>>> Apologies for not replying to everyone individually.
>>>>
>>>> I thank my fellow council candidates who took the time to reply to this
>>>> sensitive and obviously controversial matter. I understand that not
>>>> everyone feels comfortable taking a stance in this discussion.
>>>>
>>>> I asked the other council candidates about their opinion on EGO_SUM.
>>>> Unfortunately, some replies included only a rather shallow answer. A few
>>>> focused on criticism of my actions and how I approach the issue. Which
>>>> is obviously fine. I read it all and have empathy for everyone who feels
>>>> aggravated. You may or may not share the complaints. But let us focus on
>>>> the actual matter for a moment.
>>>>
>>>> Even the voices raised for a restricted reintroduction of EGO_SUM just
>>>> mention an abstract limit [1]. A concrete limit is not mentioned,
>>>> although I asked for it and provided my idea including specific limits.
>>>> Not knowing the concrete figures others have in mind makes it difficult
>>>> to find a compromise. For example, a fellow council candidate postulated
>>>> that it would be quicker for me to implement a limit-check in pkgcheck
>>>> than discuss EGO_SUM. I wish that were the case. Unfortunately it is
> 
> I think this misrepresents my point. All I said was that a bound should
> be added matching what's in Portage right now.
> 
> Please in future respond to me directly if you're going to claim something about what I've said.
> 
>> [...]
>> EGO_SUM affects two dimensions that could be limited/restricted:
>> A) the process environment, which may run into the Linux kernel
>>     environment limit on exec(3)
>> B) the size of the package directory, where EGO_SUM affects the size of
>>     ebuilds and the Manifest
>>
>> [...]
>>
>> A), however, is a different beast. There is undeniably a
>> kernel-enforced limit that we could hit due to an extremely large
>> EGO_SUM (among other things). However, the only bug report I know that
>> runs into this kernel limit was with texlive (bug #719202). The low
>> number of recorded bugs caused by the environment limit matches with
>> the fact that even the ebuild with the most EGO_SUM entries that I
>> ever analyzed, app-containers/cri-o-1.23.1 (2022-02-16) with 2052
>> EGO_SUM entries, does *not* run into the environment limit.
>>
> 
> I thought I'd gave you a list before, but maybe it was someone else.
> 
> Anyway, a non-exhaustive list (I remember maybe two more but I got bored):
> * https://bugs.gentoo.org/829545 ("app-admin/vault-1.9.1 - find: The environment is too large for exec().")
> * https://bugs.gentoo.org/829684 ("app-metrics/prometheus-2.31.1 - find: The environment is too large for exec().")
> * https://bugs.gentoo.org/830187 (you're CC'd on this) ("go lang ebuild: SRC_URI too long that it causes "Argument list too long" error")
> * https://bugs.gentoo.org/831265 ("sys-cluster/minikube-1.24.0 - find: The environment is too large for exec().")
> * a0be89b772474e3336d3de699d71482aa89d2444 ("app-emulation/nerdctl: drop 0.14.0")

Thanks for providing this valuable information, Sam. I was indeed not 
aware of those bugs. They all seem to be fixed before 2022-02-16, that 
is the date of the ::gentoo tree I mostly analyzed (which was selected 
because it was just before EGO_SUM was deprecated).

Limiting the process environment to 90% of the kernel-enforced limit, as 
antarus also suggested (potentially by approximating the EGO_SUM 
entries) would have probably prevented those bugs. As I previously 
wrote, I would be happy to work on a pkgcheck for that, if the limit is 
only about the kernel's process environment limit (A).

However this still leaves us with some that seem to also demand a limit 
with regard to the package-directory size (B).


> Other related bugs (as it's useful as a summary of where we are):
> * https://bugs.gentoo.org/540146 ("sys-apps/portage: limit no of exported variables in EAPI 6")
> * https://bugs.gentoo.org/720180 ("sys-apps/portage: add support to delay export of "A" variable until last moment")
> * https://bugs.gentoo.org/721088 ("[Future EAPI] Don't export A")
> * https://bugs.gentoo.org/833567 ("[Future EAPI] src_fetch_extra phase the runs after src_unpack")
> 
> I am not aware of a bug (yet?) for radhermit's suggestion wrt external
> helpers which is related but different to exporting fewer variables.

Improving, that is, reducing, what portage exports to child processes of 
the ebuild is sensible. But it is only indirectly related to EGO_SUM and 
not a strict prerequisite to re-introduce EGO_SUM.

- Flow


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17273 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-07-14  9:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2ZKWN4KF.MKEFFMWE.LGPKYP47@RTL7EJXF.RN4PF6UF.MDFBGF3C>
     [not found] ` <be450641-94ff-a0d9-51da-3a7a3abcc6c7@gentoo.org>
     [not found]   ` <b7309a3f-2980-b390-a16a-0518cce1da75@gentoo.org>
     [not found]     ` <87y1k33aoy.fsf@gentoo.org>
2023-06-30  8:15       ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Florian Schmaus
2023-06-30  8:22         ` Sam James
2023-06-30  9:38           ` Tim Harder
2023-06-30 11:33             ` Eray Aslan
2023-07-03 10:17               ` Florian Schmaus
2023-07-04  7:13                 ` Tim Harder
2023-07-04 10:44                   ` Gerion Entrup
2023-07-04 21:56                     ` Robin H. Johnson
2023-07-04 23:09                       ` Oskari Pirhonen
2023-07-05 18:40                         ` Gerion Entrup
2023-07-05 19:32                           ` Rich Freeman
2023-07-06  2:48                           ` Oskari Pirhonen
2023-07-06  6:09                   ` Zoltan Puskas
2023-07-06 19:46                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open Hank Leininger
2023-07-08 20:49                     ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
2023-07-03 10:17           ` Florian Schmaus
2023-07-03 11:12             ` [gentoo-dev] EGO_SUM Ulrich Mueller
2023-07-08 21:21             ` [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.) Sam James
     [not found]     ` <cdf5ddb7-8f65-74cf-5594-3e3eec86c915@gentoo.org>
     [not found]       ` <1913d3c2-5f54-acea-0ed3-930371ea1884@gentoo.org>
     [not found]         ` <CAAr7Pr9+zq2NV=7zhj5e+4LWOmNavCrfMstNTqkthk5uxQVNtg@mail.gmail.com>
2023-07-14  7:14           ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees (was: " Florian Schmaus
2023-07-14  7:33             ` Sam James
2023-07-14  8:19               ` Sam James
2023-07-14  9:07               ` Florian Schmaus
2023-07-14  8:39             ` [gentoo-dev] Re: Flow's Manifesto and questions for nominees Ulrich Mueller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox