public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Breakage and frustration
@ 2015-12-13 17:36 Patrick Lauer
  2015-12-13 21:24 ` Sergei Trofimovich
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Patrick Lauer @ 2015-12-13 17:36 UTC (permalink / raw
  To: gentoo-dev

Broken breakage


tl;dr: Stuff is broken, and no one seems to care



In August the git "migration" happened, moving our main repository from
old stupid cvs to modern shiny git.

Well, migration is not the word I'd use, because this was an untested
forced migration that is now, months later, still suffering from
regressions and failures. But, hey, no more cvs! That's good because,
reasons.

So at first there was [1] the lack of proper Manifests. Which broke
things for all rsync users for  a few hours.

Once that was 'fixed' there was the fun of [2], which made emerge --sync
very expensive because it refetched lots of files. Every time.

The 'fix' to the fix of the fix for that is still in progress ...


And a few little things like [3] happened. Oopsie.
Also there seems to be confusion how git works, leading to hilarity like
[4].

Now [5] was reported. Who needs ChangeLogs, this is git! Except for all
users, who don't get ChangeLogs. Well, let's add them and [6] not test
what happens. Guess what? Stuff breaks more.
And they are added backwards so that emerge --changelog fails in a
different way. No, I didn't want to read all changlog except the part I
cared about.

And fixing that introduces [7] some more regressions that broke updating
@system for about 3.5 days.

The fix to that fix (notice a pattern here?) broke rsync for *all* users
[8]. Almost as if no one ever tests things in a test environment ... but
hey, we're agile, let's fix stuff in production!

And the manifest issues are still [9] making life exciting.

So to summarize, in about 5 months there was user-visible breakage:

- ~1 day downtime for git migration (no updates)
- 8h no Manifests (no updates possible)
- a few days of emerge --sync being stupidly slow
- a few days of emerge-webrsync not updating
- about 3 months of emerge --changelog being broken, just to be broken
in a different way
- 3.5 days of emerge @system being broken
- about a day of emerge --sync needing manual interaction to be able to
update again
- a few days of grub being uninstallable (iow, making installing
impossible for many users)

So all in all emerge --sync && emerge -uND @system being down for >10%
of the time.

Now, I don't know if you use Gentoo, but I do, and I use it at work, so
having this level of randomization happen is not really useful.

Tell me then, please - what can I/we do so that this kind of breakage
stops, and we can actually aim at having a most excellent distro? In the
long run I am considering just creating my own clone of all
infrastructure bits so I can fix things, but it's an option that is
needlessly braindead, wasting effort, and not really useful to users
that are not me.


[1] https://bugs.gentoo.org/show_bug.cgi?id=557184
[2] https://bugs.gentoo.org/show_bug.cgi?id=557192
[3] https://bugs.gentoo.org/show_bug.cgi?id=557344
[4] https://bugs.gentoo.org/show_bug.cgi?id=557400
[5] https://bugs.gentoo.org/show_bug.cgi?id=557826
[6] https://bugs.gentoo.org/show_bug.cgi?id=565574
[7] https://bugs.gentoo.org/show_bug.cgi?id=565694
[8] https://bugs.gentoo.org/show_bug.cgi?id=567074
[9] https://bugs.gentoo.org/show_bug.cgi?id=567830


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-13 17:36 [gentoo-dev] Breakage and frustration Patrick Lauer
@ 2015-12-13 21:24 ` Sergei Trofimovich
  2015-12-13 21:41 ` Robin H. Johnson
  2015-12-14  2:15 ` Rich Freeman
  2 siblings, 0 replies; 8+ messages in thread
From: Sergei Trofimovich @ 2015-12-13 21:24 UTC (permalink / raw
  To: gentoo-dev; +Cc: patrick

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On Sun, 13 Dec 2015 18:36:41 +0100
Patrick Lauer <patrick@gentoo.org> wrote:

> Broken breakage
> 
> 
> tl;dr: Stuff is broken, and no one seems to care
> ...
> [1] https://bugs.gentoo.org/show_bug.cgi?id=557184
RESOLVED
> [2] https://bugs.gentoo.org/show_bug.cgi?id=557192
RESOLVED
> [3] https://bugs.gentoo.org/show_bug.cgi?id=557344
RESOLVED
> [4] https://bugs.gentoo.org/show_bug.cgi?id=557400
RESOLVED
> [5] https://bugs.gentoo.org/show_bug.cgi?id=557826
RESOLVED
> [6] https://bugs.gentoo.org/show_bug.cgi?id=565574
RESOLVED
> [7] https://bugs.gentoo.org/show_bug.cgi?id=565694
RESOLVED
> [8] https://bugs.gentoo.org/show_bug.cgi?id=567074
RESOLVED
> [9] https://bugs.gentoo.org/show_bug.cgi?id=567830
RESOLVED

Tried to follow the email and failed to find unfixed
problems. I believe focusing on actual issues would help.

What is broken currently? Do relevant people know they
are broken (= are there open bugs)?

Or is it an email thread about how not to break things ever
in future?

-- 

  Sergei

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-13 17:36 [gentoo-dev] Breakage and frustration Patrick Lauer
  2015-12-13 21:24 ` Sergei Trofimovich
@ 2015-12-13 21:41 ` Robin H. Johnson
  2015-12-14  2:15 ` Rich Freeman
  2 siblings, 0 replies; 8+ messages in thread
From: Robin H. Johnson @ 2015-12-13 21:41 UTC (permalink / raw
  To: gentoo-dev

TL;DR summary:
Yes, stuff has broken, but I'd call them reasonable teething issues well
distributed through the stack, and to be compared to the CVS server
moves from a decade ago, rather than CVS just before the Git switch.

On Sun, Dec 13, 2015 at 06:36:41PM +0100, Patrick Lauer wrote:
... (mail re-ordered for related issues)
> Once that was 'fixed' there was the fun of [2], which made emerge --sync
> very expensive because it refetched lots of files. Every time.
The fix for this caused these two bugs:
> And fixing that introduces [7] some more regressions that broke updating
> @system for about 3.5 days.
> - a few days of grub being uninstallable (iow, making installing
> impossible for many users)
> And the manifest issues are still [9] making life exciting.
Bug #567920 describes the issue very succinctly (mtime of a deleted file
needs to be included in the new Manifest mtime calculation).

Both of them can be worked around if the entire path (all staging nodes,
servers and end users) uses --checksum, but that's even more expensive.

I have another work-around idea, and that's simply appending a comment
of the latest commit per directory to the changelog, because that will
trigger the manifest being different ;-).

> The fix to that fix (notice a pattern here?) broke rsync for *all* users
> [8]. Almost as if no one ever tests things in a test environment ... but
> hey, we're agile, let's fix stuff in production!
We still never figured out how .git came to be added to the outgoing
data. It was NOT the rsync into staging directory, because it was only
the directory structure, and none of the files. --exclude=.git WAS used
in most of the ryncs, but not the final ones from staging to tree
distribution.

> - about 3 months of emerge --changelog being broken, just to be broken
> in a different way
This change (order of changelog entries) was explicitly to reduce your
complaint of prior excess traffic. Why Portage's parsing of the
ChangeLogs is still not handling it is an open question.

> - 3.5 days of emerge @system being broken
> - about a day of emerge --sync needing manual interaction to be able to
> update again
You missed one:
- rsync generation now halts if somebody committed some breakage to the
  tree (missing DIST entry, bad eclass).

> 
> So all in all emerge --sync && emerge -uND @system being down for >10%
> of the time.
> 
> Now, I don't know if you use Gentoo, but I do, and I use it at work, so
> having this level of randomization happen is not really useful.
> 
> Tell me then, please - what can I/we do so that this kind of breakage
> stops, and we can actually aim at having a most excellent distro? In the
> long run I am considering just creating my own clone of all
> infrastructure bits so I can fix things, but it's an option that is
> needlessly braindead, wasting effort, and not really useful to users
> that are not me.
Your own infra option would NOT have fixed [2]/[7]/[9].

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-13 17:36 [gentoo-dev] Breakage and frustration Patrick Lauer
  2015-12-13 21:24 ` Sergei Trofimovich
  2015-12-13 21:41 ` Robin H. Johnson
@ 2015-12-14  2:15 ` Rich Freeman
  2015-12-14 10:48   ` Peter Stuge
  2 siblings, 1 reply; 8+ messages in thread
From: Rich Freeman @ 2015-12-14  2:15 UTC (permalink / raw
  To: gentoo-dev

On Sun, Dec 13, 2015 at 12:36 PM, Patrick Lauer <patrick@gentoo.org> wrote:
> In the long run I am considering just creating my own clone of all
> infrastructure bits so I can fix things

I just wanted to comment that things like this should never be viewed
as a bad thing.  Many contributions to Gentoo arose because somebody
basically took something they saw and made an unofficial fork they
improved upon.  That may or may not have later become official, but it
is often beneficial all the same.

I'd just suggest that anybody doing this document what they do and
publish their sources, so that others can do the same more easily.

My ideal state for Gentoo would be one where very little of what we do
is centrally housed at all, and infra is just a bunch of servers we
tend to use as a matter of convention, but anybody can clone one at
any time and set up their own.  Federated authentication could let
people login to various services without having to actually have
access to anything sensitive.  This would make it much easier for
anybody to offer new services, or to offer patches to infra, and then
infra could perhaps be more in the role of a trusted gatekeeper and
spend less time having to implement every little thing.

That said, it is something that would need to be worked towards.  My
understanding is that infra is already trying to keep the sensitive
stuff separate from the mundane in their newer work, but I'm sure they
have a ton of old stuff that hasn't been migrated in this fashion.  As
with many teams they're probably a bit overloaded, so a big question
is how to make it happen without just throwing complaints on the folks
who are trying their best to keep it all going.

-- 
Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-14  2:15 ` Rich Freeman
@ 2015-12-14 10:48   ` Peter Stuge
  2015-12-14 13:58     ` Rich Freeman
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Stuge @ 2015-12-14 10:48 UTC (permalink / raw
  To: gentoo-dev

Rich Freeman wrote:
> a big question is how to make it happen without just throwing
> complaints on the folks who are trying their best to keep it all going.

The answer to this is the same as it has always been:

Demonstrate that you are capable and reliable and given social
compatibility then after some time you too can become a part of
the team.

This is the way most open source projects, actually most human
projects work. Maintainers are not picked at random when there
is high load, but trusted long-time contributors are promoted
to maintainers when they shine.

The more you complain instead of send patches, the less likely you
are to ever become part of the solution.


The key point to remember is that it is NOT neccessary to be part of
the team in order to contribute solutions. You *first* contribute
solutions and only *then* have a chance of becoming part of the team.

I for one am working in my non-existant spare time on a fast
ChangeLog generator.


//Peter


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-14 10:48   ` Peter Stuge
@ 2015-12-14 13:58     ` Rich Freeman
  2015-12-14 14:03       ` Patrick Lauer
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Freeman @ 2015-12-14 13:58 UTC (permalink / raw
  To: gentoo-dev

On Mon, Dec 14, 2015 at 5:48 AM, Peter Stuge <peter@stuge.se> wrote:
>
> The key point to remember is that it is NOT neccessary to be part of
> the team in order to contribute solutions. You *first* contribute
> solutions and only *then* have a chance of becoming part of the team.
>
> I for one am working in my non-existant spare time on a fast
> ChangeLog generator.
>

++, and thanks.

I was actually trying to think beyond this point though.  Right now we
have lots of infra stuff that the infra team would probably love to
publish except there is no easy way to do it without creating security
issues (intermixing of credentials and code, etc).  We have a few devs
who have access to all of it, but their time needs to be split between
keeping the current state working, and trying to build some future
state which is easier to contribute to.  For the rest of our part, we
do need to put up and start building things.

So, how do we get from that to a future state where our infra servers
have public backups or whatever minus /etc/credentials.include and
/var/spool/private-email and so on?  That is, a future state where the
default is open with necessary exceptions?  And how do we do it with
what we have, or are able to get?

I just want to be constructive.  Clearly the first step is to
acknowledge that the onus is on those who really want to see change to
chip in to make it happen.  We have to be here to serve - this is a
volunteer FOSS project.

-- 
Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-14 13:58     ` Rich Freeman
@ 2015-12-14 14:03       ` Patrick Lauer
  2015-12-14 18:42         ` Robin H. Johnson
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Lauer @ 2015-12-14 14:03 UTC (permalink / raw
  To: gentoo-dev



On 12/14/2015 02:58 PM, Rich Freeman wrote:
> On Mon, Dec 14, 2015 at 5:48 AM, Peter Stuge <peter@stuge.se> wrote:
>> The key point to remember is that it is NOT neccessary to be part of
>> the team in order to contribute solutions. You *first* contribute
>> solutions and only *then* have a chance of becoming part of the team.
>>
>> I for one am working in my non-existant spare time on a fast
>> ChangeLog generator.
>>
> ++, and thanks.
>
> I was actually trying to think beyond this point though.  Right now we
> have lots of infra stuff that the infra team would probably love to
> publish except there is no easy way to do it without creating security
> issues (intermixing of credentials and code, etc).  We have a few devs
> who have access to all of it, but their time needs to be split between
> keeping the current state working, and trying to build some future
> state which is easier to contribute to.  For the rest of our part, we
> do need to put up and start building things.
>
> So, how do we get from that to a future state where our infra servers
> have public backups or whatever minus /etc/credentials.include and
> /var/spool/private-email and so on?  That is, a future state where the
> default is open with necessary exceptions?  And how do we do it with
> what we have, or are able to get?
>
> I just want to be constructive.  Clearly the first step is to
> acknowledge that the onus is on those who really want to see change to
> chip in to make it happen.  We have to be here to serve - this is a
> volunteer FOSS project.
>
Y'know ...

If I had access to things I could help. But I don't, so I can't.

The rather obvious solutions seem to be politically impossible, and the
politically possible solutions are not satisfactory.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-dev] Breakage and frustration
  2015-12-14 14:03       ` Patrick Lauer
@ 2015-12-14 18:42         ` Robin H. Johnson
  0 siblings, 0 replies; 8+ messages in thread
From: Robin H. Johnson @ 2015-12-14 18:42 UTC (permalink / raw
  To: gentoo-dev

On Mon, Dec 14, 2015 at 03:03:53PM +0100, Patrick Lauer wrote:
> If I had access to things I could help. But I don't, so I can't.
The scripts of the git->rsync process are already open.
https://gitweb.gentoo.org/infra/mastermirror-scripts.git/
(also in there are the related scripts for the other
distfiles/releases/experimental mirroring).

The only parts not there (from memory):
- credentials to connect to rsync endpoints

And, as I noted, the Manifest issues are problems in Portage, which is
also open code...

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-14 18:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-13 17:36 [gentoo-dev] Breakage and frustration Patrick Lauer
2015-12-13 21:24 ` Sergei Trofimovich
2015-12-13 21:41 ` Robin H. Johnson
2015-12-14  2:15 ` Rich Freeman
2015-12-14 10:48   ` Peter Stuge
2015-12-14 13:58     ` Rich Freeman
2015-12-14 14:03       ` Patrick Lauer
2015-12-14 18:42         ` Robin H. Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox