From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-project+bounces-5158-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by finch.gentoo.org (Postfix) with ESMTPS id 209C913832E
	for <garchives@archives.gentoo.org>; Tue,  9 Aug 2016 05:33:39 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 8C836E0B57;
	Tue,  9 Aug 2016 05:33:35 +0000 (UTC)
Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id ACA3FE0B4F
	for <gentoo-project@lists.gentoo.org>; Tue,  9 Aug 2016 05:33:34 +0000 (UTC)
Received: from katipo2.lan (unknown [IPv6:2406:e001:1:d01:c2f8:daff:fe83:ed01])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	(Authenticated sender: kentnl)
	by smtp.gentoo.org (Postfix) with ESMTPSA id 97B793408EC
	for <gentoo-project@lists.gentoo.org>; Tue,  9 Aug 2016 05:33:32 +0000 (UTC)
Date: Tue, 9 Aug 2016 17:32:55 +1200
From: Kent Fredric <kentnl@gentoo.org>
To: gentoo-project@lists.gentoo.org
Subject: Re: [gentoo-project] Call for agenda items - Council meeting
 2016-08-14
Message-ID: <20160809173255.0ddfa090@katipo2.lan>
In-Reply-To: <febe98a2-e8c9-06c8-aa17-3fbac2788364@gentoo.org>
References: <2e11e445-c25b-b7f2-def1-99aed92308b6@gentoo.org>
	<20160804162443.GA7048@whubbs1.gaikai.biz>
	<20160804231224.7b7462168f1d23e88fe4135c@gentoo.org>
	<20160804222234.GA8357@whubbs1.gaikai.biz>
	<CAGfcS_=TwWJxjh+PUninJssMAVakUaRA5WGZ5cbSwz+XR0qQyA@mail.gmail.com>
	<20160805022658.GA15727@linux1>
	<CAGfcS_kBsnFc9T7qdHX4Eo7X=vRBmnU2V+1xeEihpaBpd9DsYg@mail.gmail.com>
	<20160805142859.GA19008@linux1>
	<CAGfcS_nQd=PncvZ-PpuAjRFNvp4p-8H=03YEmducme=vEBuhZA@mail.gmail.com>
	<20160805153658.GA11058@whubbs1.gaikai.biz>
	<52993bd4-afc9-197e-acda-96db413e6608@gentoo.org>
	<febe98a2-e8c9-06c8-aa17-3fbac2788364@gentoo.org>
Organization: Gentoo
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
Precedence: bulk
List-Post: <mailto:gentoo-project@lists.gentoo.org>
List-Help: <mailto:gentoo-project+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-project+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-project+subscribe@lists.gentoo.org>
List-Id: Gentoo Project discussion list <gentoo-project.gentoo.org>
X-BeenThere: gentoo-project@lists.gentoo.org
Reply-To: gentoo-project@lists.gentoo.org
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
 boundary="Sig_/sjzgruXUYQ+WsgeOwKN4zc8"; protocol="application/pgp-signature"
X-Archives-Salt: 35accb0e-a812-49f7-9b0e-e386ad122bea
X-Archives-Hash: 9fcc546c54f0d2ead1afea44b664f6c6

--Sig_/sjzgruXUYQ+WsgeOwKN4zc8
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Mon, 8 Aug 2016 19:07:04 -0700
Jack Morgan <jmorgan@gentoo.org> wrote:

> On 08/08/16 05:35, Marek Szuba wrote:
> >=20
> > Bottom line: I would say we do need some way of streamlining ebuild
> > stabilisation. =20
>=20
> I vote we fix this problem. I'm tired of having this same discussion
> ever 6 or 12 months. I'd like to see less policy discussion and more
> technical solutions to the problems we face.
>=20
> I propose calling for volunteers to create a new project that works on
> solving our stabilization problem. I see that looking like the
> following:
>=20
> 1) project identifies the problem(s) with real data from Bugzilla and
> the portage tree.
>=20
> 2) new project defines a technical proposal to fixing this issue, then
> presents it to the developer community for feedback. This would
> include defining tools needed or used
>=20
> 3) start working on solution + define future roadmap
>=20
>=20
> All processes and policies should be on the table for negotiating in
> the potential solution. If we need to reinvent the wheel, then let's
> do it.
>=20
> To be honest, adding more policy just ends up making everyone unhappy
> one way or the other.
>=20
>=20

There's a potential way to garner a technical solution that somewhat
alleviates the need for such rigourous arch testers, and without
degrading the stabilisation mechanic to "blind monkey system that
stabilises based on conjecture".

I've mentioned it before ages ago on the Gentoo Dev list, somewhere.

The idea is basically to instrument portage to have an (optional)
feature that when turned on, records and submits certain facts about
every failed or successful install, with the objective being to
essentially spread the load out of what `tatt` does organically over
the participant base.

1. Firstly, make no demands of homoegenity or even sanity for a users
system to participate. Ever thing they throw at this system I'm about
to propose should be considered "valid"

2. Every time a package is installed, or attempted to be installed, the
exit of that installation is qualified in one of a number of ways:

   - installed OK without tests
   - installed OK with tests
   - failed tests
   - failed install
   - failed compile=20
   - failed configure

Each of these is a single state in a single field.

3. The Name, Version, and SHA1 of the ebuild that generated the report.


4. The USE flags and any other pertinent ( and carefully selected by
Gentoo ) flags are included, each as single fields in a property set,
and decomposed into structured property lists where possible.

5. <arch> satisfaction data for the target package at the time of
installation is recorded.

eg:

   KEYWORDS=3D"arch"  + ACCEPT_KEYWORDS=3D"~arch" -> [ "arch(~)"  ]
   KEYWORDS=3D"~arch" + ACCEPT_KEYWORDS=3D"~arch" -> [ "~arch(~)" ]
   KEYWORDS=3D"arch"  + ACCEPT_KEYWORDS=3D"arch"  -> [ "arch"     ]
   KEYWORDS=3D""      + ACCEPT_KEYWORDS=3D"**"    -> [ "(**)"     ]

This seems redundant, but this is basically suggesting "hey, if you're
insane and setting lots of different arches for accept keywords, that
would be relevant data to use to ignore your report. This data can also
be used with other data I'll mention later to isolate users with "mixed
keywording" setups.

6. For every dependency listed in *DEPEND, a dictionary/hash of

  "specified atom" -> {
     name -> resolved dependency name
     version -> version of resolved dependency
     arch -> [ satisfied arch spec as in #4 ]
     sha1 -> Some kind of SHA1 that hopefully turns up in gentoo.git
  }


is recorded in the response at the time of the result.

The "satisified arch spec" field is used to isolate anomalies in
keywording and user keyword mixing and filter out non-target reports
for stabilization data.

7. A Submitter Unique Identifier

8. Possibly a Submitter-Machine Unique Identifier.

9. The whole build log will be included compressed, verbatim.

This latter part will an independent option to the "reporting" feature,
because its a slightly more invasive privacy concern than the others,
in that, arbitrary code execution can leak private data.

Hence, people who turn this feature on have to know what they're
signing up for.

10. All of the above data is pooled and shipped as a single report, and
submitted to a "report server" and aggregated.


With all of the above, in the most native of situations, we can use
that data at very least to give us a lot more assurance than "well, 30
days passed, and nobody complained", because we'll have a paper trail
of a known countable number of successful installs, which while
not representative, are likely to still be more diverse and
reassuring of confidence than the deafening silence of no
feedback.

And in non-naive situations, the results for given versions can be
aggregated and compared, and factors that are present can be correlated
with failures statistically.

And this would give us a status board of "here's a bunch of
configurations that seem to be statisically more problematic than
others, might be worth investigating"

But there would be no burden to actually dive into the logs unless you
found clusters of failures from different sources failing under the
same scenarios ( And this is why not everyone *has* to send build logs
to be effective, just enough people have to report "x configuration
bad" and some subset of them have to provide elucidating logs ).

None of what I mention here is conceptually "new", I've just
re-explained the entire CPAN Testers model in terms relevant to Gentoo,
using Gentoo parts instead of CPAN parts.

And CPAN testers find it *very effective* at being assured they didn't
break anything: They ship a TRIAL release ( akin to our ~arch ), and
then wait a week or so while people download and test it.

And pretty much anyone can become "a tester", there's no barrier to
entry, and no requirements for membership. Just install the tools, get
yourself an ID, and start installing stuff with tests (the default),
and the tools you have will automatically fire off those reports to the
hive, and you get a big pretty matrix of "We're good here", and then
after no red results in some period, they go "hey, yep, we're good" and
ship a stable release.

Or maybe occasional pockets of "you dun goofed" where there will be a
problem you might have to look into ( sometimes those problems are
entirely invalid problems, ... this is somehow typically not an issue )

http://matrix.cpantesters.org/?dist=3DApp-perlbrew+0.76

And you throw variants analysis into the mix and you get those other
facts compared and ranked by "Likelihood to be part of the problem"

http://analysis.cpantesters.org/solved?distv=3DApp-perlbrew-0.76

^ you see here variant analysis found 3 common strings in the logs that
indicated a failure, and it pointed the finger directly at the failing
test as a result. And then in rank #3, you see its pointing a finger at
CPAN::Perl::Releases as "a possible problem highly correlated with
failures" with the -0.5 theta on version 2.88=20

Lo and behold, automated differential analysis has found the bug:=20

https://rt.cpan.org/Ticket/Display.html?id=3D116517

It still takes a human to=20

a) decide to look
b) decide the differential factors are useful enough to pursue=20
c) verify the problem manually by using the guidance given
d) manually file the bug

But the point here is we can actually build some infrastructure that
will give automated tooling some degree of assurance that "this can
probably be safely stabilized now, the testers aren't seeing any issues"

Its just also the sort of data collection that can lend itself to much
more powerful benefits as well.

The only hard parts are:

1. Making a good server to handle these reports that scales well
2. Making a good client for report generation, collection from PORTAGE
and submission
3. Getting people to turn on the feature
4. Getting enough people using the feature that the majority of the
"easy" stabilizations can happen hands-free.=20

And we don't even have to do the "Fancy" parts of it now:

 Just pools of  "package:  arch =3D 100pass/0fail archb =3D 10pass/0  fail"=
=20

Would be a great start.

Because otherwise we're relying 100% on negative feedback, and assuming
that the absence of negative feedback is positive, when the reality
might be closer that the absence of negative feedback is that the
problems were too confusing to report as an explicit bug, the problems
faced were deemed unimportant to the person in question and they gave
up before they reported it, the user encountered some other entry
barrier in reporting, ..... or maybe, nobody is actually using the
package at all, so it could actually be completely broken and nobody
notices.

And it seems entirely hap-hazard to encourage tooling that not
*builds* upon that assumption.

At least with the manual stabilization process, you can be assured that
at least one human will personally install, test, and verify a package
works in at least one situation.

With a completely automated stabilization that relies on the absence of
negative feedback to stabilize, you're *not even getting that*.

Why bother with stabilization at all if the entire thing is merely
*conjecture* ?

Even a broken, flawed stabilization workflow done by teams of people
who are bad at testing is better than a stabilization workflow
implemented on conjecture of stability :P


--Sig_/sjzgruXUYQ+WsgeOwKN4zc8
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXqWsfAAoJEOhUMksTZqggVcYQALbfZOjlgjZML1RlNDX6l9me
eoAxvvD3ymXFfDlZ+P2Bf5GZMOAyXLAgcpB1W9Tay2jD+9gCauJ1mb2kf8jqRbg5
yw3AmgnMyVVjg9BOylrDE7Q1j78fUIH4gL2fM/lHIypL2gufABj10nXMyw57bEZc
xtt673JY5Cp5ktRFFdSTpRxvq/7BHSgQucNwrkPUs8MTGZCQsrYNTyXSpo0yzRCX
apijVAnpR/DNSA8CrPz0ZPSxEnXhHvbY10XqNJQsapjfJUS2m0L89OioZiJOc/YY
bol3SchsPBPK8yusz+fP3j4j9XEjJN7Mv56Pycw2s14zBv0u+pBsg/neso3QVBxn
pVUFveCwdPYued139w1Xr1bfpxm3RbFLYsanGUkJhrAniuLZQ9vTSsLpS1KiN09+
KooVYqdeimt8xADi7oPmlp87Sr9DF1XcGrOUbQBIAvBU0j+pdFXtFcW0k6cXjI9p
BiQVB6weWOQVXApa5Hpx9zeMV2rRq6mBy2igDabj0k2pDI0MLHMO2CPf7sLEi7Zd
3Vs6FEbmHFKc1DHxRL3ZTkespl23EcjIXLgbaettM9uL4TQRb5DqPBfDkmLIc5J7
njvZzyRNH27wU6JVenx1otcXHKGPoHoysJ7Aj22dLz3dF+L35g/NbuxXu7G/feo+
qlU9yLgJDgIGAv1D/pX9
=fwKl
-----END PGP SIGNATURE-----

--Sig_/sjzgruXUYQ+WsgeOwKN4zc8--