On Sat, Mar 23, 2019 at 7:18 AM Alec Warner <antarus@gentoo.org> wrote:

>
>
> On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@gentoo.org> wrote:
>
>> Hi,
>>
>> Gentoo is still having a major problem of unmaintained packages.
>> I'm not talking about pure 'maintainer-needed' here but packages that
>> have apparent maintainers and stay under the radar for long, harming
>> users in the process.  I'd like to query potential solutions as how we
>> could improve this and look for new maintainers sooner.
>>
>>
>> The current state
>> =================
>> The definition of an unmaintained package here is a bit blurry.  For our
>> needs, let's say that an unmaintained package is a package that is not
>> getting attention of any of the maintainers, whose bugs are not looked
>> at, that does not receive version bumps or simply fails to build for
>> a long time.
>>
>> This is especially the case with 'revived herds', i.e. projects that
>> were formed from old herds.  Their main characteristic is that they
>> 'maintain' a large number of loosely-related packages, and their
>> developers take care of only a small subset of them.  Sadly, we still
>> have people who cherish that model, and instead of taking packages they
>> care about themselves, they shove it into one of 'their' herds.
>>
>> So far we're rarely catching such cases directly.  Sometimes it happens
>> when another developer tries to use the package and notices the problem,
>> then finds that it's been reported a long time ago and never received
>> any attention.
>>
>> Sometimes, after retiring a developer we notice that he had 'maintained'
>> packages that were broken for years and never received any attention.
>> There are even real cases of developers taking over broken packages just
>> to prevent them from being lastrited but without ever fixing them.
>>
>> Then, some of the packages are noticed as result of major API update
>> trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
>> Those API changes provoke build failures, and while investigating them
>> we discover that some of the software hasn't seen any upstream attention
>> since 2000 (!), not to mention maintainers that could actually patch
>> the issues.
>>
>>
>> Version bump-based inactivity?
>> ==============================
>> One of the options would be to monitor inactivity as negligence to bump
>> packages.  With euscan and/or repology, we are at least able to
>> partially monitor and report new versions of software (I think someone
>> used to do that but I don't see those reports anymore).  While this
>> still requires some manual processing (esp. given that repology results
>> are sometimes mistaken), it would be a step forward.
>>
>> The counterarguments for doing this is that not all version bumps are
>> meaningful to Gentoo.  We'd have to at least be able to filter out
>> development releases if maintainers are not doing them.  Sometimes we
>> also skip releases if they don't introduce anything meaningful to Gentoo
>> users.  Finally, some developers reject new versions of software for
>> various reasons.
>>
>
> I've also considered to just use time.
>
> Many *packages* have not been touched in N time. While some software
> doesn't get updates often, even routine maintenance should require edits on
> a fairly regular basis.
>
>
>>
>>
>> Bugzilla-based inactivity?
>> ==========================
>> I've noticed something interesting in Fedora lately.  They have a policy
>> that if a package build failure is reported (note: they are reporting
>> them automatically) and the maintainer does not update it from the 'NEW'
>> state, it is automatically orphaned after 8 weeks.  Effectively,
>> if the maintainer does not take care (or at least pretends to)
>> of the package, it is orphaned automatically.
>>
>> I suppose we might be able to look for a similar policy in Gentoo.
>> However, there are two obvious counterarguments.  Firstly, this would
>> create 'busywork' that people would be required to do in order to
>> prevent from orphaning their packages.  Secondly, a fair number of
>> developers would just do this 'busywork' to every new bug just to avoid
>> the problem, rendering the measure ineffective.
>>
>
> Avoid letting the perfect be the enemy of the good here. Any metric can be
> gamed by developers; but it turns out we must choose some metric to drive
> the organization. I'm fairly sure not *all* developers will automate this
> busywork; because *some* of us want to see the number of unmaintained
> packages reduced; resulting in a net-win.
>
>
>>
>>
>> What can we actually do?
>> ========================
>> Do you have any specific ideas how we could actually improve
>> the situation?  I'm particularly looking for things we could do at least
>> semi-automatically, without having to spend tremendous effort looking
>> through thousands of unhandled bugs manually.
>>
>
> So I'd recommend avoiding a specific implementation; which means don't
> trigger off of a specific signal.
>
> Signals:
> 1) euscan first; because its most accurate and plausible already
> implemented.
> 2) Date-based scanning; its trivial to implement.
>
> So now for each package, we have 2 straightforward signals. When was it
> last touched, how many versions behind?
>
> Rules:
> A package is unmaintained if it:
>   - Has not been touched in 5 years
>   - Is behind 3 versions AND hasn't been touched in 2 years
>   - Is behind 5 versions AND hasn't been touched in 1 years
>
> As we add more signals (e.g. doesn't build, or unfixed bugs) we can add
> additional rules.
>
> We could generate a QA report per package on the qa reports page.
> If there is an API for request the QA report, we could cross-link from
> p.g.o.
>
> -A
>
>
>
>> --
>> Best regards,
>> Michał Górny
>>
>>
As a side observation I'd like to exempt a package from being flagged as
unmaintained if there's nothing wrong with it.  If upstream is idle and the
package in a quiet state simply because there's no work needing done, then
the package should be left alone.  I think a packages should be flagged in
progressive phases.

Phase 1 could determine if the package warrants attention, and my proposed
metric for this is if there are outstanding bugs on the bugzilla.  For this
purpose an outstanding bug is anything regarding the package, including
revbumps, stablereqs, as well as actual defect/qa/buildfail related bugs.
In essence, using the bugzilla as a central point of data collection and a
radar for trouble.

Phase 2 could take up any phase 1 candidates to actually audit for a lack
of maintainership, i.e., "maintainer wanted" or "maintainer needed"
packages could escalate the package in question to phase 2, as could a
timestamp check on the latest activity for teh package.  If the package is
"phase 1" status due to an outstanding bug, and either lacks a maintainer
altogether or fails a dormancy test, then the package is promoted to "phase
2"

Phase 3 could be where we take remedial action.  If the package has a
maintainer this would be a good point to contact them.  Perhaps a more
comprehensive audit of the package's lack of maintainership, etc etc etc.
A package that has entered "phase 2" has already been established as having
outstanding bugs AND failed whatever automated sort of audit is done to
check for being unmaintained.

Phase 4 is the package being officially marked as unmaintained, and at this
point it could probably be put on treecleaner's radar or however else we
wish to handle unmaintained packages.  If the package has a maintainer that
failed to respond during phase 3 this could well be raise a concern of its
own about that maintainer's own performance.