On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@gentoo.org> wrote:
Hi,

Gentoo is still having a major problem of unmaintained packages.
I'm not talking about pure 'maintainer-needed' here but packages that
have apparent maintainers and stay under the radar for long, harming
users in the process.  I'd like to query potential solutions as how we
could improve this and look for new maintainers sooner.


The current state
=================
The definition of an unmaintained package here is a bit blurry.  For our
needs, let's say that an unmaintained package is a package that is not
getting attention of any of the maintainers, whose bugs are not looked
at, that does not receive version bumps or simply fails to build for
a long time.

This is especially the case with 'revived herds', i.e. projects that
were formed from old herds.  Their main characteristic is that they
'maintain' a large number of loosely-related packages, and their
developers take care of only a small subset of them.  Sadly, we still
have people who cherish that model, and instead of taking packages they
care about themselves, they shove it into one of 'their' herds.

So far we're rarely catching such cases directly.  Sometimes it happens
when another developer tries to use the package and notices the problem,
then finds that it's been reported a long time ago and never received
any attention.

Sometimes, after retiring a developer we notice that he had 'maintained'
packages that were broken for years and never received any attention.
There are even real cases of developers taking over broken packages just
to prevent them from being lastrited but without ever fixing them.

Then, some of the packages are noticed as result of major API update
trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
Those API changes provoke build failures, and while investigating them
we discover that some of the software hasn't seen any upstream attention
since 2000 (!), not to mention maintainers that could actually patch
the issues.


Version bump-based inactivity?
==============================
One of the options would be to monitor inactivity as negligence to bump
packages.  With euscan and/or repology, we are at least able to
partially monitor and report new versions of software (I think someone
used to do that but I don't see those reports anymore).  While this
still requires some manual processing (esp. given that repology results
are sometimes mistaken), it would be a step forward.

The counterarguments for doing this is that not all version bumps are
meaningful to Gentoo.  We'd have to at least be able to filter out
development releases if maintainers are not doing them.  Sometimes we
also skip releases if they don't introduce anything meaningful to Gentoo
users.  Finally, some developers reject new versions of software for
various reasons.

I've also considered to just use time.

Many *packages* have not been touched in N time. While some software doesn't get updates often, even routine maintenance should require edits on a fairly regular basis.
 


Bugzilla-based inactivity?
==========================
I've noticed something interesting in Fedora lately.  They have a policy
that if a package build failure is reported (note: they are reporting
them automatically) and the maintainer does not update it from the 'NEW'
state, it is automatically orphaned after 8 weeks.  Effectively,
if the maintainer does not take care (or at least pretends to)
of the package, it is orphaned automatically.

I suppose we might be able to look for a similar policy in Gentoo.
However, there are two obvious counterarguments.  Firstly, this would
create 'busywork' that people would be required to do in order to
prevent from orphaning their packages.  Secondly, a fair number of
developers would just do this 'busywork' to every new bug just to avoid
the problem, rendering the measure ineffective.

Avoid letting the perfect be the enemy of the good here. Any metric can be gamed by developers; but it turns out we must choose some metric to drive the organization. I'm fairly sure not *all* developers will automate this busywork; because *some* of us want to see the number of unmaintained packages reduced; resulting in a net-win.
 


What can we actually do?
========================
Do you have any specific ideas how we could actually improve
the situation?  I'm particularly looking for things we could do at least
semi-automatically, without having to spend tremendous effort looking
through thousands of unhandled bugs manually.

So I'd recommend avoiding a specific implementation; which means don't trigger off of a specific signal.

Signals:
1) euscan first; because its most accurate and plausible already implemented.
2) Date-based scanning; its trivial to implement.

So now for each package, we have 2 straightforward signals. When was it last touched, how many versions behind?

Rules:
A package is unmaintained if it:
  - Has not been touched in 5 years
  - Is behind 3 versions AND hasn't been touched in 2 years
  - Is behind 5 versions AND hasn't been touched in 1 years

As we add more signals (e.g. doesn't build, or unfixed bugs) we can add additional rules.
 
We could generate a QA report per package on the qa reports page.
If there is an API for request the QA report, we could cross-link from p.g.o.

-A



--
Best regards,
Michał Górny