public inbox for gentoo-project@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-project] How to improve detection of unmaintained packages?
@ 2019-03-23  7:32 Michał Górny
  2019-03-23  8:04 ` Joonas Niilola
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Michał Górny @ 2019-03-23  7:32 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 4055 bytes --]

Hi,

Gentoo is still having a major problem of unmaintained packages.
I'm not talking about pure 'maintainer-needed' here but packages that
have apparent maintainers and stay under the radar for long, harming
users in the process.  I'd like to query potential solutions as how we
could improve this and look for new maintainers sooner.


The current state
=================
The definition of an unmaintained package here is a bit blurry.  For our
needs, let's say that an unmaintained package is a package that is not
getting attention of any of the maintainers, whose bugs are not looked
at, that does not receive version bumps or simply fails to build for
a long time.

This is especially the case with 'revived herds', i.e. projects that
were formed from old herds.  Their main characteristic is that they
'maintain' a large number of loosely-related packages, and their
developers take care of only a small subset of them.  Sadly, we still
have people who cherish that model, and instead of taking packages they
care about themselves, they shove it into one of 'their' herds.

So far we're rarely catching such cases directly.  Sometimes it happens
when another developer tries to use the package and notices the problem,
then finds that it's been reported a long time ago and never received
any attention.

Sometimes, after retiring a developer we notice that he had 'maintained'
packages that were broken for years and never received any attention. 
There are even real cases of developers taking over broken packages just
to prevent them from being lastrited but without ever fixing them.

Then, some of the packages are noticed as result of major API update
trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker. 
Those API changes provoke build failures, and while investigating them
we discover that some of the software hasn't seen any upstream attention
since 2000 (!), not to mention maintainers that could actually patch
the issues.


Version bump-based inactivity?
==============================
One of the options would be to monitor inactivity as negligence to bump
packages.  With euscan and/or repology, we are at least able to
partially monitor and report new versions of software (I think someone
used to do that but I don't see those reports anymore).  While this
still requires some manual processing (esp. given that repology results
are sometimes mistaken), it would be a step forward.

The counterarguments for doing this is that not all version bumps are
meaningful to Gentoo.  We'd have to at least be able to filter out
development releases if maintainers are not doing them.  Sometimes we
also skip releases if they don't introduce anything meaningful to Gentoo
users.  Finally, some developers reject new versions of software for
various reasons.


Bugzilla-based inactivity?
==========================
I've noticed something interesting in Fedora lately.  They have a policy
that if a package build failure is reported (note: they are reporting
them automatically) and the maintainer does not update it from the 'NEW'
state, it is automatically orphaned after 8 weeks.  Effectively,
if the maintainer does not take care (or at least pretends to)
of the package, it is orphaned automatically.

I suppose we might be able to look for a similar policy in Gentoo. 
However, there are two obvious counterarguments.  Firstly, this would
create 'busywork' that people would be required to do in order to
prevent from orphaning their packages.  Secondly, a fair number of
developers would just do this 'busywork' to every new bug just to avoid
the problem, rendering the measure ineffective.


What can we actually do?
========================
Do you have any specific ideas how we could actually improve
the situation?  I'm particularly looking for things we could do at least
semi-automatically, without having to spend tremendous effort looking
through thousands of unhandled bugs manually.

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23  7:32 [gentoo-project] How to improve detection of unmaintained packages? Michał Górny
@ 2019-03-23  8:04 ` Joonas Niilola
  2019-03-23  8:48 ` Toralf Förster
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Joonas Niilola @ 2019-03-23  8:04 UTC (permalink / raw
  To: gentoo-project


On 3/23/19 9:32 AM, Michał Górny wrote:
>
> Bugzilla-based inactivity?
> ==========================
> I've noticed something interesting in Fedora lately.  They have a policy
> that if a package build failure is reported (note: they are reporting
> them automatically) and the maintainer does not update it from the 'NEW'
> state, it is automatically orphaned after 8 weeks.  Effectively,
> if the maintainer does not take care (or at least pretends to)
> of the package, it is orphaned automatically.
>
> I suppose we might be able to look for a similar policy in Gentoo.
> However, there are two obvious counterarguments.  Firstly, this would
> create 'busywork' that people would be required to do in order to
> prevent from orphaning their packages.  Secondly, a fair number of
> developers would just do this 'busywork' to every new bug just to avoid
> the problem, rendering the measure ineffective.
>
>

Third: Aren't you afraid this will result in huge load of packages being 
"maintainer-needed", getting users involved with them and creating even 
more workload for proxy-maint devs? Yes, it's still better than the 
current situation, but even now there is still a problem of some PRs 
being left to rot unnoticed :\


Anyway I'd like to see some action taken for disbanding inactive 'herds' 
or projects that do not respond to bugs (if there's some easy way to 
measure that?)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23  7:32 [gentoo-project] How to improve detection of unmaintained packages? Michał Górny
  2019-03-23  8:04 ` Joonas Niilola
@ 2019-03-23  8:48 ` Toralf Förster
  2019-03-23  8:51   ` Michał Górny
  2019-03-23 14:17 ` Alec Warner
  2019-03-23 20:32 ` Toralf Förster
  3 siblings, 1 reply; 13+ messages in thread
From: Toralf Förster @ 2019-03-23  8:48 UTC (permalink / raw
  To: gentoo-project


[-- Attachment #1.1: Type: text/plain, Size: 318 bytes --]

On 3/23/19 8:32 AM, Michał Górny wrote:
> Secondly, a fair number of
> developers would just do this 'busywork' to every new bug just to avoid
> the problem, rendering the measure ineffective.
What's the rationale behind this for a dev? Claiming the package furthermore?

-- 
Toralf
PGP 23217DA7 9B888F45


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23  8:48 ` Toralf Förster
@ 2019-03-23  8:51   ` Michał Górny
  0 siblings, 0 replies; 13+ messages in thread
From: Michał Górny @ 2019-03-23  8:51 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On Sat, 2019-03-23 at 09:48 +0100, Toralf Förster wrote:
> On 3/23/19 8:32 AM, Michał Górny wrote:
> > Secondly, a fair number of
> > developers would just do this 'busywork' to every new bug just to avoid
> > the problem, rendering the measure ineffective.
> What's the rationale behind this for a dev? Claiming the package furthermore?
> 

Well, let me expand how see it.

If we require that dev needs to touch *every* bug against the package to
avoid it being orphaned, we introduce a silly case of accidentally
orphaning packages just because developer failed to touch a single bug.

Therefore, some developers would just touch all bugs ASAP to avoid
the problem.  However, it would only verify that the package is
'maintained' at a time the bug is filed and not afterwards.

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23  7:32 [gentoo-project] How to improve detection of unmaintained packages? Michał Górny
  2019-03-23  8:04 ` Joonas Niilola
  2019-03-23  8:48 ` Toralf Förster
@ 2019-03-23 14:17 ` Alec Warner
  2019-03-23 17:05   ` Raymond Jennings
  2019-03-23 18:25   ` Rich Freeman
  2019-03-23 20:32 ` Toralf Förster
  3 siblings, 2 replies; 13+ messages in thread
From: Alec Warner @ 2019-03-23 14:17 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 5686 bytes --]

On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@gentoo.org> wrote:

> Hi,
>
> Gentoo is still having a major problem of unmaintained packages.
> I'm not talking about pure 'maintainer-needed' here but packages that
> have apparent maintainers and stay under the radar for long, harming
> users in the process.  I'd like to query potential solutions as how we
> could improve this and look for new maintainers sooner.
>
>
> The current state
> =================
> The definition of an unmaintained package here is a bit blurry.  For our
> needs, let's say that an unmaintained package is a package that is not
> getting attention of any of the maintainers, whose bugs are not looked
> at, that does not receive version bumps or simply fails to build for
> a long time.
>
> This is especially the case with 'revived herds', i.e. projects that
> were formed from old herds.  Their main characteristic is that they
> 'maintain' a large number of loosely-related packages, and their
> developers take care of only a small subset of them.  Sadly, we still
> have people who cherish that model, and instead of taking packages they
> care about themselves, they shove it into one of 'their' herds.
>
> So far we're rarely catching such cases directly.  Sometimes it happens
> when another developer tries to use the package and notices the problem,
> then finds that it's been reported a long time ago and never received
> any attention.
>
> Sometimes, after retiring a developer we notice that he had 'maintained'
> packages that were broken for years and never received any attention.
> There are even real cases of developers taking over broken packages just
> to prevent them from being lastrited but without ever fixing them.
>
> Then, some of the packages are noticed as result of major API update
> trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
> Those API changes provoke build failures, and while investigating them
> we discover that some of the software hasn't seen any upstream attention
> since 2000 (!), not to mention maintainers that could actually patch
> the issues.
>
>
> Version bump-based inactivity?
> ==============================
> One of the options would be to monitor inactivity as negligence to bump
> packages.  With euscan and/or repology, we are at least able to
> partially monitor and report new versions of software (I think someone
> used to do that but I don't see those reports anymore).  While this
> still requires some manual processing (esp. given that repology results
> are sometimes mistaken), it would be a step forward.
>
> The counterarguments for doing this is that not all version bumps are
> meaningful to Gentoo.  We'd have to at least be able to filter out
> development releases if maintainers are not doing them.  Sometimes we
> also skip releases if they don't introduce anything meaningful to Gentoo
> users.  Finally, some developers reject new versions of software for
> various reasons.
>

I've also considered to just use time.

Many *packages* have not been touched in N time. While some software
doesn't get updates often, even routine maintenance should require edits on
a fairly regular basis.


>
>
> Bugzilla-based inactivity?
> ==========================
> I've noticed something interesting in Fedora lately.  They have a policy
> that if a package build failure is reported (note: they are reporting
> them automatically) and the maintainer does not update it from the 'NEW'
> state, it is automatically orphaned after 8 weeks.  Effectively,
> if the maintainer does not take care (or at least pretends to)
> of the package, it is orphaned automatically.
>
> I suppose we might be able to look for a similar policy in Gentoo.
> However, there are two obvious counterarguments.  Firstly, this would
> create 'busywork' that people would be required to do in order to
> prevent from orphaning their packages.  Secondly, a fair number of
> developers would just do this 'busywork' to every new bug just to avoid
> the problem, rendering the measure ineffective.
>

Avoid letting the perfect be the enemy of the good here. Any metric can be
gamed by developers; but it turns out we must choose some metric to drive
the organization. I'm fairly sure not *all* developers will automate this
busywork; because *some* of us want to see the number of unmaintained
packages reduced; resulting in a net-win.


>
>
> What can we actually do?
> ========================
> Do you have any specific ideas how we could actually improve
> the situation?  I'm particularly looking for things we could do at least
> semi-automatically, without having to spend tremendous effort looking
> through thousands of unhandled bugs manually.
>

So I'd recommend avoiding a specific implementation; which means don't
trigger off of a specific signal.

Signals:
1) euscan first; because its most accurate and plausible already
implemented.
2) Date-based scanning; its trivial to implement.

So now for each package, we have 2 straightforward signals. When was it
last touched, how many versions behind?

Rules:
A package is unmaintained if it:
  - Has not been touched in 5 years
  - Is behind 3 versions AND hasn't been touched in 2 years
  - Is behind 5 versions AND hasn't been touched in 1 years

As we add more signals (e.g. doesn't build, or unfixed bugs) we can add
additional rules.

We could generate a QA report per package on the qa reports page.
If there is an API for request the QA report, we could cross-link from
p.g.o.

-A



> --
> Best regards,
> Michał Górny
>
>

[-- Attachment #2: Type: text/html, Size: 7046 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 14:17 ` Alec Warner
@ 2019-03-23 17:05   ` Raymond Jennings
  2019-03-23 17:38     ` Michał Górny
  2019-03-23 18:25   ` Rich Freeman
  1 sibling, 1 reply; 13+ messages in thread
From: Raymond Jennings @ 2019-03-23 17:05 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 7859 bytes --]

On Sat, Mar 23, 2019 at 7:18 AM Alec Warner <antarus@gentoo.org> wrote:

>
>
> On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@gentoo.org> wrote:
>
>> Hi,
>>
>> Gentoo is still having a major problem of unmaintained packages.
>> I'm not talking about pure 'maintainer-needed' here but packages that
>> have apparent maintainers and stay under the radar for long, harming
>> users in the process.  I'd like to query potential solutions as how we
>> could improve this and look for new maintainers sooner.
>>
>>
>> The current state
>> =================
>> The definition of an unmaintained package here is a bit blurry.  For our
>> needs, let's say that an unmaintained package is a package that is not
>> getting attention of any of the maintainers, whose bugs are not looked
>> at, that does not receive version bumps or simply fails to build for
>> a long time.
>>
>> This is especially the case with 'revived herds', i.e. projects that
>> were formed from old herds.  Their main characteristic is that they
>> 'maintain' a large number of loosely-related packages, and their
>> developers take care of only a small subset of them.  Sadly, we still
>> have people who cherish that model, and instead of taking packages they
>> care about themselves, they shove it into one of 'their' herds.
>>
>> So far we're rarely catching such cases directly.  Sometimes it happens
>> when another developer tries to use the package and notices the problem,
>> then finds that it's been reported a long time ago and never received
>> any attention.
>>
>> Sometimes, after retiring a developer we notice that he had 'maintained'
>> packages that were broken for years and never received any attention.
>> There are even real cases of developers taking over broken packages just
>> to prevent them from being lastrited but without ever fixing them.
>>
>> Then, some of the packages are noticed as result of major API update
>> trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
>> Those API changes provoke build failures, and while investigating them
>> we discover that some of the software hasn't seen any upstream attention
>> since 2000 (!), not to mention maintainers that could actually patch
>> the issues.
>>
>>
>> Version bump-based inactivity?
>> ==============================
>> One of the options would be to monitor inactivity as negligence to bump
>> packages.  With euscan and/or repology, we are at least able to
>> partially monitor and report new versions of software (I think someone
>> used to do that but I don't see those reports anymore).  While this
>> still requires some manual processing (esp. given that repology results
>> are sometimes mistaken), it would be a step forward.
>>
>> The counterarguments for doing this is that not all version bumps are
>> meaningful to Gentoo.  We'd have to at least be able to filter out
>> development releases if maintainers are not doing them.  Sometimes we
>> also skip releases if they don't introduce anything meaningful to Gentoo
>> users.  Finally, some developers reject new versions of software for
>> various reasons.
>>
>
> I've also considered to just use time.
>
> Many *packages* have not been touched in N time. While some software
> doesn't get updates often, even routine maintenance should require edits on
> a fairly regular basis.
>
>
>>
>>
>> Bugzilla-based inactivity?
>> ==========================
>> I've noticed something interesting in Fedora lately.  They have a policy
>> that if a package build failure is reported (note: they are reporting
>> them automatically) and the maintainer does not update it from the 'NEW'
>> state, it is automatically orphaned after 8 weeks.  Effectively,
>> if the maintainer does not take care (or at least pretends to)
>> of the package, it is orphaned automatically.
>>
>> I suppose we might be able to look for a similar policy in Gentoo.
>> However, there are two obvious counterarguments.  Firstly, this would
>> create 'busywork' that people would be required to do in order to
>> prevent from orphaning their packages.  Secondly, a fair number of
>> developers would just do this 'busywork' to every new bug just to avoid
>> the problem, rendering the measure ineffective.
>>
>
> Avoid letting the perfect be the enemy of the good here. Any metric can be
> gamed by developers; but it turns out we must choose some metric to drive
> the organization. I'm fairly sure not *all* developers will automate this
> busywork; because *some* of us want to see the number of unmaintained
> packages reduced; resulting in a net-win.
>
>
>>
>>
>> What can we actually do?
>> ========================
>> Do you have any specific ideas how we could actually improve
>> the situation?  I'm particularly looking for things we could do at least
>> semi-automatically, without having to spend tremendous effort looking
>> through thousands of unhandled bugs manually.
>>
>
> So I'd recommend avoiding a specific implementation; which means don't
> trigger off of a specific signal.
>
> Signals:
> 1) euscan first; because its most accurate and plausible already
> implemented.
> 2) Date-based scanning; its trivial to implement.
>
> So now for each package, we have 2 straightforward signals. When was it
> last touched, how many versions behind?
>
> Rules:
> A package is unmaintained if it:
>   - Has not been touched in 5 years
>   - Is behind 3 versions AND hasn't been touched in 2 years
>   - Is behind 5 versions AND hasn't been touched in 1 years
>
> As we add more signals (e.g. doesn't build, or unfixed bugs) we can add
> additional rules.
>
> We could generate a QA report per package on the qa reports page.
> If there is an API for request the QA report, we could cross-link from
> p.g.o.
>
> -A
>
>
>
>> --
>> Best regards,
>> Michał Górny
>>
>>
As a side observation I'd like to exempt a package from being flagged as
unmaintained if there's nothing wrong with it.  If upstream is idle and the
package in a quiet state simply because there's no work needing done, then
the package should be left alone.  I think a packages should be flagged in
progressive phases.

Phase 1 could determine if the package warrants attention, and my proposed
metric for this is if there are outstanding bugs on the bugzilla.  For this
purpose an outstanding bug is anything regarding the package, including
revbumps, stablereqs, as well as actual defect/qa/buildfail related bugs.
In essence, using the bugzilla as a central point of data collection and a
radar for trouble.

Phase 2 could take up any phase 1 candidates to actually audit for a lack
of maintainership, i.e., "maintainer wanted" or "maintainer needed"
packages could escalate the package in question to phase 2, as could a
timestamp check on the latest activity for teh package.  If the package is
"phase 1" status due to an outstanding bug, and either lacks a maintainer
altogether or fails a dormancy test, then the package is promoted to "phase
2"

Phase 3 could be where we take remedial action.  If the package has a
maintainer this would be a good point to contact them.  Perhaps a more
comprehensive audit of the package's lack of maintainership, etc etc etc.
A package that has entered "phase 2" has already been established as having
outstanding bugs AND failed whatever automated sort of audit is done to
check for being unmaintained.

Phase 4 is the package being officially marked as unmaintained, and at this
point it could probably be put on treecleaner's radar or however else we
wish to handle unmaintained packages.  If the package has a maintainer that
failed to respond during phase 3 this could well be raise a concern of its
own about that maintainer's own performance.

[-- Attachment #2: Type: text/html, Size: 9550 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 17:05   ` Raymond Jennings
@ 2019-03-23 17:38     ` Michał Górny
  2019-03-23 17:53       ` Raymond Jennings
  0 siblings, 1 reply; 13+ messages in thread
From: Michał Górny @ 2019-03-23 17:38 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 6896 bytes --]

On Sat, 2019-03-23 at 10:05 -0700, Raymond Jennings wrote:
> On Sat, Mar 23, 2019 at 7:18 AM Alec Warner <antarus@gentoo.org> wrote:
> 
> > 
> > On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@gentoo.org> wrote:
> > 
> > > Hi,
> > > 
> > > Gentoo is still having a major problem of unmaintained packages.
> > > I'm not talking about pure 'maintainer-needed' here but packages that
> > > have apparent maintainers and stay under the radar for long, harming
> > > users in the process.  I'd like to query potential solutions as how we
> > > could improve this and look for new maintainers sooner.
> > > 
> > > 
> > > The current state
> > > =================
> > > The definition of an unmaintained package here is a bit blurry.  For our
> > > needs, let's say that an unmaintained package is a package that is not
> > > getting attention of any of the maintainers, whose bugs are not looked
> > > at, that does not receive version bumps or simply fails to build for
> > > a long time.
> > > 
> > > This is especially the case with 'revived herds', i.e. projects that
> > > were formed from old herds.  Their main characteristic is that they
> > > 'maintain' a large number of loosely-related packages, and their
> > > developers take care of only a small subset of them.  Sadly, we still
> > > have people who cherish that model, and instead of taking packages they
> > > care about themselves, they shove it into one of 'their' herds.
> > > 
> > > So far we're rarely catching such cases directly.  Sometimes it happens
> > > when another developer tries to use the package and notices the problem,
> > > then finds that it's been reported a long time ago and never received
> > > any attention.
> > > 
> > > Sometimes, after retiring a developer we notice that he had 'maintained'
> > > packages that were broken for years and never received any attention.
> > > There are even real cases of developers taking over broken packages just
> > > to prevent them from being lastrited but without ever fixing them.
> > > 
> > > Then, some of the packages are noticed as result of major API update
> > > trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
> > > Those API changes provoke build failures, and while investigating them
> > > we discover that some of the software hasn't seen any upstream attention
> > > since 2000 (!), not to mention maintainers that could actually patch
> > > the issues.
> > > 
> > > 
> > > Version bump-based inactivity?
> > > ==============================
> > > One of the options would be to monitor inactivity as negligence to bump
> > > packages.  With euscan and/or repology, we are at least able to
> > > partially monitor and report new versions of software (I think someone
> > > used to do that but I don't see those reports anymore).  While this
> > > still requires some manual processing (esp. given that repology results
> > > are sometimes mistaken), it would be a step forward.
> > > 
> > > The counterarguments for doing this is that not all version bumps are
> > > meaningful to Gentoo.  We'd have to at least be able to filter out
> > > development releases if maintainers are not doing them.  Sometimes we
> > > also skip releases if they don't introduce anything meaningful to Gentoo
> > > users.  Finally, some developers reject new versions of software for
> > > various reasons.
> > > 
> > 
> > I've also considered to just use time.
> > 
> > Many *packages* have not been touched in N time. While some software
> > doesn't get updates often, even routine maintenance should require edits on
> > a fairly regular basis.
> > 
> > 
> > > 
> > > Bugzilla-based inactivity?
> > > ==========================
> > > I've noticed something interesting in Fedora lately.  They have a policy
> > > that if a package build failure is reported (note: they are reporting
> > > them automatically) and the maintainer does not update it from the 'NEW'
> > > state, it is automatically orphaned after 8 weeks.  Effectively,
> > > if the maintainer does not take care (or at least pretends to)
> > > of the package, it is orphaned automatically.
> > > 
> > > I suppose we might be able to look for a similar policy in Gentoo.
> > > However, there are two obvious counterarguments.  Firstly, this would
> > > create 'busywork' that people would be required to do in order to
> > > prevent from orphaning their packages.  Secondly, a fair number of
> > > developers would just do this 'busywork' to every new bug just to avoid
> > > the problem, rendering the measure ineffective.
> > > 
> > 
> > Avoid letting the perfect be the enemy of the good here. Any metric can be
> > gamed by developers; but it turns out we must choose some metric to drive
> > the organization. I'm fairly sure not *all* developers will automate this
> > busywork; because *some* of us want to see the number of unmaintained
> > packages reduced; resulting in a net-win.
> > 
> > 
> > > 
> > > What can we actually do?
> > > ========================
> > > Do you have any specific ideas how we could actually improve
> > > the situation?  I'm particularly looking for things we could do at least
> > > semi-automatically, without having to spend tremendous effort looking
> > > through thousands of unhandled bugs manually.
> > > 
> > 
> > So I'd recommend avoiding a specific implementation; which means don't
> > trigger off of a specific signal.
> > 
> > Signals:
> > 1) euscan first; because its most accurate and plausible already
> > implemented.
> > 2) Date-based scanning; its trivial to implement.
> > 
> > So now for each package, we have 2 straightforward signals. When was it
> > last touched, how many versions behind?
> > 
> > Rules:
> > A package is unmaintained if it:
> >   - Has not been touched in 5 years
> >   - Is behind 3 versions AND hasn't been touched in 2 years
> >   - Is behind 5 versions AND hasn't been touched in 1 years
> > 
> > As we add more signals (e.g. doesn't build, or unfixed bugs) we can add
> > additional rules.
> > 
> > We could generate a QA report per package on the qa reports page.
> > If there is an API for request the QA report, we could cross-link from
> > p.g.o.
> > 
> > -A
> > 
> > 
> > 
> > > --
> > > Best regards,
> > > Michał Górny
> > > 
> > > 
> As a side observation I'd like to exempt a package from being flagged as
> unmaintained if there's nothing wrong with it.  If upstream is idle and the
> package in a quiet state simply because there's no work needing done, then
> the package should be left alone.

This is the attitude that means that few months later a single person is
overburdened with a few dozens unmaintained packages all suddenly
falling apart.  Just like ncurses[tinfo].  Or openssl-1.1.

-- 
Best regards,
Michał Górny


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 17:38     ` Michał Górny
@ 2019-03-23 17:53       ` Raymond Jennings
  0 siblings, 0 replies; 13+ messages in thread
From: Raymond Jennings @ 2019-03-23 17:53 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 8161 bytes --]

On Sat, Mar 23, 2019 at 10:38 AM Michał Górny <mgorny@gentoo.org> wrote:

> On Sat, 2019-03-23 at 10:05 -0700, Raymond Jennings wrote:
> > On Sat, Mar 23, 2019 at 7:18 AM Alec Warner <antarus@gentoo.org> wrote:
> >
> > >
> > > On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@gentoo.org>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > Gentoo is still having a major problem of unmaintained packages.
> > > > I'm not talking about pure 'maintainer-needed' here but packages that
> > > > have apparent maintainers and stay under the radar for long, harming
> > > > users in the process.  I'd like to query potential solutions as how
> we
> > > > could improve this and look for new maintainers sooner.
> > > >
> > > >
> > > > The current state
> > > > =================
> > > > The definition of an unmaintained package here is a bit blurry.  For
> our
> > > > needs, let's say that an unmaintained package is a package that is
> not
> > > > getting attention of any of the maintainers, whose bugs are not
> looked
> > > > at, that does not receive version bumps or simply fails to build for
> > > > a long time.
> > > >
> > > > This is especially the case with 'revived herds', i.e. projects that
> > > > were formed from old herds.  Their main characteristic is that they
> > > > 'maintain' a large number of loosely-related packages, and their
> > > > developers take care of only a small subset of them.  Sadly, we still
> > > > have people who cherish that model, and instead of taking packages
> they
> > > > care about themselves, they shove it into one of 'their' herds.
> > > >
> > > > So far we're rarely catching such cases directly.  Sometimes it
> happens
> > > > when another developer tries to use the package and notices the
> problem,
> > > > then finds that it's been reported a long time ago and never received
> > > > any attention.
> > > >
> > > > Sometimes, after retiring a developer we notice that he had
> 'maintained'
> > > > packages that were broken for years and never received any attention.
> > > > There are even real cases of developers taking over broken packages
> just
> > > > to prevent them from being lastrited but without ever fixing them.
> > > >
> > > > Then, some of the packages are noticed as result of major API update
> > > > trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
> > > > Those API changes provoke build failures, and while investigating
> them
> > > > we discover that some of the software hasn't seen any upstream
> attention
> > > > since 2000 (!), not to mention maintainers that could actually patch
> > > > the issues.
> > > >
> > > >
> > > > Version bump-based inactivity?
> > > > ==============================
> > > > One of the options would be to monitor inactivity as negligence to
> bump
> > > > packages.  With euscan and/or repology, we are at least able to
> > > > partially monitor and report new versions of software (I think
> someone
> > > > used to do that but I don't see those reports anymore).  While this
> > > > still requires some manual processing (esp. given that repology
> results
> > > > are sometimes mistaken), it would be a step forward.
> > > >
> > > > The counterarguments for doing this is that not all version bumps are
> > > > meaningful to Gentoo.  We'd have to at least be able to filter out
> > > > development releases if maintainers are not doing them.  Sometimes we
> > > > also skip releases if they don't introduce anything meaningful to
> Gentoo
> > > > users.  Finally, some developers reject new versions of software for
> > > > various reasons.
> > > >
> > >
> > > I've also considered to just use time.
> > >
> > > Many *packages* have not been touched in N time. While some software
> > > doesn't get updates often, even routine maintenance should require
> edits on
> > > a fairly regular basis.
> > >
> > >
> > > >
> > > > Bugzilla-based inactivity?
> > > > ==========================
> > > > I've noticed something interesting in Fedora lately.  They have a
> policy
> > > > that if a package build failure is reported (note: they are reporting
> > > > them automatically) and the maintainer does not update it from the
> 'NEW'
> > > > state, it is automatically orphaned after 8 weeks.  Effectively,
> > > > if the maintainer does not take care (or at least pretends to)
> > > > of the package, it is orphaned automatically.
> > > >
> > > > I suppose we might be able to look for a similar policy in Gentoo.
> > > > However, there are two obvious counterarguments.  Firstly, this would
> > > > create 'busywork' that people would be required to do in order to
> > > > prevent from orphaning their packages.  Secondly, a fair number of
> > > > developers would just do this 'busywork' to every new bug just to
> avoid
> > > > the problem, rendering the measure ineffective.
> > > >
> > >
> > > Avoid letting the perfect be the enemy of the good here. Any metric
> can be
> > > gamed by developers; but it turns out we must choose some metric to
> drive
> > > the organization. I'm fairly sure not *all* developers will automate
> this
> > > busywork; because *some* of us want to see the number of unmaintained
> > > packages reduced; resulting in a net-win.
> > >
> > >
> > > >
> > > > What can we actually do?
> > > > ========================
> > > > Do you have any specific ideas how we could actually improve
> > > > the situation?  I'm particularly looking for things we could do at
> least
> > > > semi-automatically, without having to spend tremendous effort looking
> > > > through thousands of unhandled bugs manually.
> > > >
> > >
> > > So I'd recommend avoiding a specific implementation; which means don't
> > > trigger off of a specific signal.
> > >
> > > Signals:
> > > 1) euscan first; because its most accurate and plausible already
> > > implemented.
> > > 2) Date-based scanning; its trivial to implement.
> > >
> > > So now for each package, we have 2 straightforward signals. When was it
> > > last touched, how many versions behind?
> > >
> > > Rules:
> > > A package is unmaintained if it:
> > >   - Has not been touched in 5 years
> > >   - Is behind 3 versions AND hasn't been touched in 2 years
> > >   - Is behind 5 versions AND hasn't been touched in 1 years
> > >
> > > As we add more signals (e.g. doesn't build, or unfixed bugs) we can add
> > > additional rules.
> > >
> > > We could generate a QA report per package on the qa reports page.
> > > If there is an API for request the QA report, we could cross-link from
> > > p.g.o.
> > >
> > > -A
> > >
> > >
> > >
> > > > --
> > > > Best regards,
> > > > Michał Górny
> > > >
> > > >
> > As a side observation I'd like to exempt a package from being flagged as
> > unmaintained if there's nothing wrong with it.  If upstream is idle and
> the
> > package in a quiet state simply because there's no work needing done,
> then
> > the package should be left alone.
>
> This is the attitude that means that few months later a single person is
> overburdened with a few dozens unmaintained packages all suddenly
> falling apart.  Just like ncurses[tinfo].  Or openssl-1.1.
>

I wanted to point out that a package shouldn't be flagged as unmaintained
in the first place unless there is first a reason for it to be maintained.
Those should be weeded out as candidates under the principle of "if it
isn't broke don't fix it" since there's actually nothing wrong with the
package remaining status quo.

As it is the phase 4 I proposed is meant to catch broken packages that
either a) don't have a maintainer at all, or b) whose maintainer is
completely incommunicado, and not just busy.

To clarify context though, could you give an example, howsoever
hypothetical about "all suddenly falling apart"?  Perhaps you mean a
package that is a wide spread dependency, and its revdeps all break at the
same time due to some sort of api change or the like?  Is this what you
meant by ncurses and openssl-1.1?

>
> --
> Best regards,
> Michał Górny
>
>

[-- Attachment #2: Type: text/html, Size: 10354 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 14:17 ` Alec Warner
  2019-03-23 17:05   ` Raymond Jennings
@ 2019-03-23 18:25   ` Rich Freeman
  2019-03-23 19:22     ` Alec Warner
  2019-03-23 20:14     ` Raymond Jennings
  1 sibling, 2 replies; 13+ messages in thread
From: Rich Freeman @ 2019-03-23 18:25 UTC (permalink / raw
  To: gentoo-project

On Sat, Mar 23, 2019 at 10:17 AM Alec Warner <antarus@gentoo.org> wrote:
>
>
> Avoid letting the perfect be the enemy of the good here.

Indeed, we need to avoid treating packages as unmaintained simply
because they have open bugs.

Many packages have bugs that are fairly trivial in nature, or build
issues that only show up in fairly obscure configurations.  These
often affect only a single user.

If we treeclean the package we don't actually fix the problem - we
just drive it to an overlay.  Now instead of a package that works for
11/12 users and has an obscure but, we now have a package that isn't
getting monitored for security issues, and other QA issues that might
actually be fixed if they were pointed out.

> Rules:
> A package is unmaintained if it:
>   - Has not been touched in 5 years

Do we really want to bump packages just for the sake of saying that
they've been touched?  That seems a bit much.

>   - Is behind 3 versions AND hasn't been touched in 2 years

If we have the ability to detect if a package is behind upstream,
perhaps we should actually file bugs about this so that the maintainer
is aware.

However, the fact that a newer version exists doesn't necessarily mean
that there is a problem with the older version.  For some types of
software a maintainer might be picky about what updates they accept.
For example, they might need to synchronize versions with other
distros that update less often/etc.  They should of course accept
contributions from others willing to test, but the fact that somebody
is maintaining a package on Gentoo doesn't obligate them to always
support the latest version of that package.

Now, obviously if there is a security issue/etc then we should follow
the existing security policies, but those are already established.

-- 
Rich


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 18:25   ` Rich Freeman
@ 2019-03-23 19:22     ` Alec Warner
  2019-03-23 19:36       ` Rich Freeman
  2019-03-23 20:14     ` Raymond Jennings
  1 sibling, 1 reply; 13+ messages in thread
From: Alec Warner @ 2019-03-23 19:22 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

On Sat, Mar 23, 2019 at 2:26 PM Rich Freeman <rich0@gentoo.org> wrote:

> On Sat, Mar 23, 2019 at 10:17 AM Alec Warner <antarus@gentoo.org> wrote:
> >
> >
> > Avoid letting the perfect be the enemy of the good here.
>
> Indeed, we need to avoid treating packages as unmaintained simply
> because they have open bugs.
>
> Many packages have bugs that are fairly trivial in nature, or build
> issues that only show up in fairly obscure configurations.  These
> often affect only a single user.
>

So this is why I advocate for building a number of signals, and using a
combination of signals to determine if a package is unmaintained.


>
> If we treeclean the package we don't actually fix the problem - we
> just drive it to an overlay.  Now instead of a package that works for
> 11/12 users and has an obscure but, we now have a package that isn't
> getting monitored for security issues, and other QA issues that might
> actually be fixed if they were pointed out.
>
> > Rules:
> > A package is unmaintained if it:
> >   - Has not been touched in 5 years
>
> Do we really want to bump packages just for the sake of saying that
> they've been touched?  That seems a bit much.
>

I'm not saying "we should absolutely remove packages that have not been
touched in N years" but I am saying "we should review packages that have
not been touched in N years".


>
> >   - Is behind 3 versions AND hasn't been touched in 2 years
>
> If we have the ability to detect if a package is behind upstream,
> perhaps we should actually file bugs about this so that the maintainer
> is aware.
>
> However, the fact that a newer version exists doesn't necessarily mean
> that there is a problem with the older version.  For some types of
> software a maintainer might be picky about what updates they accept.
> For example, they might need to synchronize versions with other
> distros that update less often/etc.  They should of course accept
> contributions from others willing to test, but the fact that somebody
> is maintaining a package on Gentoo doesn't obligate them to always
> support the latest version of that package.
>

> Now, obviously if there is a security issue/etc then we should follow
> the existing security policies, but those are already established.
>

Would you be happier if there was some kind of opt-out or whitelist?

Have you looked at mgorny's recent removals? its mostly stuff that doesn't
build and hasn't been touched in 5 years and *yeah* I want that stuff out
of the tree; its a net negative for everyone. Keeping packages in the tree
isn't free.


>
> --
> Rich
>
>

[-- Attachment #2: Type: text/html, Size: 3850 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 19:22     ` Alec Warner
@ 2019-03-23 19:36       ` Rich Freeman
  0 siblings, 0 replies; 13+ messages in thread
From: Rich Freeman @ 2019-03-23 19:36 UTC (permalink / raw
  To: gentoo-project

On Sat, Mar 23, 2019 at 3:22 PM Alec Warner <antarus@gentoo.org> wrote:
>
> I'm not saying "we should absolutely remove packages that have not been touched in N years" but I am saying "we should review packages that have not been touched in N years".

++

> Have you looked at mgorny's recent removals? its mostly stuff that doesn't build and hasn't been touched in 5 years and *yeah* I want that stuff out of the tree; its a net negative for everyone. Keeping packages in the tree isn't free.

Also, ++

I completely support the general intent.  I'm just trying to maintain
balance as well.  A good approach would be to just auto-file a bug as
a ping and let the maintainer ack it as a first step.  If somebody is
getting a lot of pings maybe look at it more closely, and if a ping is
ignored then definitely react.  Ask maintainers to include in their
ack a brief rationale - it need not be extensive/etc, or even
carefully scrutinized, but it could give some perspective.  "Yes, I'm
aware that upstream has v25 and we're on v20, but API was broken in
v21 without SONAME change and most of the deps in the repo want v20 as
everybody thinks upstream is crazy."  As long as we aren't pinging the
same packages often that shouldn't be a big deal and will also
simplify review.

-- 
Rich


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23 18:25   ` Rich Freeman
  2019-03-23 19:22     ` Alec Warner
@ 2019-03-23 20:14     ` Raymond Jennings
  1 sibling, 0 replies; 13+ messages in thread
From: Raymond Jennings @ 2019-03-23 20:14 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 2309 bytes --]

On Sat, Mar 23, 2019 at 11:26 AM Rich Freeman <rich0@gentoo.org> wrote:

> On Sat, Mar 23, 2019 at 10:17 AM Alec Warner <antarus@gentoo.org> wrote:
> >
> >
> > Avoid letting the perfect be the enemy of the good here.
>
> Indeed, we need to avoid treating packages as unmaintained simply
> because they have open bugs.
>
> Many packages have bugs that are fairly trivial in nature, or build
> issues that only show up in fairly obscure configurations.  These
> often affect only a single user.
>
> If we treeclean the package we don't actually fix the problem - we
> just drive it to an overlay.  Now instead of a package that works for
> 11/12 users and has an obscure but, we now have a package that isn't
> getting monitored for security issues, and other QA issues that might
> actually be fixed if they were pointed out.
>
> > Rules:
> > A package is unmaintained if it:
> >   - Has not been touched in 5 years
>
> Do we really want to bump packages just for the sake of saying that
> they've been touched?  That seems a bit much.
>
> >   - Is behind 3 versions AND hasn't been touched in 2 years
>
> If we have the ability to detect if a package is behind upstream,
> perhaps we should actually file bugs about this so that the maintainer
> is aware.
>

This is part of the idea behind my plan to have open bugs be the first (but
probably not only, as the later phases demonstrate) symptom of trouble.

Apart from it not being fair to remove teh package unless it's actually
broken, it's also a good habit imo to encourage bugs (as long as they're
not frivolous) to be filed simply for documentation purposes.

However, the fact that a newer version exists doesn't necessarily mean
> that there is a problem with the older version.  For some types of
> software a maintainer might be picky about what updates they accept.
> For example, they might need to synchronize versions with other
> distros that update less often/etc.  They should of course accept
> contributions from others willing to test, but the fact that somebody
> is maintaining a package on Gentoo doesn't obligate them to always
> support the latest version of that package.
>
> Now, obviously if there is a security issue/etc then we should follow
> the existing security policies, but those are already established.
>
> --
> Rich
>
>

[-- Attachment #2: Type: text/html, Size: 3078 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-project] How to improve detection of unmaintained packages?
  2019-03-23  7:32 [gentoo-project] How to improve detection of unmaintained packages? Michał Górny
                   ` (2 preceding siblings ...)
  2019-03-23 14:17 ` Alec Warner
@ 2019-03-23 20:32 ` Toralf Förster
  3 siblings, 0 replies; 13+ messages in thread
From: Toralf Förster @ 2019-03-23 20:32 UTC (permalink / raw
  To: gentoo-project


[-- Attachment #1.1: Type: text/plain, Size: 578 bytes --]

On 3/23/19 8:32 AM, Michał Górny wrote:
> What can we actually do?
> ========================
> Do you have any specific ideas how we could actually improve
> the situation? 
Reminds me about my strategy when I started with the tinderbox.

Because it was intented as a help to improve the QA state I started to report only the most obvious issues for 2 common configs (in fact, my KDE desktop and my server). Way later I unleashed the tinderbox to look for a wider range of issues when the lopw hanging fruits were picked up.


-- 
Toralf
PGP 23217DA7 9B888F45


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-03-23 20:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-23  7:32 [gentoo-project] How to improve detection of unmaintained packages? Michał Górny
2019-03-23  8:04 ` Joonas Niilola
2019-03-23  8:48 ` Toralf Förster
2019-03-23  8:51   ` Michał Górny
2019-03-23 14:17 ` Alec Warner
2019-03-23 17:05   ` Raymond Jennings
2019-03-23 17:38     ` Michał Górny
2019-03-23 17:53       ` Raymond Jennings
2019-03-23 18:25   ` Rich Freeman
2019-03-23 19:22     ` Alec Warner
2019-03-23 19:36       ` Rich Freeman
2019-03-23 20:14     ` Raymond Jennings
2019-03-23 20:32 ` Toralf Förster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox