public inbox for gentoo-dev-announce@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Michał Górny" <mgorny@gentoo.org>
To: gentoo-dev-announce@lists.gentoo.org
Cc: gentoo-dev@lists.gentoo.org
Subject: [gentoo-dev-announce] Incoming NATTkA upgrade
Date: Fri, 16 Apr 2021 17:11:22 +0200	[thread overview]
Message-ID: <192cac75b99ca81111f4714bff0490a3e0d5a047.camel@gentoo.org> (raw)

Hello, everyone.

TL;DR:

1. There has been a few NATTkA misfires around 2 PM UTC today.
   I'm sorry for the noise.

2. In the next hour, a major NATTkA + pkgcore upgrade should roll out.
   No problems are expected but please contact me if you see weird
   behavior after the upgrade (especially incorrect sanity-check
   results).

3. A workaround has been added that should hopefully finally fix
   occasional misbehavior due to Bugzilla race conditions.  As a side
   effect, NATTkA may be a bit slower in responding to new bugs (up to
   4 minutes of delay).

Full explanation follows.


Infra's been running an old version of NATTkA for quite some time.
The previous upgrade attempt (that involved an incompatible pkgcheck API
change) failed due to some cryptic bugs.  A lot of stable/keywording
requests suddenly started failing -- and it seemed that pkgcheck was
checking keyworded ebuilds in the temporary against old dependencies
in /usr/portage.

I've been doing some new development in NATTkA today, and in order to
deploy it cleanly I've finally decided to try figuring out what's wrong
with new NATTkA + pkgcore.  I've installed the new versions on martin
(the Infra host that used to run NATTkA in the past), and started
testing them.

I didn't notice that puppet has failed to remove the old NATTkA cronjob
from martin.  So when NATTkA was installed again, the cronjob started
running the broken NATTkA version, and it started fighting with
the correct instance over bugs.  As a result, a few bugs has seen ping-
pong between sanity-check+ and sanity-check- results.  After noticing
the problem, I've removed the old cronjob.  I apologize for the bugspam
caused by this.

Good news is that I've discovered that upgrading to the latest ~arch
pkgcore & co. (unmasked versions) resolves the problem in question. 
Since NATTkA is run on a different host than other services requiring
old pkgcore, I am going to deploy the full set of new versions shortly.
The initial testing run didn't yield any suspicious results, so
hopefully there will be no major problems this time.

The new version also includes a workaround for weird NATTkA behavior --
you might have noticed in the past that NATTkA was readding arch teams
to fixed stabilization requests, or that today it reverted 'package
list' to an earlier state while expanding it.  I've been trying to
figure out what's wrong with NATTkA's logic for a long time, and I've
finally came to the conclusion that the problem is actually in Bugzilla.

I haven't verified the exact cause but it's most likely that Bugzilla is
executing multiple SELECT queries while performing the bug search,
and therefore could end up with combination of bug properties before
and after an update.  This is the only way I can explain bug #779535. 
In a single action, CC-ARCHES was added to the bug and the package list
was changed.  However, NATTkA has reverted to the old package list while
expanding -- which can happen only if the bug had CC-ARCHES already. 
Both keywords and package list is grabbed from Bugzilla via a single
REST API query, so my only explanation for this is that Bugzilla API has
returned new keywords but old package list.

To avoid this, NATTkA now skips bugs that were updated later than 60
seconds prior to running the search.  These bugs will be deferred to
the next run (i.e. 4 minutes later), and Bugzilla should sync up until
then.  Of course, this is going to work only if the 'last change time'
field is updated no later than other bug data.

If you have any questions or problems, please do not hesitate to contact
me or report a bug (either on Gentoo Bugzilla, or on NATTkA's GitHub
issue tracker).  That said, I realize there's a quite a number of
problems reported already, and I hope I'll be able to start addressing
them ~next month.

[1] https://bugs.gentoo.org/779535#c8

-- 
Best regards,
Michał Górny




                 reply	other threads:[~2021-04-16 15:12 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=192cac75b99ca81111f4714bff0490a3e0d5a047.camel@gentoo.org \
    --to=mgorny@gentoo.org \
    --cc=gentoo-dev-announce@lists.gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox