From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gentoo-dev+bounces-80920-garchives=archives.gentoo.org@lists.gentoo.org>
Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by finch.gentoo.org (Postfix) with ESMTPS id 4561C139694
	for <garchives@archives.gentoo.org>; Sat,  3 Jun 2017 11:00:24 +0000 (UTC)
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 8F413E0C0D;
	Sat,  3 Jun 2017 11:00:13 +0000 (UTC)
Received: from smtp.gentoo.org (mail.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by pigeon.gentoo.org (Postfix) with ESMTPS id 3C257E0BEF
	for <gentoo-dev@lists.gentoo.org>; Sat,  3 Jun 2017 11:00:13 +0000 (UTC)
Received: from localhost (unknown [IPv6:2a01:e34:eeaa:6bd0:4ecc:6aff:fe03:1cfc])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	(Authenticated sender: aballier)
	by smtp.gentoo.org (Postfix) with ESMTPSA id CD792340806
	for <gentoo-dev@lists.gentoo.org>; Sat,  3 Jun 2017 11:00:11 +0000 (UTC)
Date: Sat, 3 Jun 2017 13:00:00 +0200
From: Alexis Ballier <aballier@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints
 (codename: ENFORCED_USE)
Message-ID: <20170603130000.4f88fb14@gentoo.org>
In-Reply-To: <1496411717.29233.5.camel@gentoo.org>
References: <1496071993.31087.1.camel@gentoo.org>
	<20170529200037.2559f80a@gentoo.org>
	<1496093035.12795.3.camel@gentoo.org>
	<20170530094245.40e1cf64@gentoo.org>
	<20170530092245.681d4aeb@snowblower>
	<20170530104654.31b89e10@gentoo.org>
	<20170530095607.1adbc0b8@snowblower>
	<20170530112518.65b4f9e9@gentoo.org>
	<22829.24276.295.969060@a1i15.kph.uni-mainz.de>
	<1496154812.1238.5.camel@gentoo.org>
	<20170530173340.0b575526@gentoo.org>
	<1496167898.1335.1.camel@gentoo.org>
	<20170530204614.61e8e42c@gentoo.org>
	<1496213717.1164.1.camel@gentoo.org>
	<20170531093257.23b66f88@gentoo.org>
	<1496217792.1164.5.camel@gentoo.org>
	<20170531103819.417c2420@gentoo.org>
	<1496235892.25038.1.camel@gentoo.org>
	<20170531193922.477245bb@gentoo.org>
	<1496257344.25758.1.camel@gentoo.org>
	<20170601105523.08a9234e@gentoo.org>
	<1496352685.30502.4.camel@gentoo.org>
	<20170602132758.50a5f734@gentoo.org>
	<1496411717.29233.5.camel@gentoo.org>
Organization: Gentoo
X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu)
Precedence: bulk
List-Post: <mailto:gentoo-dev@lists.gentoo.org>
List-Help: <mailto:gentoo-dev+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-dev+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@lists.gentoo.org
Reply-to: gentoo-dev@lists.gentoo.org
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Archives-Salt: 23504e88-6a76-498e-921a-1e62af56835d
X-Archives-Hash: 60cb383907db3957d5e87cfa5662a031

On Fri, 02 Jun 2017 15:55:17 +0200
Micha=C5=82 G=C3=B3rny <mgorny@gentoo.org> wrote:

> On pi=C4=85, 2017-06-02 at 13:27 +0200, Alexis Ballier wrote:
> > On Thu, 01 Jun 2017 23:31:25 +0200
> > Micha=C5=82 G=C3=B3rny <mgorny@gentoo.org> wrote:
> > [...] =20
> > > > There are probably dozens of ways to make that non
> > > > deterministic. Here's one: USE=3D'-*'. Apply '|| ( cli cgi fpm
> > > > apache2 embed phpdbg )' last; this enables 'cli'. Since it's
> > > > the last one, REQUIRED_USE is not satisfied because of 'cli?
> > > > ( ^^ ( readline libedit ) )'. If you process it first, then you
> > > > enable cli and readline and are good.   =20
> > >=20
> > > You just take a second iteration, and things fall into place.
> > > That's not a problem.
> > >  =20
> > > > > > Also, what happens if we applied all the constraints and
> > > > > > obtained some useflags setting that still fails REQUIRED_USE
> > > > > > check ?     =20
> > > > >=20
> > > > > It can't happen. If you can apply all the constraints, then
> > > > > implicitly REQUIRED_USE is satisfied. If you can't apply all
> > > > > the constraints, then it just fails. Of course, we want to
> > > > > ultimately avoid that case.   =20
> > > >=20
> > > > See the php example above. How do you ensure this does not
> > > > happen?
> > > >=20
> > > > Note that your assertion 'If you can apply all the constraints,
> > > > then implicitly REQUIRED_USE is satisfied.' is false: You're
> > > > changing the values of useflags, thus might violate some
> > > > previously checked constraint.   =20
> > >=20
> > > So you reiterate. Either you reach a solution (preferably there's
> > > only a single valid solution you can reach) or you hit a previous
> > > state which indicates a loop and you fail. =20
> >=20
> >=20
> > So, PM, for every ebuild, will need to store the (at most) 2^n
> > states it has already seen while trying to solve REQUIRED_USE and
> > iterate until it cannot proceed anymore or falls into a previous
> > state (so that's 2^n time too). That way we go from a linear time &
> > linear space algorithm to one in exponential time & space. That's
> > probably not a good idea. =20
>=20
> I don't think that's actually going to happen. You'd have to try
> really hard to get over n-1 iterations. I mean, the only simple way I
> can think of is:
>=20
>   b? ( a ) c? ( b ) d? ( c ) e? ( d ) ...
>=20
> and this only means n-1 iterations. Can you think of a better way to
> force multiple iterations that can be written without making
> REQUIRED_USE completely unreadable? How likely is that going to happen
> accidentally?

I don't see any reason why it should terminate after n iterations; I
don't see any example of how to do more either. I can try to think more
about it, but maybe it is not even needed, see below.

> > I think it's better to limit the number of iterations to some
> > constant. I'd say 1, then fail if REQUIRED_USE is still not
> > satisfied. Maybe 2 or 3 can be useful but I think it'd be much
> > harder to provide automated checks for that and much harder for
> > ebuild writers to understand what will happen with the REQUIRED_USE
> > they're about to write.=20
>=20
> Well, I don't mind setting that. All my tests (that weren't
> deliberately abusing the algorithm) were able to finish in a single
> iteration. 2 or 3 should probably be safe for all the reasonable
> uses. However, if we're not going to verify all possible values on
> repoman side, I think it would be better to have a larger limit for
> users, to not have things explode on them unnecessarily.


Look, if we assume left to right solving, only one iteration possible
and all the constraints to be of the form 'p_1? p_2? ... p_n? ( q_1 ...
q_m )' where p_i and q_i are either 'useflag' or '!useflag', then we
only have to check when such a clause can change a previously satisfied
clause to unsatisfied.

For two clauses:=20
"p_1? p_2? ... p_n? ( q_1 ... q_m )" and "p'_1? p'_2? ... p'_{n'}?
( q'_1 ... q'_{m'} )", assuming the first is satisfied, when can solving
the 2nd break the 1st?

It must be that:
1.The conditions are compatible: No p_i is the negation of a p'_j.
2.Solving the 1st does not make the 2nd trivially true: No q_i is
  the negation of a p'_j.
3.Solving the 2nd does not make the 1st trivially true afterwards: No
  p_i is the negation of a q'_j.
4.Solving the 2nd does break the 1st assumption: (A q_i is
  the negation of a q_'j) or (a q'_j is some p_i and one of q_1 ... q_m
  might be false).

The only a priori non polynomial part here is "one of q_1 ... q_m
might be false". If we do not check that (only check that some q'_j is
some p_i), we are still guaranteed that the 2nd clause cannot break the
1st but we might end up rejecting otherwise valid constructs.

Doing that, we have a check guaranteeing that the above algorithm will
always provide a valid answer for 'clause_1 clause_2' in
O(|clause_1|*|clause_2|) time.
For a complete REQUIRED_USE=3D'clause_1 ... clause_k', it is sufficient
to check that, for any i>j, clause_i cannot break clause_j in the above
sense. This gives a polynomial algorithm for being certain that a
REQUIRED_USE constraint guarantees the existence of a valid solution
after one pass.


We can do even better: Consider a graph where the nodes are the clauses
and there is an edge 'a -> b' if 'a' can break 'b' in the above sense.
This means 'a' must come left of 'b' in REQUIRED_USE. Hence, repoman
can do a topological sort of that graph and suggest a proper reordering
of the clauses or, when not possible, exhibit a cycle causing a problem
asking the developer to fix it.


Let's run a few examples to see if dropping the condition in point 4.
is not too much to ask:
"
   || ( aqua wayland X )
   xinerama? ( X )
"
Becomes:
"
  !X? ( !wayland? ( aqua ) )
  xinerama? ( X )
"
Points 1. and 2. above hold, point 3 fails with p_i =3D !X, q'_j =3D X. So
they're good and the constraint is valid in the above sense.


Another example, simplified form from php:
"
  cli? ( ^^ ( readline libedit ) )=20
  || ( cli cgi )
"
Becomes:
"
(a)   cli? ( !libedit? ( readline ) )
(b)   cli? ( libedit? ( !readline ) )
(c)   !cgi? ( cli )
"

Let's build the graph: No edge between (a) and (b) because they fail
point 1. No edge (a) -> (c) nor (b) -> (c): They fail point 4 since
"readline" does not appear in (c). There are edges (c) -> (a) and (c) ->
(b): Point 1 to 3 hold, point 4 holds because of 'a q'_j is some p_i'
with q'_j =3D p_i =3D "cli". The ordering is thus invalid because (a) and
(b) come before (c) and there are edges going from (c) to both of them.
A topological sort will give you something like "(c) (a) (b)" and
repoman can suggest you to rewrite that as:
"
(c)   !cgi? ( cli )
(a)   cli? ( !libedit? ( readline ) )
(b)   cli? ( libedit? ( !readline ) )
"
In addition, if we can ensure the topological sort is somewhat stable
(*), then it is possible to infer back that (a) (b) come from "cli? ( ^^
( readline libedit ) )" and (c) comes from "|| ( cli cgi )" so the
repoman warning/error could look like:
Hey, your REQUIRED_USE does not allow easy solving in one pass, would
you mind using "|| ( cli cgi ) cli? ( ^^ ( readline libedit ) )"
instead ?

(*) Ensuring clauses expanded from the same construct in REQUIRED_USE
stay together and thus can be grouped back to their original form. This
might not be possible or easy though.


A last example: The mysql/mysqli from php you mentioned in an earlier
email here.
"
(a)   recode? ( !imap !mysqli )
(b)   mysql? !pdo? ( mysqli )
"
There are edges (a) -> (b) and (b) -> (a): Points 1 to 3 hold; Point 4
holds because 'A q_i is the negation of a q_'j' with mysqli/!mysqli.
The topological sort will fail since there is a cycle and will exhibit
this cycle. repoman warning/error could look like:
Hey, your constraint does not allow automatic solving, there is a cycle
'recode? ( !imap !mysqli )' -> 'mysql? !pdo? ( mysqli )' -> 'recode?
( !imap !mysqli )'.


This whole thing definitely needs more thought and feedback but for now
those extra restrictions seem quite natural to me, allow easy solving
on the PM side and allow to have useful feedback from repoman.

Alexis.