From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 4561C139694 for ; Sat, 3 Jun 2017 11:00:24 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 8F413E0C0D; Sat, 3 Jun 2017 11:00:13 +0000 (UTC) Received: from smtp.gentoo.org (mail.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 3C257E0BEF for ; Sat, 3 Jun 2017 11:00:13 +0000 (UTC) Received: from localhost (unknown [IPv6:2a01:e34:eeaa:6bd0:4ecc:6aff:fe03:1cfc]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: aballier) by smtp.gentoo.org (Postfix) with ESMTPSA id CD792340806 for ; Sat, 3 Jun 2017 11:00:11 +0000 (UTC) Date: Sat, 3 Jun 2017 13:00:00 +0200 From: Alexis Ballier To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] [RFC] Forced/automatic USE flag constraints (codename: ENFORCED_USE) Message-ID: <20170603130000.4f88fb14@gentoo.org> In-Reply-To: <1496411717.29233.5.camel@gentoo.org> References: <1496071993.31087.1.camel@gentoo.org> <20170529200037.2559f80a@gentoo.org> <1496093035.12795.3.camel@gentoo.org> <20170530094245.40e1cf64@gentoo.org> <20170530092245.681d4aeb@snowblower> <20170530104654.31b89e10@gentoo.org> <20170530095607.1adbc0b8@snowblower> <20170530112518.65b4f9e9@gentoo.org> <22829.24276.295.969060@a1i15.kph.uni-mainz.de> <1496154812.1238.5.camel@gentoo.org> <20170530173340.0b575526@gentoo.org> <1496167898.1335.1.camel@gentoo.org> <20170530204614.61e8e42c@gentoo.org> <1496213717.1164.1.camel@gentoo.org> <20170531093257.23b66f88@gentoo.org> <1496217792.1164.5.camel@gentoo.org> <20170531103819.417c2420@gentoo.org> <1496235892.25038.1.camel@gentoo.org> <20170531193922.477245bb@gentoo.org> <1496257344.25758.1.camel@gentoo.org> <20170601105523.08a9234e@gentoo.org> <1496352685.30502.4.camel@gentoo.org> <20170602132758.50a5f734@gentoo.org> <1496411717.29233.5.camel@gentoo.org> Organization: Gentoo X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Archives-Salt: 23504e88-6a76-498e-921a-1e62af56835d X-Archives-Hash: 60cb383907db3957d5e87cfa5662a031 On Fri, 02 Jun 2017 15:55:17 +0200 Micha=C5=82 G=C3=B3rny wrote: > On pi=C4=85, 2017-06-02 at 13:27 +0200, Alexis Ballier wrote: > > On Thu, 01 Jun 2017 23:31:25 +0200 > > Micha=C5=82 G=C3=B3rny wrote: > > [...] =20 > > > > There are probably dozens of ways to make that non > > > > deterministic. Here's one: USE=3D'-*'. Apply '|| ( cli cgi fpm > > > > apache2 embed phpdbg )' last; this enables 'cli'. Since it's > > > > the last one, REQUIRED_USE is not satisfied because of 'cli? > > > > ( ^^ ( readline libedit ) )'. If you process it first, then you > > > > enable cli and readline and are good. =20 > > >=20 > > > You just take a second iteration, and things fall into place. > > > That's not a problem. > > > =20 > > > > > > Also, what happens if we applied all the constraints and > > > > > > obtained some useflags setting that still fails REQUIRED_USE > > > > > > check ? =20 > > > > >=20 > > > > > It can't happen. If you can apply all the constraints, then > > > > > implicitly REQUIRED_USE is satisfied. If you can't apply all > > > > > the constraints, then it just fails. Of course, we want to > > > > > ultimately avoid that case. =20 > > > >=20 > > > > See the php example above. How do you ensure this does not > > > > happen? > > > >=20 > > > > Note that your assertion 'If you can apply all the constraints, > > > > then implicitly REQUIRED_USE is satisfied.' is false: You're > > > > changing the values of useflags, thus might violate some > > > > previously checked constraint. =20 > > >=20 > > > So you reiterate. Either you reach a solution (preferably there's > > > only a single valid solution you can reach) or you hit a previous > > > state which indicates a loop and you fail. =20 > >=20 > >=20 > > So, PM, for every ebuild, will need to store the (at most) 2^n > > states it has already seen while trying to solve REQUIRED_USE and > > iterate until it cannot proceed anymore or falls into a previous > > state (so that's 2^n time too). That way we go from a linear time & > > linear space algorithm to one in exponential time & space. That's > > probably not a good idea. =20 >=20 > I don't think that's actually going to happen. You'd have to try > really hard to get over n-1 iterations. I mean, the only simple way I > can think of is: >=20 > b? ( a ) c? ( b ) d? ( c ) e? ( d ) ... >=20 > and this only means n-1 iterations. Can you think of a better way to > force multiple iterations that can be written without making > REQUIRED_USE completely unreadable? How likely is that going to happen > accidentally? I don't see any reason why it should terminate after n iterations; I don't see any example of how to do more either. I can try to think more about it, but maybe it is not even needed, see below. > > I think it's better to limit the number of iterations to some > > constant. I'd say 1, then fail if REQUIRED_USE is still not > > satisfied. Maybe 2 or 3 can be useful but I think it'd be much > > harder to provide automated checks for that and much harder for > > ebuild writers to understand what will happen with the REQUIRED_USE > > they're about to write.=20 >=20 > Well, I don't mind setting that. All my tests (that weren't > deliberately abusing the algorithm) were able to finish in a single > iteration. 2 or 3 should probably be safe for all the reasonable > uses. However, if we're not going to verify all possible values on > repoman side, I think it would be better to have a larger limit for > users, to not have things explode on them unnecessarily. Look, if we assume left to right solving, only one iteration possible and all the constraints to be of the form 'p_1? p_2? ... p_n? ( q_1 ... q_m )' where p_i and q_i are either 'useflag' or '!useflag', then we only have to check when such a clause can change a previously satisfied clause to unsatisfied. For two clauses:=20 "p_1? p_2? ... p_n? ( q_1 ... q_m )" and "p'_1? p'_2? ... p'_{n'}? ( q'_1 ... q'_{m'} )", assuming the first is satisfied, when can solving the 2nd break the 1st? It must be that: 1.The conditions are compatible: No p_i is the negation of a p'_j. 2.Solving the 1st does not make the 2nd trivially true: No q_i is the negation of a p'_j. 3.Solving the 2nd does not make the 1st trivially true afterwards: No p_i is the negation of a q'_j. 4.Solving the 2nd does break the 1st assumption: (A q_i is the negation of a q_'j) or (a q'_j is some p_i and one of q_1 ... q_m might be false). The only a priori non polynomial part here is "one of q_1 ... q_m might be false". If we do not check that (only check that some q'_j is some p_i), we are still guaranteed that the 2nd clause cannot break the 1st but we might end up rejecting otherwise valid constructs. Doing that, we have a check guaranteeing that the above algorithm will always provide a valid answer for 'clause_1 clause_2' in O(|clause_1|*|clause_2|) time. For a complete REQUIRED_USE=3D'clause_1 ... clause_k', it is sufficient to check that, for any i>j, clause_i cannot break clause_j in the above sense. This gives a polynomial algorithm for being certain that a REQUIRED_USE constraint guarantees the existence of a valid solution after one pass. We can do even better: Consider a graph where the nodes are the clauses and there is an edge 'a -> b' if 'a' can break 'b' in the above sense. This means 'a' must come left of 'b' in REQUIRED_USE. Hence, repoman can do a topological sort of that graph and suggest a proper reordering of the clauses or, when not possible, exhibit a cycle causing a problem asking the developer to fix it. Let's run a few examples to see if dropping the condition in point 4. is not too much to ask: " || ( aqua wayland X ) xinerama? ( X ) " Becomes: " !X? ( !wayland? ( aqua ) ) xinerama? ( X ) " Points 1. and 2. above hold, point 3 fails with p_i =3D !X, q'_j =3D X. So they're good and the constraint is valid in the above sense. Another example, simplified form from php: " cli? ( ^^ ( readline libedit ) )=20 || ( cli cgi ) " Becomes: " (a) cli? ( !libedit? ( readline ) ) (b) cli? ( libedit? ( !readline ) ) (c) !cgi? ( cli ) " Let's build the graph: No edge between (a) and (b) because they fail point 1. No edge (a) -> (c) nor (b) -> (c): They fail point 4 since "readline" does not appear in (c). There are edges (c) -> (a) and (c) -> (b): Point 1 to 3 hold, point 4 holds because of 'a q'_j is some p_i' with q'_j =3D p_i =3D "cli". The ordering is thus invalid because (a) and (b) come before (c) and there are edges going from (c) to both of them. A topological sort will give you something like "(c) (a) (b)" and repoman can suggest you to rewrite that as: " (c) !cgi? ( cli ) (a) cli? ( !libedit? ( readline ) ) (b) cli? ( libedit? ( !readline ) ) " In addition, if we can ensure the topological sort is somewhat stable (*), then it is possible to infer back that (a) (b) come from "cli? ( ^^ ( readline libedit ) )" and (c) comes from "|| ( cli cgi )" so the repoman warning/error could look like: Hey, your REQUIRED_USE does not allow easy solving in one pass, would you mind using "|| ( cli cgi ) cli? ( ^^ ( readline libedit ) )" instead ? (*) Ensuring clauses expanded from the same construct in REQUIRED_USE stay together and thus can be grouped back to their original form. This might not be possible or easy though. A last example: The mysql/mysqli from php you mentioned in an earlier email here. " (a) recode? ( !imap !mysqli ) (b) mysql? !pdo? ( mysqli ) " There are edges (a) -> (b) and (b) -> (a): Points 1 to 3 hold; Point 4 holds because 'A q_i is the negation of a q_'j' with mysqli/!mysqli. The topological sort will fail since there is a cycle and will exhibit this cycle. repoman warning/error could look like: Hey, your constraint does not allow automatic solving, there is a cycle 'recode? ( !imap !mysqli )' -> 'mysql? !pdo? ( mysqli )' -> 'recode? ( !imap !mysqli )'. This whole thing definitely needs more thought and feedback but for now those extra restrictions seem quite natural to me, allow easy solving on the PM side and allow to have useful feedback from repoman. Alexis.