From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lists.gentoo.org ([140.105.134.102] helo=robin.gentoo.org) by nuthatch.gentoo.org with esmtp (Exim 4.43) id 1E8eLC-00012H-A3 for garchives@archives.gentoo.org; Fri, 26 Aug 2005 13:33:54 +0000 Received: from robin.gentoo.org (localhost [127.0.0.1]) by robin.gentoo.org (8.13.4/8.13.4) with SMTP id j7QDULdT021027; Fri, 26 Aug 2005 13:30:21 GMT Received: from callisto.cs.kun.nl (callisto.cs.kun.nl [131.174.33.75]) by robin.gentoo.org (8.13.4/8.13.4) with ESMTP id j7QDNgIq002966 for ; Fri, 26 Aug 2005 13:23:42 GMT Received: from localhost (localhost [127.0.0.1]) by callisto.cs.kun.nl (Postfix) with ESMTP id B9A322E821C for ; Fri, 26 Aug 2005 13:19:38 +0200 (CEST) From: Paul de Vrieze To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-portage-dev] Re: [gentoo-dev] EBUILD_FORMAT support User-Agent: KMail/1.8.2 References: <20050707002002.GH20687@lightning.stealer.net> <200508251234.00876.pauldv@gentoo.org> <20050826073529.GP1701@nightcrawler> In-Reply-To: <20050826073529.GP1701@nightcrawler> X-Face: #Lb+'V@sGJ;ptgo5}V"W+5OCoo{LZv;bh,s,`WKLi/J)ed1_$0;6X<=?utf-8?q?700LVV/=3BLqPhiDP=5E=0A=09=27f=5Dfnv?=@%6M8\'HR1t=aFx;ePfp{ZQoBe+e)JOQ8T5*(_;mHY+cltLGq<;@$Y,=?utf-8?q?O=5C=24=0A=09Tm=23G6M?=,g![Q62J{na*S9d;R[^8pc%u\aiLqU@`kJtYl"^6pxdW Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 X-UID: 3260 X-Length: 10096 Date: Fri, 26 Aug 2005 13:19:37 +0200 Content-Type: multipart/signed; boundary="nextPart1781586.2rGcDJ6Aez"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200508261319.37485.pauldv@gentoo.org> X-Archives-Salt: 79dc1e85-2d2e-40da-a377-0c8d92878f68 X-Archives-Hash: 3a1eba457e9428e21b6556d85d3d377a --nextPart1781586.2rGcDJ6Aez Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Friday 26 August 2005 09:35, Brian Harring wrote: > Any parser that doesn't support full bash syntax isn't acceptable from > where I sit; re: slow down, 2.1 is around 33% faster sourcing the > whole tree (some cases 60% faster, some 5%, etc). The speed up's are > also what allow template's to be swapped, the eapi concept. =46or the toplevel of the ebuilds there are many things that are not=20 allowed. Basically things must be deterministic for the cache to work. I=20 have built an extension that would parse 98% of current ebuilds properly,=20 and much (more than 10 times) faster than the bash/ecache way. It is in=20 the shape of a python module written in C. It just ignores the functions,=20 so anything is allowed in there. As such the parser understands enough of=20 bash to support it. Even variable substitution and inherit are supported.=20 What's not supported is various kinds of uncommon substitution tricks=20 that should probably not happen in the toplevel either. Using EAPI would also allow to see something as capabilities. Say have=20 portage support version 2-relaxed and version 2-strict. 2-relaxed has all=20 the bash freedom and is parsed using bash. 2-strict would allow parsing=20 by a faster parser module, but would limit the bash freedom. I don't say=20 we have to do this, but if ebuild and eclass EAPI declarations follow a=20 few very simple rules that are normally obeyed, it would be possible to=20 support this thing in the future. One of the problems I see with the current ebuild format is that it is=20 impossible to do incompatible changes at all. This means that many=20 features that might be desired can not be implemented. EAPI can relieve=20 that. To make it easier there should be an easy way to get the EAPI of a=20 package. > > I'd note limiting the bash capabilities is a restriction that > transcends anything EAPI should supply; changes to what's possible in > the language (a subset of bash syntax as you're suggesting) are a > seperate format from where I draw the line in the sand. What I suggest is making a policy that would make this possible in the=20 future. Note that I do not wish to restrict any bash functionality in the=20 various functions in the ebuild.=20 > Mainly, limiting the syntax has the undesired affect of deviating from > what users/devs know already; mistakes *will* occur. QA tools can be > written, but people are fallable; both in writing a QA tool, and > abiding by the syntax subset allowed. The QA tools would just be running the parser. If the parser chokes (which= =20 it doesn't easilly) then the ebuild does not conform to the correct=20 syntax. It's even possible to just compare the variables returned. If=20 they don't match, the format is wrong for the C parser. > > > The restriction I propose would be: > > - If EAPI is defined in the ebuild it should be unconditional, on > > it's own line in the toplevel of the ebuild before any functions are > > defined. (preferably the first element after the comments and > > whitespace) > > > > - If EAPI is not defined in the ebuild, but in an eclass, the inherit > > chain should be unconditional and direct. Further more in the > > eclass the above rules should be followed. > > > > Please note that many of the conditions are allready true for current > > ebuilds, just portage can "handle" more. > > inherit chain must be unconditional anyways. re: eapi placement, I > would view that as somewhat arbitrary; the question is what gain it > would give. The gain of putting it at the top would be that there are less chances for= =20 parsers to have choked on incompatible syntax. If EAPI is in the top, at=20 some point incompatible syntax might be allowed, and older parsers could=20 still retrieve the EAPI. Of course any syntax that works on 'egrep=20 "^[ \t]*EAPI[ \t]*=3D"' should be no problem. > > I'd wonder about the parsing speed of your parser; the difference > between parsing ebuilds and running from cache metadata is several > orders of magnitude differant- the current cache backend flat_list > and portage design properly corrected ought to widen the gap too. > General cache lookup is slow due to- > A) bad call patterns, allowed by the api; N calls to get different > bits of metadata from a cpv, resulting in potentially N to disk set > of ops. > B) default cache requires opening/closing a file per cpv lookup; > syscall's are killer here. > C) every metadata lookup incurs 2 stats, ebuild and cache file. This parser was part of a stranded rewrite attempt. One of the features=20 was that it regarded packages and package instances (specific files) as=20 objects whose attributes would be lazilly evaluated. That means that it=20 would parse if not available, lookup otherwise. The speed of "emerge -s"=20 is stunning on the program as it uses a directory search which is orders=20 of magnitudes faster than python doing the same thing. > Getting to the point; cache is 100x to 400x faster then sourcing for > <=3D2.0.51. Haven't tested it under 2.1, should be different due to > cache and regen fixups/rewrites. Don't forget the fact that bash must be execed for normal parses, and that= =20 python has extremely slow string handling when not using one of the=20 standard parsing modules (that work in C). To put my money where my mouth=20 is, I've tarred up my code and put it on my dev space: http://dev.gentoo.org/~pauldv/portage_native-0.1.tar.bz2 Just run make in the extracted dir. The binary created is xbuildparse,=20 this is a standalone parser that takes the ebuild as argument. It will=20 look for eclasses in /usr/portage/eclass. The python module can be built with "make xbuildparse.so", and includes a=20 little bit of help reachable through the normal python way. > > Back to the point, essentially, EAPI matters in two places; > 1) metadata transfer from the ebuild env into python side during > depends phase; has to know what to transfer key wise. > 2) actual ebuild build phase executions; if it isn't the depends phase, > eapi being required so that the parser can swap drop in the > appropriate ebuild env template. I think it also matters in actually allowing future incompatible versions=20 of ebuild formats. I don't mean to say good bye to the current format,=20 but when redesigning the format, we should now design it for=20 extensionability. > The restrictions suggested for EAPI would only make sense if eyeing > #1, an alternative parser; no reason to drop the cache unless the > parser is capable of hitting the same runtime performance the cache > can hit (frankly, it's not possible from where I'm sitting although > the gap can be narrowed). You're probably right, but the time needed to parse an ebuild can be=20 reduced that much that parsing will not be the issue anymore, but=20 building the right tree is: time ./xbuildparse /usr/portage/sys-libs/db/db-4.2.52_p2.ebuild=20 &>/dev/null real 0m0.054s user 0m0.048s sys 0m0.002s Please note that the parser is incomplete, does have some small bugs=20 (don't try it on flag-o-matic as it someway goes into an endless loop),=20 and could probably do some things smarter. > So... the EAPI limitations, not much for due to the conclusion above. > > Interested in the parser however, since ebd is effectively a pipe > hack so that pythonic portage can control ebuild.sh. I (and others) > have been after a bashlib for a while, just no one has crunched down > and done it (easier said then done I suspect). See it above. It does not fully understand every bash statement around.=20 And important is that it currently does not understand the "if"=20 statement. This is easy to add though, just wasn't added out of "policy".=20 But being that even my own ebuilds (like db) use it, it should probably=20 be added. I do believe that the parser could be made usefull for most ebuilds. This=20 would however still mean a small restriction in allowed syntax. The=20 parser module has basically one function which is "parse" it parses an=20 ebuild, the eclasses, and returns a list of variables. Not all variables=20 are substituted though, I have a python function that does this. If=20 people are interested I can take a look at sanitizing my whole tree and=20 providing it. Paul =2D-=20 Paul de Vrieze Gentoo Developer Mail: pauldv@gentoo.org Homepage: http://www.devrieze.net --nextPart1781586.2rGcDJ6Aez Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQBDDvrJbKx5DBjWFdsRAoMoAKClsDD4QmexUE+g8DPZ+VHzc2l2XwCgl4LB Atulbb5fks2dAZXYfIyoXGQ= =F/No -----END PGP SIGNATURE----- --nextPart1781586.2rGcDJ6Aez-- -- gentoo-dev@gentoo.org mailing list