From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1L7Kup-0004C1-Nd for garchives@archives.gentoo.org; Tue, 02 Dec 2008 02:23:08 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 03786E03FC; Tue, 2 Dec 2008 02:23:07 +0000 (UTC) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.183]) by pigeon.gentoo.org (Postfix) with ESMTP id 379C4E03FC for ; Tue, 2 Dec 2008 02:23:06 +0000 (UTC) Received: by wa-out-1112.google.com with SMTP id v33so1354377wah.2 for ; Mon, 01 Dec 2008 18:23:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=DhdiTDPpmx2PnYigOX6rZgc2S1lgtzDFCCO7hGnH68s=; b=D4rMKfGWu741w5YX3FDEpIkiHFj+hh6ke3WjGkuCltAYkW2SPOvq8LpQa5nGG/BH+7 CSTGrFmVpV8JHqFhnF/7piZWM+EHKplBtY04tqwhCFb3GJXuHk8oEbObIFaryjpw02z4 Bx1M9joQURrbZw4dgtBsTdGfclwP4GLYdFnX4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=HULaxgfaUDVgA4/Sv0sy8bN1Lj41IJ3IjqfJVoHOTi4MbkVMxZkbS+jQhcOd5UDA0G SQUUJZIhXRQSqqs0ILCxlFCODI46py/QEkmQcgKzL8BH2ywqegrL+9xlgU9AguHj1E3O JtR0theD5Pi7tlLVhJ4lw6I03049EPh3J5PRE= Received: by 10.114.197.10 with SMTP id u10mr6956245waf.96.1228184584865; Mon, 01 Dec 2008 18:23:04 -0800 (PST) Received: by 10.114.174.15 with HTTP; Mon, 1 Dec 2008 18:23:04 -0800 (PST) Message-ID: <5a8c638a0812011823x3fc3c3eesc0aa73566d6bc838@mail.gmail.com> Date: Mon, 1 Dec 2008 21:23:04 -0500 From: "Emma Strubell" To: gentoo-portage-dev@lists.gentoo.org Subject: Re: [gentoo-portage-dev] Re: search functionality in emerge In-Reply-To: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_35780_26843285.1228184584839" References: <5a8c638a0811230417r5bcf912fka14a18edc9c711b6@mail.gmail.com> <5a8c638a0812010240m1e9a64a1t6ea0980dcb1baffa@mail.gmail.com> <49342452.1050606@gentoo.org> <5a8c638a0812011325l7c4231d7n85bbe63e69f2d0fe@mail.gmail.com> <5a8c638a0812011408j792bbda4r3716d04088efab4f@mail.gmail.com> <49346292.2000307@necoro.eu> <5a8c638a0812011447g37c4900y1446401b85beb87a@mail.gmail.com> X-Archives-Salt: f884ddba-60af-4c92-8a70-958a93da93a6 X-Archives-Hash: a25357ead01a790e8e5921e46430a19a ------=_Part_35780_26843285.1228184584839 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline yes, yes, i know, you're right :] and thanks a bunch for the outline! about the compression, I agree that it would be a good idea, but I don't know how to implement it. not that it would be difficult... I'm guessing there's a gzip module for python that would make it pretty straightforward? I think I'm getting ahead of myself, though. I haven't even implemented the suffix tree yet! Emma On Mon, Dec 1, 2008 at 7:20 PM, Tambet wrote: > 2008/12/2 Emma Strubell > >> True, true. Like I said, I don't really use overlays, so excuse my >> igonrance. >> > > Do you know an order of doing things: > > Rules of Optimization: > > - Rule 1: Don't do it. > - Rule 2 (for experts only): Don't do it yet. > > What this actually means - functionality comes first. Readability comes > next. Optimization comes last. Unless you are creating a fancy 3D engine = for > kung fu game. > > If you are going to exclude overlays, you are removing functionality - an= d, > indeed, absolutely has-to-be-there functionality, because noone would > intuitively expect search function to search only one subset of packages, > however reasonable this subset would be. So, you can't, just can't, add t= his > package into portage base - you could write just another external search > package for portage. > > I looked this code a bit and: > Portage's "__init__.py" contains comment "# search functionality". After > this comment, there is a nice and simple search class. > It also contains method "def action_sync(...)", which contains > synchronization stuff. > > Now, search class will be initialized by setting up 3 databases - porttre= e, > bintree and vartree, whatever those are. Those will be in self._dbs array > and porttree will be in self._portdb. > > It contains some more methods: > _findname(...) will return result of self._portdb.findname(...) with same > parameters or None if it does not exist. > Other methods will do similar things - map one or another method. > execute will do the real search... > Now - "for package in self.portdb.cp_all()" is important here ...it > currently loops over whole portage tree. All kinds of matching will be do= ne > inside. > self.portdb obviously points to porttree.py (unless it points to fake > tree). > cp_all will take all porttrees and do simple file search inside. This > method should contain optional index search. > > =09=09self.porttrees =3D [self.porttree_root] + \ > =09=09=09[os.path.realpath(t) for t in self.mysettings["PORTDIR_OVERLAY"]= .split()] > > So, self.porttrees contains list of trees - first of them is root, others > are overlays. > > Now, what you have to do will not be harder just because of having overla= y > search, too. > > You have to create method def cp_index(self), which will return dictionar= y > containing package names as keys. For oroot... will be "self.porttrees[1:= ]", > not "self.porttrees" - this will only search overlays. d =3D {} will be > replaced with d =3D self.cp_index(). If index is not there, old version w= ill > be used (thus, you have to make internal porttrees variable, which contai= ns > all or all except first). > > Other methods used by search are xmatch and aux_get - first used several > times and last one used to get description. You have to cache results of > those specific queries and make them use your cache - as you can see, tho= se > parts of portage are already able to use overlays. Thus, you have to put > your code again in beginning of those functions - create index_xmatch and > index_aux_get methods, then make those methods use them and return their > results unless those are None (or something other in case none is already > legal result) - if they return None, old code will be run and do it's job= . > If index is not created, result is None. In index_** methods, just check = if > query is what you can answer and if it is, then answer it. > > Obviously, the simplest way to create your index is to delete index, then > use those same methods to query for all nessecary information - and faste= st > way would be to add updating index directly into sync, which you could do > later. > > Please, also, make those commands to turn index on and off (last one shou= ld > also delete it to save disk space). Default should be off until it's fast= , > small and reliable. Also notice that if index is kept on hard drive, it > might be faster if it's compressed (gz, for example) - decompressing take= s > less time and more processing power than reading it fully out. > > Have luck! > > -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Emma Strubell schrieb: >>> > 2) does anyone really need to search an overlay anyway? >>> >>> Of course. Take large (semi-)official overlays like sunrise. They can >>> easily be seen as a second portage tree. >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v2.0.9 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iEYEARECAAYFAkk0YpEACgkQ4UOg/zhYFuD3jQCdG/ChDmyOncpgUKeMuqDxD1Tt >>> 0mwAn2FXskdEAyFlmE8shUJy7WlhHr4S >>> =3D+lCO >>> -----END PGP SIGNATURE----- >>> >>> On Mon, Dec 1, 2008 at 5:17 PM, Ren=E9 'Necoro' Neumann wrote: >> >> > ------=_Part_35780_26843285.1228184584839 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline yes, yes, i know, you're right :]

and thanks a bunch for the ou= tline! about the compression, I agree that it would be a good idea, but I d= on't know how to implement it. not that it would be difficult... I'= m guessing there's a gzip module for python that would make it pretty s= traightforward? I think I'm getting ahead of myself, though. I haven= 9;t even implemented the suffix tree yet!

Emma

On Mon, Dec 1, 2008 at 7:20 PM, = Tambet <qtvali@gma= il.com> wrote:
2008/12/2 Emma Strubell <emma.strubell@gmail.com>
<= div class=3D"gmail_quote">
True, true. Like I said, I don't really use overlays, so excuse my igon= rance.

Do you k= now an order of doing things:

Rules of Optimization:
  • Rule 1: Don't do it.
  • Rule 2 (for experts only): Don't do it yet.
What this actually means - functionality comes first. Readability= comes next. Optimization comes last. Unless you are creating a fancy 3D en= gine for kung fu game.

If you are going to exclude overlays, you are= removing functionality - and, indeed, absolutely has-to-be-there functiona= lity, because noone would intuitively expect search function to search only= one subset of packages, however reasonable this subset would be. So, you c= an't, just can't, add this package into portage base - you could wr= ite just another external search package for portage.

I looked this code a bit and:
Portage's "__init__.py" = contains comment "# search functionality". After this comment, th= ere is a nice and simple search class.
It also contains method "def= action_sync(...)", which contains synchronization stuff.

Now, search class will be initialized by setting up 3 databases - portt= ree, bintree and vartree, whatever those are. Those will be in self._dbs ar= ray and porttree will be in self._portdb.

It contains some more meth= ods:
_findname(...) will return result of self._portdb.findname(...) with same p= arameters or None if it does not exist.
Other methods will do similar th= ings - map one or another method.
execute will do the real search...
Now - "for package in self.portdb.cp_all()" is important here ...= it currently loops over whole portage tree. All kinds of matching will be d= one inside.
self.portdb obviously points to porttree.py (unless it point= s to fake tree).
cp_all will take all porttrees and do simple file search inside. This metho= d should contain optional index search.
=09=09self.porttrees =3D [s=
elf.porttree_root] + \
=09=09=09[os.path.realpath(t) for t in self.myset= tings["PORTDIR_OVERLAY"].split()]
So, self.porttrees contains list of trees - first of them is root, ot= hers are overlays.

Now, what you have to do will not be harder just = because of having overlay search, too.

You have to create method def= cp_index(self), which will return dictionary containing package names as k= eys. For oroot... will be "self.porttrees[1:]", not "self.po= rttrees" - this will only search overlays. d =3D {} will be replaced w= ith d =3D self.cp_index(). If index is not there, old version will be used = (thus, you have to make internal porttrees variable, which contains all or = all except first).

Other methods used by search are xmatch and aux_get - first used severa= l times and last one used to get description. You have to cache results of = those specific queries and make them use your cache - as you can see, those= parts of portage are already able to use overlays. Thus, you have to put y= our code again in beginning of those functions - create index_xmatch and in= dex_aux_get methods, then make those methods use them and return their resu= lts unless those are None (or something other in case none is already legal= result) - if they return None, old code will be run and do it's job. I= f index is not created, result is None. In index_** methods, just check if = query is what you can answer and if it is, then answer it.

Obviously, the simplest way to create your index is to delete index, th= en use those same methods to query for all nessecary information - and fast= est way would be to add updating index directly into sync, which you could = do later.

Please, also, make those commands to turn index on and off (last one sh= ould also delete it to save disk space). Default should be off until it'= ;s fast, small and reliable. Also notice that if index is kept on hard driv= e, it might be faster if it's compressed (gz, for example) - decompress= ing takes less time and more processing power than reading it fully out.
Have luck!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Emma Strubell schrieb:
> 2) does anyone really need to search an overlay anyway?

Of course. Take large (semi-)official overlays like sunrise. They can=
easily be seen as a second portage tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk0YpEACgkQ4UOg/zhYFuD3jQCdG/ChDmyOncpgUKeMuqDxD1Tt
0mwAn2FXskdEAyFlmE8shUJy7WlhHr4S
=3D+lCO
-----END PGP SIGNATURE-----

On Mon, Dec 1, 2008 at 5:17 PM, Ren=E9 'Necoro' = Neumann <lists@necoro.eu> wrote:



------=_Part_35780_26843285.1228184584839--