yes, yes, i know, you're right :] and thanks a bunch for the outline! about the compression, I agree that it would be a good idea, but I don't know how to implement it. not that it would be difficult... I'm guessing there's a gzip module for python that would make it pretty straightforward? I think I'm getting ahead of myself, though. I haven't even implemented the suffix tree yet! Emma On Mon, Dec 1, 2008 at 7:20 PM, Tambet wrote: > 2008/12/2 Emma Strubell > >> True, true. Like I said, I don't really use overlays, so excuse my >> igonrance. >> > > Do you know an order of doing things: > > Rules of Optimization: > > - Rule 1: Don't do it. > - Rule 2 (for experts only): Don't do it yet. > > What this actually means - functionality comes first. Readability comes > next. Optimization comes last. Unless you are creating a fancy 3D engine for > kung fu game. > > If you are going to exclude overlays, you are removing functionality - and, > indeed, absolutely has-to-be-there functionality, because noone would > intuitively expect search function to search only one subset of packages, > however reasonable this subset would be. So, you can't, just can't, add this > package into portage base - you could write just another external search > package for portage. > > I looked this code a bit and: > Portage's "__init__.py" contains comment "# search functionality". After > this comment, there is a nice and simple search class. > It also contains method "def action_sync(...)", which contains > synchronization stuff. > > Now, search class will be initialized by setting up 3 databases - porttree, > bintree and vartree, whatever those are. Those will be in self._dbs array > and porttree will be in self._portdb. > > It contains some more methods: > _findname(...) will return result of self._portdb.findname(...) with same > parameters or None if it does not exist. > Other methods will do similar things - map one or another method. > execute will do the real search... > Now - "for package in self.portdb.cp_all()" is important here ...it > currently loops over whole portage tree. All kinds of matching will be done > inside. > self.portdb obviously points to porttree.py (unless it points to fake > tree). > cp_all will take all porttrees and do simple file search inside. This > method should contain optional index search. > > self.porttrees = [self.porttree_root] + \ > [os.path.realpath(t) for t in self.mysettings["PORTDIR_OVERLAY"].split()] > > So, self.porttrees contains list of trees - first of them is root, others > are overlays. > > Now, what you have to do will not be harder just because of having overlay > search, too. > > You have to create method def cp_index(self), which will return dictionary > containing package names as keys. For oroot... will be "self.porttrees[1:]", > not "self.porttrees" - this will only search overlays. d = {} will be > replaced with d = self.cp_index(). If index is not there, old version will > be used (thus, you have to make internal porttrees variable, which contains > all or all except first). > > Other methods used by search are xmatch and aux_get - first used several > times and last one used to get description. You have to cache results of > those specific queries and make them use your cache - as you can see, those > parts of portage are already able to use overlays. Thus, you have to put > your code again in beginning of those functions - create index_xmatch and > index_aux_get methods, then make those methods use them and return their > results unless those are None (or something other in case none is already > legal result) - if they return None, old code will be run and do it's job. > If index is not created, result is None. In index_** methods, just check if > query is what you can answer and if it is, then answer it. > > Obviously, the simplest way to create your index is to delete index, then > use those same methods to query for all nessecary information - and fastest > way would be to add updating index directly into sync, which you could do > later. > > Please, also, make those commands to turn index on and off (last one should > also delete it to save disk space). Default should be off until it's fast, > small and reliable. Also notice that if index is kept on hard drive, it > might be faster if it's compressed (gz, for example) - decompressing takes > less time and more processing power than reading it fully out. > > Have luck! > > -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Emma Strubell schrieb: >>> > 2) does anyone really need to search an overlay anyway? >>> >>> Of course. Take large (semi-)official overlays like sunrise. They can >>> easily be seen as a second portage tree. >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v2.0.9 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iEYEARECAAYFAkk0YpEACgkQ4UOg/zhYFuD3jQCdG/ChDmyOncpgUKeMuqDxD1Tt >>> 0mwAn2FXskdEAyFlmE8shUJy7WlhHr4S >>> =+lCO >>> -----END PGP SIGNATURE----- >>> >>> On Mon, Dec 1, 2008 at 5:17 PM, René 'Necoro' Neumann wrote: >> >> >