From: tvali <qtvali@gmail.com>
To: gentoo-portage-dev@lists.gentoo.org
Subject: Re: [gentoo-portage-dev] search functionality in emerge
Date: Sun, 23 Nov 2008 22:00:27 +0200 [thread overview]
Message-ID: <cea53e3c0811231200u5b91aa8dqee4435ccdb6b8663@mail.gmail.com> (raw)
In-Reply-To: <5a8c638a0811231049g56506b9flc0986705a24094f0@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6484 bytes --]
Yes if it would be a low-level implementation to portage, speeding up it's
native code for searching and using indexes, then it would make everything
faster, including emerge (because emerge does search first for package
relations). I have actually wanted to do it myself several years ago, so
reacting here to have my ideas discussed, too.
Douglas Anderson 16:46 reply is about locks and I think that it would need
to rethink portages locking methods - what, when and why it locks. This is
probably quite hard task by itself. Anyway, as portage actually lets user
make two emerges at the same time, locks might be OK, too.
I think that the best thing would be bottom-up refactoring - first to make a
list of lowest-level functions, which have to do with reading data from
portage tree or writing into it; then making indexer class, which will be
used by all of those low-level functions.
To have it OOP, it should be implemented in such way:
- Low-level portage tree handler does everything with portage tree, no
function in portage uses it directly.
- Tree handler has several needed and several optional methods - so that
implementing new handler would be easy, but creating things like native
regex search would be possible.
- One could implement a new tree handler with SQLite or other interface
instead of filesystem and do other tricks through this interface; for
example, boost it.
So, nice way to go would be:
1. Implementing portage tree handler and it's proxy, which uses current
portage tree in non-indexed way and simply gives methods for the same kind
of access, as currently implemented one.
2. Refactoring portage to rely only on portage tree handler and use
direct portage tree nowhere. To test if it is so, portage tree should be
moved to another directory, about which only this handler knows, and check
if portage works well. Indexing all places, where portage uses it's tree
handler (by std. comment, for example) and making clear, which methods would
contain all boostable code of it.
3. Implementing those methods in proxy, which could simulate fast regex
search and other stuff using simplest possible interface of portage tree
handler (smth. like four methods add, remove, get, list). Proxy should be
able to use handler's methods if they are implemented.
4. Refactoring portage to use advanced methods in proxy.
5. Now, having taken all code together into one place and looking this
nice and readable code, real optimizations could be discussed here, for
example. Ideally, i think, portage would have such tree handlers:
- Filesystem handler - fast searches over current portage tree
structure.
- SQL handler - rewrite of tree functions into SQL queries.
- Network-based handler - maybe sometimes it would nice to have
portage tree only in one machine of cluster, for example if I
want to have
100 really small computers with fast connection with mother-computer and
portage tree is too big to be wholly copied to all of them.
- Memory-buffered handler with daemon, which is actually proxy to some
other handler - daemon, which reads whole tree (from filesystem
or SQL) into
memory on boot or first use, creates really fast index (because
now it does
matter to have better indexing) and optionally deletes some [less needed]
parts of it's index from memory when it's becoming full and behaves as
really simple proxy if it stays full. This should be implemented after
critical parts of filesystem or SQL handler.
2008/11/23 Emma Strubell <emma.strubell@gmail.com>
> Wow, that's extremely helpful!! I happen to particularly enjoy tries, so
> the suffix trie sounds like a great idea. The trie class example is really
> helpful too, because this will be my first time programming in Python, and
> it's a bit easier to figure out what's going on syntax-wise in that simple
> trie class than in the middle of the portage source code!
>
> Seriously, thanks again :]
>
>
> On Sun, Nov 23, 2008 at 11:56 AM, Lucian Poston <lucianposton@gmail.com>wrote:
>
>> > Thanks for the replies! I know there are a couple programs out there
>> that
>> > basically already do what I'm looking to do... Unfortunately I wasn't
>> aware
>> > of these pre-existing utilities until after I submitted my project
>> proposal
>> > to my professor. So, I'm looking to implement a better search myself.
>> > Preferably by editing the existing portage code, not writing a separate
>> > program. So if anyone can offer any help regarding the actual
>> implementation
>> > of search in portage, I would greatly appreciate it!
>>
>> Most of the search implementation is in
>> /usr/lib/portage/pym/_emerge/__init__.py in class search. The class's
>> execute() method simply iterates over all packages (and descriptions
>> and package sets) and matches against the searchkey. You might need
>> to look into pym/portage/dbapi/porttree.py for portdbapi as well.
>>
>> If you intend to index and support fast regex lookup, then you need to
>> do some fancy indexing, which I'm not terribly familiar with. You
>> could follow in the steps of eix[1] or other indexed search utilities
>> and design some sort of index layout, which is easier than the
>> following thought. You might consider implementing a suffix trie or
>> similar that has sublinear regexp lookup and marshalling the structure
>> for the index. I couldn't find a python implementation for something
>> like this, but here is a general trie class[2] that you might start
>> with if you go that route. There is a perl module[3],
>> Tie::Hash::Regex, that does that, but properly implementing that in
>> python would be a chore. :)
>>
>> That project sounds interesting and fun. Good luck!
>>
>> Lucian Poston
>>
>> [1] https://projects.gentooexperimental.org/eix/wiki/IndexFileLayout
>> [2]
>> http://www.koders.com/python/fid7B6BC1651A9E8BBA547552FE3F039479A4DECC45.aspx
>> [3]
>> http://search.cpan.org/~davecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm<http://search.cpan.org/%7Edavecross/Tie-Hash-Regex-1.02/lib/Tie/Hash/Regex.pm>
>>
>>
>
--
tvali
Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad.
Muide, üle 120 oled väga tark, üle 140 oled geenius, mingi 170 oled ju mingi
täica pea nagu prügikast...
[-- Attachment #2: Type: text/html, Size: 7402 bytes --]
next prev parent reply other threads:[~2008-11-23 20:00 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-23 12:17 [gentoo-portage-dev] search functionality in emerge Emma Strubell
2008-11-23 14:01 ` tvali
2008-11-23 14:33 ` Pacho Ramos
2008-11-23 14:43 ` Emma Strubell
2008-11-23 16:56 ` Lucian Poston
2008-11-23 18:49 ` Emma Strubell
2008-11-23 20:00 ` tvali [this message]
2008-11-23 21:20 ` Mike Auty
2008-11-23 21:59 ` René 'Necoro' Neumann
2008-11-24 0:53 ` tvali
2008-11-24 9:34 ` René 'Necoro' Neumann
2008-11-24 9:48 ` Fabian Groffen
2008-11-24 14:30 ` tvali
2008-11-24 15:14 ` tvali
2008-11-24 15:15 ` René 'Necoro' Neumann
2008-11-24 15:18 ` tvali
2008-11-24 17:15 ` tvali
2008-11-30 23:42 ` Emma Strubell
2008-12-01 7:34 ` [gentoo-portage-dev] " Duncan
2008-12-01 10:40 ` Emma Strubell
2008-12-01 17:52 ` Zac Medico
2008-12-01 21:25 ` Emma Strubell
2008-12-01 21:52 ` Tambet
2008-12-01 22:08 ` Emma Strubell
2008-12-01 22:17 ` René 'Necoro' Neumann
2008-12-01 22:47 ` Emma Strubell
2008-12-02 0:20 ` Tambet
2008-12-02 2:23 ` Emma Strubell
2008-12-02 10:21 ` Alec Warner
2008-12-02 12:42 ` Tambet
2008-12-02 13:51 ` Tambet
2008-12-02 19:54 ` Alec Warner
2008-12-02 21:47 ` Tambet
2008-12-02 17:42 ` Tambet
2008-11-23 14:56 ` [gentoo-portage-dev] " Douglas Anderson
2008-11-24 3:12 ` Marius Mauch
2008-11-24 5:01 ` devsk
2008-11-24 6:25 ` Marius Mauch
2008-11-24 6:47 ` [gentoo-portage-dev] " Duncan
2009-02-12 19:16 ` [gentoo-portage-dev] " René 'Necoro' Neumann
[not found] ` <5a8c638a0902121258s7402d9d7l1ad2b9a8ecf9820d@mail.gmail.com>
2009-02-12 21:01 ` Fwd: " Emma Strubell
2009-02-12 21:05 ` Mike Auty
2009-02-12 21:14 ` Emma Strubell
2009-02-13 13:37 ` Marijn Schouten (hkBst)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cea53e3c0811231200u5b91aa8dqee4435ccdb6b8663@mail.gmail.com \
--to=qtvali@gmail.com \
--cc=gentoo-portage-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox