From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1L4PhI-00030b-TQ for garchives@archives.gentoo.org; Mon, 24 Nov 2008 00:53:05 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id AA648E039F; Mon, 24 Nov 2008 00:53:05 +0000 (UTC) Received: from rv-out-0708.google.com (rv-out-0708.google.com [209.85.198.241]) by pigeon.gentoo.org (Postfix) with ESMTP id 53828E039F for ; Mon, 24 Nov 2008 00:53:05 +0000 (UTC) Received: by rv-out-0708.google.com with SMTP id b17so2092833rvf.46 for ; Sun, 23 Nov 2008 16:53:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=hKNZrYhb8fKkNFhLcePteWcM55zef04jAvHz4SYqrcE=; b=pvV9FN0wpywd3GmzuDITQ3E6S3/r5j4kCKJRhQ49ZfxdAosATJ1AUx2guqShPDy1Ni t7AmO2aFI6RniayA8BLodeRuZe0A6zuC3W/407mK4WupvgMH9GOA1Zj13M6/JxfTpuDl i09FQlglcbWtwJYunjPDNjKAjGfZrsRCvGZqY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=YcRq/SJ9+pabzghVFPSa/RTNaW7lwXeSTvQcehnIXsZf5htTkHbBi0SIg0l5cSMyP5 AcTXfFwnFfALnIl//xbAj20FybMnEvWg4R2f5djljxPojt+WmZZL/Iw7BhpokCnrssOp xXpeS14FHLqXoy+Fy72606EkZT/0bk4iDZqIM= Received: by 10.142.83.4 with SMTP id g4mr1384920wfb.156.1227487982625; Sun, 23 Nov 2008 16:53:02 -0800 (PST) Received: by 10.142.84.1 with HTTP; Sun, 23 Nov 2008 16:53:02 -0800 (PST) Message-ID: Date: Mon, 24 Nov 2008 02:53:02 +0200 From: tvali To: gentoo-portage-dev@lists.gentoo.org Subject: Re: [gentoo-portage-dev] search functionality in emerge In-Reply-To: <4929D240.1070306@necoro.eu> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_164285_471688.1227487982621" References: <5a8c638a0811230417r5bcf912fka14a18edc9c711b6@mail.gmail.com> <1227450820.27501.0.camel@localhost> <5a8c638a0811230643g63ebba1bkf6c7c4b7d6cc497a@mail.gmail.com> <5a8c638a0811231049g56506b9flc0986705a24094f0@mail.gmail.com> <4929C911.6090006@gentoo.org> <4929D240.1070306@necoro.eu> X-Archives-Salt: 4c24b9fd-9f30-49e8-b383-cd02d71b7019 X-Archives-Hash: 59afbe439805a460aeb389e8326788f7 ------=_Part_164285_471688.1227487982621 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline There is daemon, which notices about filesystem changes - http://pyinotify.sourceforge.net/ would be a good choice. In case many different applications use portage tree directly without using any portage API (which is a bad choice, I think, and should be deprecated), then there is a kind of "hack" - using http://www.freenet.org.nz/python/lufs-python/ to create a new filesystem (damn now I would like to have some time to join this game). I hope it's possible to build it everywhere where gentoo should work, but it'n no problem if it's not - you can implement it in such way that it's not needed= . I totally agree, that filesystem is a bottleneck, but this suffix trie woul= d check for directories first, I guess. Now, having this custom filesystem, which actually serves portage tree like some odd API, you can have backward= s compability and still create your own thing. Having such classes (numbers show implementation order; this is not specified here if proxies are abstract classes, base classes or smth. other= , just it shows some relations between some imaginary objects): - *1. PortageTreeApi* - Proxy for different portage trees on FS or SQL o= r other. - *2. PortageTreeCachedApi *- same, as previous, but contains boosted memory cache. It should be able to save it's state, which is simply writ= ing it's inner variables into file. - *3. PortageTreeDaemon *- has interface compatible with PortageTreeAPI, this daemon serves portage tree to PortageTreeFS and portage tree itself= . In reality, it should be base class of *PortageTreeApi* and * PortageTreeCachedApi* so that they could be directly used as daemons. When cached API is used as daemon, it should be able to check filesystem changes - thus, implementations should contain change trigger callbacks. - *4. PortageTreeFS *- filesystem, which can be used to map any of those to filesystem. Connectable with PortageTreeApi or PortageTreeDaemon. Thi= s creates filesystems, which can be used for backwards-compability. This cannot be used on architectures, which dont implement lufs-python or ana= log. - *6. PortageTreeServer *- server, which serves data from PortageTreeDaemon, PortageTreeCachedApi or PortageTreeApi to some other computer. - Implementations can be proxied through *PortageTreeApi*, * PortageTreeCachedApi* or *PortageTreeDaemon*. - *5. PortageTreeImplementationAsSqlDb* - *1. PortageTreeImplementationAsFilesystem* - *3. PortageTreeImplementationAsDaemon* - client, actually. - *6. PortageTreeImplementationAsServer* - client, too. So, *1* - creating PortageTreeApi and PortageTreeImplementationAsFilesystem is pure refactoring task, at first. Then, adding more advanced functions to PortageTreeApi is basically refactoring, too. PortageTreeApi should not become too complex or contain any advanced tasks, which are not purely db-specific, so some common baseclass could implement more high-level things. Then, *2* - this is finishing your schoolwork, but not yet in most powerful way as we are having only index then, and first search is still slow. At beginning this cache is unable to provide data about changes in portage tre= e (which could be implemented by some versioning after this new api is only place to update it), so it should have index update command and be only use= d in search. Then, *3* - having portage tree daemon means that things can really be cached now and this cache can be kept in memory; also it means updates on filesystem changes. Then, *4* - having PortageTreeFS means that now you can easily implement portage tree on faster medium without losing backwards-compability. Now, *5* - implementation as SQL DB is logical as SQL is standardized and common language for creating fast databases. Eventually, *6* - this has really nothing to do with boosting search, but i= n fast network it could still boost emerge by removing need for emerge --sync for local networks. I think that then it would be considered to have synchronization also in those classes - CachedApi almost needs it to be faster with server-client connections. After that, ImplementationAsSync and ImplementationAsWebRsSync could be added and sync server built onto this daemon. As I really doubt that emerge --sync is currently also ultraslow - I see no meaning in waitin= g a long time to get few new items as currently seems to happen -, it would boost another life-critical part of portage. So, hope that helps a bit - have luck! 2008/11/23 Ren=E9 'Necoro' Neumann > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Mike Auty schrieb: > > Finally there are overlays, and since these can change outside of a= n > > "emerge --sync" (as indeed can the main tree), you'll have to reindex > > these before each search request, or give the user stale data until the= y > > manually reindex. > > Determining whether there has been a change to the ebuild system is a > major point in the whole thing. What does a great index serves you, if > it does not notice the changes the user made in his own local overlay? > :) Manually re-indexing is not a good choice I think... > > If somebody comes up here with a good (and fast) solution, this would be > a nice thing ;) (need it myself). > > Regards, > Ren=E9 > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkkp0kAACgkQ4UOg/zhYFuAhTACfYDxNeQQG6dysgU5TrNEZGOiH > 3CoAn2wV6g8/8uj+T99cxJGdQBxTtZjI > =3D2I2j > -----END PGP SIGNATURE----- > > --=20 tvali Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad. Muide, =FCle 120 oled v=E4ga tark, =FCle 140 oled geenius, mingi 170 oled j= u mingi t=E4ica pea nagu pr=FCgikast... ------=_Part_164285_471688.1227487982621 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline There is daemon, which notices about filesystem changes - http://pyinotify.sourceforge.net/ would be= a good choice.

In case many different applications use portage tree= directly without using any portage API (which is a bad choice, I think, an= d should be deprecated), then there is a kind of "hack" - using <= a href=3D"http://www.freenet.org.nz/python/lufs-python/">http://www.freenet= .org.nz/python/lufs-python/ to create a new filesystem (damn now I woul= d like to have some time to join this game). I hope it's possible to bu= ild it everywhere where gentoo should work, but it'n no problem if it&#= 39;s not - you can implement it in such way that it's not needed. I tot= ally agree, that filesystem is a bottleneck, but this suffix trie would che= ck for directories first, I guess. Now, having this custom filesystem, whic= h actually serves portage tree like some odd API, you can have backwards co= mpability and still create your own thing.

Having such classes (numbers show implementation order; this is not spe= cified here if proxies are abstract classes, base classes or smth. other, j= ust it shows some relations between some imaginary objects):
  • 1. PortageTreeApi - Proxy for different portage trees on FS or SQL o= r other.
  • 2. PortageTreeCachedApi - same, as previous, but co= ntains boosted memory cache. It should be able to save it's state, whic= h is simply writing it's inner variables into file.
  • 3. PortageTreeDaemon - has interface compatible with Portag= eTreeAPI, this daemon serves portage tree to PortageTreeFS and portage tree= itself. In reality, it should be base class of PortageTreeApi and <= b>PortageTreeCachedApi so that they could be directly used as daemons. = When cached API is used as daemon, it should be able to check filesystem ch= anges - thus, implementations should contain change trigger callbacks.
  • 4. PortageTreeFS - filesystem, which can be used to map any= of those to filesystem. Connectable with PortageTreeApi or PortageTreeDaem= on. This creates filesystems, which can be used for backwards-compability. = This cannot be used on architectures, which dont implement lufs-python or a= nalog.
  • 6. PortageTreeServer - server, which serves data from PortageTre= eDaemon, PortageTreeCachedApi or PortageTreeApi to some other computer.
  • Implementations can be proxied through PortageTreeApi, Porta= geTreeCachedApi or PortageTreeDaemon.
    • 5. PortageTreeImplementationAsSqlDb
    • 1. Portag= eTreeImplementationAsFilesystem
    • 3. PortageTreeImplementation= AsDaemon - client, actually.
    • 6. PortageTreeImplementatio= nAsServer - client, too.
So, 1 - creating PortageTreeApi and PortageTreeImplementat= ionAsFilesystem is pure refactoring task, at first. Then, adding more advan= ced functions to PortageTreeApi is basically refactoring, too. PortageTreeA= pi should not become too complex or contain any advanced tasks, which are n= ot purely db-specific, so some common baseclass could implement more high-l= evel things.
Then, 2 - this is finishing your schoolwork, but not yet in most pow= erful way as we are having only index then, and first search is still slow.= At beginning this cache is unable to provide data about changes in portage= tree (which could be implemented by some versioning after this new api is = only place to update it), so it should have index update command and be onl= y used in search.
Then, 3 - having portage tree daemon means that things can really be= cached now and this cache can be kept in memory; also it means updates on = filesystem changes.
Then, 4 - having PortageTreeFS means that now= you can easily implement portage tree on faster medium without losing back= wards-compability.
Now, 5 - implementation as SQL DB is logical as SQL is standardized = and common language for creating fast databases.
Eventually, 6 - = this has really nothing to do with boosting search, but in fast network it = could still boost emerge by removing need for emerge --sync for local netwo= rks.

I think that then it would be considered to have synchronization also i= n those classes - CachedApi almost needs it to be faster with server-client= connections. After that, ImplementationAsSync and ImplementationAsWebRsSyn= c could be added and sync server built onto this daemon. As I really doubt = that emerge --sync is currently also ultraslow - I see no meaning in waitin= g a long time to get few new items as currently seems to happen -, it would= boost another life-critical part of portage.

So, hope that helps a bit - have luck!

2008/11/23 Ren=E9 'Ne= coro' Neumann <lists@necoro.eu>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Auty schrieb:
>     Finally there are overlays, and si= nce these can change outside of an
> "emerge --sync" (as indeed can the main tree), you'll ha= ve to reindex
> these before each search request, or give the user stale data until th= ey
> manually reindex.

Determining whether there has been a change to the ebuild system is a=
major point in the whole thing. What does a great index serves you, if
it does not notice the changes the user made in his own local overlay?
:) Manually re-indexing is not a good choice I think...

If somebody comes up here with a good (and fast) solution, this would be a nice thing ;) (need it myself).

Regards,
Ren=E9
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkp0kAACgkQ4UOg/zhYFuAhTACfYDxNeQQG6dysgU5TrNEZGOiH
3CoAn2wV6g8/8uj+T99cxJGdQBxTtZjI
=3D2I2j
-----END PGP SIGNATURE-----




--
tvali

Kuskilt fo= orumist: http://www.cooltests.com = - kui inglise keelt oskad. Muide, =FCle 120 oled v=E4ga tark, =FCle 140 ole= d geenius, mingi 170 oled ju mingi t=E4ica pea nagu pr=FCgikast...
------=_Part_164285_471688.1227487982621--