* [gentoo-dev] [RFC] Treewide metadata.xml @ 2005-05-27 10:38 Danny van Dyk 2005-05-27 10:41 ` Michael Hanselmann ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Danny van Dyk @ 2005-05-27 10:38 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi @ all, I'd like to have some feedback on an idea that stuck to my mind for some time already: Currently, we (should) have one metadata.xml file per package (8635 in the whole tree). This is quite handy if you want to look up information about a package you know by name. On the other hand, if you want to search for a package name via metadata, you have to traverse the whole tree. Quite unhandy if done w/o a cache and not yet implemented AFAIK. I would like to propose the following changes: Let's keep the metadata.xml in each package's directory in _CVS only_. Don't propagate them via rsync. Instead, use a script to compile all metadata.xml files into one central (XML) file. (This would probably need slight changes to the DTD). This file would then be placed into gentoo-portage/metadata/ and Portage,devs and users could easily parse it. TIA for any feedback on this! PS: Ciaran: Leave me alive please ;-) PPS: Yes, I'd volounteer to write that compilation script. Danny - -- Danny van Dyk <kugelfang@gentoo.org> Gentoo/AMD64 Project, Gentoo Scientific Project -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFClviVaVNL8NrtU6IRAs8zAJ0UIS5jnBT+V+w6P+403ebW0TE15gCfTIsJ 4GWNpxCkRgQrbPrpEHtzLBE= =Ul1J -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 10:38 [gentoo-dev] [RFC] Treewide metadata.xml Danny van Dyk @ 2005-05-27 10:41 ` Michael Hanselmann 2005-05-27 11:01 ` Simon Stelling ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Michael Hanselmann @ 2005-05-27 10:41 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 613 bytes --] Hello Danny > Let's keep the metadata.xml in each package's directory in _CVS only_. > Don't propagate them via rsync. Instead, use a script to compile all > metadata.xml files into one central (XML) file. (This would probably > need slight changes to the DTD). This file would then be placed into > gentoo-portage/metadata/ and Portage,devs and users could easily parse > it. Sounds good for me, it would also save some inodes and blocks on the hard drive. Implementing the compilation should be easy using XSLT. Greets, Michael -- Gentoo Linux Developer using m0n0wall | http://hansmi.ch/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 10:38 [gentoo-dev] [RFC] Treewide metadata.xml Danny van Dyk 2005-05-27 10:41 ` Michael Hanselmann @ 2005-05-27 11:01 ` Simon Stelling 2005-05-27 11:28 ` Brian Harring 2005-05-27 12:08 ` Paul de Vrieze 2005-05-27 11:05 ` Paul de Vrieze 2005-05-27 11:33 ` Aaron Walker 3 siblings, 2 replies; 14+ messages in thread From: Simon Stelling @ 2005-05-27 11:01 UTC (permalink / raw To: gentoo-dev Hi, Danny van Dyk wrote: > On the other hand, if you want to search for a package name via > metadata, you have to traverse the whole tree. Quite unhandy if done w/o > a cache and not yet implemented AFAIK. It is implemented, see app-portage/eix or app-portage/esearch, but these both depend on cache, which has to be updated regulary. > I would like to propose the following changes: > Let's keep the metadata.xml in each package's directory in _CVS only_. > Don't propagate them via rsync. Instead, use a script to compile all > metadata.xml files into one central (XML) file. (This would probably > need slight changes to the DTD). This file would then be placed into > gentoo-portage/metadata/ and Portage,devs and users could easily parse it. Sounds good, if your script validates the per-package metadata.xml before transform it to the global one. It'd really suck if a single missing '>' could screw the whole tree's metadata. This shouldn't be a problem, especially if you transform the information with XSLT. I definitively like the idea, it should speed up emerge -s enormously and save quite some inodes. A reduction of 8% of files sounds really good. Greetings, blubb -- blubb Gentoo/AMD64 Developer http://www.blubb.li/ -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:01 ` Simon Stelling @ 2005-05-27 11:28 ` Brian Harring 2005-05-27 11:47 ` Danny van Dyk 2005-05-27 15:51 ` Simon Stelling 2005-05-27 12:08 ` Paul de Vrieze 1 sibling, 2 replies; 14+ messages in thread From: Brian Harring @ 2005-05-27 11:28 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1344 bytes --] On Fri, May 27, 2005 at 01:01:56PM +0200, Simon Stelling wrote: > > I would like to propose the following changes: > > Let's keep the metadata.xml in each package's directory in _CVS only_. > > Don't propagate them via rsync. Instead, use a script to compile all > > metadata.xml files into one central (XML) file. (This would probably > > need slight changes to the DTD). This file would then be placed into > > gentoo-portage/metadata/ and Portage,devs and users could easily parse it. > > Sounds good, if your script validates the per-package metadata.xml > before transform it to the global one. It'd really suck if a single > missing '>' could screw the whole tree's metadata. This shouldn't be a > problem, especially if you transform the information with XSLT. > > I definitively like the idea, it should speed up emerge -s enormously Unlikely... stable portage knows of metadata.xml *explicitly* in two places, repoman's commit code, and digest checking, neither of which come into play for an emerge -s. You'll remove one entry from the listdir returns for a package directory, per package directory, bout it. What's the gain, aside from implication of collapsing it into a single file? Honestly my only use for metadata.xml is looking up who I get to poke about fixing broken ebuilds... ~harring [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:28 ` Brian Harring @ 2005-05-27 11:47 ` Danny van Dyk 2005-05-27 11:52 ` Ciaran McCreesh 2005-05-27 12:21 ` Brian Harring 2005-05-27 15:51 ` Simon Stelling 1 sibling, 2 replies; 14+ messages in thread From: Danny van Dyk @ 2005-05-27 11:47 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Brian > What's the gain, aside from implication of collapsing it into a > single file? Honestly my only use for metadata.xml is looking up who > I get to poke about fixing broken ebuilds... The gain is: ... that you portage people could use it for emerge -s instead of using a DESCRIPTION-cache. ... we don't need to find the metadata.xml file before parsing it. ... reducing the number of files in the (public) tree. Danny - -- Danny van Dyk <kugelfang@gentoo.org> Gentoo/AMD64 Project, Gentoo Scientific Project -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFClwjYaVNL8NrtU6IRAqK6AKCQ8VejfXaqU6y6swWXviE0wWfz+gCePlZc Ck/GoaeNxOlhE54dJ/slZQI= =Ooia -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:47 ` Danny van Dyk @ 2005-05-27 11:52 ` Ciaran McCreesh 2005-05-27 12:21 ` Brian Harring 1 sibling, 0 replies; 14+ messages in thread From: Ciaran McCreesh @ 2005-05-27 11:52 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 662 bytes --] On Fri, 27 May 2005 13:47:37 +0200 Danny van Dyk <kugelfang@gentoo.org> wrote: | > What's the gain, aside from implication of collapsing it into a | > single file? Honestly my only use for metadata.xml is looking up | > who I get to poke about fixing broken ebuilds... | The gain is: | ... that you portage people could use it for emerge -s instead of | using | a DESCRIPTION-cache. Eww! emerge having to parse XML? Bad! Bad! Bad! If you want a search tool, install one. -- Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron) Mail : ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:47 ` Danny van Dyk 2005-05-27 11:52 ` Ciaran McCreesh @ 2005-05-27 12:21 ` Brian Harring 1 sibling, 0 replies; 14+ messages in thread From: Brian Harring @ 2005-05-27 12:21 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 3685 bytes --] On Fri, May 27, 2005 at 01:47:37PM +0200, Danny van Dyk wrote: > Hi Brian > > What's the gain, aside from implication of collapsing it into a > > single file? Honestly my only use for metadata.xml is looking up who > > I get to poke about fixing broken ebuilds... > The gain is: > ... that you portage people could use it for emerge -s instead of using > a DESCRIPTION-cache. 'you portage people' ? :) > ... we don't need to find the metadata.xml file before parsing it. Portage's emerge -s doesn't use metadata.xml. Guessing you meant emerge -S (--searchDesc), but that too, doesn't use metadata.xml. So, a few implications in what you mean/are after then. 1) This global description cache would have to be duplicated, and recreated on cvs->rsync runs. Why? Unless you're padding extra bytes in the description cache, updates _will_ kill performance. Personally, I'm not much for it because there is a minimal window for cvs->rsync infra-side to get it's thing done, and this will jack up the runtime. 2) You're still doing entry by entry. Y'all are assuming having this data shoved into one file is going to make it quicker for reads (in reality, you're still reading 19000+ records, just your solution is out of a single file). This may be quicker due to syscall overhead, but I posit the drawbacks aren't worth it. 3) This complicates the hell out of cache updates, and still suffers the same issues eix/esearch suffer- namely that it's not sensitive to cache updates. If we make it sensitive to cache updates, you're looking at regen runtimes going through the roof (see #1 comment on updates). This is regardless of if it's a duplication approach or description is stored in it's own db outside of the normal flat_list cache files. 4) This proposal breaks the cache up into seperate chunks. That's the cache backends decision frankly, and _cannot_ be imposed onto the cache backend implementation from above. I moved eclass data into the cache backend in cvs head explicitly for the purpose of allowing the cache to be effectively standalone, and able to be bound to a remote tree. You force this change from above, it breaks the cache design (pure and simple), and ultimately isn't what you're after (see below). Frankly, any comments that this is going to make things faster are ignoring the existing code. Why is emerge -S so damned slow? Better question, why is it that a mysql cache backend _still_ is so damned slow on emerge -S? That should be hella fast compared to opening 19000 files, right? Because the current stable cache design allows *only* for individual record lookups. In other words, even with an rdbms implementation, it goes record by record. What is needed is a way to hand off to the cache "hey you, give me all cpv's that have metadata that matches this criteria". Move the lookup/searching into the cache backend, which is already built into the cache refactoring I wrote for cvs head. If you want to collapse all of the description data into some faster lookup, fine, do so _strictly_ within that cache backend, and modify that class so that it has an appropriate get_matches lookup that's able to do a specific metadata lookup faster. People are free to disgaree mind you, but this talk of speed gains frankly seems to be missing the boat on how our cache actually works, let alone the issues with it. Collapsing all metadata down into a single file, yeah that would be nifty from the standpoint of less files/wasted space on fs's. Centralized DESCRIPTION cache implemented in xml? Eh... ~brian [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:28 ` Brian Harring 2005-05-27 11:47 ` Danny van Dyk @ 2005-05-27 15:51 ` Simon Stelling 2005-05-28 2:08 ` James Northrup 1 sibling, 1 reply; 14+ messages in thread From: Simon Stelling @ 2005-05-27 15:51 UTC (permalink / raw To: gentoo-dev Brian Harring wrote: >>I definitively like the idea, it should speed up emerge -s enormously > > Unlikely... stable portage knows of metadata.xml *explicitly* in two > places, repoman's commit code, and digest checking, neither of which > come into play for an emerge -s. You'll remove one entry from the > listdir returns for a package directory, per package directory, bout > it. You're right, i mixed up description with longdescribtion. -- blubb Gentoo/AMD64 Developer http://www.blubb.li/ -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 15:51 ` Simon Stelling @ 2005-05-28 2:08 ` James Northrup 0 siblings, 0 replies; 14+ messages in thread From: James Northrup @ 2005-05-28 2:08 UTC (permalink / raw To: gentoo-dev I like the xml generation pass but i have a few questions: does the xml-publish transaction lend a greater window of opportunity for mirror rsync de-sync? does it make sense for digest info be sucked up into the xml pass, to be superceded by the legacy files where present ala ebuild x-y/z.ebuild digest should the xml be digested? On May 27, 2005, at 8:51 AM, Simon Stelling wrote: > Brian Harring wrote: > > >>> I definitively like the idea, it should speed up emerge -s >>> enormously >>> >> >> Unlikely... stable portage knows of metadata.xml *explicitly* in two >> places, repoman's commit code, and digest checking, neither of which -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:01 ` Simon Stelling 2005-05-27 11:28 ` Brian Harring @ 2005-05-27 12:08 ` Paul de Vrieze 1 sibling, 0 replies; 14+ messages in thread From: Paul de Vrieze @ 2005-05-27 12:08 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 709 bytes --] On Friday 27 May 2005 13:01, Simon Stelling wrote: > > Sounds good, if your script validates the per-package metadata.xml > before transform it to the global one. It'd really suck if a single > missing '>' could screw the whole tree's metadata. This shouldn't be a > problem, especially if you transform the information with XSLT. > > I definitively like the idea, it should speed up emerge -s enormously > and save quite some inodes. A reduction of 8% of files sounds really > good. My script does that (it was written before repoman and the server did xml parsing). Check it out ;-) Paul -- Paul de Vrieze Gentoo Developer Mail: pauldv@gentoo.org Homepage: http://www.devrieze.net [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 10:38 [gentoo-dev] [RFC] Treewide metadata.xml Danny van Dyk 2005-05-27 10:41 ` Michael Hanselmann 2005-05-27 11:01 ` Simon Stelling @ 2005-05-27 11:05 ` Paul de Vrieze 2005-05-27 11:17 ` Danny van Dyk 2005-05-27 11:33 ` Aaron Walker 3 siblings, 1 reply; 14+ messages in thread From: Paul de Vrieze @ 2005-05-27 11:05 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1781 bytes --] On Friday 27 May 2005 12:38, Danny van Dyk wrote: > Hi @ all, > > I'd like to have some feedback on an idea that stuck to my mind for > some time already: > > Currently, we (should) have one metadata.xml file per package (8635 in > the whole tree). This is quite handy if you want to look up information > about a package you know by name. > > On the other hand, if you want to search for a package name via > metadata, you have to traverse the whole tree. Quite unhandy if done > w/o a cache and not yet implemented AFAIK. > > I would like to propose the following changes: > Let's keep the metadata.xml in each package's directory in _CVS only_. > Don't propagate them via rsync. Instead, use a script to compile all > metadata.xml files into one central (XML) file. (This would probably > need slight changes to the DTD). This file would then be placed into > gentoo-portage/metadata/ and Portage,devs and users could easily parse > it. > > TIA for any feedback on this! Guys, I've actually implemented a script that creates such a file (except for categories) and provided support in the dtd for it. I'd need to search where I have the script (it was fairly simple, but did a lot of parsing to ensure correctness). It was meant to kindoff be an alternative to packages.gentoo.org with all nice links to herds, maintainers etc. (Even reverse search). The biggest problem was that the stuff was kindof slow. Especially searching by xslt. The script is kindoff slow because of all the correction stuff, but it is workable. You can find it (and the accompanying xslt script) in: http://dev.gentoo.org/~pauldv/pkgList.tar.bz2 Paul -- Paul de Vrieze Gentoo Developer Mail: pauldv@gentoo.org Homepage: http://www.devrieze.net [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:05 ` Paul de Vrieze @ 2005-05-27 11:17 ` Danny van Dyk 2005-05-27 12:06 ` Paul de Vrieze 0 siblings, 1 reply; 14+ messages in thread From: Danny van Dyk @ 2005-05-27 11:17 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Paul, Paul de Vrieze schrieb: > The script is kindoff slow because of all the correction stuff, but it is > workable. You can find it (and the accompanying xslt script) in: > http://dev.gentoo.org/~pauldv/pkgList.tar.bz2 H, can you chmod a+r it please ? ;-) Danny - -- Danny van Dyk <kugelfang@gentoo.org> Gentoo/AMD64 Project, Gentoo Scientific Project -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFClwGvaVNL8NrtU6IRAsfNAJ9GecWUtYgqEXn1k9KXZpny65f94wCdEDTO ywUZI7isC7/Az+lxADJVreA= =v0/i -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 11:17 ` Danny van Dyk @ 2005-05-27 12:06 ` Paul de Vrieze 0 siblings, 0 replies; 14+ messages in thread From: Paul de Vrieze @ 2005-05-27 12:06 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 455 bytes --] On Friday 27 May 2005 13:17, Danny van Dyk wrote: > Hi Paul, > > Paul de Vrieze schrieb: > > The script is kindoff slow because of all the correction stuff, but > > it is workable. You can find it (and the accompanying xslt script) > > in: http://dev.gentoo.org/~pauldv/pkgList.tar.bz2 > > H, can you chmod a+r it please ? ;-) > Done! Paul -- Paul de Vrieze Gentoo Developer Mail: pauldv@gentoo.org Homepage: http://www.devrieze.net [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-dev] [RFC] Treewide metadata.xml 2005-05-27 10:38 [gentoo-dev] [RFC] Treewide metadata.xml Danny van Dyk ` (2 preceding siblings ...) 2005-05-27 11:05 ` Paul de Vrieze @ 2005-05-27 11:33 ` Aaron Walker 3 siblings, 0 replies; 14+ messages in thread From: Aaron Walker @ 2005-05-27 11:33 UTC (permalink / raw To: gentoo-dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Danny van Dyk wrote: > On the other hand, if you want to search for a package name via > metadata, you have to traverse the whole tree. Quite unhandy if done w/o > a cache and not yet implemented AFAIK. herdstat can do this. ciaranm poked me the other day to see if I could implement a cache for it (which I mostly finished last night), so hopefully that'll be in a release soon. > > I would like to propose the following changes: > Let's keep the metadata.xml in each package's directory in _CVS only_. > Don't propagate them via rsync. Instead, use a script to compile all > metadata.xml files into one central (XML) file. (This would probably > need slight changes to the DTD). This file would then be placed into > gentoo-portage/metadata/ and Portage,devs and users could easily parse it. > > TIA for any feedback on this! > > PS: Ciaran: Leave me alive please ;-) > PPS: Yes, I'd volounteer to write that compilation script. I think it's a good idea. Regarding the compilation script, there's already-written code in herdstat that could handle this, but most likely that's not an option (I can hear the "C++ sucks" flames already ;p). Cheers - -- Politics makes strange bedfellows, and journalism makes strange politics. -- Amy Gorin Aaron Walker <ka0ttic@gentoo.org> [ BSD | cron | forensics | shell-tools | commonbox | netmon | vim | web-apps ] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFClwWQC3poscuANHARAsYOAJ4t+0wvsRSjkUpnhAfMo5Dr+HSLlwCg3WJW a74kgH3r9AnXaWwDn9YFGXY= =rXet -----END PGP SIGNATURE----- -- gentoo-dev@gentoo.org mailing list ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-05-28 2:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-27 10:38 [gentoo-dev] [RFC] Treewide metadata.xml Danny van Dyk 2005-05-27 10:41 ` Michael Hanselmann 2005-05-27 11:01 ` Simon Stelling 2005-05-27 11:28 ` Brian Harring 2005-05-27 11:47 ` Danny van Dyk 2005-05-27 11:52 ` Ciaran McCreesh 2005-05-27 12:21 ` Brian Harring 2005-05-27 15:51 ` Simon Stelling 2005-05-28 2:08 ` James Northrup 2005-05-27 12:08 ` Paul de Vrieze 2005-05-27 11:05 ` Paul de Vrieze 2005-05-27 11:17 ` Danny van Dyk 2005-05-27 12:06 ` Paul de Vrieze 2005-05-27 11:33 ` Aaron Walker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox