* [gentoo-dev] Inviting you to project "PackageMap" @ 2009-06-12 7:42 Sebastian Pipping [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com> 2009-06-12 18:27 ` [gentoo-dev] " Petteri Räty 0 siblings, 2 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-12 7:42 UTC (permalink / raw To: PackageKit users and developers list; +Cc: gentoo-dev Hello! Quick (re-)introduction: My task for Gentoo/Google Summer of Code 2009 is to give Gentoo a Debian popcon equivalent, a tool to collect statistics on "what package is installed how often". To achieve this goal I'm extending Smolt (a tool currently doing similar things with hardware information) by fine-tunable software stats gathering. The plan we have for Smolt is to make it cross-distro, not just fit Gentoo or Fedora. One point where the consequences and benefits of such an approach can be seen clearly is with counting packages from different distros into the same buckets. What do I mean by that? Debian's Git counts for Gentoo's Git counts for Fedora's, you know the list. With packages counted from accross distros we can suddenly answer questions that we currently cannot answer, among them - What globally popular packages are missing in distro X? Let's say we don't have a package for product P. Do other distros have one? They do, maybe we need one, too? They don't, maybe P is not that important then? - How many Linux users are approximately using program X in total? Not just on Ubuntu or Arch - all across Linux, BSD, Solaris! - Does distro X have 10 times the packages of Y or is it just different splitting? To count into the same bucket we use global identifiers for the "products" that fall out of a package. Gentoo package "dev-util/git" can produce product "cpe://a:git:git", Debian's "git-core" can, too. That string before is a CPE URI [1], a concept close to package naming in Java. This "intermediate language" allows us to relate package names from distro X with those of distro Y and answer various questions from that data. To do such mapping we need code (or a "service") that does the mapping for us and base of collected data that the service can operate on. Both of these is project "PackageMap" I have started populating the database with packages (currently 312 in number) made from information extracted from the Gentoo tree and the National Vulnerability Database. Latter holds many CPEs. Let me state clearly that packagemap is not about Gentoo in particular. Sure, the initial data has lots of Gentoo in it but the whole point of the project is to get information and people from different distros together. To see what these 312 packages maps look like at the moment you best do a few clicks through the database folder yourself: http://git.goodpoint.de/?p=packagemap.git;a=tree;f=database Also, there are Relax NG schema and DTD for validation, more documentation than I usually write and a few scripts: http://git.goodpoint.de/?p=packagemap.git;a=tree By now I hope you have gained interest in what this can become. Your active participation is highly appreciated. A few minutes from everyone can make a huge difference here. If you want write access to the repo - mail me: sebastian@pipping.org. Please have a look at the Git repository linked above and ask questions. I propose to keep the related Gentoo stuff on gentoo-dev and everything else on the packagekit list. I hope that works out well. Thanks for reading up to this point. Sebastian PS: I'm aware "hartwork.org" might not make a good longterm location for DTDs, XML namespaces and such for a cross-distro project. Any ideas where to put them best? [1] http://cpe.mitre.org/ ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com>]
* [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com> @ 2009-06-12 9:54 ` Sebastian Pipping 2009-06-17 12:08 ` Tiziano Müller 2009-06-12 13:00 ` [gentoo-dev] " Steven J Long 1 sibling, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-06-12 9:54 UTC (permalink / raw To: PackageKit users and developers list; +Cc: gentoo-dev Richard Hughes wrote: > I'm slightly worried about it being called a service. Is it going to > be a new process that just does the mapping or is this a bad choice of > words? If it is a new process then I'm not sure such a thing will > catch on. I'm not yet sure about how a mapper will keep it's data fresh as the use of it is dependent on that. Ignore my "service" for now. > I'm also worried that a package manager has to read in and parse > thousands of small files. While you mention "package manager" - with the current concept the data will not be precise enough for use with a package manager. > Why did you decide to write each project as > a single xml file? - The other 99% of the database stay valid XML if a single file is invalid - To better fit the version controlled environment > Parsing and reading 10,000 files (in multiple directories) might take > a few seconds, and would have to be copied into memory (few Mb) to > query quickly. Correct. > Which has to be invalidated if any of the files or > directories change. Why didn't you just put them in a sqlite database > that can be queried in a few ms, without dragging in an xml parser? > Also 10,000 files take up way more space (and takes longer to install > and update) than a single database file. I like your idea about sqlite. Maybe keeping the data to edit XML and query and sqlite export snapshot is something to try. > XML might be > useful for storing the data, but not for querying. Good point. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" 2009-06-12 9:54 ` [gentoo-dev] Re: [packagekit] " Sebastian Pipping @ 2009-06-17 12:08 ` Tiziano Müller 0 siblings, 0 replies; 29+ messages in thread From: Tiziano Müller @ 2009-06-17 12:08 UTC (permalink / raw To: gentoo-dev [-- Attachment #1.1: Type: text/plain, Size: 2168 bytes --] Am Freitag, den 12.06.2009, 11:54 +0200 schrieb Sebastian Pipping: > Richard Hughes wrote: > > I'm slightly worried about it being called a service. Is it going to > > be a new process that just does the mapping or is this a bad choice of > > words? If it is a new process then I'm not sure such a thing will > > catch on. > > I'm not yet sure about how a mapper will keep it's data > fresh as the use of it is dependent on that. > Ignore my "service" for now. > > > > I'm also worried that a package manager has to read in and parse > > thousands of small files. > > While you mention "package manager" - with the current concept > the data will not be precise enough for use with a package manager. > > > > Why did you decide to write each project as > > a single xml file? > > - The other 99% of the database stay valid XML if a single > file is invalid > > - To better fit the version controlled environment > > > > Parsing and reading 10,000 files (in multiple directories) might take > > a few seconds, and would have to be copied into memory (few Mb) to > > query quickly. > > Correct. > > > > Which has to be invalidated if any of the files or > > directories change. Why didn't you just put them in a sqlite database > > that can be queried in a few ms, without dragging in an xml parser? > > Also 10,000 files take up way more space (and takes longer to install > > and update) than a single database file. > > I like your idea about sqlite. Maybe keeping the data to edit XML > and query and sqlite export snapshot is something to try. Why not use a XML database like dbxml? Maybe you could just specify the XML files as storage and then dbxml would do the rest. > > > > XML might be > > useful for storing the data, but not for querying. > > Good point. Using XPath and XQuery you can do queries on XML as well. Cheers, Tiziano -- Tiziano Müller Gentoo Linux Developer, Council Member Areas of responsibility: Samba, PostgreSQL, CPP, Python, sysadmin, GLEP Editor E-Mail : dev-zero@gentoo.org GnuPG FP : F327 283A E769 2E36 18D5 4DE2 1B05 6A63 AE9C 1E30 [-- Attachment #1.2: Dies ist ein digital signierter Nachrichtenteil --] [-- Type: application/pgp-signature, Size: 205 bytes --] [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 3551 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-dev] Re: Inviting you to project "PackageMap" [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com> 2009-06-12 9:54 ` [gentoo-dev] Re: [packagekit] " Sebastian Pipping @ 2009-06-12 13:00 ` Steven J Long 2009-06-13 3:55 ` Sebastian Pipping 1 sibling, 1 reply; 29+ messages in thread From: Steven J Long @ 2009-06-12 13:00 UTC (permalink / raw To: gentoo-dev; +Cc: packagekit Richard Hughes wrote: > Sebastian Pipping wrote: >> To do such mapping we need code (or a "service") that does the mapping >> for us and base of collected data that the service can operate on. Both >> of these is project "PackageMap" > You might as well use Gentoo's version specification for your internal format, as it's the most comprehensive. The most you need to add is debian epochs. > I'm also worried that a package manager has to read in and parse > thousands of small files. Why did you decide to write each project as > a single xml file? > <snip> > I agree with the concept, but not the implementation. All you're > trying to provide is a packagename <-> ID database. XML might be > useful for storing the data, but not for querying. > XML was never meant for data-storage for such record-sets: it was designed for data *interchange* between incompatible database engines, and as a friendlier SGML for user-defined data (which some poor DBA/coder would otherwise end up having to pull in from Excel, in most cases. The cleanup in such cases can take days, depending on how long the executive in-question has kept it as a pet-project;) igli. -- #friendly-coders -- We're friendly but we're not /that/ friendly ;-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-dev] Re: Inviting you to project "PackageMap" 2009-06-12 13:00 ` [gentoo-dev] " Steven J Long @ 2009-06-13 3:55 ` Sebastian Pipping 2009-07-11 21:38 ` [gentoo-dev] " Steven J Long 0 siblings, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-06-13 3:55 UTC (permalink / raw To: gentoo-dev; +Cc: packagekit Steven J Long wrote: > You might as well use Gentoo's version specification for your internal > format, as it's the most comprehensive. The most you need to add is > debian epochs. I'm not sure what you are referring to. Please share more details or pointers. > XML was never meant for data-storage for such record-sets: it was designed > for data *interchange* [..] Interesting point. What would you use as an alternative that works well with a version control system? Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-dev] Re: Re: Inviting you to project "PackageMap" 2009-06-13 3:55 ` Sebastian Pipping @ 2009-07-11 21:38 ` Steven J Long 0 siblings, 0 replies; 29+ messages in thread From: Steven J Long @ 2009-07-11 21:38 UTC (permalink / raw To: gentoo-dev; +Cc: packagekit Sorry for delay in answering this one, been up to here with RL, and I didn't have access to the debian and BSD bookmarks. Sebastian Pipping wrote: > Steven J Long wrote: >> You might as well use Gentoo's version specification for your internal >> format, as it's the most comprehensive. The most you need to add is >> debian epochs. > > I'm not sure what you are referring to. > Please share more details or pointers. > There's two aspects: the category grouping, and the version specifier. Most distros have a flat namespace, which gets kinda hairy. One level of grouping above that makes a BIG difference. If you take a look here: http://sources.gentoo.org/viewcvs.py/portage/main/trunk/pym/portage/versions.py?view=markup you can see the version RE; note it can handle arbitrary levels of patch (pre etc.) This is what we use in update[1] to split an arbitrary CPV: CPV='^(.*-.*|virtual)/(.*)-([0-9]+)((\.[0-9]+)*)([a-z]?)((_(pre|p|beta alpha|rc)[0-9]*)*)(-r([0-9.]+))?$' (That's all one line.) If it doesn't match that, the CPV is not valid, and with the one RE match, we have all the constituent parts, though we don't often compare versions and handle that separately. We've removed cvs. prefix since it's supposed to be deprecated, and added the inter-rev bit for prefix portage. cvs. is still kept in our version comparison code as it's a pita to rewrite, and we're waiting for some sort of outcome to handling vcs builds; simply swapping two letters is clearly optimal. The "official" reasons given for deprecating cvs. were 1) no-one knew about it. 2) those who did apparently found it "hard to deal with" though one could be forgiven for thinking that's due to novelty; -rNN is supposedly "hard" too. 3) it's vendor-specific. We removed it for getCPV() as we call that function a *lot*; much more often than verCompare() and every little bit helps. Adding it back is a doddle. A prefix here is more efficient than yet another suffix. I've yet to see an upstream version that can't be handled adequately by the Gentoo versioning scheme. Granted you get the odd insane upstream; in those kind of cases, I'd rather go by date (which can ofc be handled easily) or svn id. A distro really should not have any need to push a version more than once a day, and the distro release bit at the end allows one to deal with the occasional mistake. Ultimately it's down to the maintainer, ofc, but I'd question the stability of a project that releases more than once per day; it does occasionally happen with bugfixes, usually with some sort of patch revision. WRT debian epochs, they're described here: http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version You should also take a look at how *BSD handle eg LIB_DEPENDS (the LDEPEND variable I've occasionally suggested in the past): http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-depend.html I personally am not too fussed about the other *_DEPENDS mentioned therein; you might want to add them for completeness. > >> XML was never meant for data-storage for such record-sets: it was >> designed for data *interchange* [..] > > Interesting point. What would you use as an alternative that > works well with a version control system? > Ah didn't realise you wanted to keep this all in a vcs. Personally I'd use a format similar to the metadata section part of an ebuild, so simple shell assignments. I wouldn't worry about whether strings are single or double-quoted (or even if they're quoted at all, myself); just ban all backslashes, and have some sort of script that runs to verify pre-commit. There's no need to start going off into some nutty anxiety attack about "oh but sh allows X and BASH allows Y" imo; just verify that the data lexes fine: people make mistakes all the time. (It's only mildly tricky in C, so should be easy enough however you implement.) Having said that, XML is fine for microformats; Gentoo GuideXML is one of the nicest applications of XML I've seen (and was a factor in us going with Gentoo, believe it or not.) Nor is this a massive dataset. I just think most of your users will be more comfortable. Good luck with it, whatever format you go with. HTH, Steve. [1] http://forums.gentoo.org/viewtopic-t-546828.html -- the version on there is very much out of date (I'll get that modded soon, Naib ty for your lovely comments;) so if any Gentoo-user does want to try it: git clone git://weaver.gentooenterprise.com/update.git Please be aware this is under AGPL3+ for non-commercial use only, so if that bothers you, don't clone it. (Written with a work colleague, on work time. He wasn't happy with CC-NCSA as he's committed to FSF for some reason, a stance I have to say I am coming to agree with, but that's another [off;]topic.) -- #friendly-coders -- We're friendly but we're not /that/ friendly ;-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-dev] Inviting you to project "PackageMap" 2009-06-12 7:42 [gentoo-dev] Inviting you to project "PackageMap" Sebastian Pipping [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com> @ 2009-06-12 18:27 ` Petteri Räty 2009-06-12 21:43 ` [packagekit] " Sebastian Pipping 1 sibling, 1 reply; 29+ messages in thread From: Petteri Räty @ 2009-06-12 18:27 UTC (permalink / raw To: gentoo-dev; +Cc: PackageKit users and developers list [-- Attachment #1: Type: text/plain, Size: 828 bytes --] Sebastian Pipping wrote: > > To count into the same bucket we use global identifiers for the > "products" that fall out of a package. Gentoo package "dev-util/git" > can produce product "cpe://a:git:git", Debian's "git-core" can, too. > That string before is a CPE URI [1], a concept close to package naming > in Java. This "intermediate language" allows us to relate package names > from distro X with those of distro Y and answer various questions from > that data. > > To do such mapping we need code (or a "service") that does the mapping > for us and base of collected data that the service can operate on. Both > of these is project "PackageMap" > Instead of manually populating a database wouldn't it make more sense to parse this information from package metadata.xml? Regards, Petteri [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-12 18:27 ` [gentoo-dev] " Petteri Räty @ 2009-06-12 21:43 ` Sebastian Pipping 2009-06-13 15:53 ` Petteri Räty 0 siblings, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-06-12 21:43 UTC (permalink / raw To: PackageKit users and developers list; +Cc: gentoo-dev Petteri Räty wrote: > Sebastian Pipping wrote: >> To count into the same bucket we use global identifiers for the >> "products" that fall out of a package. Gentoo package "dev-util/git" >> can produce product "cpe://a:git:git", Debian's "git-core" can, too. >> That string before is a CPE URI [1], a concept close to package naming >> in Java. This "intermediate language" allows us to relate package names >> from distro X with those of distro Y and answer various questions from >> that data. >> >> To do such mapping we need code (or a "service") that does the mapping >> for us and base of collected data that the service can operate on. Both >> of these is project "PackageMap" > > Instead of manually populating a database wouldn't it make more sense to > parse this information from package metadata.xml? Which information exactly? Please elaborate on that. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-12 21:43 ` [packagekit] " Sebastian Pipping @ 2009-06-13 15:53 ` Petteri Räty 2009-06-13 19:03 ` Sebastian Pipping 0 siblings, 1 reply; 29+ messages in thread From: Petteri Räty @ 2009-06-13 15:53 UTC (permalink / raw To: gentoo-dev; +Cc: PackageKit users and developers list [-- Attachment #1: Type: text/plain, Size: 1312 bytes --] Sebastian Pipping wrote: > Petteri Räty wrote: >> Sebastian Pipping wrote: >>> To count into the same bucket we use global identifiers for the >>> "products" that fall out of a package. Gentoo package "dev-util/git" >>> can produce product "cpe://a:git:git", Debian's "git-core" can, too. >>> That string before is a CPE URI [1], a concept close to package naming >>> in Java. This "intermediate language" allows us to relate package names >>> from distro X with those of distro Y and answer various questions from >>> that data. >>> >>> To do such mapping we need code (or a "service") that does the mapping >>> for us and base of collected data that the service can operate on. Both >>> of these is project "PackageMap" >> Instead of manually populating a database wouldn't it make more sense to >> parse this information from package metadata.xml? > > Which information exactly? Please elaborate on that. > > Sebastian > I mean making metadata.xml the authoritative source for mapping CPE to Gentoo packages. I don't want to see the situation when adding new packages to the tree would need some mapping being done in an external web service. We should of course try to provide as much automation as possible for creating the value for metadata.xml. Regards, Petteri [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-13 15:53 ` Petteri Räty @ 2009-06-13 19:03 ` Sebastian Pipping 2009-06-13 19:16 ` Petteri Räty 2009-06-15 13:52 ` Robert Buchholz 0 siblings, 2 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-13 19:03 UTC (permalink / raw To: PackageKit users and developers list; +Cc: gentoo-dev Petteri Räty wrote: > I mean making metadata.xml the authoritative source for mapping CPE to > Gentoo packages. I don't want to see the situation when adding new > packages to the tree would need some mapping being done in an external > web service. Well, it's a nothing more than git commit and push once you sent me your public SSH key :-D One of the stronger points for collaborating at the source is that poeple who are not Gentoo devs (yet) and therefore have no write access to the Gentoo tree can still extend and fix the Gentoo packagemap entries. Doing it downstream would hurt the whole project in several ways. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-13 19:03 ` Sebastian Pipping @ 2009-06-13 19:16 ` Petteri Räty 2009-06-15 13:52 ` Robert Buchholz 1 sibling, 0 replies; 29+ messages in thread From: Petteri Räty @ 2009-06-13 19:16 UTC (permalink / raw To: gentoo-dev; +Cc: PackageKit users and developers list [-- Attachment #1: Type: text/plain, Size: 998 bytes --] Sebastian Pipping wrote: > Petteri Räty wrote: >> I mean making metadata.xml the authoritative source for mapping CPE to >> Gentoo packages. I don't want to see the situation when adding new >> packages to the tree would need some mapping being done in an external >> web service. > > Well, it's a nothing more than git commit and push once you > sent me your public SSH key :-D > > One of the stronger points for collaborating at the source is that > poeple who are not Gentoo devs (yet) and therefore have no write access > to the Gentoo tree can still extend and fix the Gentoo packagemap > entries. Doing it downstream would hurt the whole project > in several ways. > > If there's no entry in metadata.xml you can add it to your service and submit them via the usual means to Gentoo repository. Based on my experience I am just guessing that doing this externally won't be viable in the long term. If you succeed that way, all power to you. Regards, Petteri [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-13 19:03 ` Sebastian Pipping 2009-06-13 19:16 ` Petteri Räty @ 2009-06-15 13:52 ` Robert Buchholz 2009-06-15 17:04 ` Sebastian Pipping 1 sibling, 1 reply; 29+ messages in thread From: Robert Buchholz @ 2009-06-15 13:52 UTC (permalink / raw To: gentoo-dev; +Cc: Sebastian Pipping, PackageKit users and developers list [-- Attachment #1: Type: text/plain, Size: 1180 bytes --] On Saturday 13 June 2009, Sebastian Pipping wrote: > One of the stronger points for collaborating at the source is that > poeple who are not Gentoo devs (yet) and therefore have no write > access to the Gentoo tree can still extend and fix the Gentoo > packagemap entries. Doing it downstream would hurt the whole project > in several ways. To drive the project forward and find cross-distro acceptance, the packagemap repo/server has to be the authorative source of information for distributions that participate. However, I see advantages in a distributed model to collect the information. Gentoo developers could feed <cpe> tags into the metadata.xml of the tree and do not need to sign up to commit to the third-party packagemap repository. Synchronizing changed tags to the packagemap repository should be easy to automate. Changes in the repository could be propagated back to the tree by a designated team of Gentoo developers interested in the packagemap project. I have a feeling other distributions might also favor a model where they have more control about the data without giving all their devs access to one big repo. Robert [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-15 13:52 ` Robert Buchholz @ 2009-06-15 17:04 ` Sebastian Pipping 2009-06-15 18:24 ` Robert Buchholz 0 siblings, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-06-15 17:04 UTC (permalink / raw To: Robert Buchholz Cc: gentoo-dev, PackageKit users and developers list, Paul Wise, Petteri Räty Robert Buchholz wrote: > On Saturday 13 June 2009, Sebastian Pipping wrote: >> One of the stronger points for collaborating at the source is that >> poeple who are not Gentoo devs (yet) and therefore have no write >> access to the Gentoo tree can still extend and fix the Gentoo >> packagemap entries. Doing it downstream would hurt the whole project >> in several ways. > > To drive the project forward and find cross-distro acceptance, the > packagemap repo/server has to be the authorative source of information > for distributions that participate. > > However, I see advantages in a distributed model to collect the > information. Gentoo developers could feed <cpe> tags into the > metadata.xml of the tree and do not need to sign up to commit to the > third-party packagemap repository. Synchronizing changed tags to the > packagemap repository should be easy to automate. Changes in the > repository could be propagated back to the tree by a designated team of > Gentoo developers interested in the packagemap project. > > I have a feeling other distributions might also favor a model where they > have more control about the data without giving all their devs access > to one big repo. Paul Wise of Debian also articulated interest in doing database building at distro level, so that's one more point /for/ your feeling. However there are a few more things to take into account, please have a look at my reply to Paul: http://lists.alioth.debian.org/pipermail/popcon-developers/2009-June/001759.html Sorry for not CC'ing you, I should have though of that. Thinking the other way around: Is there anything we could do to make the central place approach work and feel better for everybody? Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-15 17:04 ` Sebastian Pipping @ 2009-06-15 18:24 ` Robert Buchholz 2009-06-15 19:13 ` Sebastian Pipping 0 siblings, 1 reply; 29+ messages in thread From: Robert Buchholz @ 2009-06-15 18:24 UTC (permalink / raw To: Sebastian Pipping Cc: gentoo-dev, PackageKit users and developers list, Paul Wise, Petteri Räty [-- Attachment #1: Type: text/plain, Size: 1156 bytes --] On Monday 15 June 2009, Sebastian Pipping wrote: > However there are a few more things to take into account, > please have a look at my reply to Paul: > http://lists.alioth.debian.org/pipermail/popcon-developers/2009-June/ >001759.html > > Sorry for not CC'ing you, I should have though of that. > > Thinking the other way around: Is there anything we could do > to make the central place approach work and feel better for > everybody? The consumers of the PackageMap will always only use the central database. It is only the populators of the database that would be distributed. I am convinced the project will be more viable if people can choose their level of contribution. Many developers just won't care enough to take the extra hassle. If you make it easy enough for them to contribute to the CPE mapping, i.e. update their debian/controls or metadata.xml, they will (or not :-). Other developers that care more can then extract the data and merge it at the database and do extra maintenance tasks such as updating the substition map. If you make merging easy, I don't see how this hurts the project. Robert [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-15 18:24 ` Robert Buchholz @ 2009-06-15 19:13 ` Sebastian Pipping 2009-06-15 20:27 ` Petteri Räty 2009-06-15 21:27 ` [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" Christian Faulhammer 0 siblings, 2 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-15 19:13 UTC (permalink / raw To: Robert Buchholz Cc: gentoo-dev, PackageKit users and developers list, Paul Wise, Petteri Räty Robert Buchholz wrote: > The consumers of the PackageMap will always only use the central > database. I'm not sure about that. I rather assume it will happen. Especially use ignoring the substitution map. > I am convinced the project will be more viable if people can choose > their level of contribution. Many developers just won't care enough to > take the extra hassle. Agreed. However, I don't see a huge difference in level of extra hassle. The most difficult thing is doing who's-the-vendor research in my eyes atm which is the same at both ends. Maybe collaborating at a central place can add some fun that adding "some field I don't really care about" downstream cannot. Btw on Gentoo putting it in metadata.xml might be adding to the risk of a checksum mismatch, at least for extra edits. No idea if QA tools will catch 90% of that happening. > If you make merging easy, I don't see how this hurts the project. I don't see how easy merging compensates for the issues I brought up. Can I have a few more voices on this?: Would you clearly feel more comfortable and motivated to contribute to PackageMap if it works at your distro's source package? Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-15 19:13 ` Sebastian Pipping @ 2009-06-15 20:27 ` Petteri Räty 2009-06-17 0:34 ` Sebastian Pipping 2009-06-15 21:27 ` [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" Christian Faulhammer 1 sibling, 1 reply; 29+ messages in thread From: Petteri Räty @ 2009-06-15 20:27 UTC (permalink / raw To: Sebastian Pipping; +Cc: gentoo-dev, PackageKit users and developers list [-- Attachment #1: Type: text/plain, Size: 669 bytes --] Sebastian Pipping wrote: > > > Can I have a few more voices on this?: Would you clearly feel more > comfortable and motivated to contribute to PackageMap if it works > at your distro's source package? > You are somewhat missing the point. My point is that most developers probably don't want to care about what happens PackageMap upstream at all so unless you mandate something to metadata.xml you will be relying on others to keep the information between Portage and your service in sync. But if it's in metadata.xml as a mandatory attribute then developers will be automatically adding the value when they create a new pkg. Regards, Petteri [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-15 20:27 ` Petteri Räty @ 2009-06-17 0:34 ` Sebastian Pipping 2009-06-17 9:37 ` Marijn Schouten (hkBst) 2009-06-20 13:16 ` Petteri Räty 0 siblings, 2 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-17 0:34 UTC (permalink / raw To: Petteri Räty Cc: gentoo-dev, PackageKit users and developers list, Paul Wise, Robert Buchholz, Christian Faulhammer I start to understand the real benefits of moving a larger part of the maintenance down to the distro level as you proposed. Okay, let's add support for CPEs at distro package level and sync up and down with the central packagemap database. Please contact me for collaboration on sync scripts and "modeling" of details. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-17 0:34 ` Sebastian Pipping @ 2009-06-17 9:37 ` Marijn Schouten (hkBst) 2009-06-18 0:09 ` Sebastian Pipping 2009-06-20 13:16 ` Petteri Räty 1 sibling, 1 reply; 29+ messages in thread From: Marijn Schouten (hkBst) @ 2009-06-17 9:37 UTC (permalink / raw To: gentoo-dev Cc: Petteri Räty, PackageKit users and developers list, Paul Wise, Robert Buchholz, Christian Faulhammer -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sebastian Pipping wrote: > I start to understand the real benefits of moving a larger > part of the maintenance down to the distro level as you proposed. > > Okay, let's add support for CPEs at distro package level > and sync up and down with the central packagemap database. > Please contact me for collaboration on sync scripts > and "modeling" of details. Do we not already have enough information available to automatically determine derived unique identifiers like CPE? We have the package homepage and the package name (and the package category) and the combination should be enough information to do direct comparisons to data gathered from other repos (assuming they also contain such data). For example you can determine automatically that gentoo:dev-scheme/gambit and debian:gambc are the same package because although their names differ they have the same homepage and share a category. To create the database, every time you see a package you get its metadata from its home repo. Use those values to compare to existing CPEs. If it is not yet in the database create a new entry (CPE) for it with all the metadata like homepage, categories, other-stuff-that-is-useful that is available. Every time you get a match you may want to improve the metadata of the CPE with the metadata of the newly added match. The very least you want to do is record the addition of the new match. For example if you just automatically determined that debian:gambc matches the CPE you already have for gentoo:dev-scheme/gambit then you add "debian:gambc" to the list of matches. This should get you 99,99% of all packages. You can arrange to be able to provide hints to the system for cases where it isn't able to do the correct derivation automatically. This can be done by adding this information to an empty CPE-database. For example if the system wouldn't be able to match gentoo:dev-scheme/gambit and debian:gambc, then you can create a CPE entry that contains both in its matchlist. The first thing that your program should then do to populate the database is automatically fill out the rest of that CPE by querying the gentoo and debian repos. Users will be able to use names from any repo that they please in interactions with packagekit's package manager (wrapper). For example they could do: packagekit install debian:gambc. This is a lot more intuitive than using CPEs directly (I don't know if this is what is intended). There does not seem to be a need to do any manual conversion, enlist help from a lot of distro packagers or add CPE to our metadata. Is this the way you are also intending this to work? If not, why? Marijn - -- If you cannot read my mind, then listen to what I say. Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML <http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAko4uUkACgkQp/VmCx0OL2ySvwCfQHwn2R/yC9EHx8KFjOE0B3f9 CCwAnRXqFX8q0Kt3MlMS9e63PC0LaiV+ =Y0gZ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-17 9:37 ` Marijn Schouten (hkBst) @ 2009-06-18 0:09 ` Sebastian Pipping 2009-06-18 9:07 ` Marijn Schouten (hkBst) [not found] ` <1245295820.11471.223.camel@chianamo.mine.nu> 0 siblings, 2 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-18 0:09 UTC (permalink / raw To: PackageKit users and developers list Cc: gentoo-dev, Paul Wise, Christian Faulhammer, Petteri Räty, Robert Buchholz Marijn Schouten (hkBst) wrote: > Sebastian Pipping wrote: >> I start to understand the real benefits of moving a larger >> part of the maintenance down to the distro level as you proposed. > >> Okay, let's add support for CPEs at distro package level >> and sync up and down with the central packagemap database. >> Please contact me for collaboration on sync scripts >> and "modeling" of details. > > Do we not already have enough information available to automatically determine > derived unique identifiers like CPE? > > We have the package homepage and the package name (and the package category) and > the combination should be enough information to do direct comparisons to data > gathered from other repos (assuming they also contain such data). You are asking a valid question. The homepage links can be a great helper in mapping and they have been of help already for the mapping of the first 1000 Gentoo packages in packagemap. However it might not be as easy you make it sound, as there are a few things that complicate things and produce extra work: - In many cases a project can be reached from several URLs. For a project on SF.net you might have - http://sf.net/projects/${name} - http://${name}.sf.net/ - http://www.${name}.org/ That case can be handled rather easily but there are many more special cases and a manual map may be required for stuff that's not hosted on a larger hosting site. - Split packages (think Git or Qt) may all have the same homepage. In Debian the source package might help there, in Gentoo you'd have to do common prefix detection or so, that's special cases again, and continuous review that it still does what you need. > For example you can determine automatically that gentoo:dev-scheme/gambit and > debian:gambc are the same package because although their names differ they have > the same homepage and share a category. To detect equal categories you need a map for categories for all participating distros. Yes, it's smaller than mapping all packages but it involves a manual map and keeping it in sync. Another word on homepage collisions: A few days before I wrote a script that builds a map from homepages to packagenames for the whole Gentoo tree (code/gentoo/gentoo-world-to-homepage-map.sh). The generated table from my run was 12330 lines long, each line for a different package. If you run an analysis over that table you see that many homepages appear many more times than just once. Here's the top ten: 68 http://www.gnome.org/ 67 http://www.gentoo.org/ 58 http://www.gentoo.org/proj/en/perl/ 42 http://lingucomponent.openoffice.org/ 26 http://www.kde.org/ 25 http://www.gentoo.org 20 http://sourceforge.net/projects/synce/ 19 http://www.trolltech.com/ 19 http://search.cpan.org/~rjbs/ 18 http://opensuse.foehr-it.de/ The command I used is $ sed 's| *.*$||' homepage-to-package.txt \ | sort | uniq -c | sort -n -r | head -n 10 I think this three cases alone show that it would be - also a lot of work - be many special cases - still require manual mappings here and there Another disadvantage is the current static XML approach of packagemap is language independent. We can easily build tools for packagemap in any language that has an XML parser. If the data actually is the code we suddenly have to keep code from different languages in precise special case sync. I'm not sure if the approach you describe is less work in total. I guess to find out we'd have to do both in parallel :-) It could be interesting how much the list of homepages in say Debian packages and Gentoo packages overlap. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-18 0:09 ` Sebastian Pipping @ 2009-06-18 9:07 ` Marijn Schouten (hkBst) 2009-06-19 18:53 ` Sebastian Pipping [not found] ` <1245295820.11471.223.camel@chianamo.mine.nu> 1 sibling, 1 reply; 29+ messages in thread From: Marijn Schouten (hkBst) @ 2009-06-18 9:07 UTC (permalink / raw To: gentoo-dev Cc: PackageKit users and developers list, Paul Wise, Christian Faulhammer, Petteri Räty, Robert Buchholz -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sebastian Pipping wrote: > Marijn Schouten (hkBst) wrote: >> Sebastian Pipping wrote: >>> I start to understand the real benefits of moving a larger >>> part of the maintenance down to the distro level as you proposed. >>> Okay, let's add support for CPEs at distro package level >>> and sync up and down with the central packagemap database. >>> Please contact me for collaboration on sync scripts >>> and "modeling" of details. >> Do we not already have enough information available to automatically determine >> derived unique identifiers like CPE? >> >> We have the package homepage and the package name (and the package category) and >> the combination should be enough information to do direct comparisons to data >> gathered from other repos (assuming they also contain such data). > > You are asking a valid question. The homepage links can be a great > helper in mapping and they have been of help already for the mapping > of the first 1000 Gentoo packages in packagemap. > > However it might not be as easy you make it sound, as there are > a few things that complicate things and produce extra work: > > - In many cases a project can be reached from several URLs. > For a project on SF.net you might have > - http://sf.net/projects/${name} > - http://${name}.sf.net/ > - http://www.${name}.org/ > That case can be handled rather easily but there are many more > special cases and a manual map may be required for stuff that's > not hosted on a larger hosting site. But homepage is just ONE of the things that help you to identify a package. Some packages that are the same will have different homepages and some packages which are different will have the same homepage. If you take just homepage, package name into account and the fact that packages from the same repo are different, you can probably match over 95% of all packages correctly. > - Split packages (think Git or Qt) may all have the same homepage. > In Debian the source package might help there, in Gentoo you'd > have to do common prefix detection or so, that's special > cases again, and continuous review that it still does what you need. Neither of the gits gentoo has seems very split, so I'll only address qt. Gentoo has qt-core and qt-svg (and many more). I would say that they would each have to get a different CPE and that none of them is equivalent to a package in another or the same distro that has all of qt combined. Packages that get manually split are a minority AFAIK, though texlive is another big one that comes to mind. Debian does splitting into ``normal'' and ``devel'' packages. Has it been decided what to do with those? Now that you got me thinking about split packages, I realize that the exact files installed by a package are also all by themselves a way to get over 95% correct matching. For distros (like Gentoo) that have packages that have flags that influence the list of installed files you must decide whether to add them to the database last, or whether you will try to use an imprecise file list. >> For example you can determine automatically that gentoo:dev-scheme/gambit and >> debian:gambc are the same package because although their names differ they have >> the same homepage and share a category. > > To detect equal categories you need a map for categories for all > participating distros. Yes, it's smaller than mapping all packages > but it involves a manual map and keeping it in sync. No, there need not be a manual mapping. There is no reason to do true/false comparisons. All we need is a distance function, like for example Levenshtein distance (http://en.wikipedia.org/wiki/Levenshtein_distance). Actually on second thought Levenshtein distance is probably not what we want, since we would be more interested in how much strings have in common than in how much they differ. I think the idea is clear though. > Another word on homepage collisions: A few days before I wrote > a script that builds a map from homepages to packagenames for the > whole Gentoo tree (code/gentoo/gentoo-world-to-homepage-map.sh). > The generated table from my run was 12330 lines long, each line for > a different package. > > If you run an analysis over that table you see that many > homepages appear many more times than just once. > Here's the top ten: > > 68 http://www.gnome.org/ > 67 http://www.gentoo.org/ > 58 http://www.gentoo.org/proj/en/perl/ > 42 http://lingucomponent.openoffice.org/ > 26 http://www.kde.org/ > 25 http://www.gentoo.org > 20 http://sourceforge.net/projects/synce/ > 19 http://www.trolltech.com/ > 19 http://search.cpan.org/~rjbs/ > 18 http://opensuse.foehr-it.de/ texlive with (http://www.tug.org/texlive/) seems to be missing from this list. $ eix -H http://www.tug.org/texlive/ | tail -n 1 Found 79 matches. I suspect you used grep (or whatever) to construct your data, instead of using the package manager or a tool that knows how to extract the data available in packages (and eclasses). > The command I used is > > $ sed 's| *.*$||' homepage-to-package.txt \ > | sort | uniq -c | sort -n -r | head -n 10 > > I think this three cases alone show that it would be I'm not sure which 3 cases you mean. > - also a lot of work > - be many special cases > - still require manual mappings here and there > > Another disadvantage is the current static XML approach of > packagemap is language independent. We can easily build > tools for packagemap in any language that has an XML parser. I agree that XML is a disadvantage, but not that it is language independent. ;P > If the data actually is the code we suddenly have to keep > code from different languages in precise special case sync. I did not argue for a data format nor for a specific language nor coding style nor anything that seems to match what you are saying here; I only spoke about how to populate the CPE database. > I'm not sure if the approach you describe is less work in total. > I guess to find out we'd have to do both in parallel :-) > > It could be interesting how much the list of homepages > in say Debian packages and Gentoo packages overlap. It would certainly be interesting. Marijn - -- If you cannot read my mind, then listen to what I say. Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML <http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAko6A7QACgkQp/VmCx0OL2wl/wCgpSNzob7skilge+56ynbmawHY /1EAoJnOOG2Bix0IpWqySP063AJIWDta =L9t+ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-18 9:07 ` Marijn Schouten (hkBst) @ 2009-06-19 18:53 ` Sebastian Pipping 0 siblings, 0 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-19 18:53 UTC (permalink / raw To: PackageKit users and developers list Cc: gentoo-dev, Petteri Räty, Christian Faulhammer, Paul Wise, Robert Buchholz Marijn Schouten (hkBst) wrote: > Neither of the gits gentoo has seems very split, I was referring to git in Debian here: Package: git-core Binary: git-core, git-doc, git-arch, git-cvs, git-svn, git-email, git-daemon-run, git-gui, gitk, gitweb > texlive with (http://www.tug.org/texlive/) seems to be missing from this list. > > $ eix -H http://www.tug.org/texlive/ | tail -n 1 > Found 79 matches. > > I suspect you used grep (or whatever) to construct your data, instead of using > the package manager or a tool that knows how to extract the data available in > packages (and eclasses). True, grep and friends. > I'm not sure which 3 cases you mean. I was referring to what I said before, in summary: 1) non-unique homepages 2) extra work for split packages 3) extra work for category mapping > I did not argue for a data format nor for a specific language nor coding style > nor anything that seems to match what you are saying here; I only spoke about > how to populate the CPE database. I understood you wanted to replace the XML colection with mapping code. I got you wrong then. I agree that combining automated fill of the database with manual can speed things up a lot. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1245295820.11471.223.camel@chianamo.mine.nu>]
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" [not found] ` <1245295820.11471.223.camel@chianamo.mine.nu> @ 2009-06-18 22:33 ` Sebastian Pipping [not found] ` <1245382383.14805.281.camel@chianamo.mine.nu> 0 siblings, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-06-18 22:33 UTC (permalink / raw To: Paul Wise Cc: PackageKit users and developers list, gentoo-dev, Christian Faulhammer, Petteri Räty, Robert Buchholz Paul Wise wrote: > On Thu, 2009-06-18 at 02:09 +0200, Sebastian Pipping wrote: > >> It could be interesting how much the list of homepages >> in say Debian packages and Gentoo packages overlap. > > Debian sid amd64 binary packages: > > $ grep -h ^Homepage /var/lib/apt/lists/mirror.internode.on.net_debian_dists_unstable_main_binary-amd64_Packages | sort | uniq -c | sort -n -r | head -n 10 > 154 Homepage: http://www.go-oo.org > 149 Homepage: http://www.kde.org/ > 107 Homepage: http://i18n.kde.org/ > 97 Homepage: http://www.tug.org/texlive > 90 Homepage: http://www.mono-project.com/ > 83 Homepage: http://xcb.freedesktop.org > 67 Homepage: http://xmms2.xmms.se/ > 63 Homepage: http://www.kde.org > 59 Homepage: http://www.ruby-lang.org/ > 59 Homepage: http://www.cs.wustl.edu/~schmidt/ACE.html > > Debian sid source packages: > > $ grep -h ^Homepage /var/lib/apt/lists/mirror.internode.on.net_debian_dists_unstable_main_source_Sources | sort | uniq -c | sort -n -r | head -n 10 23 Homepage: http://www.tryton.org/ > 23 Homepage: http://www.Rmetrics.org > 19 Homepage: http://www.xfce.org/ > 19 Homepage: http://www.apertium.org > 19 Homepage: http://goodies.xfce.org/ > 16 Homepage: http://www.schoolsplay.org/ > 16 Homepage: http://savonet.sourceforge.net/ > 16 Homepage: http://gtk2-perl.sourceforge.net/ > 14 Homepage: http://www.ggzgamingzone.org/ > 13 Homepage: http://www.kde.org/ Can you share the list files and the scripts you used to generate them with me? I'd like to determine the subset of URLs that appear exactly once in both gentoo and debian source packages. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1245382383.14805.281.camel@chianamo.mine.nu>]
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" [not found] ` <1245382383.14805.281.camel@chianamo.mine.nu> @ 2009-06-19 17:36 ` Sebastian Pipping 2009-06-19 21:47 ` Sebastian Pipping 0 siblings, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-06-19 17:36 UTC (permalink / raw To: Paul Wise Cc: PackageKit users and developers list, gentoo-dev, Christian Faulhammer, Petteri Räty, Robert Buchholz Paul Wise wrote: > The scripts were in my mail and the files are on every Debian mirror: > > wget -O - http://ftp.debian.org/debian/dists/unstable/main/binary-amd64/Packages | grep -h ^Homepage | sort | uniq -c | sort -n -r | head -n 10 > wget -O - http://ftp.debian.org/debian/dists/unstable/main/source/Sources | grep -h ^Homepage | sort | uniq -c | sort -n -r | head -n 10 I see, thanks. I wrote: > I'd like to determine the subset of URLs that appear > exactly once in both gentoo and debian source packages. I made a script for this job now. With zero normalization I get this result: Mappable homepages in Debian: 6222 Mappable homepages in Gentoo: 9582 Shared (without normalization): 1183 That's about 11% of the Gentoo tree. The script is up here: http://git.goodpoint.de/?p=packagemap.git;a=tree;f=code/debian Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-19 17:36 ` Sebastian Pipping @ 2009-06-19 21:47 ` Sebastian Pipping 0 siblings, 0 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-19 21:47 UTC (permalink / raw To: PackageKit users and developers list Cc: Paul Wise, Robert Buchholz, Christian Faulhammer, Petteri Räty, gentoo-dev Sebastian Pipping wrote: >> I'd like to determine the subset of URLs that appear >> exactly once in both gentoo and debian source packages. > > Mappable homepages in Debian: 6222 > Mappable homepages in Gentoo: 9582 > Shared (without normalization): 1183 With normalization for SourceForge, Google Code, Alioth, Savannah, Berlios, RobyForge, Gna, Pypi the number of directly mappable packages increases by about 500: Mappable homepages in Debian: 6222 Mappable homepages in Gentoo: 9582 Shared (w/o normalization): 1183 Shared (w/ normalization): 1670 Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-17 0:34 ` Sebastian Pipping 2009-06-17 9:37 ` Marijn Schouten (hkBst) @ 2009-06-20 13:16 ` Petteri Räty 2009-06-20 17:28 ` Sebastian Pipping 2009-07-14 16:49 ` Sebastian Pipping 1 sibling, 2 replies; 29+ messages in thread From: Petteri Räty @ 2009-06-20 13:16 UTC (permalink / raw To: Sebastian Pipping; +Cc: gentoo-dev, PackageKit users and developers list [-- Attachment #1: Type: text/plain, Size: 834 bytes --] Sebastian Pipping wrote: > I start to understand the real benefits of moving a larger > part of the maintenance down to the distro level as you proposed. > > Okay, let's add support for CPEs at distro package level > and sync up and down with the central packagemap database. > Please contact me for collaboration on sync scripts > and "modeling" of details. > > > > Sebastian You need to come up with the needed DTD changes for metadata.xml. Last time the schema was changed it was done with a GLEP so writing one seems prudent here too especially if we are going to make the value mandatory after it was been added to all existing packages. Also documentation (devmanual, developer handbook come to mind at least) concerning metadata.xml needs to be updated to document the new stuff. Regards, Petteri [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 261 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap" 2009-06-20 13:16 ` Petteri Räty @ 2009-06-20 17:28 ` Sebastian Pipping 2009-07-14 16:49 ` Sebastian Pipping 1 sibling, 0 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-06-20 17:28 UTC (permalink / raw To: Petteri Räty; +Cc: gentoo-dev, PackageKit users and developers list Petteri Räty wrote: > You need to come up with the needed DTD changes for metadata.xml. Last > time the schema was changed it was done with a GLEP so writing one seems > prudent here too especially if we are going to make the value mandatory > after it was been added to all existing packages. Also documentation > (devmanual, developer handbook come to mind at least) concerning > metadata.xml needs to be updated to document the new stuff. Makes sense, added to my todo list. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [gentoo-dev] Inviting you to project "PackageMap" 2009-06-20 13:16 ` Petteri Räty 2009-06-20 17:28 ` Sebastian Pipping @ 2009-07-14 16:49 ` Sebastian Pipping 2009-07-20 2:03 ` [GLEP] CPE names in metadata (was Re: [gentoo-dev] Inviting you to project "PackageMap") Sebastian Pipping 1 sibling, 1 reply; 29+ messages in thread From: Sebastian Pipping @ 2009-07-14 16:49 UTC (permalink / raw To: Petteri Räty; +Cc: gentoo-dev, Robert Buchholz Petteri Räty wrote: > You need to come up with the needed DTD changes for metadata.xml. Last > time the schema was changed it was done with a GLEP so writing one seems > prudent here I have started - writing a GLEP - extending the DTD - extending a sample metadata.xml Related gitweb over here: http://git.goodpoint.de/?p=metadata-xml-cpe-glep.git Would be great to get some review. Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* [GLEP] CPE names in metadata (was Re: [gentoo-dev] Inviting you to project "PackageMap") 2009-07-14 16:49 ` Sebastian Pipping @ 2009-07-20 2:03 ` Sebastian Pipping 0 siblings, 0 replies; 29+ messages in thread From: Sebastian Pipping @ 2009-07-20 2:03 UTC (permalink / raw To: gentoo-dev Sebastian Pipping wrote: > I have started > - writing a GLEP > - extending the DTD > - extending a sample metadata.xml > > Related gitweb over here: > http://git.goodpoint.de/?p=metadata-xml-cpe-glep.git Especially as this is my first GLEP and it will affect most of you in the long run, I depend on your feedback here. Just added more words on CPE names and replaced the given example. Please have a look and tear it apart :-) Thanks in advance, Sebastian ^ permalink raw reply [flat|nested] 29+ messages in thread
* [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" 2009-06-15 19:13 ` Sebastian Pipping 2009-06-15 20:27 ` Petteri Räty @ 2009-06-15 21:27 ` Christian Faulhammer 1 sibling, 0 replies; 29+ messages in thread From: Christian Faulhammer @ 2009-06-15 21:27 UTC (permalink / raw To: gentoo-dev [-- Attachment #1: Type: text/plain, Size: 1011 bytes --] Hi, Sebastian Pipping <webmaster@hartwork.org>: > > I am convinced the project will be more viable if people can choose > > their level of contribution. Many developers just won't care enough > > to take the extra hassle. > > Agreed. However, I don't see a huge difference in level of > extra hassle. The most difficult thing is doing who's-the-vendor > research in my eyes atm which is the same at both ends. > Maybe collaborating at a central place can add some fun that > adding "some field I don't really care about" downstream cannot. I agree with Petteri here, adding the cpe information into our metadata.xml makes forgetting entry submissions to PackageMap really easy for everyone. Thus automatic extraction of information from your side from metadata.xml files should be one option to gather the information. V-Li -- Christian Faulhammer, Gentoo Lisp project <URL:http://www.gentoo.org/proj/en/lisp/>, #gentoo-lisp on FreeNode <URL:http://gentoo.faulhammer.org/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2009-07-20 2:03 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-12 7:42 [gentoo-dev] Inviting you to project "PackageMap" Sebastian Pipping [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com> 2009-06-12 9:54 ` [gentoo-dev] Re: [packagekit] " Sebastian Pipping 2009-06-17 12:08 ` Tiziano Müller 2009-06-12 13:00 ` [gentoo-dev] " Steven J Long 2009-06-13 3:55 ` Sebastian Pipping 2009-07-11 21:38 ` [gentoo-dev] " Steven J Long 2009-06-12 18:27 ` [gentoo-dev] " Petteri Räty 2009-06-12 21:43 ` [packagekit] " Sebastian Pipping 2009-06-13 15:53 ` Petteri Räty 2009-06-13 19:03 ` Sebastian Pipping 2009-06-13 19:16 ` Petteri Räty 2009-06-15 13:52 ` Robert Buchholz 2009-06-15 17:04 ` Sebastian Pipping 2009-06-15 18:24 ` Robert Buchholz 2009-06-15 19:13 ` Sebastian Pipping 2009-06-15 20:27 ` Petteri Räty 2009-06-17 0:34 ` Sebastian Pipping 2009-06-17 9:37 ` Marijn Schouten (hkBst) 2009-06-18 0:09 ` Sebastian Pipping 2009-06-18 9:07 ` Marijn Schouten (hkBst) 2009-06-19 18:53 ` Sebastian Pipping [not found] ` <1245295820.11471.223.camel@chianamo.mine.nu> 2009-06-18 22:33 ` Sebastian Pipping [not found] ` <1245382383.14805.281.camel@chianamo.mine.nu> 2009-06-19 17:36 ` Sebastian Pipping 2009-06-19 21:47 ` Sebastian Pipping 2009-06-20 13:16 ` Petteri Räty 2009-06-20 17:28 ` Sebastian Pipping 2009-07-14 16:49 ` Sebastian Pipping 2009-07-20 2:03 ` [GLEP] CPE names in metadata (was Re: [gentoo-dev] Inviting you to project "PackageMap") Sebastian Pipping 2009-06-15 21:27 ` [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" Christian Faulhammer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox