public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] Inviting you to project "PackageMap"
@ 2009-06-12  7:42 Sebastian Pipping
       [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com>
  2009-06-12 18:27 ` [gentoo-dev] " Petteri Räty
  0 siblings, 2 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-12  7:42 UTC (permalink / raw
  To: PackageKit users and developers list; +Cc: gentoo-dev

Hello!


Quick (re-)introduction:  My task for Gentoo/Google Summer of Code 2009
is to give Gentoo a Debian popcon equivalent, a tool to collect
statistics on "what package is installed how often".  To achieve this
goal I'm extending Smolt (a tool currently doing similar things with
hardware information) by fine-tunable software stats gathering.


The plan we have for Smolt is to make it cross-distro, not just fit
Gentoo or Fedora.  One point where the consequences and benefits of such
an approach can be seen clearly is with

  counting packages from different distros into the same buckets.

What do I mean by that?  Debian's Git counts for Gentoo's Git counts for
Fedora's, you know the list.  With packages counted from accross distros
we can suddenly answer questions that we currently cannot answer, among them

 - What globally popular packages are missing in distro X?
   Let's say we don't have a package for product P.  Do other distros
   have one?  They do, maybe we need one, too?  They don't, maybe P is
   not that important then?

 - How many Linux users are approximately using program X in total?
   Not just on Ubuntu or Arch - all across Linux, BSD, Solaris!

 - Does distro X have 10 times the packages of Y or is it just
   different splitting?

To count into the same bucket we use global identifiers for the
"products" that fall out of a package.  Gentoo package "dev-util/git"
can produce product "cpe://a:git:git", Debian's "git-core" can, too.
That string before is a CPE URI [1], a concept close to package naming
in Java.  This "intermediate language" allows us to relate package names
from distro X with those of distro Y and answer various questions from
that data.

To do such mapping we need code (or a "service") that does the mapping
for us and base of collected data that the service can operate on.  Both
of these is project "PackageMap"

I have started populating the database with packages (currently 312
in number) made from information extracted from the Gentoo tree
and the National Vulnerability Database.  Latter holds many CPEs.
Let me state clearly that packagemap is not about Gentoo in particular.
Sure, the initial data has lots of Gentoo in it but the whole point of
the project is to get information and people from different distros
together.

To see what these 312 packages maps look like at the moment you best do
a few clicks through the database folder yourself:
http://git.goodpoint.de/?p=packagemap.git;a=tree;f=database

Also, there are Relax NG schema and DTD for validation, more
documentation than I usually write and a few scripts:
http://git.goodpoint.de/?p=packagemap.git;a=tree

  By now I hope you have gained interest in what this can become.
  Your active participation is highly appreciated.
  A few minutes from everyone can make a huge difference here.
  If you want write access to the repo - mail me: sebastian@pipping.org.

Please have a look at the Git repository linked above and ask questions.
I propose to keep the related Gentoo stuff on gentoo-dev and everything
else on the packagekit list.  I hope that works out well.

Thanks for reading up to this point.



Sebastian



PS: I'm aware "hartwork.org" might not make a good longterm location for
    DTDs, XML namespaces and such for a cross-distro project.  Any ideas
    where to put them best?

[1] http://cpe.mitre.org/





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap"
       [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com>
@ 2009-06-12  9:54   ` Sebastian Pipping
  2009-06-17 12:08     ` Tiziano Müller
  2009-06-12 13:00   ` [gentoo-dev] " Steven J Long
  1 sibling, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-12  9:54 UTC (permalink / raw
  To: PackageKit users and developers list; +Cc: gentoo-dev

Richard Hughes wrote:
> I'm slightly worried about it being called a service. Is it going to
> be a new process that just does the mapping or is this a bad choice of
> words? If it is a new process then I'm not sure such a thing will
> catch on.

I'm not yet sure about how a mapper will keep it's data
fresh as the use of it is dependent on that.
Ignore my "service" for now.


> I'm also worried that a package manager has to read in and parse
> thousands of small files.

While you mention "package manager" - with the current concept
the data will not be precise enough for use with a package manager.


> Why did you decide to write each project as
> a single xml file?

 - The other 99% of the database stay valid XML if a single
   file is invalid

 - To better fit the version controlled environment


> Parsing and reading 10,000 files (in multiple directories) might take
> a few seconds, and would have to be copied into memory (few Mb) to
> query quickly.

Correct.


> Which has to be invalidated if any of the files or
> directories change. Why didn't you just put them in a sqlite database
> that can be queried in a few ms, without dragging in an xml parser?
> Also 10,000 files take up way more space (and takes longer to install
> and update) than a single database file.

I like your idea about sqlite.  Maybe keeping the data to edit XML
and query and sqlite export snapshot is something to try.


> XML might be
> useful for storing the data, but not for querying.

Good point.



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [gentoo-dev]  Re: Inviting you to project "PackageMap"
       [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com>
  2009-06-12  9:54   ` [gentoo-dev] Re: [packagekit] " Sebastian Pipping
@ 2009-06-12 13:00   ` Steven J Long
  2009-06-13  3:55     ` Sebastian Pipping
  1 sibling, 1 reply; 29+ messages in thread
From: Steven J Long @ 2009-06-12 13:00 UTC (permalink / raw
  To: gentoo-dev; +Cc: packagekit

Richard Hughes wrote:

>  Sebastian Pipping wrote:
>> To do such mapping we need code (or a "service") that does the mapping
>> for us and base of collected data that the service can operate on.  Both
>> of these is project "PackageMap"
>
You might as well use Gentoo's version specification for your internal
format, as it's the most comprehensive. The most you need to add is
debian epochs.

> I'm also worried that a package manager has to read in and parse
> thousands of small files. Why did you decide to write each project as
> a single xml file?
>
<snip>
> I agree with the concept, but not the implementation. All you're
> trying to provide is a packagename <-> ID database. XML might be
> useful for storing the data, but not for querying.
>
XML was never meant for data-storage for such record-sets: it was designed
for data *interchange* between incompatible database engines, and as a
friendlier SGML for user-defined data (which some poor DBA/coder would
otherwise end up having to pull in from Excel, in most cases. The cleanup
in such cases can take days, depending on how long the executive in-question
has kept it as a pet-project;)

igli.
--
#friendly-coders -- We're friendly but we're not /that/ friendly ;-)




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-12  7:42 [gentoo-dev] Inviting you to project "PackageMap" Sebastian Pipping
       [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com>
@ 2009-06-12 18:27 ` Petteri Räty
  2009-06-12 21:43   ` [packagekit] " Sebastian Pipping
  1 sibling, 1 reply; 29+ messages in thread
From: Petteri Räty @ 2009-06-12 18:27 UTC (permalink / raw
  To: gentoo-dev; +Cc: PackageKit users and developers list

[-- Attachment #1: Type: text/plain, Size: 828 bytes --]

Sebastian Pipping wrote:
> 
> To count into the same bucket we use global identifiers for the
> "products" that fall out of a package.  Gentoo package "dev-util/git"
> can produce product "cpe://a:git:git", Debian's "git-core" can, too.
> That string before is a CPE URI [1], a concept close to package naming
> in Java.  This "intermediate language" allows us to relate package names
> from distro X with those of distro Y and answer various questions from
> that data.
> 
> To do such mapping we need code (or a "service") that does the mapping
> for us and base of collected data that the service can operate on.  Both
> of these is project "PackageMap"
> 

Instead of manually populating a database wouldn't it make more sense to
parse this information from package metadata.xml?

Regards,
Petteri




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-12 18:27 ` [gentoo-dev] " Petteri Räty
@ 2009-06-12 21:43   ` Sebastian Pipping
  2009-06-13 15:53     ` Petteri Räty
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-12 21:43 UTC (permalink / raw
  To: PackageKit users and developers list; +Cc: gentoo-dev

Petteri Räty wrote:
> Sebastian Pipping wrote:
>> To count into the same bucket we use global identifiers for the
>> "products" that fall out of a package.  Gentoo package "dev-util/git"
>> can produce product "cpe://a:git:git", Debian's "git-core" can, too.
>> That string before is a CPE URI [1], a concept close to package naming
>> in Java.  This "intermediate language" allows us to relate package names
>> from distro X with those of distro Y and answer various questions from
>> that data.
>>
>> To do such mapping we need code (or a "service") that does the mapping
>> for us and base of collected data that the service can operate on.  Both
>> of these is project "PackageMap"
> 
> Instead of manually populating a database wouldn't it make more sense to
> parse this information from package metadata.xml?

Which information exactly?  Please elaborate on that.



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [gentoo-dev]  Re: Inviting you to project "PackageMap"
  2009-06-12 13:00   ` [gentoo-dev] " Steven J Long
@ 2009-06-13  3:55     ` Sebastian Pipping
  2009-07-11 21:38       ` [gentoo-dev] " Steven J Long
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-13  3:55 UTC (permalink / raw
  To: gentoo-dev; +Cc: packagekit

Steven J Long wrote:
> You might as well use Gentoo's version specification for your internal
> format, as it's the most comprehensive. The most you need to add is
> debian epochs.

I'm not sure what you are referring to.
Please share more details or pointers.


> XML was never meant for data-storage for such record-sets: it was designed
> for data *interchange* [..]

Interesting point.  What would you use as an alternative that
works well with a version control system?



Sebastian




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-12 21:43   ` [packagekit] " Sebastian Pipping
@ 2009-06-13 15:53     ` Petteri Räty
  2009-06-13 19:03       ` Sebastian Pipping
  0 siblings, 1 reply; 29+ messages in thread
From: Petteri Räty @ 2009-06-13 15:53 UTC (permalink / raw
  To: gentoo-dev; +Cc: PackageKit users and developers list

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

Sebastian Pipping wrote:
> Petteri Räty wrote:
>> Sebastian Pipping wrote:
>>> To count into the same bucket we use global identifiers for the
>>> "products" that fall out of a package.  Gentoo package "dev-util/git"
>>> can produce product "cpe://a:git:git", Debian's "git-core" can, too.
>>> That string before is a CPE URI [1], a concept close to package naming
>>> in Java.  This "intermediate language" allows us to relate package names
>>> from distro X with those of distro Y and answer various questions from
>>> that data.
>>>
>>> To do such mapping we need code (or a "service") that does the mapping
>>> for us and base of collected data that the service can operate on.  Both
>>> of these is project "PackageMap"
>> Instead of manually populating a database wouldn't it make more sense to
>> parse this information from package metadata.xml?
> 
> Which information exactly?  Please elaborate on that.
> 
> Sebastian
> 

I mean making metadata.xml the authoritative source for mapping CPE to
Gentoo packages. I don't want to see the situation when adding new
packages to the tree would need some mapping being done in an external
web service. We should of course try to provide as much automation as
possible for creating the value for metadata.xml.

Regards,
Petteri


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-13 15:53     ` Petteri Räty
@ 2009-06-13 19:03       ` Sebastian Pipping
  2009-06-13 19:16         ` Petteri Räty
  2009-06-15 13:52         ` Robert Buchholz
  0 siblings, 2 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-13 19:03 UTC (permalink / raw
  To: PackageKit users and developers list; +Cc: gentoo-dev

Petteri Räty wrote:
> I mean making metadata.xml the authoritative source for mapping CPE to
> Gentoo packages. I don't want to see the situation when adding new
> packages to the tree would need some mapping being done in an external
> web service.

Well, it's a nothing more than git commit and push once you
sent me your public SSH key :-D

One of the stronger points for collaborating at the source is that
poeple who are not Gentoo devs (yet) and therefore have no write access
to the Gentoo tree can still extend and fix the Gentoo packagemap
entries.  Doing it downstream would hurt the whole project
in several ways.




Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-13 19:03       ` Sebastian Pipping
@ 2009-06-13 19:16         ` Petteri Räty
  2009-06-15 13:52         ` Robert Buchholz
  1 sibling, 0 replies; 29+ messages in thread
From: Petteri Räty @ 2009-06-13 19:16 UTC (permalink / raw
  To: gentoo-dev; +Cc: PackageKit users and developers list

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

Sebastian Pipping wrote:
> Petteri Räty wrote:
>> I mean making metadata.xml the authoritative source for mapping CPE to
>> Gentoo packages. I don't want to see the situation when adding new
>> packages to the tree would need some mapping being done in an external
>> web service.
> 
> Well, it's a nothing more than git commit and push once you
> sent me your public SSH key :-D
> 
> One of the stronger points for collaborating at the source is that
> poeple who are not Gentoo devs (yet) and therefore have no write access
> to the Gentoo tree can still extend and fix the Gentoo packagemap
> entries.  Doing it downstream would hurt the whole project
> in several ways.
> 
> 

If there's no entry in metadata.xml you can add it to your service and
submit them via the usual means to Gentoo repository. Based on my
experience I am just guessing that doing this externally won't be viable
in the long term. If you succeed that way, all power to you.

Regards,
Petteri


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-13 19:03       ` Sebastian Pipping
  2009-06-13 19:16         ` Petteri Räty
@ 2009-06-15 13:52         ` Robert Buchholz
  2009-06-15 17:04           ` Sebastian Pipping
  1 sibling, 1 reply; 29+ messages in thread
From: Robert Buchholz @ 2009-06-15 13:52 UTC (permalink / raw
  To: gentoo-dev; +Cc: Sebastian Pipping, PackageKit users and developers list

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

On Saturday 13 June 2009, Sebastian Pipping wrote:
> One of the stronger points for collaborating at the source is that
> poeple who are not Gentoo devs (yet) and therefore have no write
> access to the Gentoo tree can still extend and fix the Gentoo
> packagemap entries.  Doing it downstream would hurt the whole project
> in several ways.

To drive the project forward and find cross-distro acceptance, the 
packagemap repo/server has to be the authorative source of information 
for distributions that participate.

However, I see advantages in a distributed model to collect the 
information. Gentoo developers could feed <cpe> tags into the 
metadata.xml of the tree and do not need to sign up to commit to the 
third-party packagemap repository. Synchronizing changed tags to the 
packagemap repository should be easy to automate. Changes in the 
repository could be propagated back to the tree by a designated team of 
Gentoo developers interested in the packagemap project.

I have a feeling other distributions might also favor a model where they 
have more control about the data without giving all their devs access 
to one big repo.


Robert

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-15 13:52         ` Robert Buchholz
@ 2009-06-15 17:04           ` Sebastian Pipping
  2009-06-15 18:24             ` Robert Buchholz
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-15 17:04 UTC (permalink / raw
  To: Robert Buchholz
  Cc: gentoo-dev, PackageKit users and developers list, Paul Wise,
	Petteri Räty

Robert Buchholz wrote:
> On Saturday 13 June 2009, Sebastian Pipping wrote:
>> One of the stronger points for collaborating at the source is that
>> poeple who are not Gentoo devs (yet) and therefore have no write
>> access to the Gentoo tree can still extend and fix the Gentoo
>> packagemap entries.  Doing it downstream would hurt the whole project
>> in several ways.
> 
> To drive the project forward and find cross-distro acceptance, the 
> packagemap repo/server has to be the authorative source of information 
> for distributions that participate.
> 
> However, I see advantages in a distributed model to collect the 
> information. Gentoo developers could feed <cpe> tags into the 
> metadata.xml of the tree and do not need to sign up to commit to the 
> third-party packagemap repository. Synchronizing changed tags to the 
> packagemap repository should be easy to automate. Changes in the 
> repository could be propagated back to the tree by a designated team of 
> Gentoo developers interested in the packagemap project.
> 
> I have a feeling other distributions might also favor a model where they 
> have more control about the data without giving all their devs access 
> to one big repo.

Paul Wise of Debian also articulated interest in doing database building
at distro level, so that's one more point /for/ your feeling.

However there are a few more things to take into account,
please have a look at my reply to Paul:
http://lists.alioth.debian.org/pipermail/popcon-developers/2009-June/001759.html

Sorry for not CC'ing you, I should have though of that.

Thinking the other way around:  Is there anything we could do
to make the central place approach work and feel better for everybody?



Sebastian




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-15 17:04           ` Sebastian Pipping
@ 2009-06-15 18:24             ` Robert Buchholz
  2009-06-15 19:13               ` Sebastian Pipping
  0 siblings, 1 reply; 29+ messages in thread
From: Robert Buchholz @ 2009-06-15 18:24 UTC (permalink / raw
  To: Sebastian Pipping
  Cc: gentoo-dev, PackageKit users and developers list, Paul Wise,
	Petteri Räty

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

On Monday 15 June 2009, Sebastian Pipping wrote:
> However there are a few more things to take into account,
> please have a look at my reply to Paul:
> http://lists.alioth.debian.org/pipermail/popcon-developers/2009-June/
>001759.html
>
> Sorry for not CC'ing you, I should have though of that.
>
> Thinking the other way around:  Is there anything we could do
> to make the central place approach work and feel better for
> everybody?

The consumers of the PackageMap will always only use the central 
database. It is only the populators of the database that would be 
distributed.

I am convinced the project will be more viable if people can choose 
their level of contribution. Many developers just won't care enough to 
take the extra hassle. If you make it easy enough for them to 
contribute to the CPE mapping, i.e. update their debian/controls or 
metadata.xml, they will (or not :-). Other developers that care more 
can then extract the data and merge it at the database and do extra 
maintenance tasks such as updating the substition map.

If you make merging easy, I don't see how this hurts the project.


Robert

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-15 18:24             ` Robert Buchholz
@ 2009-06-15 19:13               ` Sebastian Pipping
  2009-06-15 20:27                 ` Petteri Räty
  2009-06-15 21:27                 ` [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" Christian Faulhammer
  0 siblings, 2 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-15 19:13 UTC (permalink / raw
  To: Robert Buchholz
  Cc: gentoo-dev, PackageKit users and developers list, Paul Wise,
	Petteri Räty

Robert Buchholz wrote:
> The consumers of the PackageMap will always only use the central 
> database.

I'm not sure about that.  I rather assume it will happen.
Especially use ignoring the substitution map.


> I am convinced the project will be more viable if people can choose 
> their level of contribution. Many developers just won't care enough to 
> take the extra hassle.

Agreed.  However, I don't see a huge difference in level of
extra hassle.  The most difficult thing is doing who's-the-vendor
research in my eyes atm which is the same at both ends.
Maybe collaborating at a central place can add some fun that
adding "some field I don't really care about" downstream cannot.

Btw on Gentoo putting it in metadata.xml might be adding to the risk
of a checksum mismatch, at least for extra edits.  No idea if
QA tools will catch 90% of that happening.


> If you make merging easy, I don't see how this hurts the project.

I don't see how easy merging compensates for the issues I brought up.


Can I have a few more voices on this?:  Would you clearly feel more
comfortable and motivated to contribute to PackageMap if it works
at your distro's source package?



Sebastian




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-15 19:13               ` Sebastian Pipping
@ 2009-06-15 20:27                 ` Petteri Räty
  2009-06-17  0:34                   ` Sebastian Pipping
  2009-06-15 21:27                 ` [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" Christian Faulhammer
  1 sibling, 1 reply; 29+ messages in thread
From: Petteri Räty @ 2009-06-15 20:27 UTC (permalink / raw
  To: Sebastian Pipping; +Cc: gentoo-dev, PackageKit users and developers list

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

Sebastian Pipping wrote:
> 
> 
> Can I have a few more voices on this?:  Would you clearly feel more
> comfortable and motivated to contribute to PackageMap if it works
> at your distro's source package?
> 

You are somewhat missing the point. My point is that most developers
probably don't want to care about what happens PackageMap upstream at
all so unless you mandate something to metadata.xml you will be relying
on others to keep the information between Portage and your service in
sync. But if it's in metadata.xml as a mandatory attribute then
developers will be automatically adding the value when they create a new
pkg.

Regards,
Petteri


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap"
  2009-06-15 19:13               ` Sebastian Pipping
  2009-06-15 20:27                 ` Petteri Räty
@ 2009-06-15 21:27                 ` Christian Faulhammer
  1 sibling, 0 replies; 29+ messages in thread
From: Christian Faulhammer @ 2009-06-15 21:27 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

Hi,

Sebastian Pipping <webmaster@hartwork.org>:
> > I am convinced the project will be more viable if people can choose 
> > their level of contribution. Many developers just won't care enough
> > to take the extra hassle.
> 
> Agreed.  However, I don't see a huge difference in level of
> extra hassle.  The most difficult thing is doing who's-the-vendor
> research in my eyes atm which is the same at both ends.
> Maybe collaborating at a central place can add some fun that
> adding "some field I don't really care about" downstream cannot.

 I agree with Petteri here, adding the cpe information into our
metadata.xml makes forgetting entry submissions to PackageMap really
easy for everyone.  Thus automatic extraction of information from your
side from metadata.xml files should be one option to gather the
information.

V-Li

-- 
Christian Faulhammer, Gentoo Lisp project
<URL:http://www.gentoo.org/proj/en/lisp/>, #gentoo-lisp on FreeNode

<URL:http://gentoo.faulhammer.org/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-15 20:27                 ` Petteri Räty
@ 2009-06-17  0:34                   ` Sebastian Pipping
  2009-06-17  9:37                     ` Marijn Schouten (hkBst)
  2009-06-20 13:16                     ` Petteri Räty
  0 siblings, 2 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-17  0:34 UTC (permalink / raw
  To: Petteri Räty
  Cc: gentoo-dev, PackageKit users and developers list, Paul Wise,
	Robert Buchholz, Christian Faulhammer

I start to understand the real benefits of moving a larger
part of the maintenance down to the distro level as you proposed.

Okay, let's add support for CPEs at distro package level
and sync up and down with the central packagemap database.
Please contact me for collaboration on sync scripts
and "modeling" of details.



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-17  0:34                   ` Sebastian Pipping
@ 2009-06-17  9:37                     ` Marijn Schouten (hkBst)
  2009-06-18  0:09                       ` Sebastian Pipping
  2009-06-20 13:16                     ` Petteri Räty
  1 sibling, 1 reply; 29+ messages in thread
From: Marijn Schouten (hkBst) @ 2009-06-17  9:37 UTC (permalink / raw
  To: gentoo-dev
  Cc: Petteri Räty, PackageKit users and developers list,
	Paul Wise, Robert Buchholz, Christian Faulhammer

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sebastian Pipping wrote:
> I start to understand the real benefits of moving a larger
> part of the maintenance down to the distro level as you proposed.
> 
> Okay, let's add support for CPEs at distro package level
> and sync up and down with the central packagemap database.
> Please contact me for collaboration on sync scripts
> and "modeling" of details.

Do we not already have enough information available to automatically determine
derived unique identifiers like CPE?

We have the package homepage and the package name (and the package category) and
the combination should be enough information to do direct comparisons to data
gathered from other repos (assuming they also contain such data).

For example you can determine automatically that gentoo:dev-scheme/gambit and
debian:gambc are the same package because although their names differ they have
the same homepage and share a category.

To create the database, every time you see a package you get its metadata from
its home repo. Use those values to compare to existing CPEs. If it is not yet in
the database create a new entry (CPE) for it with all the metadata like
homepage, categories, other-stuff-that-is-useful that is available. Every time
you get a match you may want to improve the metadata of the CPE with the
metadata of the newly added match. The very least you want to do is record the
addition of the new match. For example if you just automatically determined that
debian:gambc matches the CPE you already have for gentoo:dev-scheme/gambit then
you add "debian:gambc" to the list of matches.

This should get you 99,99% of all packages. You can arrange to be able to
provide hints to the system for cases where it isn't able to do the correct
derivation automatically. This can be done by adding this information to an
empty CPE-database. For example if the system wouldn't be able to match
gentoo:dev-scheme/gambit and debian:gambc, then you can create a CPE entry that
contains both in its matchlist. The first thing that your program should then do
to populate the database is automatically fill out the rest of that CPE by
querying the gentoo and debian repos.

Users will be able to use names from any repo that they please in interactions
with packagekit's package manager (wrapper). For example they could do:
packagekit install debian:gambc. This is a lot more intuitive than using CPEs
directly (I don't know if this is what is intended).

There does not seem to be a need to do any manual conversion, enlist help from a
lot of distro packagers or add CPE to our metadata.

Is this the way you are also intending this to work? If not, why?

Marijn

- --
If you cannot read my mind, then listen to what I say.

Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML
<http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAko4uUkACgkQp/VmCx0OL2ySvwCfQHwn2R/yC9EHx8KFjOE0B3f9
CCwAnRXqFX8q0Kt3MlMS9e63PC0LaiV+
=Y0gZ
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap"
  2009-06-12  9:54   ` [gentoo-dev] Re: [packagekit] " Sebastian Pipping
@ 2009-06-17 12:08     ` Tiziano Müller
  0 siblings, 0 replies; 29+ messages in thread
From: Tiziano Müller @ 2009-06-17 12:08 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1: Type: text/plain, Size: 2168 bytes --]

Am Freitag, den 12.06.2009, 11:54 +0200 schrieb Sebastian Pipping:
> Richard Hughes wrote:
> > I'm slightly worried about it being called a service. Is it going to
> > be a new process that just does the mapping or is this a bad choice of
> > words? If it is a new process then I'm not sure such a thing will
> > catch on.
> 
> I'm not yet sure about how a mapper will keep it's data
> fresh as the use of it is dependent on that.
> Ignore my "service" for now.
> 
> 
> > I'm also worried that a package manager has to read in and parse
> > thousands of small files.
> 
> While you mention "package manager" - with the current concept
> the data will not be precise enough for use with a package manager.
> 
> 
> > Why did you decide to write each project as
> > a single xml file?
> 
>  - The other 99% of the database stay valid XML if a single
>    file is invalid
> 
>  - To better fit the version controlled environment
> 
> 
> > Parsing and reading 10,000 files (in multiple directories) might take
> > a few seconds, and would have to be copied into memory (few Mb) to
> > query quickly.
> 
> Correct.
> 
> 
> > Which has to be invalidated if any of the files or
> > directories change. Why didn't you just put them in a sqlite database
> > that can be queried in a few ms, without dragging in an xml parser?
> > Also 10,000 files take up way more space (and takes longer to install
> > and update) than a single database file.
> 
> I like your idea about sqlite.  Maybe keeping the data to edit XML
> and query and sqlite export snapshot is something to try.
Why not use a XML database like dbxml?
Maybe you could just specify the XML files as storage and then dbxml
would do the rest.


> 
> 
> > XML might be
> > useful for storing the data, but not for querying.
> 
> Good point.

Using XPath and XQuery you can do queries on XML as well.

Cheers,
Tiziano

-- 
Tiziano Müller
Gentoo Linux Developer, Council Member
Areas of responsibility:
  Samba, PostgreSQL, CPP, Python, sysadmin, GLEP Editor
E-Mail   : dev-zero@gentoo.org
GnuPG FP : F327 283A E769 2E36 18D5  4DE2 1B05 6A63 AE9C 1E30

[-- Attachment #1.2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 205 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3551 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-17  9:37                     ` Marijn Schouten (hkBst)
@ 2009-06-18  0:09                       ` Sebastian Pipping
  2009-06-18  9:07                         ` Marijn Schouten (hkBst)
       [not found]                         ` <1245295820.11471.223.camel@chianamo.mine.nu>
  0 siblings, 2 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-18  0:09 UTC (permalink / raw
  To: PackageKit users and developers list
  Cc: gentoo-dev, Paul Wise, Christian Faulhammer, Petteri Räty,
	Robert Buchholz

Marijn Schouten (hkBst) wrote:
> Sebastian Pipping wrote:
>> I start to understand the real benefits of moving a larger
>> part of the maintenance down to the distro level as you proposed.
> 
>> Okay, let's add support for CPEs at distro package level
>> and sync up and down with the central packagemap database.
>> Please contact me for collaboration on sync scripts
>> and "modeling" of details.
> 
> Do we not already have enough information available to automatically determine
> derived unique identifiers like CPE?
> 
> We have the package homepage and the package name (and the package category) and
> the combination should be enough information to do direct comparisons to data
> gathered from other repos (assuming they also contain such data).

You are asking a valid question.  The homepage links can be a great
helper in mapping and they have been of help already for the mapping
of the first 1000 Gentoo packages in packagemap.

However it might not be as easy you make it sound, as there are
a few things that complicate things and produce extra work:

 - In many cases a project can be reached from several URLs.
   For a project on SF.net you might have
   - http://sf.net/projects/${name}
   - http://${name}.sf.net/
   - http://www.${name}.org/
   That case can be handled rather easily but there are many more
   special cases and a manual map may be required for stuff that's
   not hosted on a larger hosting site.

 - Split packages (think Git or Qt) may all have the same homepage.
   In Debian the source package might help there, in Gentoo you'd
   have to do common prefix detection or so, that's special
   cases again, and continuous review that it still does what you need.


> For example you can determine automatically that gentoo:dev-scheme/gambit and
> debian:gambc are the same package because although their names differ they have
> the same homepage and share a category.

To detect equal categories you need a map for categories for all
participating distros.  Yes, it's smaller than mapping all packages
but it involves a manual map and keeping it in sync.

Another word on homepage collisions:  A few days before I wrote
a script that builds a map from homepages to packagenames for the
whole Gentoo tree (code/gentoo/gentoo-world-to-homepage-map.sh).
The generated table from my run was 12330 lines long, each line for
a different package.

If you run an analysis over that table you see that many
homepages appear many more times than just once.
Here's the top ten:

     68 http://www.gnome.org/
     67 http://www.gentoo.org/
     58 http://www.gentoo.org/proj/en/perl/
     42 http://lingucomponent.openoffice.org/
     26 http://www.kde.org/
     25 http://www.gentoo.org
     20 http://sourceforge.net/projects/synce/
     19 http://www.trolltech.com/
     19 http://search.cpan.org/~rjbs/
     18 http://opensuse.foehr-it.de/

The command I used is

  $ sed 's|  *.*$||' homepage-to-package.txt \
    | sort | uniq -c | sort -n -r | head -n 10

I think this three cases alone show that it would be
- also a lot of work
- be many special cases
- still require manual mappings here and there

Another disadvantage is the current static XML approach of
packagemap is language independent.  We can easily build
tools for packagemap in any language that has an XML parser.
If the data actually is the code we suddenly have to keep
code from different languages in precise special case sync.

I'm not sure if the approach you describe is less work in total.
I guess to find out we'd have to do both in parallel :-)

It could be interesting how much the list of homepages
in say Debian packages and Gentoo packages overlap.



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-18  0:09                       ` Sebastian Pipping
@ 2009-06-18  9:07                         ` Marijn Schouten (hkBst)
  2009-06-19 18:53                           ` Sebastian Pipping
       [not found]                         ` <1245295820.11471.223.camel@chianamo.mine.nu>
  1 sibling, 1 reply; 29+ messages in thread
From: Marijn Schouten (hkBst) @ 2009-06-18  9:07 UTC (permalink / raw
  To: gentoo-dev
  Cc: PackageKit users and developers list, Paul Wise,
	Christian Faulhammer, Petteri Räty, Robert Buchholz

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sebastian Pipping wrote:
> Marijn Schouten (hkBst) wrote:
>> Sebastian Pipping wrote:
>>> I start to understand the real benefits of moving a larger
>>> part of the maintenance down to the distro level as you proposed.
>>> Okay, let's add support for CPEs at distro package level
>>> and sync up and down with the central packagemap database.
>>> Please contact me for collaboration on sync scripts
>>> and "modeling" of details.
>> Do we not already have enough information available to automatically determine
>> derived unique identifiers like CPE?
>>
>> We have the package homepage and the package name (and the package category) and
>> the combination should be enough information to do direct comparisons to data
>> gathered from other repos (assuming they also contain such data).
> 
> You are asking a valid question.  The homepage links can be a great
> helper in mapping and they have been of help already for the mapping
> of the first 1000 Gentoo packages in packagemap.
> 
> However it might not be as easy you make it sound, as there are
> a few things that complicate things and produce extra work:
> 
>  - In many cases a project can be reached from several URLs.
>    For a project on SF.net you might have
>    - http://sf.net/projects/${name}
>    - http://${name}.sf.net/
>    - http://www.${name}.org/
>    That case can be handled rather easily but there are many more
>    special cases and a manual map may be required for stuff that's
>    not hosted on a larger hosting site.

But homepage is just ONE of the things that help you to identify a package. Some
packages that are the same will have different homepages and some packages which
are different will have the same homepage. If you take just homepage, package
name into account and the fact that packages from the same repo are different,
you can probably match over 95% of all packages correctly.

>  - Split packages (think Git or Qt) may all have the same homepage.
>    In Debian the source package might help there, in Gentoo you'd
>    have to do common prefix detection or so, that's special
>    cases again, and continuous review that it still does what you need.

Neither of the gits gentoo has seems very split, so I'll only address qt. Gentoo
has qt-core and qt-svg (and many more). I would say that they would each have to
get a different CPE and that none of them is equivalent to a package in another
or the same distro that has all of qt combined. Packages that get manually split
are a minority AFAIK, though texlive is another big one that comes to mind.
Debian does splitting into ``normal'' and ``devel'' packages. Has it been
decided what to do with those?
Now that you got me thinking about split packages, I realize that the exact
files installed by a package are also all by themselves a way to get over 95%
correct matching. For distros (like Gentoo) that have packages that have flags
that influence the list of installed files you must decide whether to add them
to the database last, or whether you will try to use an imprecise file list.

>> For example you can determine automatically that gentoo:dev-scheme/gambit and
>> debian:gambc are the same package because although their names differ they have
>> the same homepage and share a category.
> 
> To detect equal categories you need a map for categories for all
> participating distros.  Yes, it's smaller than mapping all packages
> but it involves a manual map and keeping it in sync.

No, there need not be a manual mapping. There is no reason to do true/false
comparisons. All we need is a distance function, like for example Levenshtein
distance (http://en.wikipedia.org/wiki/Levenshtein_distance). Actually on second
thought Levenshtein distance is probably not what we want, since we would be
more interested in how much strings have in common than in how much they differ.
I think the idea is clear though.

> Another word on homepage collisions:  A few days before I wrote
> a script that builds a map from homepages to packagenames for the
> whole Gentoo tree (code/gentoo/gentoo-world-to-homepage-map.sh).
> The generated table from my run was 12330 lines long, each line for
> a different package.
> 
> If you run an analysis over that table you see that many
> homepages appear many more times than just once.
> Here's the top ten:
> 
>      68 http://www.gnome.org/
>      67 http://www.gentoo.org/
>      58 http://www.gentoo.org/proj/en/perl/
>      42 http://lingucomponent.openoffice.org/
>      26 http://www.kde.org/
>      25 http://www.gentoo.org
>      20 http://sourceforge.net/projects/synce/
>      19 http://www.trolltech.com/
>      19 http://search.cpan.org/~rjbs/
>      18 http://opensuse.foehr-it.de/

texlive with (http://www.tug.org/texlive/) seems to be missing from this list.

$ eix -H http://www.tug.org/texlive/ | tail -n 1
Found 79 matches.

I suspect you used grep (or whatever) to construct your data, instead of using
the package manager or a tool that knows how to extract the data available in
packages (and eclasses).

> The command I used is
> 
>   $ sed 's|  *.*$||' homepage-to-package.txt \
>     | sort | uniq -c | sort -n -r | head -n 10
> 
> I think this three cases alone show that it would be

I'm not sure which 3 cases you mean.

> - also a lot of work
> - be many special cases
> - still require manual mappings here and there
> 
> Another disadvantage is the current static XML approach of
> packagemap is language independent.  We can easily build
> tools for packagemap in any language that has an XML parser.

I agree that XML is a disadvantage, but not that it is language independent. ;P

> If the data actually is the code we suddenly have to keep
> code from different languages in precise special case sync.

I did not argue for a data format nor for a specific language nor coding style
nor anything that seems to match what you are saying here; I only spoke about
how to populate the CPE database.

> I'm not sure if the approach you describe is less work in total.
> I guess to find out we'd have to do both in parallel :-)
> 
> It could be interesting how much the list of homepages
> in say Debian packages and Gentoo packages overlap.

It would certainly be interesting.

Marijn

- --
If you cannot read my mind, then listen to what I say.

Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML
<http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAko6A7QACgkQp/VmCx0OL2wl/wCgpSNzob7skilge+56ynbmawHY
/1EAoJnOOG2Bix0IpWqySP063AJIWDta
=L9t+
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
       [not found]                         ` <1245295820.11471.223.camel@chianamo.mine.nu>
@ 2009-06-18 22:33                           ` Sebastian Pipping
       [not found]                             ` <1245382383.14805.281.camel@chianamo.mine.nu>
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-18 22:33 UTC (permalink / raw
  To: Paul Wise
  Cc: PackageKit users and developers list, gentoo-dev,
	Christian Faulhammer, Petteri Räty, Robert Buchholz

Paul Wise wrote:
> On Thu, 2009-06-18 at 02:09 +0200, Sebastian Pipping wrote:
> 
>> It could be interesting how much the list of homepages
>> in say Debian packages and Gentoo packages overlap.
> 
> Debian sid amd64 binary packages:
> 
> $ grep -h ^Homepage /var/lib/apt/lists/mirror.internode.on.net_debian_dists_unstable_main_binary-amd64_Packages | sort | uniq -c | sort -n -r | head -n 10
>     154 Homepage: http://www.go-oo.org
>     149 Homepage: http://www.kde.org/
>     107 Homepage: http://i18n.kde.org/
>      97 Homepage: http://www.tug.org/texlive
>      90 Homepage: http://www.mono-project.com/
>      83 Homepage: http://xcb.freedesktop.org
>      67 Homepage: http://xmms2.xmms.se/
>      63 Homepage: http://www.kde.org
>      59 Homepage: http://www.ruby-lang.org/
>      59 Homepage: http://www.cs.wustl.edu/~schmidt/ACE.html
> 
> Debian sid source packages:
> 
> $ grep -h ^Homepage /var/lib/apt/lists/mirror.internode.on.net_debian_dists_unstable_main_source_Sources | sort | uniq -c | sort -n -r | head -n 10     23 Homepage: http://www.tryton.org/
>      23 Homepage: http://www.Rmetrics.org
>      19 Homepage: http://www.xfce.org/
>      19 Homepage: http://www.apertium.org
>      19 Homepage: http://goodies.xfce.org/
>      16 Homepage: http://www.schoolsplay.org/
>      16 Homepage: http://savonet.sourceforge.net/
>      16 Homepage: http://gtk2-perl.sourceforge.net/
>      14 Homepage: http://www.ggzgamingzone.org/
>      13 Homepage: http://www.kde.org/

Can you share the list files and the scripts you used
to generate them with me?

I'd like to determine the subset of URLs that appear
exactly once in both gentoo and debian source packages.



Sebastian




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
       [not found]                             ` <1245382383.14805.281.camel@chianamo.mine.nu>
@ 2009-06-19 17:36                               ` Sebastian Pipping
  2009-06-19 21:47                                 ` Sebastian Pipping
  0 siblings, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-19 17:36 UTC (permalink / raw
  To: Paul Wise
  Cc: PackageKit users and developers list, gentoo-dev,
	Christian Faulhammer, Petteri Räty, Robert Buchholz

Paul Wise wrote:
> The scripts were in my mail and the files are on every Debian mirror:
> 
> wget -O - http://ftp.debian.org/debian/dists/unstable/main/binary-amd64/Packages | grep -h ^Homepage | sort | uniq -c | sort -n -r | head -n 10
> wget -O - http://ftp.debian.org/debian/dists/unstable/main/source/Sources | grep -h ^Homepage | sort | uniq -c | sort -n -r | head -n 10

I see, thanks.


I wrote:
> I'd like to determine the subset of URLs that appear
> exactly once in both gentoo and debian source packages.

I made a script for this job now.  With zero normalization
I get this result:

  Mappable homepages in Debian: 6222
  Mappable homepages in Gentoo: 9582
  Shared (without normalization): 1183

That's about 11% of the Gentoo tree.

The script is up here:
http://git.goodpoint.de/?p=packagemap.git;a=tree;f=code/debian



Sebastian




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-18  9:07                         ` Marijn Schouten (hkBst)
@ 2009-06-19 18:53                           ` Sebastian Pipping
  0 siblings, 0 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-19 18:53 UTC (permalink / raw
  To: PackageKit users and developers list
  Cc: gentoo-dev, Petteri Räty, Christian Faulhammer, Paul Wise,
	Robert Buchholz

Marijn Schouten (hkBst) wrote:
> Neither of the gits gentoo has seems very split,

I was referring to git in Debian here:

  Package: git-core
  Binary: git-core, git-doc, git-arch, git-cvs, git-svn,
    git-email, git-daemon-run, git-gui, gitk, gitweb


> texlive with (http://www.tug.org/texlive/) seems to be missing from this list.
> 
> $ eix -H http://www.tug.org/texlive/ | tail -n 1
> Found 79 matches.
> 
> I suspect you used grep (or whatever) to construct your data, instead of using
> the package manager or a tool that knows how to extract the data available in
> packages (and eclasses).

True, grep and friends.


> I'm not sure which 3 cases you mean.

I was referring to what I said before, in summary:
1) non-unique homepages
2) extra work for split packages
3) extra work for category mapping


> I did not argue for a data format nor for a specific language nor coding style
> nor anything that seems to match what you are saying here; I only spoke about
> how to populate the CPE database.

I understood you wanted to replace the XML colection with mapping code.
I got you wrong then.  I agree that combining automated fill of the
database with manual can speed things up a lot.



Sebastian




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-19 17:36                               ` Sebastian Pipping
@ 2009-06-19 21:47                                 ` Sebastian Pipping
  0 siblings, 0 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-19 21:47 UTC (permalink / raw
  To: PackageKit users and developers list
  Cc: Paul Wise, Robert Buchholz, Christian Faulhammer,
	Petteri Räty, gentoo-dev

Sebastian Pipping wrote:
>> I'd like to determine the subset of URLs that appear
>> exactly once in both gentoo and debian source packages.
> 
>   Mappable homepages in Debian: 6222
>   Mappable homepages in Gentoo: 9582
>   Shared (without normalization): 1183

With normalization for

  SourceForge, Google Code, Alioth, Savannah,
  Berlios, RobyForge, Gna, Pypi

the number of directly mappable packages increases
by about 500:

  Mappable homepages in Debian: 6222
  Mappable homepages in Gentoo: 9582
  Shared (w/o normalization): 1183
  Shared (w/  normalization): 1670



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-17  0:34                   ` Sebastian Pipping
  2009-06-17  9:37                     ` Marijn Schouten (hkBst)
@ 2009-06-20 13:16                     ` Petteri Räty
  2009-06-20 17:28                       ` Sebastian Pipping
  2009-07-14 16:49                       ` Sebastian Pipping
  1 sibling, 2 replies; 29+ messages in thread
From: Petteri Räty @ 2009-06-20 13:16 UTC (permalink / raw
  To: Sebastian Pipping; +Cc: gentoo-dev, PackageKit users and developers list

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]

Sebastian Pipping wrote:
> I start to understand the real benefits of moving a larger
> part of the maintenance down to the distro level as you proposed.
> 
> Okay, let's add support for CPEs at distro package level
> and sync up and down with the central packagemap database.
> Please contact me for collaboration on sync scripts
> and "modeling" of details.
> 
> 
> 
> Sebastian

You need to come up with the needed DTD changes for metadata.xml. Last
time the schema was changed it was done with a GLEP so writing one seems
prudent here too especially if we are going to make the value mandatory
after it was been added to all existing packages. Also documentation
(devmanual, developer handbook come to mind at least) concerning
metadata.xml needs to be updated to document the new stuff.

Regards,
Petteri


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [packagekit] [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-20 13:16                     ` Petteri Räty
@ 2009-06-20 17:28                       ` Sebastian Pipping
  2009-07-14 16:49                       ` Sebastian Pipping
  1 sibling, 0 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-06-20 17:28 UTC (permalink / raw
  To: Petteri Räty; +Cc: gentoo-dev, PackageKit users and developers list

Petteri Räty wrote:
> You need to come up with the needed DTD changes for metadata.xml. Last
> time the schema was changed it was done with a GLEP so writing one seems
> prudent here too especially if we are going to make the value mandatory
> after it was been added to all existing packages. Also documentation
> (devmanual, developer handbook come to mind at least) concerning
> metadata.xml needs to be updated to document the new stuff.

Makes sense, added to my todo list.




Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [gentoo-dev]  Re: Re: Inviting you to project "PackageMap"
  2009-06-13  3:55     ` Sebastian Pipping
@ 2009-07-11 21:38       ` Steven J Long
  0 siblings, 0 replies; 29+ messages in thread
From: Steven J Long @ 2009-07-11 21:38 UTC (permalink / raw
  To: gentoo-dev; +Cc: packagekit

Sorry for delay in answering this one, been up to here with RL, and I didn't
have access to the debian and BSD bookmarks.

Sebastian Pipping wrote:

> Steven J Long wrote:
>> You might as well use Gentoo's version specification for your internal
>> format, as it's the most comprehensive. The most you need to add is
>> debian epochs.
> 
> I'm not sure what you are referring to.
> Please share more details or pointers.
>
There's two aspects: the category grouping, and the version specifier.

Most distros have a flat namespace, which gets kinda hairy. One level of
grouping above that makes a BIG difference.

If you take a look here:
http://sources.gentoo.org/viewcvs.py/portage/main/trunk/pym/portage/versions.py?view=markup
you can see the version RE; note it can handle arbitrary levels of patch
(pre etc.) This is what we use in update[1] to split an arbitrary CPV:
CPV='^(.*-.*|virtual)/(.*)-([0-9]+)((\.[0-9]+)*)([a-z]?)((_(pre|p|beta
alpha|rc)[0-9]*)*)(-r([0-9.]+))?$'

(That's all one line.) If it doesn't match that, the CPV is not valid, and
with the one RE match, we have all the constituent parts, though we don't
often compare versions and handle that separately.

We've removed cvs. prefix since it's supposed to be deprecated, and added
the inter-rev bit for prefix portage. cvs. is still kept in our version
comparison code as it's a pita to rewrite, and we're waiting for some sort
of outcome to handling vcs builds; simply swapping two letters is clearly
optimal.

The "official" reasons given for deprecating cvs. were
1) no-one knew about it.
2) those who did apparently found it "hard to deal with" though one could be
forgiven for thinking that's due to novelty; -rNN is supposedly "hard" too.
3) it's vendor-specific.

We removed it for getCPV() as we call that function a *lot*; much more often
than verCompare() and every little bit helps. Adding it back is a doddle.
A prefix here is more efficient than yet another suffix.

I've yet to see an upstream version that can't be handled adequately by the
Gentoo versioning scheme. Granted you get the odd insane upstream; in those
kind of cases, I'd rather go by date (which can ofc be handled easily) or
svn id. A distro really should not have any need to push a version more
than once a day, and the distro release bit at the end allows one to deal
with the occasional mistake. Ultimately it's down to the maintainer, ofc,
but I'd question the stability of a project that releases more than once per
day; it does occasionally happen with bugfixes, usually with some sort of
patch revision.

WRT debian epochs, they're described here:
http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version

You should also take a look at how *BSD handle eg LIB_DEPENDS (the LDEPEND
variable I've occasionally suggested in the past):
http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-depend.html

I personally am not too fussed about the other *_DEPENDS mentioned therein;
you might want to add them for completeness.

> 
>> XML was never meant for data-storage for such record-sets: it was
>> designed for data *interchange* [..]
> 
> Interesting point.  What would you use as an alternative that
> works well with a version control system?
>
Ah didn't realise you wanted to keep this all in a vcs. Personally I'd use
a format similar to the metadata section part of an ebuild, so simple
shell assignments. I wouldn't worry about whether strings are single or
double-quoted (or even if they're quoted at all, myself); just ban all
backslashes, and have some sort of script that runs to verify pre-commit.

There's no need to start going off into some nutty anxiety attack about "oh
but sh allows X and BASH allows Y" imo; just verify that the data lexes
fine: people make mistakes all the time. (It's only mildly tricky in C, so
should be easy enough however you implement.)

Having said that, XML is fine for microformats; Gentoo GuideXML is one of
the nicest applications of XML I've seen (and was a factor in us going with
Gentoo, believe it or not.) Nor is this a massive dataset. I just think most
of your users will be more comfortable.

Good luck with it, whatever format you go with.

HTH,
Steve.

[1] http://forums.gentoo.org/viewtopic-t-546828.html -- the version on there
is very much out of date (I'll get that modded soon, Naib ty for your lovely
comments;) so if any Gentoo-user does want to try it:
git clone git://weaver.gentooenterprise.com/update.git
Please be aware this is under AGPL3+ for non-commercial use only, so if that
bothers you, don't clone it. (Written with a work colleague, on work time.
He wasn't happy with CC-NCSA as he's committed to FSF for some reason, a
stance I have to say I am coming to agree with, but that's another
[off;]topic.)
-- 
#friendly-coders -- We're friendly but we're not /that/ friendly ;-)




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [gentoo-dev] Inviting you to project "PackageMap"
  2009-06-20 13:16                     ` Petteri Räty
  2009-06-20 17:28                       ` Sebastian Pipping
@ 2009-07-14 16:49                       ` Sebastian Pipping
  2009-07-20  2:03                         ` [GLEP] CPE names in metadata (was Re: [gentoo-dev] Inviting you to project "PackageMap") Sebastian Pipping
  1 sibling, 1 reply; 29+ messages in thread
From: Sebastian Pipping @ 2009-07-14 16:49 UTC (permalink / raw
  To: Petteri Räty; +Cc: gentoo-dev, Robert Buchholz

Petteri Räty wrote:
> You need to come up with the needed DTD changes for metadata.xml. Last
> time the schema was changed it was done with a GLEP so writing one seems
> prudent here

I have started
- writing a GLEP
- extending the DTD
- extending a sample metadata.xml

Related gitweb over here:
http://git.goodpoint.de/?p=metadata-xml-cpe-glep.git

Would be great to get some review.



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [GLEP] CPE names in metadata (was Re: [gentoo-dev] Inviting you to project "PackageMap")
  2009-07-14 16:49                       ` Sebastian Pipping
@ 2009-07-20  2:03                         ` Sebastian Pipping
  0 siblings, 0 replies; 29+ messages in thread
From: Sebastian Pipping @ 2009-07-20  2:03 UTC (permalink / raw
  To: gentoo-dev

Sebastian Pipping wrote:
> I have started
> - writing a GLEP
> - extending the DTD
> - extending a sample metadata.xml
> 
> Related gitweb over here:
> http://git.goodpoint.de/?p=metadata-xml-cpe-glep.git

Especially as this is my first GLEP and it will affect most of you in
the long run, I depend on your feedback here.
Just added more words on CPE names and replaced the given example.

Please have a look and tear it apart :-)

Thanks in advance,



Sebastian



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-07-20  2:03 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-12  7:42 [gentoo-dev] Inviting you to project "PackageMap" Sebastian Pipping
     [not found] ` <15e53e180906120130md68cd94nba61fa5560c73eb4@mail.gmail.com>
2009-06-12  9:54   ` [gentoo-dev] Re: [packagekit] " Sebastian Pipping
2009-06-17 12:08     ` Tiziano Müller
2009-06-12 13:00   ` [gentoo-dev] " Steven J Long
2009-06-13  3:55     ` Sebastian Pipping
2009-07-11 21:38       ` [gentoo-dev] " Steven J Long
2009-06-12 18:27 ` [gentoo-dev] " Petteri Räty
2009-06-12 21:43   ` [packagekit] " Sebastian Pipping
2009-06-13 15:53     ` Petteri Räty
2009-06-13 19:03       ` Sebastian Pipping
2009-06-13 19:16         ` Petteri Räty
2009-06-15 13:52         ` Robert Buchholz
2009-06-15 17:04           ` Sebastian Pipping
2009-06-15 18:24             ` Robert Buchholz
2009-06-15 19:13               ` Sebastian Pipping
2009-06-15 20:27                 ` Petteri Räty
2009-06-17  0:34                   ` Sebastian Pipping
2009-06-17  9:37                     ` Marijn Schouten (hkBst)
2009-06-18  0:09                       ` Sebastian Pipping
2009-06-18  9:07                         ` Marijn Schouten (hkBst)
2009-06-19 18:53                           ` Sebastian Pipping
     [not found]                         ` <1245295820.11471.223.camel@chianamo.mine.nu>
2009-06-18 22:33                           ` Sebastian Pipping
     [not found]                             ` <1245382383.14805.281.camel@chianamo.mine.nu>
2009-06-19 17:36                               ` Sebastian Pipping
2009-06-19 21:47                                 ` Sebastian Pipping
2009-06-20 13:16                     ` Petteri Räty
2009-06-20 17:28                       ` Sebastian Pipping
2009-07-14 16:49                       ` Sebastian Pipping
2009-07-20  2:03                         ` [GLEP] CPE names in metadata (was Re: [gentoo-dev] Inviting you to project "PackageMap") Sebastian Pipping
2009-06-15 21:27                 ` [gentoo-dev] Re: [packagekit] Inviting you to project "PackageMap" Christian Faulhammer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox