public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml
@ 2012-08-10 11:11 Federico "fox" Scrinzi
  2012-08-10 12:03 ` Gilles Dartiguelongue
  0 siblings, 1 reply; 13+ messages in thread
From: Federico "fox" Scrinzi @ 2012-08-10 11:11 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 6437 bytes --]

Hi everybody!

euscan is available in portage as a dev package
(app-portage/euscan-9999). This tool allows to check if a given
package/ebuild has new upstream versions or not. It uses different
heuristics to scan upstream and grab new versions and related urls.

euscan can use either custom "handlers" for well known upstream (github,
pypi, cpan, sourceforge, google-code, etc..) or use directory scanning
using SRC_URI. If directory scan fails for some reason, euscan will
fallback to brute force (generating possible next version number and
trying to fetch those packages).

The problem that we're facing with euscan is that some packages in
upstream use strange version numbers or the list of available versions
is placed in a location that is totally different from SRC_URI.

Examples:
- MySQL: most MySQL mirrors are not browsable (always fallback to brute
force)
- webalizer uses strange version numbers in upstream
(ftp://ftp.mrunix.net/pub/webalizer/), in this case euscan should be
aware that 2.21-02 is the version number in upstream and scan the ftp
directory searching for webalizer-(\d+).(\d+)-(\d+).tar.gz. The last
version of webalizer, 2.23.05, is not recognized by euscan and is not
available in gentoo.
- Authen-SASL-Cyrus in upstream uses “-server” in version numbers
http://www.cpan.org/authors/id/P/PB/PBOETTCH/
- XML-Tidy that uses stranges letters in version number


We thought about how to solve this issue and we agreed that the best way
to handle the problem for every specific case was adding some more
information in metadata.xml.

In Debian, uscan uses information from debian/watch inside debian
packages, hence as so much work is already done we thought about taking
this info from watch files and save it in metadata.xml to make euscan
use it.

I wrote a simple script that patches metadata.xml adding an experimental
<watch> tag with data from debian packages:
https://github.com/volpino/euscan/blob/master/bin/euscan_patch_metadata

A basic watch data contains a base url to scan and a pattern to search
into it:
Example:
 base: http://icedtea.classpath.org/download/source/
 pattern: icedtea-([\d\.]+).tar.gz
Which means "open that url and search for the links that match that
pattern".
This is useful for example when is not possible to retrieve the base url
from SRC_URI (icedtea’s SRC_URI is
http://icedtea.classpath.org/hg/release/icedtea7-forest-2.2/hotspot/archive/889dffcf4a54.tar.gz)

Advanced usage with directory pattern:
Example:
 base: http://ftp.gwdg.de/pub/misc/mysql/Downloads/MySQL-([\d\.]+)
 pattern: mysql-([\d\.]+).tar.gz
Scans all directories that match the query looking for links that match
the pattern

We need also some options for mangling versions and download url: these
options can contain regexps or names of mangling rules (e.g.: "cpan"
means apply mangling rules for CPAN versions)

Version mangling example:
As mentioned above webalizer uses both dots and hyphens in version
numbers, so an option like this is required versionmangle=”s/-/./”

Download url mangling example:
Page scan on berlios returns an url like this:
http://prdownload.berlios.de/mirageiv/mirage-0.9.tar.gz that should be
mangled to get a working download url with an option like
downloadurlmangle=”s/prdownload/download/”

(for more info see uscan manpage)

Another example: dev-perl/Math-BaseCnv or XML-Tidy  in upstream use
strange version numbers like 1.8.B59BrZ that should be mangled to 1.8

Summarizing we need:
- A base url and a file pattern to search for new upstream versions when
SRC_URI is not suitable
- some options for mangling retrieved data from the scan of upstream
using base url and pattern or using remote-id information

So our problem is: how can we store this data in a very flexible and
efficient way?
Proposed solutions:

1) Add an euscan tag with a custom namespace
Example:
<euscan xmlns="http://euscan.iksaif.net">
 <transformation>
   <regexp><from>a</from><to>b</to></regexp>
   <cpan-mangle/>
   <gentoo-mangle/>
 </transformation>
</euscan>
Which means: apply regex s/a/b/ then apply cpan mangling rules and then
gentoo mangling rules.

2) Change quite heavily the remote-id tag:
   -  adding versionmanging and downloadmangling options that contain
regexes
   -  adding a new remote-id type called for example url, that tag will
contain the base url and the pattern

3) Add a watch tag to <upstream> with versionmangling and
downloadmangling options. This tag can have a type (and in that case the
data from remote-id is used) or can contain the base url and the file
pattern. (this is what is currently implemented for our tests).


So before going further, we would like some feedback from you on these
approaches.
What do you think about them? Which do you prefer? Do you think there’s
a better approach or some steps can be changed in a more efficient way?



Other examples:

dev-perl/XML-Tidy: # We have to strip trailing letters in version and
then apply cpan mangling rules
<upstream>
  <remote-id type="cpan">XML-Tidy</remote-id>
  <remote-id type="cpan-module">XML::Tidy</remote-id>
  <watch type="cpan" versionmangle="s/(\d+)((\.\d+)*).*/$1$2/;cpan">
  </watch>
</upstream>

sys-fs/dfc:  # Download hosting sux and have download id in url
<upstream>
  <watch version="3">
    http://projects.gw-computing.net/projects/dfc/files
    /attachments/download/[0-9]+/dfc-(.*)\.tar\.gz
  </watch>
</upstream>

sys-dev/gcc:  # Tons of files in SRC_URI, let’s be more efficient

media-plugins/vdr-cpumon  # 0.0.6a == 0.0.6_p1 so should need version
mangling

app-admin/webalizer:
<upstream>
  <watch version="3" versionmangle="s/-/./">
    http://www.mrunix.net/webalizer/download.html
    webalizer-(.*)-src\.tgz
  </watch>
</upstream>

kde-base/okular:
<upstream>
 <watch
version="3">ftp://ftp.kde.org/pub/kde/stable/([\d\.]*)/src/okular-([\d\.]*).tar.xz</watch>
 <watch
version="3">ftp://ftp.kde.org/pub/kde/stable/((?:\d\.)+\d)/src/okular-((?:\d\.)+\d).tar.xz</watch>
</upstream>

sci-geosciences/grass:
<upstream>
 <watch
version="3">http://grass.osgeo.org/grass64/source/grass-([\d\.]*(?:RC\d){0,1}).tar.gz</watch>
</upstream>

-- 
f.

  "Always code as if the guy who ends up maintaining your code will be a
   violent psychopath who knows where you live."
  (Martin Golding)



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml
  2012-08-10 11:11 [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml Federico "fox" Scrinzi
@ 2012-08-10 12:03 ` Gilles Dartiguelongue
  2012-08-10 14:21   ` [gentoo-dev] SRC_URI " Jeroen Roovers
  2012-08-10 20:04   ` [gentoo-dev] [RFC] euscan: Need to add more upstream info " Corentin Chary
  0 siblings, 2 replies; 13+ messages in thread
From: Gilles Dartiguelongue @ 2012-08-10 12:03 UTC (permalink / raw
  To: gentoo-dev

Having done some debian packaging for work, I find watch files from
debian really helpful. Changing the format to a XML compatible one does
not seem like a hard work so I'll probably leave that up for others to
discuss.

Since you are proposing this, a side question is:
Why should we write SRC_URI in ebuilds if that info is now available in
metadata.xml ? (granted that we might still want to keep over-riding
this information in ebuilds)

-- 
Gilles Dartiguelongue <eva@gentoo.org>
Gentoo



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-10 12:03 ` Gilles Dartiguelongue
@ 2012-08-10 14:21   ` Jeroen Roovers
  2012-08-10 14:24     ` Diego Elio Pettenò
                       ` (2 more replies)
  2012-08-10 20:04   ` [gentoo-dev] [RFC] euscan: Need to add more upstream info " Corentin Chary
  1 sibling, 3 replies; 13+ messages in thread
From: Jeroen Roovers @ 2012-08-10 14:21 UTC (permalink / raw
  To: gentoo-dev

On Fri, 10 Aug 2012 14:03:23 +0200
Gilles Dartiguelongue <eva@gentoo.org> wrote:

> Since you are proposing this, a side question is:
> Why should we write SRC_URI in ebuilds if that info is now available
> in metadata.xml ? (granted that we might still want to keep
> over-riding this information in ebuilds)

1) The information in metadata.xml is inaccurate, it's a hint. When it
   fails, nothing of value is lost since the ebuild (supposedly) has
   what you want.
2) SRC_URI is precise.
3) SRC_URI can change over time, and across versions (even with all the
   variables in place).
4) Backward compatibility.
5) The inversion of your question: Why should we start handling SRC_URI
   outside ebuilds and eclasses? Or, how would that be practical,
   advantageous, an improvement on the current situation.


     jer


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-10 14:21   ` [gentoo-dev] SRC_URI " Jeroen Roovers
@ 2012-08-10 14:24     ` Diego Elio Pettenò
  2012-08-10 20:05     ` Corentin Chary
  2012-08-13 10:51     ` Gilles Dartiguelongue
  2 siblings, 0 replies; 13+ messages in thread
From: Diego Elio Pettenò @ 2012-08-10 14:24 UTC (permalink / raw
  To: gentoo-dev

On 10/08/2012 07:21, Jeroen Roovers wrote:
> 3) SRC_URI can change over time, and across versions (even with all the
>    variables in place).

I agree with Jeroen here — in particular see things that come from
alioth such as sys-apps/pcsc-lite and app-crypt/ccid: the SRC_URI
actually has to change for each ebuild because there is one extra number
that is used...

-- 
Diego Elio Pettenò — Flameeyes
flameeyes@flameeyes.eu — http://blog.flameeyes.eu/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml
  2012-08-10 12:03 ` Gilles Dartiguelongue
  2012-08-10 14:21   ` [gentoo-dev] SRC_URI " Jeroen Roovers
@ 2012-08-10 20:04   ` Corentin Chary
  1 sibling, 0 replies; 13+ messages in thread
From: Corentin Chary @ 2012-08-10 20:04 UTC (permalink / raw
  To: gentoo-dev

On Fri, Aug 10, 2012 at 2:03 PM, Gilles Dartiguelongue <eva@gentoo.org> wrote:
> Having done some debian packaging for work, I find watch files from
> debian really helpful. Changing the format to a XML compatible one does
> not seem like a hard work so I'll probably leave that up for others to
> discuss.
>
> Since you are proposing this, a side question is:
> Why should we write SRC_URI in ebuilds if that info is now available in
> metadata.xml ? (granted that we might still want to keep over-riding
> this information in ebuilds)

It's not (only) SRC_URI, sometime it's completly different, sometimes
<watch> would contain only versionmangle since SRC_URI contains
enought informations for euscan... SRC_URI serves a totally different
purpose :).

-- 
Corentin Chary
http://xf.iksaif.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-10 14:21   ` [gentoo-dev] SRC_URI " Jeroen Roovers
  2012-08-10 14:24     ` Diego Elio Pettenò
@ 2012-08-10 20:05     ` Corentin Chary
  2012-08-10 20:12       ` Diego Elio Pettenò
  2012-08-13 10:51     ` Gilles Dartiguelongue
  2 siblings, 1 reply; 13+ messages in thread
From: Corentin Chary @ 2012-08-10 20:05 UTC (permalink / raw
  To: gentoo-dev

On Fri, Aug 10, 2012 at 4:21 PM, Jeroen Roovers <jer@gentoo.org> wrote:
> On Fri, 10 Aug 2012 14:03:23 +0200
> Gilles Dartiguelongue <eva@gentoo.org> wrote:
>
>> Since you are proposing this, a side question is:
>> Why should we write SRC_URI in ebuilds if that info is now available
>> in metadata.xml ? (granted that we might still want to keep
>> over-riding this information in ebuilds)
>
> 1) The information in metadata.xml is inaccurate, it's a hint. When it
>    fails, nothing of value is lost since the ebuild (supposedly) has
>    what you want.
> 2) SRC_URI is precise.
> 3) SRC_URI can change over time, and across versions (even with all the
>    variables in place).
> 4) Backward compatibility.
> 5) The inversion of your question: Why should we start handling SRC_URI
>    outside ebuilds and eclasses? Or, how would that be practical,
>    advantageous, an improvement on the current situation.

Right, our proposal is not here to replace SRC_URI, it's here to fix
the cases where SRC_URI can't be sanely used to guess new upstream
versions (strange mangling rules, unbrowsable directories, etc...).

-- 
Corentin Chary
http://xf.iksaif.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-10 20:05     ` Corentin Chary
@ 2012-08-10 20:12       ` Diego Elio Pettenò
  2012-08-11 11:55         ` Corentin Chary
  0 siblings, 1 reply; 13+ messages in thread
From: Diego Elio Pettenò @ 2012-08-10 20:12 UTC (permalink / raw
  To: gentoo-dev

On 10/08/2012 13:05, Corentin Chary wrote:
> Right, our proposal is not here to replace SRC_URI, it's here to fix
> the cases where SRC_URI can't be sanely used to guess new upstream
> versions (strange mangling rules, unbrowsable directories, etc...).

Yes I guess Jeroen was just saying why we shouldn't abandon it as Gilles
proposed.

FWIW for the rest it feels right to me. Although this starts to add up
to the reasons why at least metadata.xml should be validated by schema,
and not DTD.

-- 
Diego Elio Pettenò — Flameeyes
flameeyes@flameeyes.eu — http://blog.flameeyes.eu/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-10 20:12       ` Diego Elio Pettenò
@ 2012-08-11 11:55         ` Corentin Chary
  2012-08-11 22:43           ` Alec Warner
  0 siblings, 1 reply; 13+ messages in thread
From: Corentin Chary @ 2012-08-11 11:55 UTC (permalink / raw
  To: gentoo-dev

On Fri, Aug 10, 2012 at 10:12 PM, Diego Elio Pettenò
<flameeyes@flameeyes.eu> wrote:
> On 10/08/2012 13:05, Corentin Chary wrote:
>> Right, our proposal is not here to replace SRC_URI, it's here to fix
>> the cases where SRC_URI can't be sanely used to guess new upstream
>> versions (strange mangling rules, unbrowsable directories, etc...).
>
> Yes I guess Jeroen was just saying why we shouldn't abandon it as Gilles
> proposed.
>
> FWIW for the rest it feels right to me. Although this starts to add up
> to the reasons why at least metadata.xml should be validated by schema,
> and not DTD.

Maybe .. We plan to use <watch xmlns="http://euscan.iksaif.net"> to
avoid editing metadata.dtd (for now).
What do you think about format propositions ? Current format looks
like what was given in the examples, but mgorny feels that something
more xmlish would be better.

-- 
Corentin Chary
http://xf.iksaif.net


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-11 11:55         ` Corentin Chary
@ 2012-08-11 22:43           ` Alec Warner
  2012-08-11 23:53             ` Diego Elio Pettenò
  0 siblings, 1 reply; 13+ messages in thread
From: Alec Warner @ 2012-08-11 22:43 UTC (permalink / raw
  To: gentoo-dev

On Sat, Aug 11, 2012 at 1:55 PM, Corentin Chary <iksaif@gentoo.org> wrote:
> On Fri, Aug 10, 2012 at 10:12 PM, Diego Elio Pettenò
> <flameeyes@flameeyes.eu> wrote:
>> On 10/08/2012 13:05, Corentin Chary wrote:
>>> Right, our proposal is not here to replace SRC_URI, it's here to fix
>>> the cases where SRC_URI can't be sanely used to guess new upstream
>>> versions (strange mangling rules, unbrowsable directories, etc...).
>>
>> Yes I guess Jeroen was just saying why we shouldn't abandon it as Gilles
>> proposed.
>>
>> FWIW for the rest it feels right to me. Although this starts to add up
>> to the reasons why at least metadata.xml should be validated by schema,
>> and not DTD.
>
> Maybe .. We plan to use <watch xmlns="http://euscan.iksaif.net"> to
> avoid editing metadata.dtd (for now).
> What do you think about format propositions ? Current format looks
> like what was given in the examples, but mgorny feels that something
> more xmlish would be better.

If you want metadata.dtd patched; please file a bug against www@ and
someone will look at it (you may have to poke us a few times... ;))

>
> --
> Corentin Chary
> http://xf.iksaif.net
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-11 22:43           ` Alec Warner
@ 2012-08-11 23:53             ` Diego Elio Pettenò
  2012-08-13 10:25               ` Dirkjan Ochtman
  0 siblings, 1 reply; 13+ messages in thread
From: Diego Elio Pettenò @ 2012-08-11 23:53 UTC (permalink / raw
  To: gentoo-dev

On 11/08/2012 15:43, Alec Warner wrote:
> If you want metadata.dtd patched; please file a bug against www@ and
> someone will look at it (you may have to poke us a few times... ;))

Can we have xmlschema instead? You know so that things like broken email
addresses in <maintainer> can be caught...

-- 
Diego Elio Pettenò — Flameeyes
flameeyes@flameeyes.eu — http://blog.flameeyes.eu/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-11 23:53             ` Diego Elio Pettenò
@ 2012-08-13 10:25               ` Dirkjan Ochtman
  2012-08-13 14:50                 ` Diego Elio Pettenò
  0 siblings, 1 reply; 13+ messages in thread
From: Dirkjan Ochtman @ 2012-08-13 10:25 UTC (permalink / raw
  To: gentoo-dev

On Sun, Aug 12, 2012 at 1:53 AM, Diego Elio Pettenò
<flameeyes@flameeyes.eu> wrote:
> Can we have xmlschema instead? You know so that things like broken email
> addresses in <maintainer> can be caught...

https://bugs.gentoo.org/show_bug.cgi?id=384457

IMO RELAX NG (or Schematron) would be much better than XML Schema.

Cheers,

Dirkjan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-10 14:21   ` [gentoo-dev] SRC_URI " Jeroen Roovers
  2012-08-10 14:24     ` Diego Elio Pettenò
  2012-08-10 20:05     ` Corentin Chary
@ 2012-08-13 10:51     ` Gilles Dartiguelongue
  2 siblings, 0 replies; 13+ messages in thread
From: Gilles Dartiguelongue @ 2012-08-13 10:51 UTC (permalink / raw
  To: gentoo-dev

Le vendredi 10 août 2012 à 16:21 +0200, Jeroen Roovers a écrit :
> On Fri, 10 Aug 2012 14:03:23 +0200
> Gilles Dartiguelongue <eva@gentoo.org> wrote:
> 
> > Since you are proposing this, a side question is:
> > Why should we write SRC_URI in ebuilds if that info is now available
> > in metadata.xml ? (granted that we might still want to keep
> > over-riding this information in ebuilds)

I was not suggesting to erase SRC_URI from all ebuilds but for things
that have well defined rules like gnome packages, defining SRC_URI is
mostly a useless excercise.


5) The inversion of your question: Why should we start handling SRC_URI
>    outside ebuilds and eclasses? Or, how would that be practical,
>    advantageous, an improvement on the current situation.
> 

I will add against my own question that eclasses currently handle this
(see gnome.org eclass) very well. So maybe there is nothing to debate
here :)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [gentoo-dev] SRC_URI in metadata.xml
  2012-08-13 10:25               ` Dirkjan Ochtman
@ 2012-08-13 14:50                 ` Diego Elio Pettenò
  0 siblings, 0 replies; 13+ messages in thread
From: Diego Elio Pettenò @ 2012-08-13 14:50 UTC (permalink / raw
  To: gentoo-dev

On 13/08/2012 03:25, Dirkjan Ochtman wrote:
> IMO RELAX NG (or Schematron) would be much better than XML Schema.

They are generally idempotent enough that you don't have to worry about
which one you choose... Relax NG works for me, I would have to convert
them to rnc anyway to load into emacs's nxml.

-- 
Diego Elio Pettenò — Flameeyes
flameeyes@flameeyes.eu — http://blog.flameeyes.eu/


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-08-13 15:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-10 11:11 [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml Federico "fox" Scrinzi
2012-08-10 12:03 ` Gilles Dartiguelongue
2012-08-10 14:21   ` [gentoo-dev] SRC_URI " Jeroen Roovers
2012-08-10 14:24     ` Diego Elio Pettenò
2012-08-10 20:05     ` Corentin Chary
2012-08-10 20:12       ` Diego Elio Pettenò
2012-08-11 11:55         ` Corentin Chary
2012-08-11 22:43           ` Alec Warner
2012-08-11 23:53             ` Diego Elio Pettenò
2012-08-13 10:25               ` Dirkjan Ochtman
2012-08-13 14:50                 ` Diego Elio Pettenò
2012-08-13 10:51     ` Gilles Dartiguelongue
2012-08-10 20:04   ` [gentoo-dev] [RFC] euscan: Need to add more upstream info " Corentin Chary

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox