From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id A336B13800E for ; Fri, 10 Aug 2012 11:13:18 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 2B433E05EF; Fri, 10 Aug 2012 11:12:56 +0000 (UTC) Received: from contumacia.investici.org (contumacia.investici.org [178.255.144.35]) by pigeon.gentoo.org (Postfix) with ESMTP id B15EFE057F for ; Fri, 10 Aug 2012 11:11:58 +0000 (UTC) Received: from [178.255.144.35] (contumacia [178.255.144.35]) (Authenticated sender: fox91@anche.no) by localhost (Postfix) with ESMTPSA id C192AE8621 for ; Fri, 10 Aug 2012 11:11:55 +0000 (UTC) X-DKIM: Sendmail DKIM Filter v2.8.2 contumacia.investici.org C192AE8621 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anche.no; s=stigmate; t=1344597117; bh=SvUrrqHRYOILy+Bbpq1GyFSr9+Yem+G0XUM7kt/Asbs=; h=Message-ID:Date:From:MIME-Version:To:Subject:Content-Type; b=bUpTeEwtspHLrFF/q7hlaVy8Dv/x5zx5K+z5XXY0Kb1KxB7YSpW68m8CY5O2Cr8Yg 3MUk0wNa+XVGkLZexOd+HS0iKw+ot76tW/XC5rVZiwE5Xb1LXnrdHE1kBw51unqG3s YzVtUFMVtj15crJGzJmZKc2Vugi3QKFVDl2sgRH0= Message-ID: <5024EC6C.3070300@anche.no> Date: Fri, 10 Aug 2012 13:11:40 +0200 From: "Federico \"fox\" Scrinzi" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120601 Thunderbird/13.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 To: gentoo-dev@lists.gentoo.org Subject: [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml X-Enigmail-Version: 1.4.3 OpenPGP: url=http://keyserver.paranoici.org:11371/pks/lookup?op=get&search=B0087658 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig813F8A40F0427B063EA06BED" X-Archives-Salt: 62b014ba-1ea5-4641-aa7a-0f19dfb20068 X-Archives-Hash: cae76d10fcd1a0edf31ba3002397d671 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig813F8A40F0427B063EA06BED Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi everybody! euscan is available in portage as a dev package (app-portage/euscan-9999). This tool allows to check if a given package/ebuild has new upstream versions or not. It uses different heuristics to scan upstream and grab new versions and related urls. euscan can use either custom "handlers" for well known upstream (github, pypi, cpan, sourceforge, google-code, etc..) or use directory scanning using SRC_URI. If directory scan fails for some reason, euscan will fallback to brute force (generating possible next version number and trying to fetch those packages). The problem that we're facing with euscan is that some packages in upstream use strange version numbers or the list of available versions is placed in a location that is totally different from SRC_URI. Examples: - MySQL: most MySQL mirrors are not browsable (always fallback to brute force) - webalizer uses strange version numbers in upstream (ftp://ftp.mrunix.net/pub/webalizer/), in this case euscan should be aware that 2.21-02 is the version number in upstream and scan the ftp directory searching for webalizer-(\d+).(\d+)-(\d+).tar.gz. The last version of webalizer, 2.23.05, is not recognized by euscan and is not available in gentoo. - Authen-SASL-Cyrus in upstream uses =93-server=94 in version numbers http://www.cpan.org/authors/id/P/PB/PBOETTCH/ - XML-Tidy that uses stranges letters in version number We thought about how to solve this issue and we agreed that the best way to handle the problem for every specific case was adding some more information in metadata.xml. In Debian, uscan uses information from debian/watch inside debian packages, hence as so much work is already done we thought about taking this info from watch files and save it in metadata.xml to make euscan use it. I wrote a simple script that patches metadata.xml adding an experimental tag with data from debian packages: https://github.com/volpino/euscan/blob/master/bin/euscan_patch_metadata A basic watch data contains a base url to scan and a pattern to search into it: Example: base: http://icedtea.classpath.org/download/source/ pattern: icedtea-([\d\.]+).tar.gz Which means "open that url and search for the links that match that pattern". This is useful for example when is not possible to retrieve the base url from SRC_URI (icedtea=92s SRC_URI is http://icedtea.classpath.org/hg/release/icedtea7-forest-2.2/hotspot/archi= ve/889dffcf4a54.tar.gz) Advanced usage with directory pattern: Example: base: http://ftp.gwdg.de/pub/misc/mysql/Downloads/MySQL-([\d\.]+) pattern: mysql-([\d\.]+).tar.gz Scans all directories that match the query looking for links that match the pattern We need also some options for mangling versions and download url: these options can contain regexps or names of mangling rules (e.g.: "cpan" means apply mangling rules for CPAN versions) Version mangling example: As mentioned above webalizer uses both dots and hyphens in version numbers, so an option like this is required versionmangle=3D=94s/-/./=94 Download url mangling example: Page scan on berlios returns an url like this: http://prdownload.berlios.de/mirageiv/mirage-0.9.tar.gz that should be mangled to get a working download url with an option like downloadurlmangle=3D=94s/prdownload/download/=94 (for more info see uscan manpage) Another example: dev-perl/Math-BaseCnv or XML-Tidy in upstream use strange version numbers like 1.8.B59BrZ that should be mangled to 1.8 Summarizing we need: - A base url and a file pattern to search for new upstream versions when SRC_URI is not suitable - some options for mangling retrieved data from the scan of upstream using base url and pattern or using remote-id information So our problem is: how can we store this data in a very flexible and efficient way? Proposed solutions: 1) Add an euscan tag with a custom namespace Example: ab Which means: apply regex s/a/b/ then apply cpan mangling rules and then gentoo mangling rules. 2) Change quite heavily the remote-id tag: - adding versionmanging and downloadmangling options that contain regexes - adding a new remote-id type called for example url, that tag will contain the base url and the pattern 3) Add a watch tag to with versionmangling and downloadmangling options. This tag can have a type (and in that case the data from remote-id is used) or can contain the base url and the file pattern. (this is what is currently implemented for our tests). So before going further, we would like some feedback from you on these approaches. What do you think about them? Which do you prefer? Do you think there=92s= a better approach or some steps can be changed in a more efficient way? Other examples: dev-perl/XML-Tidy: # We have to strip trailing letters in version and then apply cpan mangling rules XML-Tidy XML::Tidy sys-fs/dfc: # Download hosting sux and have download id in url http://projects.gw-computing.net/projects/dfc/files /attachments/download/[0-9]+/dfc-(.*)\.tar\.gz sys-dev/gcc: # Tons of files in SRC_URI, let=92s be more efficient media-plugins/vdr-cpumon # 0.0.6a =3D=3D 0.0.6_p1 so should need version= mangling app-admin/webalizer: http://www.mrunix.net/webalizer/download.html webalizer-(.*)-src\.tgz kde-base/okular: ftp://ftp.kde.org/pub/kde/stable/([\d\.]*)/src/okular-([\d\= =2E]*).tar.xz ftp://ftp.kde.org/pub/kde/stable/((?:\d\.)+\d)/src/okular-(= (?:\d\.)+\d).tar.xz sci-geosciences/grass: http://grass.osgeo.org/grass64/source/grass-([\d\.]*(?:RC\d= ){0,1}).tar.gz --=20 f. "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." (Martin Golding) --------------enig813F8A40F0427B063EA06BED Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQJOxwAAoJEA+rdfOgGL72iwAP/RqGdNUdqb309bQR5MtHBmZ3 fHb1UEKlIhqJ7YW4rL18xdQcPuOe/tFY7V5rGGimpgXgf5B3VfMumN1tlbc5iPyt +sfw+kyLOBWS4r2lO6GkgbTbiqjCwHHagMpmtyAh6LbT7k/gAB+bZpfLi5tFx2MI wb28cMJ89uDJ3nvgsy6vdkvxSD/2c/54WyVTHWY/G/IsQ4hS6G3bzF8t8FtNCU+u qkCOoq+09vHnQvBEACVI7wtqm2o63HROBdKjtryGm9BXVAQga+yl+i6J9/tcusx0 O3mAdZgFkHooBh5ZJdTHuQDT9jqk+DEgma4uKMhab4oaxaJpl/xWhUAAouOZdMHK c9FEjlhbC492FVQGsmtcWTQGxoI8ZT8a3ntQ9PKQk5HceHqJM6GPhcYHX/0r9ngB EKzsW844qWTL12nY2NiQsaCZRcZxpij9JUyU6+z6AYgJih6WSWYS9MdeyOvEaB3K lomW23XwZxmPv/phM9QxHDXygV6UxT629P0gGRt8nD3d+NVo0zeF/O4CiXXFGp8n 0t1NRGtGRCcw7ntNlawal2sQvzTl5+hH3C63E+Avv7z4rJ7o7Y0jGdT9H+OYceTc mC7Y1VN9iVQhHT227Hqb9FkGABRWIxGUgtOdIBy5gsVB7Cfd3jgmifaWVP7Jhe04 3KymVP7R5mBsDnH8MLFo =tTNV -----END PGP SIGNATURE----- --------------enig813F8A40F0427B063EA06BED--