* [gentoo-soc] Automatically generated overlay of R packages - final report
@ 2012-08-21 16:27 André Erdmann
0 siblings, 0 replies; only message in thread
From: André Erdmann @ 2012-08-21 16:27 UTC (permalink / raw
To: gentoo-soc; +Cc: Denis Dupeyron
Hi everyone,
== Brief summary of this project ==
The aim of this project is to create scripts that automate the process
of overlay creation/maintenance for R packages from repositories such
as CRAN and Bioconductor.
Longer:
For the ebuild creation of a single package one needs to extract the
package, copy-paste data from its description file to the ebuild and
look up dependencies, which is time-consuming. Although trivial for a
few number of packages, this is practically impossible to do by hand
for repositories like CRAN (> 3500 packages), especially 'cause it
also requires tracking changes (new / updated / removed packages). The
solution is to automate that process and this is what this project is
about.
The project's git repository is located at
http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git
== Current state of this project and future directions ==
Automatically generated overlay of R packages has now reached the end
of GSoC 2012's coding period. The result briefly described below is
roverlay, a script and modules written in python. I tried to keep the
code as extensible as possible, making future extensions like other
ways to get R packages (git, svn, ...) easy.
It has two user-accessible main parts:
* overlay creation, which accepts R packages as input and creates a
portage overlay for them
* repository management, download R packages from remotes and use them
as input for overlay creation
The minimal requirement for downloading packages is that a remote
offers http access to its packages. The preferred way is rsync, which
is used for CRAN and BIOC. The http support has been added later to
include repos like R-Forge and Omegahat.
Overlay creation is able to work incrementally so that existing
ebuilds don't have to be recreated. It involves several tasks:
* reading R package metadata and fixing errors like misspelled data
fields along the way, e.g. 'Depents' is read as 'Depends'. Package
reading is configurable.
* ebuild creation, which tries to create an ebuild for an R package
using its metadata
-> dependency resolution that creates correct DEPEND/RDEPEND ebuild
variables. It's realized by a dictionary approach extended by
version-relative lookups.
* overlay writing
-> per-Package metadata.xml/Manifest creation
Currently, the ebuild creation success rate is slightly higher than
95%. Ca. 900 out of 32000 creations fail due to various reasons: os
type not supported, dependency unresolvable, R package format not
supported (.Z-compressed tarballs, ...).
Extensive documentation is available at [0] and covers usage,
configuration, installation, what to expect and how roverlay works.
All in all, I accomplished most objectives of my proposal. Some have
been dropped like getting packages via svn, some have been added like
getting packages via http and the version-relative dependency
resolution. What's really missing is the integration into Gentoo's
infrastructure so that end-users can add the resulting overlay using
Layman. This will hopefully happen in the near future. As for the
future, I'll focus on adding features based on real world/production
usage needs.
At last, I'd like to thank Denis (Calchan), my mentor, for guidance
throughout the last months. I don't tend to ask many questions, but
whenever I had one, he was able to answer it ;) Overall, taking part
in gsoc for Gentoo has been a good experience.
[0] http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git;a=blob_plain;f=doc/html/usage.html;hb=HEAD
--
Regards,
André E.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2012-08-21 18:14 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-21 16:27 [gentoo-soc] Automatically generated overlay of R packages - final report André Erdmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox