From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1OYPme-0006ba-P5 for garchives@archives.gentoo.org; Mon, 12 Jul 2010 20:39:25 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 597F9E0AB7; Mon, 12 Jul 2010 20:38:56 +0000 (UTC) Received: from smtp.webfaction.com (mail6.webfaction.com [74.55.86.74]) by pigeon.gentoo.org (Postfix) with ESMTP id 2AFEEE0AB5; Mon, 12 Jul 2010 20:38:56 +0000 (UTC) Received: from mail-ww0-f53.google.com (mail-ww0-f53.google.com [74.125.82.53]) by smtp.webfaction.com (Postfix) with ESMTP id 1BE0F390AF2; Mon, 12 Jul 2010 15:38:54 -0500 (CDT) Received: by wwb24 with SMTP id 24so415099wwb.10 for ; Mon, 12 Jul 2010 13:38:53 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-science@lists.gentoo.org Reply-to: gentoo-science@lists.gentoo.org MIME-Version: 1.0 Received: by 10.216.63.147 with SMTP id a19mr125987wed.35.1278967133417; Mon, 12 Jul 2010 13:38:53 -0700 (PDT) Received: by 10.216.182.72 with HTTP; Mon, 12 Jul 2010 13:38:53 -0700 (PDT) Date: Mon, 12 Jul 2010 22:38:53 +0200 Message-ID: Subject: [gentoo-science] G-CRAN weekly report #7 (warning: big read) From: Auke Booij To: gentoo-soc@lists.gentoo.org, gentoo-science@lists.gentoo.org Content-Type: text/plain; charset=ISO-8859-1 X-Archives-Salt: fb5aa6cb-7074-4ca8-b3e7-b731c38c1b63 X-Archives-Hash: 2462615c9a54675c12a1c7282521af5c As the subject says, this report is pretty long. It's intended for those who haven't closely followed my work up until now and would like to catch up, so go grab a cup of coffee if you really want to read this to the end. Subjects in this report (in order): -intro of the project -what have I been up to last week -instructions on installing packages from bioconductor and CRAN -g-common, the interface (or actually lack of interface) this project will have -plans for the coming week and next week Perhaps an introduction of the circumstances is in place. R is a language for statisticians. With statistics being such a wide topic, there are thousands of additional packages you can install to further analyze data, and the Bioconductor project adds another field to R by introducing genomics. My job is to cleanly enable Gentoo users to install the latest versions of these packages systemwide, as opposed to directly calling R's package installers and ending up with dangling files. Last week, I was up to the point where some packages installed correctly, but there were some rough edges too. For packages not relying on external (non-R) libraries, this should all be smoothed out now. I've spent a lot of time communicating with several parties last week. There was a minor issue with the Bioconductor repositories, I've spoken to some people about g-common, talked a bit with the CRAN maintainers and had some technical discussions with rafaelmartins, who's a gsoc student working on g-octave, as you may know. Then there are some helpful dependency resolution changes. Dependencies on R packages now work perfectly fine, and external dependencies are going to be tackled soon (but it won't be pretty). So why is this helpful? It means you can install most Bioconductor packages flawlessly. As promised in an earlier email to the gentoo-science ML, some instructions. Please note that this will of course not be the way you'll eventually use g-cran, but I'm still working on the interface (more on that later). First, create two overlays. I'm simply calling them bioconductor_1 and bioconductor_2. One of them primarily contains code, the other consists primarily of gene databases. # mkdir -p /usr/local/portage/bioconductor_1/profiles # mkdir -p /usr/local/portage/bioconductor_2/profiles Now we need to set the repo_name and categories of these overlays, too. # echo "bioconductor_1" >> /usr/local/portage/bioconductor_1/profiles/repo_name # echo "bioconductor_2" >> /usr/local/portage/bioconductor_2/profiles/repo_name # echo "dev-R" >> /usr/local/portage/bioconductor_1/profiles/categories # echo "dev-R" >> /usr/local/portage/bioconductor_2/profiles/categories It's time to actually get the tree. Make sure you've installed g-cran (it's in the science overlay), sync the repositories and then generate the tree: # g-cran /usr/local/portage/bioconductor_1 sync http://www.bioconductor.org/packages/devel/bioc # g-cran /usr/local/portage/bioconductor_2 sync http://www.bioconductor.org/packages/devel/data/annotation # g-cran /usr/local/portage/bioconductor_1 generate-tree # g-cran /usr/local/portage/bioconductor_2 generate-tree You can now add the overlays to your favorite package manager and start emerging (*ahem* - installing) packages. If all is well, you should be able to install, for example, dev-R/zebrafishdb (this is a bioconductor_2 database package that pulls in several bioconductor_1 packages). I have absolutely no clue as to what you can do with these packages, but I suppose some biology fans out there can clarify that. Now, it may be that portage complains about missing Manifest files. If that's the case, then also run: # for x in /usr/local/portage/bioconductor_{1,2}/dev-R/*; do touch "${x}/Manifest"; done I hope that should do the trick, please tell me if it does, and if it's needed at all. Once you've done this and this trick actually works, you should be able to install dev-R/zebrafishdb. If you don't need no stinkin' databases of deoxyribonucleic acid, but are interested in CRAN, just create a cran overlay as we did for bioconductor_1 and bioconductor_2, but use http://cran.r-project.org as the source repository, and 'cran' for the overlay name. Better yet, find a mirror close to you at http://cran.r-project.org/mirrors.html Okay, so that was quite a journey to get a simple sqlite database of gene data. g-common is what will be making all this easier. Unfortunately I haven't heard much from the other two students I was cooperating with before, anymore, so I'm going to invent something of my own. The plan has remained roughly the same, but time after time I'm struggling to explain it, so please bear with me as you read this. [start explanation of g-common] Current projects to install non-ebuild packages generate ebuild files at request, put them in an overlay and tell portage to install them. The problem with this approach is that the ebuilds are only generated when you know what you want to install, ie. the overlay doesn't get fully populated upfront. This approach implies you cannot search for packages in such repositories, you cannot depend on packages in such repositories, and you can't trivially update packages in such repositories. I'd like to generate a full package tree at sync time, no matter if you want to use it or not. Further, this syncing should work like any other overlay: ideally, support for non-ebuild repositories is transparent to the users. I'm going to do this via an abstraction layer called g-common, for which support needs to be written for all package managers. But once that support is written, and the non-ebuild repository reading code is adjusted to work with g-common, there is nothing stopping you from using a non-ebuild repository like a regular ebuild overlay. How this works is not exactly trivial to explain, but the important part is that even though tools like g-cran are really functioning, the package managers thinks it's dealing with a regular PMS-worthy tree. At sync time, the package manager simply calls the g-common method for syncing a tree, which in turn calls the appropriate repository driver to fetch the new package listing from the true remote repository. To integrate this well, some patching is needed. At install time, all the various pkg_unpack, src_install, etc. phases result in calls to g-common, and again those result in calls to the appropriate repository driver, which then executes the phase, but all this is sort of PMS-compliant. Call it over-engineering, but it'll feel like magic and I'm going to prove it. [end explanation of g-common] The plan for this week is to /finally/ get some work done on g-common and perhaps prepare the code for external dependency resolution. On Saturday, I'm unfortunately leaving for vacation, so you won't see me doing much. After that vacation, first of all there's GUADEC 2010 which I'm going to attend, but of course I'm also going to continue developing g-common and finish external dependency resolution. Now, if you've come to this point in my email, I'd really like to thank you, because I know how easy it is to simply mark an email as read and move on. You are why I'm developing this, thanks a lot! The next weekly report will be in two weeks, Auke Booij / tulcod.