public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] Proposal: R_Overlay: Automated overlay maintenance
@ 2013-05-02 22:17 André Erdmann
  0 siblings, 0 replies; only message in thread
From: André Erdmann @ 2013-05-02 22:17 UTC (permalink / raw
  To: gentoo-soc; +Cc: Denis Dupeyron

Hello,

I've submitted a proposal today about extending last year's GSoC
project "Automatically generated overlay of R packages" [0] with focus
on automated overlay maintenance. It's included in this mail for
public review. Feel free to comment ;)

Kind Regards,
André Erdmann

[0] http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git;a=summary

=== proposal starts here ===

--- Table of contents ---

1 Abstract
2 Objective
3 Implementation Ideas
4 Deliverables
5 Timeline
6 Biography / About me
7 Extra information

--- Table of contents ---

1 Abstract
---

roverlay is a program that creates ebuilds for R packages and makes
them available as overlay. It's the result of last year's GSoC project
"Automatically generated overlay of R packages".

This project will extend overlay creation and add automated overlay maintenance.


2 Objective
---

To give a short review of what has been done since the end of last
year's GSoC, these features have been added:

* faster Manifest file creation using the portage libs directly (still
experimental)
* creation of a package mirror directory ("overlay DISTDIR")
* package rules that allow to control various aspects of package
processing (=ebuild creation)

Overall, roverlay's code size increased by about 30% (+3800 lines),
whereas its user-targeted documentation increased by 22% (+500 lines).

There's still work to do in order to get a fully automated overlay. As
mentioned before, this project/proposal focuses on two major areas,
(a) extending ebuild and overlay creation and (b) adding automated
overlay maintenance, with the latter one being more important.

The aim of automated overlay maintenance is to provide an overlay that
can be deployed to end-users without requiring interaction by the
overlay maintainer (ideally). This includes verification of ebuilds
and the entire overlay (typically after running roverlay) as well as a
simple status web page. A tinderbox approach may also be implemented
(in addition to structural testing).

As usual, proper documentation is considered essential.


3 Implementation Ideas
---

This section lists a few features/enhancements that I plan to
implement. It's not a definite list of things that will be done, but
rather gives you an idea of how the result will look like.


3.1 Extending Ebuild/Overlay Creation


3.1.1 Control Flow

Currently, the package rules system is able to ignore an R package
entirely and to set an ebuild's KEYWORDS variable. This feature will
extend this by

* "Relocate" packages

-> change the category and/or name of an ebuild
-> rename the (local) src files using arrows in SRC_URI
    This is necessary because R package names are sometimes too generic.

* Modify ebuild variables at ebuild creation time via package rules
* Patch ebuilds after creating the overlay (but before writing
Manifest files and testing it)


3.1.2 Misc Features

* Add SLOT handling to dependency resolution

* Replace the r_suggests USE flag with a USE_EXPAND variable so that
users can select optional dependencies on a per-package basis

* Bypass ebuild generation: insert hand-written ebuilds into the overlay

Doing that at overlay creation time has two advantages: the inserted
ebuild safely replaces any generated one and it won't be excluded from
overlay verification.

* Run incremental overlay creation for a given set of packages

More importantly, regenerate ebuilds on tarball (checksum) change,
which happens when upstream (CRAN etc.) changes a package's content
without renaming it. Maybe revbump regenerated ebuilds which allows
easy end-user upgrades.

* Generate meaningful statistics for the web page / QA tools


3.1.3 Console

roverlay already features the depres console, which can be used to
create and test dependency rules in a rather limited way.

A reimplementation of this console would allow to control each step of
overlay creation interactively, for example:

* "forge" packages - create fake package information from user input
* create ebuilds for packages, add them to an overlay
* test subsystems like dependency resolution and package rules

The new console would also feature a more user-friendly interface,
e.g. readline support (-> tab completion).

This is a nice-to-have feature and may be dropped in favor of others.


3.2 Automated Overlay Maintenance

The main goal here is to require as little maintainer interaction as possible.


3.2.1 Overlay Verification

* Structural testing: verify the overlay and all of its ebuilds
(possibly using repoman.checks etc)

-> Ensure that all dependencies of an ebuild are actually satisfiable

* Black-box testing: tinderbox approach - (try to) build all packages (ebuilds)

An ebuild that doesn't pass structural testing will be removed from
the overlay. Test results will be logged and included in the status
web page.


3.2.2 Other QA Tools

To be figured out.

Currently, I plan to provide a script that reports the overlay's
status and provides easy access to overlay statistics and log
messages.

This script should support several output formats, human readable text
and html at least, so that it can serve as base for creating a status
web page.


3.2.3 Overlay Snapshots

The idea here is to add version control to the overlay, which allows
to restore a previous (known-to-work) state of the overlay.

A real "known-to-work" state would also have to include all R
packages, but that's not practical (could be solved with hard links /
file copies, though).

The main purpose of this feature is to recover from roverlay.py
failure, e.g. if the result doesn't meet one's expectations, without
the need to regenerate the entire overlay (which could easily take
hours).

Possible solutions:

1. use a full-featured version control system, e.g. git.

The drawback of this solution is that git's history will needlessly
grow (as mentioned above, there's no real "known-to-work" state, so a
complete history is useless). This could be remedied by use of
git-filter-branch, but it would add a layer of complexity without any
real advantage.

2. create tarballs and keep a certain set of them, e.g. keep the last
seven days (or roverlay.py runs) plus one tarball for each of the last
12 weeks.

I'd definitely opt for 2 here.


3.3 User Tools

Generally, users are expected to simply add the overlay with layman
and use it like any other, but certain cases might require user
interaction:

* upstream changes the package's content without renaming it (see 3.1.2)

This invalidates package files downloaded by the user. Rebuilding
these packages (ebuilds) may be advantageous, too.

Removing and/or refetching the files is rather easy, a postsync script
could do the job, possibly with code-side support in roverlay.py to
speed things up (i.e., don't scan the whole overlay after each sync).

The tricky part is to determine the list of packages that should be
rebuilt. Simply rebuilding any ebuild whose package file has been
removed from ${DISTDIR} is not accurate for at least two reasons
(package not installed, user does not keep distfiles). A possible
solution is to rev-bump ebuilds in roverlay.py whenever upstream
changes a package.


4 Deliverables
---

Coding will be done in Python as that's the programming language in
which roverlay.py is written. Some tools may be written as shell
scripts, e.g. the user refetch tool. Documentation will continue to
use reStructuredText.

4.1 Final (September 23)

ebuild/overlay creation (roverlay.py):
   all features as listed in 3.1.1 and 3.1.2
   roverlay console

automated overlay maintenance:
   overlay verification using both structural testing and tinderboxing
   snapshot create/restore functionality
   overlay status script and web page

user tools: refetch tool

documentation:
   roverlay.py's new features and automated overlay maintenance fully documented
   The user refetch tool probably doesn't need much in-depth documentation



4.2 Mid-term (July 29)

ebuild/overlay creation (roverlay.py):
   all features as listed in 3.1.1 and 3.1.2

automated overlay maintenance:
   overlay verification using structural testing
   snapshot create/restore functionality
   basic overlay status script

user tools: refetch tool

documentation: partially done, roverlay.py's new features documented
(in doc/rst/usage.rst)



5 Timeline
---

* May 28 - Jun 16: implement control flow features as described in 3.1.1
* Jun 17 - Jun 30: misc features as listed in 3.1.2
* Jul 1 - Jul 14: add structural testing
* Jul 15 - Jul 21: write the user refetch and overlay snapshot/restore tools
* Jul 22 - Jul 28: basic qa script
* Jul 29 - Aug 4: Mid-term evaluations / write documentation
* Aug 5 - Aug 18: add tinderboxing
* Aug 19 - Aug 25: extend the qa script, simple status web page
* Aug 26 - Sep 8: roverlay console (see 3.1.3)
* Sep 9 - Sep 23 (Mo): write/improve documentation


6 Biography / About me
---

I'm a twenty-one year old undergraduate student from Stuttgart,
Germany. My major subject is Computer Science in which I'll get my
bachelor's degree in 2014.

I've been using Gentoo for 4 years now and I'm about to become an
official dev in near future. As for other open source activities, I
contribute to the TLP project (https://github.com/linrunner/TLP) on a
regular basis (since mid 2011), sometimes in form of patches, but
normally (and more important, in my opinion) by doing code reviews. I
also maintain a small portage overlay for TLP
(https://github.com/dywisor/tlp-portage).


7 Extra information
---

7.1 Use the tools that you will use in your project to make changes to code

My bug tracker activity is low. I've recently reported a minor build
issue and proposed a patch (bug #467728,
https://bugs.gentoo.org/show_bug.cgi?id=467728).


7.2 Participate in our development community

You can find a mailing list entry from me at <this mail>.


7.3 Contact Info

email : dywi at mailerd.de
irc   : dywi at irc.freenode.net

home mailing address: <removed>
phone number: <removed>


7.4 Working hours

Mo - Sa, 8 am - 9 pm UTC

Actual working time sums up to about 20 hours per week from May 28
until Jul 20 and then 35hours/week.


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-05-02 22:17 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-02 22:17 [gentoo-soc] Proposal: R_Overlay: Automated overlay maintenance André Erdmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox