From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1R5ZPH-0006qU-7q for garchives@archives.gentoo.org; Mon, 19 Sep 2011 08:40:55 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id F372621C415; Mon, 19 Sep 2011 08:40:17 +0000 (UTC) Received: from mail-pz0-f42.google.com (mail-pz0-f42.google.com [209.85.210.42]) by pigeon.gentoo.org (Postfix) with ESMTP id A064021C40C for ; Mon, 19 Sep 2011 08:39:13 +0000 (UTC) Received: by pzk1 with SMTP id 1so10488753pzk.1 for ; Mon, 19 Sep 2011 01:39:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=fZHFQxN5m457KoLyLtKDh24nHdItpyZnKBugd8etXOc=; b=BLhIQkEbEgpvpmw63gcexvh3jvF1rUJ/l/cTLr9SAxSnxBpuFs99YVavxj+0rRCpOA 0dS8kl4RrofUSR9G169ap0lNcUmwC/i4iWia4GWmhAKCgZSapA65UCs8EjQRMUPFseST k6+Uim2FkrZKIJvIrlFVwBdWIyYyFmoxs1jtI= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Received: by 10.68.47.167 with SMTP id e7mr3964797pbn.286.1316421551838; Mon, 19 Sep 2011 01:39:11 -0700 (PDT) Received: by 10.142.222.8 with HTTP; Mon, 19 Sep 2011 01:39:11 -0700 (PDT) In-Reply-To: References: <4E76704D.9080808@gentoo.org> Date: Mon, 19 Sep 2011 10:39:11 +0200 Message-ID: Subject: Re: [gentoo-dev] euscan proof of concept (like debian's uscan) From: Corentin Chary To: gentoo-dev@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Archives-Salt: X-Archives-Hash: 2f408f82970c14932123c3a577193d2f On Mon, Sep 19, 2011 at 9:35 AM, Dirkjan Ochtman wrote: > On Mon, Sep 19, 2011 at 00:27, "Pawe=C5=82 Hajdan, Jr." > wrote: >> Okay, I think this is pretty cool and we should find it a new home in >> the Gentoo infrastructure. >> >> I was thinking about http://qa-reports.gentoo.org/ with the repo at >> http://git.overlays.gentoo.org/gitweb/?p=3Dproj/qa-scripts.git;a=3Dsumma= ry >> >> I can act as a proxy committer and reviewer for that code. Could you >> break it up into some smaller parts (preferably backend first) and send >> to me for review (if you're interested)? >> >> How long does it take to generate the reports? > > +1 I think it would be good to run this on Gentoo infra, and I > wouldn't mind helping out. > > Bikeshedding: not sure "reports" is the best name for this, as reports > implies something more static? Here is how it works, each week I launch this script on lt server. I've got ~30 trees installed with layman. The server is an AMD X2 4600+ with 4GB of RAM and two 80GB HD in raid1 using ext4. My network bandwidth is 20Mbps down 1Mbps up. #!/bin/sh ## Setup some vars to use local portage tree export PATH=3D${HOME}/euscan/bin:${PATH} export PYTHONPATH=3D${HOME}/euscan/pym:${PYTHONPATH} export PORTAGE_CONFIGROOT=3D${HOME}/local export ROOT=3D${HOME}/local export EIX_CACHEFILE=3D${HOME}/local/var/cache/eix ## Go to euscanwww dir cd ${HOME}/euscan/euscanwww/ ## Update local trees ## Bottleneck: disk and network bandwidth ## Time: less than 30mn emerge --sync --root=3D${ROOT} --config-root=3D${PORTAGE_CONFIGROOT} ROOT=3D"/" layman -S --config=3D${ROOT}/etc/layman/layman.cfg ## Also update eix database, because we use eix internaly ## Bottleneck: disk and cpu ##Time: 30mn ~ 1h eix-update ## Scan portage (packages, versions) ## Bottleneck: disk and cpu ## Time: < 15mn ## Note: this script uses eix to get a list of packages and versions python manage.py scan-portage --all --purge-versions --purge-packages ## Scan metadata (herds, maintainers, homepages, ...) ## Bottleneck: disk ## Time: 1h ~ 1h30 ## Note: this script uses gentoolkit to fetch metadata python manage.py scan-metadata --all --progress ## Scan uptsream packages ## Bottleneck: disk, network bandwidth and latency, cpu ## Time: up to 6h ## Note: euscan is called on each package. euscan has a slow startup caused by gentoolkit/portage. ## gparallel is used here to limit the load caused by euscan, and to launch up to 16 euscan instances at a time on this machine ## this part is the longest, but scale very well eix --only-names -x | gparallel --load 4 --jobs 800% euscan >> ${HOME}/logs/euscan-upstream.log python manage.py scan-upstream --feed --purge-versions < ${HOME}/logs/euscan-upstream.log ## Update counters (6) ## Time: some minutes ## Bottleneck: cpu ## Note: this script could probably be implemented faster using raw SQL que= ries python manage.py update-counters > Also not sure how much it has to do > with QA. > How much of it constitutes the backend, in your opinion? It seems > there are two parts, right now: > > 1. euscan script, to find new versions for a single package > 2. the django www app, including storage for the version data Yes, exactly. Here is how the tree is structured currently: euscan script bin/ -- contains the euscan python "binary" pym/ -- contains most of the code used by the euscan script pym/euscan/handlers -- contains specific site handlers (rubygems, pypi, pecl, pear, ..) euscanwww django app euscanwww/ -- contains all the stuff for the django application, all the django application needs is a working portage tree and euscan available in the $PATH > IMO it would be nice to have a somewhat generic REST-style service > exposing the data, and build a simple UI on top of that. In > particular, I have different ideas about what the UI should look like, > so it would be nice if different people could experiment (and/or > integrate in other services like znurt.org). I already added some very dummy json formating (note that it also exposes internal key id, which is bad, but I just wanted to experiment). All you need is to append "/json" to an url. For example: - http://euscan.iksaif.net/maintainers/4/json - http://euscan.iksaif.net/package/app-accessibility/brltty/json This could be a lot better, we just need to define an API and the implementation will be easy. A first step would be to make an ebuild for euscan, and another for euscanwww so that anyone can easilly install it and play with it. Feel free to ping me on irc, I'm on #gentoo-sunrise, my nickname is "iksaif= ". --=20 Corentin Chary http://xf.iksaif.net