From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 2AAD51381F3 for ; Mon, 29 Apr 2013 01:48:48 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id D4B9BE088D; Mon, 29 Apr 2013 01:48:45 +0000 (UTC) Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 25885E088D for ; Mon, 29 Apr 2013 01:48:44 +0000 (UTC) Received: by mail-pd0-f177.google.com with SMTP id p11so3361989pdj.36 for ; Sun, 28 Apr 2013 18:48:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding; bh=t5doalEoDJU2qHQvcR+fNR3nV1ZO0kG1WMrkbdIT/Hs=; b=AdSfO+qmjSmHGGhcM3MjiJ6NyzP0V6TkQK73Bs6C2c6R5KV0d/L6S6nn7IxHmLGYbT XFn8IxSMSeeCQO2jQRXKzaqqCHjj3KegmY8z3LqRP5glE7Eh3sNtkuI31ZKAjFo/r+i5 SSNbdCUIJcdz/2a+210iM1OmNyDUOoNVPvAqH+/X0LndCV/JGEliXv50ZtQbBh14sBHu fUUDsbQ8Fo4OpzmTqdol7r8+xqf+Rii1tzD2shOFKltB5JgDlF3j04cJVS5yLFYGxsXn /6wVfKUIYJbXwxogblw6nDhbeNWORWBrxkg1TEz+YQvYO+Kgzxl1OAOXdWHjKKK3l6gU 5OFA== X-Received: by 10.66.100.196 with SMTP id fa4mr44232806pab.140.1367200123961; Sun, 28 Apr 2013 18:48:43 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-soc@lists.gentoo.org Reply-to: gentoo-soc@lists.gentoo.org MIME-Version: 1.0 Received: by 10.66.250.198 with HTTP; Sun, 28 Apr 2013 18:48:23 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?Q?Antanas_Ur=C5=A1ulis?= Date: Mon, 29 Apr 2013 02:48:23 +0100 Message-ID: Subject: Re: [gentoo-soc] GSoC 2013: Log collector To: gentoo-soc@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Archives-Salt: 55a55419-afe2-4826-a8a6-0c61c0cd0fb4 X-Archives-Hash: f8cfa049437a945b54795c5b71412cc8 Hi Diego, Thanks for the quick reply and sorry it took me this long to get back to you - a couple other responsibilities have been piling up lately. I've tried to assess what this project would involve, maybe this could be a starting point for the proposal. A lot of this describes how the current solution works, so changes might be necessary: - conceptually the system should have 3 components: a log collector&analyser, a storage backend and a frontend - it would be integrated with portage: --- portage would implement a client which can submit logs to the collector, possibly providing information why the package failed --- this connection between portage and the collector should rely on as little as possible (because any packages providing that functionality might break) --- it should support IPv6 because that's what is used between the container on the tinderbox and the box itself (here's a question though: any technical reason why IPv6 was used? I admit I didn't look into this too deeply) - the collector & analyser: --- receives logs over some protocol --- should be able to group logs (receive several log files for a failing package and keep them together) (this, depending on the implementation, might be part of the portage integration) --- matches each line against a regexp, we can look into something more extensible if required --- organises the files by hostname and submits them to the storage backend - the storage backend: --- I could start with Amazon's AWS and then move to something standalone (how much data is there to store, actually? 1/10/100 GB? and how large can a single log file become?) --- keeps the logs and also a simple database that would hold information about the log groups (package, date, links to log files, etc.) - the frontend: --- displays a list of packages that have matches --- should be integrated with bugzilla; one can see open bugs for a selected package, and also file a new bug --- should be password protected Comments/additions greatly appreciated. Now, regarding the Gentoo application template, I have actually a long time ago submitted a one-line workaround patch[1] for openoffice, but that probably doesn't qualify. Could you point me (general direction is ok) towards something I could fix for my application? Cheers, Antanas [1] https://bugs.gentoo.org/show_bug.cgi?id=3D306211 On Mon, Apr 22, 2013 at 4:55 PM, Diego Elio Petten=C3=B2 wrote: > Hi Antanas, > > First of all, thanks to showing your interest in this project. I'll be > the assigned mentor for the project if you're going to be working on > it. > > To answer your concerns (in quite an unsorted fashion, I apologize), I > would start with saying that what we're looking for with this project > is to have not just an average log collector, but one that is > integrated explicitly with components such as Portage and Bugzilla; > the target of this project, for me, is to be able to use it for the > tinderboxes I've been running (which are currently on-hold because I'm > too busy with training at a new job). > > While I'm not discounting *any* particular technology right away, > we're looking for something that works quickly and relatively > easily... sometimes while technology is already out there that could > work, it doesn't suit the workflow as it is right now. At the same > time, for what I'm concerned you can start by keeping the use of > Amazon's AWS services, if it helps you. The end goal is to avoid using > it, but there is no hurry with that, as the costs associated with it > are very marginal right now. The only important part here is that the > data you store on AWS is not persistent when using the final > collector. > > I don't have any problem with "wasting" the 10 days =E2=80=94 but this do= es > not give any "get out of trouble free" card for the evaluation, so > keep it in mind, you might have to do some harder work in the first > few days. But that does not say much, some people actually performs > better under pressure, so it's up to you if you want to go for it or > not :) > > If there is anything else you want to know, feel free to ask on the > mailing list and I'll answer ASAP. > > Thanks, > Diego > > Diego Elio Petten=C3=B2 =E2=80=94 Flameeyes > flameeyes@flameeyes.eu =E2=80=94 http://blog.flameeyes.eu/ > > > On Mon, Apr 22, 2013 at 2:19 AM, Antanas Ur=C5=A1ulis > wrote: >> Hello, >> >> I'm a final year Computer Science undergraduate at the University of >> Cambridge (UK). My main languages are C and C++, but my bachelor's proje= ct >> involves working with and modifying the CIEL task-parallel execution sys= tem >> written in Python, and I have also been using Python for my scripting ne= eds. >> I have been a Gentoo user since 2009 (switched from *buntu), if I can >> remember correctly. >> >> I am interested in the "Log collector/analyzer for tinderbox" GSoC proje= ct, >> not only because I would like to help the developers of my favourite Lin= ux >> distribution, but I could also see myself using an extension of the tool= to >> oversee my own systems (for example, computers at the Lithuanian Nationa= l >> Olympiad in Informatics, where I am a member of the technical staff). >> >> Of course, there are some things I would like to discuss. Firstly, are t= here >> already any thoughts on a replacement for Amazon's SimpleDB storage? I w= ould >> see that as a major part of the project. Second, if I were to think of t= his >> as a more general tool, with the possibility to support various log >> providers (portage, apache, etc.) and analyzers in the future, what woul= d >> your opinion be on that? I think it would be possible to implement this >> without any external dependencies, so that Portage remains light. Third, >> after a quick search, I came upon the question: is there a reason not to >> hook into systems such as Apache Flume + Elastic Search, or logstash, or >> Scribe? >> >> Lastly, I would like to know up-front whether you would be OK with me on= ly >> being to able to fully focus on the project from the 7th June - I have e= xams >> until then and cannot devote my time to this. That is 10 calendar days l= ost >> from the "Students get to know mentors, read documentation, get up to sp= eed >> to begin working on their projects." phase. >> >> I'm happy to provide more background about myself and hope to hear from = you! >> >> Regards, >> Antanas Ur=C5=A1ulis >