* [gentoo-soc] GSoC 2013: Log collector @ 2013-04-22 1:19 Antanas Uršulis 2013-04-22 15:55 ` Diego Elio Pettenò 0 siblings, 1 reply; 10+ messages in thread From: Antanas Uršulis @ 2013-04-22 1:19 UTC (permalink / raw To: gentoo-soc [-- Attachment #1: Type: text/plain, Size: 1929 bytes --] Hello, I'm a final year Computer Science undergraduate at the University of Cambridge (UK). My main languages are C and C++, but my bachelor's project involves working with and modifying the CIEL task-parallel execution system written in Python, and I have also been using Python for my scripting needs. I have been a Gentoo user since 2009 (switched from *buntu), if I can remember correctly. I am interested in the "Log collector/analyzer for tinderbox" GSoC project, not only because I would like to help the developers of my favourite Linux distribution, but I could also see myself using an extension of the tool to oversee my own systems (for example, computers at the Lithuanian National Olympiad in Informatics, where I am a member of the technical staff). Of course, there are some things I would like to discuss. Firstly, are there already any thoughts on a replacement for Amazon's SimpleDB storage? I would see that as a major part of the project. Second, if I were to think of this as a more general tool, with the possibility to support various log providers (portage, apache, etc.) and analyzers in the future, what would your opinion be on that? I think it would be possible to implement this without any external dependencies, so that Portage remains light. Third, after a quick search, I came upon the question: is there a reason not to hook into systems such as Apache Flume + Elastic Search, or logstash, or Scribe? Lastly, I would like to know up-front whether you would be OK with me only being to able to fully focus on the project from the 7th June - I have exams until then and cannot devote my time to this. That is 10 calendar days lost from the "Students get to know mentors, read documentation, get up to speed to begin working on their projects." phase. I'm happy to provide more background about myself and hope to hear from you! Regards, Antanas Uršulis [-- Attachment #2: Type: text/html, Size: 2083 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-22 1:19 [gentoo-soc] GSoC 2013: Log collector Antanas Uršulis @ 2013-04-22 15:55 ` Diego Elio Pettenò 2013-04-29 1:48 ` Antanas Uršulis 0 siblings, 1 reply; 10+ messages in thread From: Diego Elio Pettenò @ 2013-04-22 15:55 UTC (permalink / raw To: gentoo-soc Hi Antanas, First of all, thanks to showing your interest in this project. I'll be the assigned mentor for the project if you're going to be working on it. To answer your concerns (in quite an unsorted fashion, I apologize), I would start with saying that what we're looking for with this project is to have not just an average log collector, but one that is integrated explicitly with components such as Portage and Bugzilla; the target of this project, for me, is to be able to use it for the tinderboxes I've been running (which are currently on-hold because I'm too busy with training at a new job). While I'm not discounting *any* particular technology right away, we're looking for something that works quickly and relatively easily... sometimes while technology is already out there that could work, it doesn't suit the workflow as it is right now. At the same time, for what I'm concerned you can start by keeping the use of Amazon's AWS services, if it helps you. The end goal is to avoid using it, but there is no hurry with that, as the costs associated with it are very marginal right now. The only important part here is that the data you store on AWS is not persistent when using the final collector. I don't have any problem with "wasting" the 10 days — but this does not give any "get out of trouble free" card for the evaluation, so keep it in mind, you might have to do some harder work in the first few days. But that does not say much, some people actually performs better under pressure, so it's up to you if you want to go for it or not :) If there is anything else you want to know, feel free to ask on the mailing list and I'll answer ASAP. Thanks, Diego Diego Elio Pettenò — Flameeyes flameeyes@flameeyes.eu — http://blog.flameeyes.eu/ On Mon, Apr 22, 2013 at 2:19 AM, Antanas Uršulis <antanas.ursulis@gmail.com> wrote: > Hello, > > I'm a final year Computer Science undergraduate at the University of > Cambridge (UK). My main languages are C and C++, but my bachelor's project > involves working with and modifying the CIEL task-parallel execution system > written in Python, and I have also been using Python for my scripting needs. > I have been a Gentoo user since 2009 (switched from *buntu), if I can > remember correctly. > > I am interested in the "Log collector/analyzer for tinderbox" GSoC project, > not only because I would like to help the developers of my favourite Linux > distribution, but I could also see myself using an extension of the tool to > oversee my own systems (for example, computers at the Lithuanian National > Olympiad in Informatics, where I am a member of the technical staff). > > Of course, there are some things I would like to discuss. Firstly, are there > already any thoughts on a replacement for Amazon's SimpleDB storage? I would > see that as a major part of the project. Second, if I were to think of this > as a more general tool, with the possibility to support various log > providers (portage, apache, etc.) and analyzers in the future, what would > your opinion be on that? I think it would be possible to implement this > without any external dependencies, so that Portage remains light. Third, > after a quick search, I came upon the question: is there a reason not to > hook into systems such as Apache Flume + Elastic Search, or logstash, or > Scribe? > > Lastly, I would like to know up-front whether you would be OK with me only > being to able to fully focus on the project from the 7th June - I have exams > until then and cannot devote my time to this. That is 10 calendar days lost > from the "Students get to know mentors, read documentation, get up to speed > to begin working on their projects." phase. > > I'm happy to provide more background about myself and hope to hear from you! > > Regards, > Antanas Uršulis ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-22 15:55 ` Diego Elio Pettenò @ 2013-04-29 1:48 ` Antanas Uršulis 2013-04-29 2:08 ` Rich Freeman ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Antanas Uršulis @ 2013-04-29 1:48 UTC (permalink / raw To: gentoo-soc Hi Diego, Thanks for the quick reply and sorry it took me this long to get back to you - a couple other responsibilities have been piling up lately. I've tried to assess what this project would involve, maybe this could be a starting point for the proposal. A lot of this describes how the current solution works, so changes might be necessary: - conceptually the system should have 3 components: a log collector&analyser, a storage backend and a frontend - it would be integrated with portage: --- portage would implement a client which can submit logs to the collector, possibly providing information why the package failed --- this connection between portage and the collector should rely on as little as possible (because any packages providing that functionality might break) --- it should support IPv6 because that's what is used between the container on the tinderbox and the box itself (here's a question though: any technical reason why IPv6 was used? I admit I didn't look into this too deeply) - the collector & analyser: --- receives logs over some protocol --- should be able to group logs (receive several log files for a failing package and keep them together) (this, depending on the implementation, might be part of the portage integration) --- matches each line against a regexp, we can look into something more extensible if required --- organises the files by hostname and submits them to the storage backend - the storage backend: --- I could start with Amazon's AWS and then move to something standalone (how much data is there to store, actually? 1/10/100 GB? and how large can a single log file become?) --- keeps the logs and also a simple database that would hold information about the log groups (package, date, links to log files, etc.) - the frontend: --- displays a list of packages that have matches --- should be integrated with bugzilla; one can see open bugs for a selected package, and also file a new bug --- should be password protected Comments/additions greatly appreciated. Now, regarding the Gentoo application template, I have actually a long time ago submitted a one-line workaround patch[1] for openoffice, but that probably doesn't qualify. Could you point me (general direction is ok) towards something I could fix for my application? Cheers, Antanas [1] https://bugs.gentoo.org/show_bug.cgi?id=306211 On Mon, Apr 22, 2013 at 4:55 PM, Diego Elio Pettenò <flameeyes@flameeyes.eu> wrote: > Hi Antanas, > > First of all, thanks to showing your interest in this project. I'll be > the assigned mentor for the project if you're going to be working on > it. > > To answer your concerns (in quite an unsorted fashion, I apologize), I > would start with saying that what we're looking for with this project > is to have not just an average log collector, but one that is > integrated explicitly with components such as Portage and Bugzilla; > the target of this project, for me, is to be able to use it for the > tinderboxes I've been running (which are currently on-hold because I'm > too busy with training at a new job). > > While I'm not discounting *any* particular technology right away, > we're looking for something that works quickly and relatively > easily... sometimes while technology is already out there that could > work, it doesn't suit the workflow as it is right now. At the same > time, for what I'm concerned you can start by keeping the use of > Amazon's AWS services, if it helps you. The end goal is to avoid using > it, but there is no hurry with that, as the costs associated with it > are very marginal right now. The only important part here is that the > data you store on AWS is not persistent when using the final > collector. > > I don't have any problem with "wasting" the 10 days — but this does > not give any "get out of trouble free" card for the evaluation, so > keep it in mind, you might have to do some harder work in the first > few days. But that does not say much, some people actually performs > better under pressure, so it's up to you if you want to go for it or > not :) > > If there is anything else you want to know, feel free to ask on the > mailing list and I'll answer ASAP. > > Thanks, > Diego > > Diego Elio Pettenò — Flameeyes > flameeyes@flameeyes.eu — http://blog.flameeyes.eu/ > > > On Mon, Apr 22, 2013 at 2:19 AM, Antanas Uršulis > <antanas.ursulis@gmail.com> wrote: >> Hello, >> >> I'm a final year Computer Science undergraduate at the University of >> Cambridge (UK). My main languages are C and C++, but my bachelor's project >> involves working with and modifying the CIEL task-parallel execution system >> written in Python, and I have also been using Python for my scripting needs. >> I have been a Gentoo user since 2009 (switched from *buntu), if I can >> remember correctly. >> >> I am interested in the "Log collector/analyzer for tinderbox" GSoC project, >> not only because I would like to help the developers of my favourite Linux >> distribution, but I could also see myself using an extension of the tool to >> oversee my own systems (for example, computers at the Lithuanian National >> Olympiad in Informatics, where I am a member of the technical staff). >> >> Of course, there are some things I would like to discuss. Firstly, are there >> already any thoughts on a replacement for Amazon's SimpleDB storage? I would >> see that as a major part of the project. Second, if I were to think of this >> as a more general tool, with the possibility to support various log >> providers (portage, apache, etc.) and analyzers in the future, what would >> your opinion be on that? I think it would be possible to implement this >> without any external dependencies, so that Portage remains light. Third, >> after a quick search, I came upon the question: is there a reason not to >> hook into systems such as Apache Flume + Elastic Search, or logstash, or >> Scribe? >> >> Lastly, I would like to know up-front whether you would be OK with me only >> being to able to fully focus on the project from the 7th June - I have exams >> until then and cannot devote my time to this. That is 10 calendar days lost >> from the "Students get to know mentors, read documentation, get up to speed >> to begin working on their projects." phase. >> >> I'm happy to provide more background about myself and hope to hear from you! >> >> Regards, >> Antanas Uršulis > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-29 1:48 ` Antanas Uršulis @ 2013-04-29 2:08 ` Rich Freeman 2013-05-02 4:22 ` Antanas Uršulis 2013-04-29 3:03 ` Brian Dolbec 2013-04-29 7:28 ` Diego Elio Pettenò 2 siblings, 1 reply; 10+ messages in thread From: Rich Freeman @ 2013-04-29 2:08 UTC (permalink / raw To: gentoo-soc On Sun, Apr 28, 2013 at 9:48 PM, Antanas Uršulis <antanas.ursulis@gmail.com> wrote: > --- portage would implement a client which can submit logs to the > collector, possibly providing information why the package failed While this isn't exactly necessary for tinderbox purposes, having portage be able to dump consolidated logs (tarball, etc) might be really useful for bug reporting purposes in general. Right now we ask users to submit a laundry list of useful data that requires several commands to consolidate. If a simple command dumped all the useful info relevant to the last failed build into a tarball suitable for attachment to a bug, that might be very useful all-around. This could also be leveraged for a bug reporter as well. The tarball might include an xml or text that contains relevant metadata for the failed build suitable for parsing by the log server. Just a random thought - not my project... Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-29 2:08 ` Rich Freeman @ 2013-05-02 4:22 ` Antanas Uršulis 0 siblings, 0 replies; 10+ messages in thread From: Antanas Uršulis @ 2013-05-02 4:22 UTC (permalink / raw To: gentoo-soc > While this isn't exactly necessary for tinderbox purposes, having > portage be able to dump consolidated logs (tarball, etc) might be > really useful for bug reporting purposes in general. Great idea, thank you! Given that I would anyway write code which generates a list of logs/bits of information to send, this feature will be easy to add on. I've incorporated it into my proposal. Antanas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-29 1:48 ` Antanas Uršulis 2013-04-29 2:08 ` Rich Freeman @ 2013-04-29 3:03 ` Brian Dolbec 2013-04-29 7:06 ` Diego Elio Pettenò 2013-04-29 7:28 ` Diego Elio Pettenò 2 siblings, 1 reply; 10+ messages in thread From: Brian Dolbec @ 2013-04-29 3:03 UTC (permalink / raw To: gentoo-soc On Mon, 2013-04-29 at 02:48 +0100, Antanas Uršulis wrote: > - the collector & analyser: > --- receives logs over some protocol > --- should be able to group logs (receive several log files for a > failing package and keep them together) (this, depending on the > implementation, might be part of the portage integration) > --- matches each line against a regexp, we can look into something > more extensible if required > --- organises the files by hostname and submits them to the storage backend > > - the storage backend: > --- I could start with Amazon's AWS and then move to something > standalone (how much data is there to store, actually? 1/10/100 GB? > and how large can a single log file become?) > --- keeps the logs and also a simple database that would hold > information about the log groups (package, date, links to log files, > etc.) > > - the frontend: > --- displays a list of packages that have matches > --- should be integrated with bugzilla; one can see open bugs for a > selected package, and also file a new bug > --- should be password protected > If you would like a gtk frontend for parsing logs, porthole's terminal is pretty much a ready to use app which already parses portage output, separates out error and warning messages to separate views. The separated warnings, etc. make it easy to peruse to find the relevant errors, double-clicking on one brings you to the place in the main log where the error occurred. There you can move through and find the cause if it is shown. It has been my intention for years to separate it from the package browser. If you are interested in all or part of the code. http://sourceforge.net/p/porthole/code/ci/master/tree/porthole/terminal/ P.S. This portion of the terminal code was done long before portage had the current logging capability. -- Brian Dolbec <dolsen@gentoo.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-29 3:03 ` Brian Dolbec @ 2013-04-29 7:06 ` Diego Elio Pettenò 0 siblings, 0 replies; 10+ messages in thread From: Diego Elio Pettenò @ 2013-04-29 7:06 UTC (permalink / raw To: gentoo-soc On 29/04/2013 04:03, Brian Dolbec wrote: > If you would like a gtk frontend for parsing logs, porthole's terminal > is pretty much a ready to use app which already parses portage output, > separates out error and warning messages to separate views. Can we not derail the project please? Gtk cannot really have anything to do with this at this point. If you want something Gtk, make your own project. -- Diego Elio Pettenò — Flameeyes flameeyes@flameeyes.eu — http://blog.flameeyes.eu/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-29 1:48 ` Antanas Uršulis 2013-04-29 2:08 ` Rich Freeman 2013-04-29 3:03 ` Brian Dolbec @ 2013-04-29 7:28 ` Diego Elio Pettenò 2013-04-30 10:13 ` Antanas Uršulis 2 siblings, 1 reply; 10+ messages in thread From: Diego Elio Pettenò @ 2013-04-29 7:28 UTC (permalink / raw To: gentoo-soc Hi Antanas, On 29/04/2013 02:48, Antanas Uršulis wrote: > I've tried to assess what this project would involve, maybe this could > be a starting point for the proposal. A lot of this describes how the > current solution works, so changes might be necessary: Most likely, yes. > - conceptually the system should have 3 components: a log > collector&analyser, a storage backend and a frontend Correct. > - it would be integrated with portage: > --- portage would implement a client which can submit logs to the > collector, possibly providing information why the package failed > --- this connection between portage and the collector should rely on > as little as possible (because any packages providing that > functionality might break) Also correct. > --- it should support IPv6 because that's what is used between the > container on the tinderbox and the box itself (here's a question > though: any technical reason why IPv6 was used? I admit I didn't look > into this too deeply) Yeah the technical reason is actually two fold: - the host for the tinderboxes only has one IPv4 address, so it was either NAT or IPv6; given that the tinderbox runs isolated networking through proxy to stop packages using network at build time, NAT was not a great idea; using IPv6 means that I can still jump on the hosts either from another IPv6-enabled system or, like in my previous and current office, straight from my IPv6-enabled workstation; - by using IPv6, the name of the tinderbox is found simply by doing a reverse-lookup of the address, as all the tinderboxes have proper records. > - the collector & analyser: > --- receives logs over some protocol > --- should be able to group logs (receive several log files for a > failing package and keep them together) (this, depending on the > implementation, might be part of the portage integration) > --- matches each line against a regexp, we can look into something > more extensible if required > --- organises the files by hostname and submits them to the storage backend Also correct. > - the storage backend: > --- I could start with Amazon's AWS and then move to something > standalone (how much data is there to store, actually? 1/10/100 GB? > and how large can a single log file become?) I've seen log files getting over 1GB (yes I know it's crazy) but that's relatively rare. I don't have a quick assessment of the total storage over the past year unfortunately. > --- keeps the logs and also a simple database that would hold > information about the log groups (package, date, links to log files, > etc.) Correct. > - the frontend: > --- displays a list of packages that have matches > --- should be integrated with bugzilla; one can see open bugs for a > selected package, and also file a new bug > --- should be password protected Also correct. Do note that one of the things that the frontend has to do is being able to _attach_ the data rather than just link to it (which is what I've been doing myself up to now). > Comments/additions greatly appreciated. Now, regarding the Gentoo > application template, I have actually a long time ago submitted a > one-line workaround patch[1] for openoffice, but that probably doesn't > qualify. Could you point me (general direction is ok) towards > something I could fix for my application? I'm not sure if it makes much sense to sweat fixing something, given that we're talking about writing a series of webapps and systems. It might be better for you to point at any kind of similar work you ever done. -- Diego Elio Pettenò — Flameeyes flameeyes@flameeyes.eu — http://blog.flameeyes.eu/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-29 7:28 ` Diego Elio Pettenò @ 2013-04-30 10:13 ` Antanas Uršulis 2013-05-02 4:17 ` Antanas Uršulis 0 siblings, 1 reply; 10+ messages in thread From: Antanas Uršulis @ 2013-04-30 10:13 UTC (permalink / raw To: gentoo-soc Thanks for the input, I'll write up a draft application and mail it here later on. In the mean time, I haven't done directly related work, but I could point to my bachelor's project [1]. It's very much a work in progress, but the work I did in Cambridge over last summer is not public yet (really tiny bits of it are here [2]). [1] https://github.com/aursulis/ciel/tree/shm_blockstore [2] https://github.com/awm22/NetFPGA-P33/pull/1 On Mon, Apr 29, 2013 at 8:28 AM, Diego Elio Pettenò <flameeyes@flameeyes.eu> wrote: > Hi Antanas, > > On 29/04/2013 02:48, Antanas Uršulis wrote: >> I've tried to assess what this project would involve, maybe this could >> be a starting point for the proposal. A lot of this describes how the >> current solution works, so changes might be necessary: > > Most likely, yes. > >> - conceptually the system should have 3 components: a log >> collector&analyser, a storage backend and a frontend > > Correct. > >> - it would be integrated with portage: >> --- portage would implement a client which can submit logs to the >> collector, possibly providing information why the package failed >> --- this connection between portage and the collector should rely on >> as little as possible (because any packages providing that >> functionality might break) > > Also correct. > >> --- it should support IPv6 because that's what is used between the >> container on the tinderbox and the box itself (here's a question >> though: any technical reason why IPv6 was used? I admit I didn't look >> into this too deeply) > > Yeah the technical reason is actually two fold: > > - the host for the tinderboxes only has one IPv4 address, so it was > either NAT or IPv6; given that the tinderbox runs isolated networking > through proxy to stop packages using network at build time, NAT was not > a great idea; using IPv6 means that I can still jump on the hosts either > from another IPv6-enabled system or, like in my previous and current > office, straight from my IPv6-enabled workstation; > - by using IPv6, the name of the tinderbox is found simply by doing a > reverse-lookup of the address, as all the tinderboxes have proper records. > >> - the collector & analyser: >> --- receives logs over some protocol >> --- should be able to group logs (receive several log files for a >> failing package and keep them together) (this, depending on the >> implementation, might be part of the portage integration) >> --- matches each line against a regexp, we can look into something >> more extensible if required >> --- organises the files by hostname and submits them to the storage backend > > Also correct. > >> - the storage backend: >> --- I could start with Amazon's AWS and then move to something >> standalone (how much data is there to store, actually? 1/10/100 GB? >> and how large can a single log file become?) > > I've seen log files getting over 1GB (yes I know it's crazy) but that's > relatively rare. I don't have a quick assessment of the total storage > over the past year unfortunately. > >> --- keeps the logs and also a simple database that would hold >> information about the log groups (package, date, links to log files, >> etc.) > > Correct. > >> - the frontend: >> --- displays a list of packages that have matches >> --- should be integrated with bugzilla; one can see open bugs for a >> selected package, and also file a new bug >> --- should be password protected > > Also correct. Do note that one of the things that the frontend has to do > is being able to _attach_ the data rather than just link to it (which is > what I've been doing myself up to now). > >> Comments/additions greatly appreciated. Now, regarding the Gentoo >> application template, I have actually a long time ago submitted a >> one-line workaround patch[1] for openoffice, but that probably doesn't >> qualify. Could you point me (general direction is ok) towards >> something I could fix for my application? > > I'm not sure if it makes much sense to sweat fixing something, given > that we're talking about writing a series of webapps and systems. It > might be better for you to point at any kind of similar work you ever done. > > -- > Diego Elio Pettenò — Flameeyes > flameeyes@flameeyes.eu — http://blog.flameeyes.eu/ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [gentoo-soc] GSoC 2013: Log collector 2013-04-30 10:13 ` Antanas Uršulis @ 2013-05-02 4:17 ` Antanas Uršulis 0 siblings, 0 replies; 10+ messages in thread From: Antanas Uršulis @ 2013-05-02 4:17 UTC (permalink / raw To: gentoo-soc As promised, I posted a proposal at https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/aursulis/1 Comments appreciated as always. For whatever reason Melange got the formatting horribly wrong. I'll try to fix that, as well as write the biography section, in the morning. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-05-02 4:22 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-22 1:19 [gentoo-soc] GSoC 2013: Log collector Antanas Uršulis 2013-04-22 15:55 ` Diego Elio Pettenò 2013-04-29 1:48 ` Antanas Uršulis 2013-04-29 2:08 ` Rich Freeman 2013-05-02 4:22 ` Antanas Uršulis 2013-04-29 3:03 ` Brian Dolbec 2013-04-29 7:06 ` Diego Elio Pettenò 2013-04-29 7:28 ` Diego Elio Pettenò 2013-04-30 10:13 ` Antanas Uršulis 2013-05-02 4:17 ` Antanas Uršulis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox