From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1LbMoX-0000ql-1F for garchives@archives.gentoo.org; Sun, 22 Feb 2009 22:28:45 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id AFE2EE02D8; Sun, 22 Feb 2009 22:28:43 +0000 (UTC) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.239]) by pigeon.gentoo.org (Postfix) with ESMTP id 76382E02D8 for ; Sun, 22 Feb 2009 22:28:43 +0000 (UTC) Received: by rv-out-0506.google.com with SMTP id g9so1550797rvb.2 for ; Sun, 22 Feb 2009 14:28:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=lF3unT++TWTPq6eVA+bK58nkDJEq0atQMh0SPC0Q3dg=; b=dn8LHuAFU9MXzIcGZiSIX/I5civtqlOYfOfjyO9E4hAzKF9dytgSW5PT+zaXWQ81D3 6YXOWEVVWt8BDRT7/7QAqpMQQi/ZAc8aj7XosX1AOiJt6GrIFYhb7QbaU/jAVuUW7yd+ EbKi5Qs2jvAuf8ZBr/GEw52TipJANrlY2Il+o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=rCEgh+4VilSW8cxKeoykz7IstpmiPmjkQ58sg2E11g1eiLOnaEYSROdVH2So+rN+ey 0Hgjg7y59JPxWD+xf8jPyk9hsezOP/W5WZNc0pYSh+s6niP8AUPc0EDCHvuU99HJwbIL TeLi/f2lVoMq+6XzMBDj0OnvasURv1Oe+HjJo= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Received: by 10.142.238.4 with SMTP id l4mr1632941wfh.339.1235341723051; Sun, 22 Feb 2009 14:28:43 -0800 (PST) In-Reply-To: <200902222115.31620.shrdlu@unlimitedmail.org> References: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com> <200902222115.31620.shrdlu@unlimitedmail.org> Date: Sun, 22 Feb 2009 14:28:43 -0800 Message-ID: <5bdc1c8b0902221428i4f8fd44ev5c0bd249bb157c72@mail.gmail.com> Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file From: Mark Knecht To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: 6a80a3c2-46e9-42be-914d-8151843f376a X-Archives-Hash: 6931647b6eee1dc96f6ccbdeca6d7f59 On Sun, Feb 22, 2009 at 12:15 PM, Etaoin Shrdlu wrote: > On Sunday 22 February 2009, 20:06, Mark Knecht wrote: >> Hi, >> Very off topic other than I'd do this on my Gentoo box prior to >> using R on my Gentoo box. Please ignore if not of interest. >> >> I've got a really big data file in essentially a *.csv format. >> (comma delimited) I need to scan this file and create a new output >> file. I'm wondering if there is a reasonably easy command line way of >> doing this using something like sed or awk which I know nothing about. >> Thanks in advance. >> >> The basic idea goes something like this: >> >> 1) The input file might look this the following where some of it is >> attributes (shown as letters) and other parts are results. (shown as >> numbers) >> >> A,B,C,D,1 >> E,F,G,H,2 >> I,J,K,L,3 >> M,N,O,P,4 >> Q,R,S,T,5 >> U,V,W,X,6 > > Are the results always in the last field, and only a single field? > Is the total number of fields per line always fixed? I don't know that for certain yet but I think the results will not always be in the last field. The total number of fields per line is always fixed in a given file but might change from file to file. If it does I'm willing to do minor edits (heck - I'll do major edits if I have to!!) to get it working. > >> 2) From the above data input file I want to take the attributes from a >> few preceeding lines (say 3 in this example) and write them to the >> output file along with the result on the last of the 3 lines. The >> output file might look like this: >> >> A,B,C,D,E,F,G,H,I,J,K,L,3 >> E,F,G,H,I,J,K,L,M,N,O,P,4 >> I,J,K,L,M,N,O,P,Q,R,S,T,5 >> M,N,O,P,Q,R,S,T,U,V,W,X,6 > > Is the number of lines you pick for the operation always 3 or can it > vary? And, once you choose a number n of lines, should the whole file be > processed concatenating n lines at a time, and the resulting single line > be ended with the result of the nth line? in other words, does the > following hold for the output format: > > > > > The above diagram is correct when the lines chosen is 3. I suspect that I might chose 10 or 15 lines once I get real data and do some testing but that was harder to show in this email. A good design for me would be a single variable I could set. Once a value is chosen I want to process every line in the input file the same way. I don't use 5 lines sometimes and 10 lines other times. In a given file it's always the same number of lines. > ... > > With answers to the above questions, it's probably possible to hack > together a solution. Thanks! - Mark