From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-user+bounces-91371-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1LbMoX-0000ql-1F
	for garchives@archives.gentoo.org; Sun, 22 Feb 2009 22:28:45 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id AFE2EE02D8;
	Sun, 22 Feb 2009 22:28:43 +0000 (UTC)
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.239])
	by pigeon.gentoo.org (Postfix) with ESMTP id 76382E02D8
	for <gentoo-user@lists.gentoo.org>; Sun, 22 Feb 2009 22:28:43 +0000 (UTC)
Received: by rv-out-0506.google.com with SMTP id g9so1550797rvb.2
        for <gentoo-user@lists.gentoo.org>; Sun, 22 Feb 2009 14:28:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:mime-version:received:in-reply-to:references
         :date:message-id:subject:from:to:content-type
         :content-transfer-encoding;
        bh=lF3unT++TWTPq6eVA+bK58nkDJEq0atQMh0SPC0Q3dg=;
        b=dn8LHuAFU9MXzIcGZiSIX/I5civtqlOYfOfjyO9E4hAzKF9dytgSW5PT+zaXWQ81D3
         6YXOWEVVWt8BDRT7/7QAqpMQQi/ZAc8aj7XosX1AOiJt6GrIFYhb7QbaU/jAVuUW7yd+
         EbKi5Qs2jvAuf8ZBr/GEw52TipJANrlY2Il+o=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=rCEgh+4VilSW8cxKeoykz7IstpmiPmjkQ58sg2E11g1eiLOnaEYSROdVH2So+rN+ey
         0Hgjg7y59JPxWD+xf8jPyk9hsezOP/W5WZNc0pYSh+s6niP8AUPc0EDCHvuU99HJwbIL
         TeLi/f2lVoMq+6XzMBDj0OnvasURv1Oe+HjJo=
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
Received: by 10.142.238.4 with SMTP id l4mr1632941wfh.339.1235341723051; Sun, 
	22 Feb 2009 14:28:43 -0800 (PST)
In-Reply-To: <200902222115.31620.shrdlu@unlimitedmail.org>
References: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com>
	 <200902222115.31620.shrdlu@unlimitedmail.org>
Date: Sun, 22 Feb 2009 14:28:43 -0800
Message-ID: <5bdc1c8b0902221428i4f8fd44ev5c0bd249bb157c72@mail.gmail.com>
Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file
From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Archives-Salt: 6a80a3c2-46e9-42be-914d-8151843f376a
X-Archives-Hash: 6931647b6eee1dc96f6ccbdeca6d7f59

On Sun, Feb 22, 2009 at 12:15 PM, Etaoin Shrdlu
<shrdlu@unlimitedmail.org> wrote:
> On Sunday 22 February 2009, 20:06, Mark Knecht wrote:
>> Hi,
>>    Very off topic other than I'd do this on my Gentoo box prior to
>> using R on my Gentoo box. Please ignore if not of interest.
>>
>>    I've got a really big data file in essentially a *.csv format.
>> (comma delimited) I need to scan this file and create a new output
>> file. I'm wondering if there is a reasonably easy command line way of
>> doing this using something like sed or awk which I know nothing about.
>> Thanks in advance.
>>
>>    The basic idea goes something like this:
>>
>> 1) The input file might look this the following where some of it is
>> attributes (shown as letters) and other parts are results. (shown as
>> numbers)
>>
>> A,B,C,D,1
>> E,F,G,H,2
>> I,J,K,L,3
>> M,N,O,P,4
>> Q,R,S,T,5
>> U,V,W,X,6
>
> Are the results always in the last field, and only a single field?
> Is the total number of fields per line always fixed?

I don't know that for certain yet but I think the results will not
always be in the last field.

The total number of fields per line is always fixed in a given file
but might change from file to file. If it does I'm willing to do minor
edits (heck - I'll do major edits if I have to!!) to get it working.

>
>> 2) From the above data input file I want to take the attributes from a
>> few preceeding lines (say 3 in this example) and write them to the
>> output file along with the result on the last of the 3 lines. The
>> output file might look like this:
>>
>> A,B,C,D,E,F,G,H,I,J,K,L,3
>> E,F,G,H,I,J,K,L,M,N,O,P,4
>> I,J,K,L,M,N,O,P,Q,R,S,T,5
>> M,N,O,P,Q,R,S,T,U,V,W,X,6
>
> Is the number of lines you pick for the operation always 3 or can it
> vary? And, once you choose a number n of lines, should the whole file be
> processed concatenating n lines at a time, and the resulting single line
> be ended with the result of the nth line? in other words, does the
> following hold for the output format:
>
> <concatenation of attributes of lines 1..n> <result of line n>
> <concatenation of attributes of lines 2..n+1> <result of line n+1>
> <concatenation of attributes of lines 3..n+2> <result of line n+1>
> <concatenation of attributes of lines 4..n+3> <result of line n+1>

The above diagram is correct when the lines chosen is 3. I suspect
that I might chose 10 or 15 lines once I get real data and do some
testing but that was harder to show in this email. A good design for
me would be a single variable I could set. Once a value is chosen I
want to process every line in the input file the same way. I don't use
5 lines sometimes and 10 lines other times. In a given file it's
always the same number of lines.

> ...
>
> With answers to the above questions, it's probably possible to hack
> together a solution.

Thanks!

- Mark