From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-user+bounces-91363-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1LbKlE-0006A4-JH
	for garchives@archives.gentoo.org; Sun, 22 Feb 2009 20:17:12 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 8906FE02A7;
	Sun, 22 Feb 2009 20:17:11 +0000 (UTC)
Received: from dcnode-02.unlimitedmail.net (smtp.unlimitedmail.net [94.127.184.242])
	by pigeon.gentoo.org (Postfix) with ESMTP id 39571E02A7
	for <gentoo-user@lists.gentoo.org>; Sun, 22 Feb 2009 20:17:11 +0000 (UTC)
Received: from ppp.zz ([137.204.208.98])
	(authenticated bits=0)
	by dcnode-02.unlimitedmail.net (8.14.3/8.14.3) with ESMTP id n1MKGv31010413
	for <gentoo-user@lists.gentoo.org>; Sun, 22 Feb 2009 21:16:58 +0100
From: Etaoin Shrdlu <shrdlu@unlimitedmail.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Sun, 22 Feb 2009 21:15:31 +0100
User-Agent: KMail/1.9.9
References: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com>
In-Reply-To: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com>
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200902222115.31620.shrdlu@unlimitedmail.org>
X-UnlimitedMail-MailScanner-From: shrdlu@unlimitedmail.org
X-Spam-Status: No
X-Archives-Salt: 8e720754-c693-419a-af0e-7508a9ddca6e
X-Archives-Hash: 1bd75dc616dc663d01ad12ba0ba55f22

On Sunday 22 February 2009, 20:06, Mark Knecht wrote:
> Hi,
>    Very off topic other than I'd do this on my Gentoo box prior to
> using R on my Gentoo box. Please ignore if not of interest.
>
>    I've got a really big data file in essentially a *.csv format.
> (comma delimited) I need to scan this file and create a new output
> file. I'm wondering if there is a reasonably easy command line way of
> doing this using something like sed or awk which I know nothing about.
> Thanks in advance.
>
>    The basic idea goes something like this:
>
> 1) The input file might look this the following where some of it is
> attributes (shown as letters) and other parts are results. (shown as
> numbers)
>
> A,B,C,D,1
> E,F,G,H,2
> I,J,K,L,3
> M,N,O,P,4
> Q,R,S,T,5
> U,V,W,X,6

Are the results always in the last field, and only a single field?
Is the total number of fields per line always fixed?

> 2) From the above data input file I want to take the attributes from a
> few preceeding lines (say 3 in this example) and write them to the
> output file along with the result on the last of the 3 lines. The
> output file might look like this:
>
> A,B,C,D,E,F,G,H,I,J,K,L,3
> E,F,G,H,I,J,K,L,M,N,O,P,4
> I,J,K,L,M,N,O,P,Q,R,S,T,5
> M,N,O,P,Q,R,S,T,U,V,W,X,6

Is the number of lines you pick for the operation always 3 or can it 
vary? And, once you choose a number n of lines, should the whole file be 
processed concatenating n lines at a time, and the resulting single line 
be ended with the result of the nth line? in other words, does the 
following hold for the output format:

<concatenation of attributes of lines 1..n> <result of line n>
<concatenation of attributes of lines 2..n+1> <result of line n+1>
<concatenation of attributes of lines 3..n+2> <result of line n+1>
<concatenation of attributes of lines 4..n+3> <result of line n+1>
...

With answers to the above questions, it's probably possible to hack 
together a solution.