From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-user+bounces-91416-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1LbjAI-0002re-V8
	for garchives@archives.gentoo.org; Mon, 23 Feb 2009 22:20:43 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id E5ED2E0560;
	Mon, 23 Feb 2009 22:20:22 +0000 (UTC)
Received: from dcnode-01.unlimitedmail.net (smtp.unlimitedmail.net [94.127.184.242])
	by pigeon.gentoo.org (Postfix) with ESMTP id 94D03E0560
	for <gentoo-user@lists.gentoo.org>; Mon, 23 Feb 2009 22:20:22 +0000 (UTC)
Received: from ppp.zz ([137.204.208.98])
	(authenticated bits=0)
	by dcnode-01.unlimitedmail.net (8.14.3/8.14.3) with ESMTP id n1NMK9Wp026468
	for <gentoo-user@lists.gentoo.org>; Mon, 23 Feb 2009 23:20:09 +0100
From: Etaoin Shrdlu <shrdlu@unlimitedmail.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Mon, 23 Feb 2009 23:18:38 +0100
User-Agent: KMail/1.9.9
References: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com> <200902231057.56850.shrdlu@unlimitedmail.org> <5bdc1c8b0902230805t575e97deg9c8b9fb271f4296@mail.gmail.com>
In-Reply-To: <5bdc1c8b0902230805t575e97deg9c8b9fb271f4296@mail.gmail.com>
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200902232318.38352.shrdlu@unlimitedmail.org>
X-UnlimitedMail-MailScanner-From: shrdlu@unlimitedmail.org
X-Spam-Status: No
X-Archives-Salt: 2537e8a8-e701-4c2d-bbab-acc57829702d
X-Archives-Hash: 81a74885073e8556c2cee93d8451fd4c

On Monday 23 February 2009, 17:05, Mark Knecht wrote:

> I'm attaching a small (100 line) data file out of TradeStation. Zipped
> it's about 2K. It should expand to about 10K. When I run the command
> to get 10 lines put together it works correctly and gives me a file
> with 91 lines and about 100K in size. (I.e. - 10x on my disk.)
>
> awk -v n=3D10 -f awkScript1.awk awkDataIn.csv >awkDataOut.csv
>
> No mangling of the first line - that must have been something earlier
> I guess. Sorry for the confusion on that front.
>
> One other item has come up as I start to play with this farther down
> the tool chain. I want to use this data in either R or RapidMiner to
> data mine for patterns. Both of those tools are easier to use if the
> first line in the file has column titles. I had originally asked
> TradeStation not to output the column titles but if I do then for the
> first line of our new file I should actually copy the first line of
> the input file N times. Something like
>
> For i=3D1; read line, write N times, write \n
>
> and then
>
> for i>=3D2 do what we're doing right now.

That is actually accomplished just by adding a bit of code:

BEGIN {FS=3DOFS=3D","}

NR=3D=3D1{for(i=3D1;i<=3Dn;i++){printf "%s%s", sep, $0;sep=3DOFS};print""} =
# header
NR>=3D2{
=C2=A0 r=3D$NF;NF--
=C2=A0 for(i=3D1;i<n;i++){
=C2=A0 =C2=A0 s[i]=3Ds[i+1]
=C2=A0 =C2=A0 dt[i]=3Ddt[i+1]
=C2=A0 =C2=A0 if((NR>=3Dn+1)&&(i=3D=3D1))printf "%s%s",dt[1],OFS
=C2=A0 =C2=A0 if(NR>=3Dn+1)printf "%s%s",s[i],OFS
=C2=A0 }
=C2=A0 sep=3Ddt[n]=3D"";for(i=3D1;i<=3Ddropcol;i++){dt[n]=3Ddt[n] sep $i;se=
p=3DOFS}
=C2=A0 sub("^([^,]*,){"dropcol"}","")
=C2=A0 s[n]=3D$0
=C2=A0 if(NR>=3Dn+1)printf "%s,%s\n", s[n],r
}

Note that no column is dropped from the header. If you need to do that,=20
just tell us how you want to do that.