From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1LbjAI-0002re-V8 for garchives@archives.gentoo.org; Mon, 23 Feb 2009 22:20:43 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id E5ED2E0560; Mon, 23 Feb 2009 22:20:22 +0000 (UTC) Received: from dcnode-01.unlimitedmail.net (smtp.unlimitedmail.net [94.127.184.242]) by pigeon.gentoo.org (Postfix) with ESMTP id 94D03E0560 for ; Mon, 23 Feb 2009 22:20:22 +0000 (UTC) Received: from ppp.zz ([137.204.208.98]) (authenticated bits=0) by dcnode-01.unlimitedmail.net (8.14.3/8.14.3) with ESMTP id n1NMK9Wp026468 for ; Mon, 23 Feb 2009 23:20:09 +0100 From: Etaoin Shrdlu To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file Date: Mon, 23 Feb 2009 23:18:38 +0100 User-Agent: KMail/1.9.9 References: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com> <200902231057.56850.shrdlu@unlimitedmail.org> <5bdc1c8b0902230805t575e97deg9c8b9fb271f4296@mail.gmail.com> In-Reply-To: <5bdc1c8b0902230805t575e97deg9c8b9fb271f4296@mail.gmail.com> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200902232318.38352.shrdlu@unlimitedmail.org> X-UnlimitedMail-MailScanner-From: shrdlu@unlimitedmail.org X-Spam-Status: No X-Archives-Salt: 2537e8a8-e701-4c2d-bbab-acc57829702d X-Archives-Hash: 81a74885073e8556c2cee93d8451fd4c On Monday 23 February 2009, 17:05, Mark Knecht wrote: > I'm attaching a small (100 line) data file out of TradeStation. Zipped > it's about 2K. It should expand to about 10K. When I run the command > to get 10 lines put together it works correctly and gives me a file > with 91 lines and about 100K in size. (I.e. - 10x on my disk.) > > awk -v n=3D10 -f awkScript1.awk awkDataIn.csv >awkDataOut.csv > > No mangling of the first line - that must have been something earlier > I guess. Sorry for the confusion on that front. > > One other item has come up as I start to play with this farther down > the tool chain. I want to use this data in either R or RapidMiner to > data mine for patterns. Both of those tools are easier to use if the > first line in the file has column titles. I had originally asked > TradeStation not to output the column titles but if I do then for the > first line of our new file I should actually copy the first line of > the input file N times. Something like > > For i=3D1; read line, write N times, write \n > > and then > > for i>=3D2 do what we're doing right now. That is actually accomplished just by adding a bit of code: BEGIN {FS=3DOFS=3D","} NR=3D=3D1{for(i=3D1;i<=3Dn;i++){printf "%s%s", sep, $0;sep=3DOFS};print""} = # header NR>=3D2{ =C2=A0 r=3D$NF;NF-- =C2=A0 for(i=3D1;i=3Dn+1)&&(i=3D=3D1))printf "%s%s",dt[1],OFS =C2=A0 =C2=A0 if(NR>=3Dn+1)printf "%s%s",s[i],OFS =C2=A0 } =C2=A0 sep=3Ddt[n]=3D"";for(i=3D1;i<=3Ddropcol;i++){dt[n]=3Ddt[n] sep $i;se= p=3DOFS} =C2=A0 sub("^([^,]*,){"dropcol"}","") =C2=A0 s[n]=3D$0 =C2=A0 if(NR>=3Dn+1)printf "%s,%s\n", s[n],r } Note that no column is dropped from the header. If you need to do that,=20 just tell us how you want to do that.