From: Etaoin Shrdlu <shrdlu@unlimitedmail.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Mon, 23 Feb 2009 10:57:56 +0100 [thread overview]
Message-ID: <200902231057.56850.shrdlu@unlimitedmail.org> (raw)
In-Reply-To: <5bdc1c8b0902221531v5ccd6ee4w40437f0267e623df@mail.gmail.com>
On Monday 23 February 2009, 00:31, Mark Knecht wrote:
> Yeah, that's probably almost usable as it is . I tried it with n=3 and
> n=10. Worked both times just fine. The initial issue might be (as with
> Willie's sed code) that the first line wasn't quite right and required
> some hand editing. I'd prefer not to have to hand edit anything as the
> files are large and that step will be slow. I can work on that.
But then could you paste an example of such line, so we can see it? The
first line was not special in the sample you posted...
> As per the message to Willie it would be nice to be able to drop
> columns out but technically I suppose it's not really required. All of
> this is going into another program which must at some level understand
> what the columns are. If I have extra dates and don't use them that's
> probably workable.
Anyway, it's not difficult to add that feature:
BEGIN { FS=OFS=","}
{
r=$NF;NF--
for(i=1;i<n;i++){
s[i]=s[i+1]
dt[i]=dt[i+1]
if((NR>=n)&&(i==1))printf "%s%s",dt[1],OFS
if(NR>=n)printf "%s%s",s[i],OFS
}
sep=dt[n]="";for(i=1;i<=dropcol;i++){dt[n]=dt[n] sep $i;sep=OFS}
sub("^([^,]*,){"dropcol"}","")
s[n]=$0
if(NR>=n)printf "%s,%s\n", s[n],r
}
There is a new variable "dropcol" which contains the number of columns to
drop. Also, for the above to work, you must add the --re-interval
command line switch to awk, eg
awk --re-interval -v n=4 -v dropcol=2 -f program.awk datafile.csv
> The down side is the output file is 10x larger than the input file -
> roughly - and my current input files are 40-60MB so the output files
> will be 600MB. Not huge but if they grew too much more I might get
> beyond what a single file can be on ext3, right? Isn't that 2GB or so?
That is strange, the output file could be bigger but not by that
factor...if you don't mind, again could you paste a sample input file
(maybe just some lines, to get an idea...)?
next prev parent reply other threads:[~2009-02-23 9:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-22 19:06 [gentoo-user] [OT] - command line read *.csv & create new file Mark Knecht
2009-02-22 20:15 ` Etaoin Shrdlu
2009-02-22 22:28 ` Mark Knecht
2009-02-22 22:57 ` Etaoin Shrdlu
2009-02-22 23:31 ` Mark Knecht
2009-02-23 6:17 ` Paul Hartman
2009-02-23 9:57 ` Etaoin Shrdlu [this message]
2009-02-23 16:05 ` Mark Knecht
2009-02-23 22:18 ` Etaoin Shrdlu
2009-02-24 2:26 ` Mark Knecht
2009-02-24 10:56 ` Etaoin Shrdlu
2009-02-24 14:41 ` Mark Knecht
2009-02-24 17:48 ` Etaoin Shrdlu
2009-02-24 22:51 ` Mark Knecht
2009-02-25 10:27 ` Etaoin Shrdlu
2009-02-22 20:59 ` Willie Wong
2009-02-22 23:15 ` Mark Knecht
2009-02-23 0:57 ` Willie Wong
2009-02-23 1:54 ` Mark Knecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200902231057.56850.shrdlu@unlimitedmail.org \
--to=shrdlu@unlimitedmail.org \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox