public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Sun, 22 Feb 2009 15:31:08 -0800	[thread overview]
Message-ID: <5bdc1c8b0902221531v5ccd6ee4w40437f0267e623df@mail.gmail.com> (raw)
In-Reply-To: <200902222357.40242.shrdlu@unlimitedmail.org>

On Sun, Feb 22, 2009 at 2:57 PM, Etaoin Shrdlu <shrdlu@unlimitedmail.org> wrote:
> On Sunday 22 February 2009, 23:28, Mark Knecht wrote:
>
>> > <concatenation of attributes of lines 1..n> <result of line n>
>> > <concatenation of attributes of lines 2..n+1> <result of line n+1>
>> > <concatenation of attributes of lines 3..n+2> <result of line n+1>
>> > <concatenation of attributes of lines 4..n+3> <result of line n+1>
>>
>> The above diagram is correct when the lines chosen is 3. I suspect
>> that I might chose 10 or 15 lines once I get real data and do some
>> testing but that was harder to show in this email. A good design for
>> me would be a single variable I could set. Once a value is chosen I
>> want to process every line in the input file the same way. I don't use
>> 5 lines sometimes and 10 lines other times. In a given file it's
>> always the same number of lines.
>
> Ok, try this for a start:
>
> BEGIN { FS=OFS=","}
>
> {
>  r=$NF;NF--
>  for(i=1;i<n;i++){
>    s[i]=s[i+1]
>    if(NR>=n)printf "%s%s",s[i],OFS
>  }
>  s[n]=$0;if(NR>=n)printf "%s,%s\n", s[n],r
> }
>
> Save the above code in a file (eg, program.awk) and run it with
>
> awk -v n=3 -f program.awk datafile.csv
>
> where the "n=3" part is to  be replaced with the actual number of lines
> you want to group (eg, n=5, n=4, etc.)
>
> With your sample input and n=3, the above awk program produces the output
> you show.
>
>

Yeah, that's probably almost usable as it is . I tried it with n=3 and
n=10. Worked both times just fine. The initial issue might be (as with
Willie's sed code) that the first line wasn't quite right and required
some hand editing. I'd prefer not to have to hand edit anything as the
files are large and that step will be slow. I can work on that.

As per the message to Willie it would be nice to be able to drop
columns out but technically I suppose it's not really required. All of
this is going into another program which must at some level understand
what the columns are. If I have extra dates and don't use them that's
probably workable.

The down side is the output file is 10x larger than the input file -
roughly - and my current input files are 40-60MB so the output files
will be 600MB. Not huge but if they grew too much more I might get
beyond what a single file can be on ext3, right? Isn't that 2GB or so?

Thanks very much,
Mark



  reply	other threads:[~2009-02-22 23:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 19:06 [gentoo-user] [OT] - command line read *.csv & create new file Mark Knecht
2009-02-22 20:15 ` Etaoin Shrdlu
2009-02-22 22:28   ` Mark Knecht
2009-02-22 22:57     ` Etaoin Shrdlu
2009-02-22 23:31       ` Mark Knecht [this message]
2009-02-23  6:17         ` Paul Hartman
2009-02-23  9:57         ` Etaoin Shrdlu
2009-02-23 16:05           ` Mark Knecht
2009-02-23 22:18             ` Etaoin Shrdlu
2009-02-24  2:26               ` Mark Knecht
2009-02-24 10:56                 ` Etaoin Shrdlu
2009-02-24 14:41                   ` Mark Knecht
2009-02-24 17:48                     ` Etaoin Shrdlu
2009-02-24 22:51                       ` Mark Knecht
2009-02-25 10:27                         ` Etaoin Shrdlu
2009-02-22 20:59 ` Willie Wong
2009-02-22 23:15   ` Mark Knecht
2009-02-23  0:57     ` Willie Wong
2009-02-23  1:54       ` Mark Knecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bdc1c8b0902221531v5ccd6ee4w40437f0267e623df@mail.gmail.com \
    --to=markknecht@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox