public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Sun, 22 Feb 2009 11:06:31 -0800	[thread overview]
Message-ID: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com> (raw)

Hi,
   Very off topic other than I'd do this on my Gentoo box prior to
using R on my Gentoo box. Please ignore if not of interest.

   I've got a really big data file in essentially a *.csv format.
(comma delimited) I need to scan this file and create a new output
file. I'm wondering if there is a reasonably easy command line way of
doing this using something like sed or awk which I know nothing about.
Thanks in advance.

   The basic idea goes something like this:

1) The input file might look this the following where some of it is
attributes (shown as letters) and other parts are results. (shown as
numbers)

A,B,C,D,1
E,F,G,H,2
I,J,K,L,3
M,N,O,P,4
Q,R,S,T,5
U,V,W,X,6

2) From the above data input file I want to take the attributes from a
few preceeding lines (say 3 in this example) and write them to the
output file along with the result on the last of the 3 lines. The
output file might look like this:

A,B,C,D,E,F,G,H,I,J,K,L,3
E,F,G,H,I,J,K,L,M,N,O,P,4
I,J,K,L,M,N,O,P,Q,R,S,T,5
M,N,O,P,Q,R,S,T,U,V,W,X,6

3) This must be done as a read/process/write operation of some sort
because the input file may be far larger than system memory.
(Currently it isn't, but it likely will eventually be.)

4) In my example above I suggested that there is a single result but
their may be more than one. (Don't know yet.) I showed 3 lines but
might be doing 10. I don't know. It's important to me to pick a
moderately flexible way of dealing with this as the order of columns
and number of results will likely change over time and I'll certainly
need to adjust.

   Thanks in advance for any pointers. Happy to buy a good book if
someone knows what I should look for.

Cheers,
Mark



             reply	other threads:[~2009-02-22 19:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 19:06 Mark Knecht [this message]
2009-02-22 20:15 ` [gentoo-user] [OT] - command line read *.csv & create new file Etaoin Shrdlu
2009-02-22 22:28   ` Mark Knecht
2009-02-22 22:57     ` Etaoin Shrdlu
2009-02-22 23:31       ` Mark Knecht
2009-02-23  6:17         ` Paul Hartman
2009-02-23  9:57         ` Etaoin Shrdlu
2009-02-23 16:05           ` Mark Knecht
2009-02-23 22:18             ` Etaoin Shrdlu
2009-02-24  2:26               ` Mark Knecht
2009-02-24 10:56                 ` Etaoin Shrdlu
2009-02-24 14:41                   ` Mark Knecht
2009-02-24 17:48                     ` Etaoin Shrdlu
2009-02-24 22:51                       ` Mark Knecht
2009-02-25 10:27                         ` Etaoin Shrdlu
2009-02-22 20:59 ` Willie Wong
2009-02-22 23:15   ` Mark Knecht
2009-02-23  0:57     ` Willie Wong
2009-02-23  1:54       ` Mark Knecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com \
    --to=markknecht@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox