From: Mark Knecht <markknecht@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] [OT] - command line read *.csv & create new file
Date: Sun, 22 Feb 2009 14:28:43 -0800 [thread overview]
Message-ID: <5bdc1c8b0902221428i4f8fd44ev5c0bd249bb157c72@mail.gmail.com> (raw)
In-Reply-To: <200902222115.31620.shrdlu@unlimitedmail.org>
On Sun, Feb 22, 2009 at 12:15 PM, Etaoin Shrdlu
<shrdlu@unlimitedmail.org> wrote:
> On Sunday 22 February 2009, 20:06, Mark Knecht wrote:
>> Hi,
>> Very off topic other than I'd do this on my Gentoo box prior to
>> using R on my Gentoo box. Please ignore if not of interest.
>>
>> I've got a really big data file in essentially a *.csv format.
>> (comma delimited) I need to scan this file and create a new output
>> file. I'm wondering if there is a reasonably easy command line way of
>> doing this using something like sed or awk which I know nothing about.
>> Thanks in advance.
>>
>> The basic idea goes something like this:
>>
>> 1) The input file might look this the following where some of it is
>> attributes (shown as letters) and other parts are results. (shown as
>> numbers)
>>
>> A,B,C,D,1
>> E,F,G,H,2
>> I,J,K,L,3
>> M,N,O,P,4
>> Q,R,S,T,5
>> U,V,W,X,6
>
> Are the results always in the last field, and only a single field?
> Is the total number of fields per line always fixed?
I don't know that for certain yet but I think the results will not
always be in the last field.
The total number of fields per line is always fixed in a given file
but might change from file to file. If it does I'm willing to do minor
edits (heck - I'll do major edits if I have to!!) to get it working.
>
>> 2) From the above data input file I want to take the attributes from a
>> few preceeding lines (say 3 in this example) and write them to the
>> output file along with the result on the last of the 3 lines. The
>> output file might look like this:
>>
>> A,B,C,D,E,F,G,H,I,J,K,L,3
>> E,F,G,H,I,J,K,L,M,N,O,P,4
>> I,J,K,L,M,N,O,P,Q,R,S,T,5
>> M,N,O,P,Q,R,S,T,U,V,W,X,6
>
> Is the number of lines you pick for the operation always 3 or can it
> vary? And, once you choose a number n of lines, should the whole file be
> processed concatenating n lines at a time, and the resulting single line
> be ended with the result of the nth line? in other words, does the
> following hold for the output format:
>
> <concatenation of attributes of lines 1..n> <result of line n>
> <concatenation of attributes of lines 2..n+1> <result of line n+1>
> <concatenation of attributes of lines 3..n+2> <result of line n+1>
> <concatenation of attributes of lines 4..n+3> <result of line n+1>
The above diagram is correct when the lines chosen is 3. I suspect
that I might chose 10 or 15 lines once I get real data and do some
testing but that was harder to show in this email. A good design for
me would be a single variable I could set. Once a value is chosen I
want to process every line in the input file the same way. I don't use
5 lines sometimes and 10 lines other times. In a given file it's
always the same number of lines.
> ...
>
> With answers to the above questions, it's probably possible to hack
> together a solution.
Thanks!
- Mark
next prev parent reply other threads:[~2009-02-22 22:28 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-22 19:06 [gentoo-user] [OT] - command line read *.csv & create new file Mark Knecht
2009-02-22 20:15 ` Etaoin Shrdlu
2009-02-22 22:28 ` Mark Knecht [this message]
2009-02-22 22:57 ` Etaoin Shrdlu
2009-02-22 23:31 ` Mark Knecht
2009-02-23 6:17 ` Paul Hartman
2009-02-23 9:57 ` Etaoin Shrdlu
2009-02-23 16:05 ` Mark Knecht
2009-02-23 22:18 ` Etaoin Shrdlu
2009-02-24 2:26 ` Mark Knecht
2009-02-24 10:56 ` Etaoin Shrdlu
2009-02-24 14:41 ` Mark Knecht
2009-02-24 17:48 ` Etaoin Shrdlu
2009-02-24 22:51 ` Mark Knecht
2009-02-25 10:27 ` Etaoin Shrdlu
2009-02-22 20:59 ` Willie Wong
2009-02-22 23:15 ` Mark Knecht
2009-02-23 0:57 ` Willie Wong
2009-02-23 1:54 ` Mark Knecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5bdc1c8b0902221428i4f8fd44ev5c0bd249bb157c72@mail.gmail.com \
--to=markknecht@gmail.com \
--cc=gentoo-user@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox