From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([69.77.167.62] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1LbJer-00086H-8D for garchives@archives.gentoo.org; Sun, 22 Feb 2009 19:06:33 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 350EBE035C; Sun, 22 Feb 2009 19:06:32 +0000 (UTC) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.175]) by pigeon.gentoo.org (Postfix) with ESMTP id 06379E035C for ; Sun, 22 Feb 2009 19:06:31 +0000 (UTC) Received: by wf-out-1314.google.com with SMTP id 29so1845501wff.10 for ; Sun, 22 Feb 2009 11:06:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=BP9O2EieWmen1WCYaX4FBR4zsWOx5c2ddPbZCqsxYgE=; b=bXsZ6HHIWR6IXTKMuRueerI6LcJJ9dxp4TrmmEfwDJzfYKdALUhKte5dw1A9yXi+ix 6rTMvmpBoI52h2nOkaA1nKiEmlNx1Qj8pktK6B2YQhgCobrQ3MQh7knrM8if54StJMko ez7oDTqh1lBMfe62SJ8jWZDQ6iw8LhBZgUSZc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=Kq1WK9CxDOejtzbLtVFAhH02gXYxvSzSDTP2OSuoELYzRxUm/aHpqawpoSYxz1KsSz Kjb+/ltAcbnJOPicDJaWsjPCm7znJgqyryNFzw+0DETpPu76xySenNKZB3kep0jYy1RO i07ua+IiqX3zIKtscEcBLaWTlZeWS2HxP3Ack= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Received: by 10.142.135.16 with SMTP id i16mr1568014wfd.250.1235329591646; Sun, 22 Feb 2009 11:06:31 -0800 (PST) Date: Sun, 22 Feb 2009 11:06:31 -0800 Message-ID: <5bdc1c8b0902221106h71a8783y698aa209ace59a6@mail.gmail.com> Subject: [gentoo-user] [OT] - command line read *.csv & create new file From: Mark Knecht To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Archives-Salt: 49f1bc52-bb09-49c2-9a8a-f0ab74089c45 X-Archives-Hash: c0aa7952b75dac8fbc8ff266f9e8fb33 Hi, Very off topic other than I'd do this on my Gentoo box prior to using R on my Gentoo box. Please ignore if not of interest. I've got a really big data file in essentially a *.csv format. (comma delimited) I need to scan this file and create a new output file. I'm wondering if there is a reasonably easy command line way of doing this using something like sed or awk which I know nothing about. Thanks in advance. The basic idea goes something like this: 1) The input file might look this the following where some of it is attributes (shown as letters) and other parts are results. (shown as numbers) A,B,C,D,1 E,F,G,H,2 I,J,K,L,3 M,N,O,P,4 Q,R,S,T,5 U,V,W,X,6 2) From the above data input file I want to take the attributes from a few preceeding lines (say 3 in this example) and write them to the output file along with the result on the last of the 3 lines. The output file might look like this: A,B,C,D,E,F,G,H,I,J,K,L,3 E,F,G,H,I,J,K,L,M,N,O,P,4 I,J,K,L,M,N,O,P,Q,R,S,T,5 M,N,O,P,Q,R,S,T,U,V,W,X,6 3) This must be done as a read/process/write operation of some sort because the input file may be far larger than system memory. (Currently it isn't, but it likely will eventually be.) 4) In my example above I suggested that there is a single result but their may be more than one. (Don't know yet.) I showed 3 lines but might be doing 10. I don't know. It's important to me to pick a moderately flexible way of dealing with this as the order of columns and number of results will likely change over time and I'll certainly need to adjust. Thanks in advance for any pointers. Happy to buy a good book if someone knows what I should look for. Cheers, Mark