public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Walter Dnes <waltdnes@waltdnes.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?
Date: Fri, 22 Mar 2024 00:20:08 -0400	[thread overview]
Message-ID: <MTAwMDAzNC53ZG5lc2RheQ.1711081203@quikprotect> (raw)
In-Reply-To: <ZfyubWIBR8XKnKhA@tp>

On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote

> Why not make the alteration one step before -- in the CSV?  There are
> CSV abstraction tools like `q`, which gives you a SQL-like interface
> to a csv file. Or you could write a quick transformer in python,
> if you know the language a bit.

  I wrote a quick transformer in bash, because I know the language a
bit <G>.  There are six "hospital regions" in Ontario.  The CSV file
does all data for one "hospital region" April 1, 2020 to present,
followed by same for next hospital region, etc, etc, for all 6 regions.
My first step is to run dos2unix and sort by date, ending up with 6
consecutive lines per day, sorted by date.  Then a "while" loop reads
through the input CSV file.  Accumulators are added up for all 6 lines
and a line is written out for each date.  Because I'm doing bash
arithmetic, the data *MUST* be valid numbers, not ".".  So I *MUST* set
missing data to somthing like zero.  Adding "." to a number causes bash
to error out with a CSV file like so...

"2023-09-05","CENTRAL",5,2,152,6,3,1,1
"2023-09-06","CENTRAL",5,2,136,6,3,1,1
"2023-09-07","CENTRAL",7,1,158,8,2,1,1
"2023-09-08","CENTRAL",7,3,154,6,3,-1,0
"2023-09-09","CENTRAL",".",".",169,".",".",".","."
"2023-09-10","CENTRAL",".",".",169,".",".",".","."
"2023-09-11","CENTRAL",".",".",155,".",".",".","."
"2023-09-12","CENTRAL",".",".",147,".",".",".","."

  The magic incantation for CSV files is to set IFS to a comma like so...

oldifs="${IFS}"
IFS=","

  This allows parsing CSV files like so...

while read
do
   dataarray=(${REPLY})

...and, kaboom, you have a fully populated array from reading one line
of a CSV file.  The following "transformer" is my "parsehospicu" script
that summarizes the data to "hospsum.csv".  Note: I've deleted the
leading "shebang slash bin slash bash" line because the Gentoo mailing
list software doesn't seem to like "executable emails".

  I suppose I could have two different versions of the...

echo "${prevdate}...

...line inside of an if/then/else/fi construct.  It would depend on the
date being inside the "missing data range".

=======================================================================

# Strip out missing "." that screw up the script
sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv
dos2unix -n region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv
#
# tail skips headers at beginning of file
# sed deletes Row_ID, and strips out quotes
# Output goes to file /dev/shm/temp0.txt
tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > /dev/shm/temp0.txt
#
## Set up IFS for easier parsing
oldifs="${IFS}"
IFS=","
#
# Initialize previous line's date to enter loop smoothly
# expando to read first line
dataline=$( head -1 /dev/shm/temp0.txt )
dataarray=(${dataline})
prevdate="${dataarray[0]}"
#
# Zero out accumulators to enter loop smoothly
accum2=0
accum3=0
accum4=0
accum5=0
accum6=0
accum7=0
accum8=0
#
# Remove previous hospsum.csv and open a new one for writing
rm -rf hospsum.csv
exec 3>hospsum.csv
#
# Write header line to output file
echo "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3
#
# Main loop
# Read the data from one line in /dev/shm/temp0.txt
while read
do
   dataarray=(${REPLY})
   if [ "${dataarray[0]}" = "${prevdate}" ]; then
#
# If this line's date is same as previous line's date, add amounts to accumulators.
      accum2=$(( ${accum2} + ${dataarray[2]} ))
      accum3=$(( ${accum3} + ${dataarray[3]} ))
      accum4=$(( ${accum4} + ${dataarray[4]} ))
      accum5=$(( ${accum5} + ${dataarray[5]} ))
      accum6=$(( ${accum6} + ${dataarray[6]} ))
      accum7=$(( ${accum7} + ${dataarray[7]} ))
      accum8=$(( ${accum8} + ${dataarray[8]} ))
   else
#
# If this line's date has changed, output to hospsum.csv, update prevdate,
# and update accumulators.  ***IMPORTANT*** "echo" TO hospsum.csv MUST BE
# EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate***
      echo "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}" >&3
      prevdate="${dataarray[0]}"
      accum2=${dataarray[2]}
      accum3=${dataarray[3]}
      accum4=${dataarray[4]}
      accum5=${dataarray[5]}
      accum6=${dataarray[6]}
      accum7=${dataarray[7]}
      accum8=${dataarray[8]}
   fi
done</dev/shm/temp0.txt
#
# Close file hospsum.csv
exec 3<&-
IFS="${oldifs}"


-- 
Roses are red
Roses are blue
Depending on their velocity
Relative to you


  reply	other threads:[~2024-03-22  4:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-21 21:46 [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? Walter Dnes
2024-03-21 22:02 ` Frank Steinmetzger
2024-03-22  4:20   ` Walter Dnes [this message]
2024-03-22  6:52     ` Paul Colquhoun
2024-03-22 18:33       ` Walter Dnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MTAwMDAzNC53ZG5lc2RheQ.1711081203@quikprotect \
    --to=waltdnes@waltdnes.org \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox