* [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? @ 2024-03-21 21:46 Walter Dnes 2024-03-21 22:02 ` Frank Steinmetzger 0 siblings, 1 reply; 5+ messages in thread From: Walter Dnes @ 2024-03-21 21:46 UTC (permalink / raw To: Gentoo Users List The province of Ontario does weekly Covid data updates which I summarize and post on the DSLReports Canchat subforum, e.g. https://www.dslreports.com/forum/r33854514-#google_vignette Note the data gap in the pink and brown lines on the 3rd and 4th graphs. That's actual missing data. In the underlying spreadsheet page those gaps initially show up as zeros. I manually blank out region B1258:C1299 (i.e. 2023/09/09 to 2023/10/20) every week when I update so that it doesn't show up as zero hospitalizations. How do I set up and execute a macro to to zap the contents of region B1258:C1299 on a page? I've recently solved another problem with the COVID data that I mentioned in the past. The data files could be downloaded by a browser but "wget" was being rejected. The data keepers seem to be filtering based on the default "wget" user agent. I've bypassed that by faking a valid browser user agent... any valid browser user agent. #!/bin/bash rm -rf conposcovidloc.csv wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0" https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/455fd63b-603d-4608-8216-7d8647f43350/download/conposcovidloc.csv #!/bin/bash rm -rf region_hospital_icu_covid_data.csv wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0" https://data.ontario.ca/dataset/8f3a449b-bde5-4631-ada6-8bd94dbc7d15/resource/e760480e-1f95-4634-a923-98161cfb02fa/download/region_hospital_icu_covid_data.csv Note that I have to first remove the previous week's file, because wget won't overwrite it, and skips the download altogether. -- Roses are red Roses are blue Depending on their velocity Relative to you ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? 2024-03-21 21:46 [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? Walter Dnes @ 2024-03-21 22:02 ` Frank Steinmetzger 2024-03-22 4:20 ` Walter Dnes 0 siblings, 1 reply; 5+ messages in thread From: Frank Steinmetzger @ 2024-03-21 22:02 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1936 bytes --] Am Thu, Mar 21, 2024 at 05:46:31PM -0400 schrieb Walter Dnes: > The province of Ontario does weekly Covid data updates which I > summarize and post on the DSLReports Canchat subforum, e.g. > https://www.dslreports.com/forum/r33854514-#google_vignette Note the > data gap in the pink and brown lines on the 3rd and 4th graphs. That's > actual missing data. In the underlying spreadsheet page those gaps > initially show up as zeros. I manually blank out region B1258:C1299 > (i.e. 2023/09/09 to 2023/10/20) every week when I update so that it > doesn't show up as zero hospitalizations. How do I set up and execute a > macro to to zap the contents of region B1258:C1299 on a page? Why not make the alteration one step before -- in the CSV? There are CSV abstraction tools like `q`, which gives you a SQL-like interface to a csv file. Or you could write a quick transformer in python, if you know the language a bit. Pseudo code, as I haven’t worked with csv in Python in a looong time: import csv with csv.open("input file", 'r') as A: with csv.open("output file", 'w') as B: for rownum, row in enumerate(A): if rownum >= 1258 or rownum <= 1299: # write a modified row which has columns B and C blanked B.write( [row[0] + ['', ''] + row[3:] ) else: B.write(row) > Note that I have to first remove the previous week's file, because wget > won't overwrite it, and skips the download altogether. Maybe remove the -r from rm, just to peace of mind. Also, such minimalist scripts that don’t use bash features can be sh scripts instead. This increases performance, as sh loads faster than bash. ;-) -- Grüße | Greetings | Salut | Qapla’ Please do not share anything from, with or about me on any social network. It is not enough to have no opinion, you must also be unable to express it. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? 2024-03-21 22:02 ` Frank Steinmetzger @ 2024-03-22 4:20 ` Walter Dnes 2024-03-22 6:52 ` Paul Colquhoun 0 siblings, 1 reply; 5+ messages in thread From: Walter Dnes @ 2024-03-22 4:20 UTC (permalink / raw To: gentoo-user On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote > Why not make the alteration one step before -- in the CSV? There are > CSV abstraction tools like `q`, which gives you a SQL-like interface > to a csv file. Or you could write a quick transformer in python, > if you know the language a bit. I wrote a quick transformer in bash, because I know the language a bit <G>. There are six "hospital regions" in Ontario. The CSV file does all data for one "hospital region" April 1, 2020 to present, followed by same for next hospital region, etc, etc, for all 6 regions. My first step is to run dos2unix and sort by date, ending up with 6 consecutive lines per day, sorted by date. Then a "while" loop reads through the input CSV file. Accumulators are added up for all 6 lines and a line is written out for each date. Because I'm doing bash arithmetic, the data *MUST* be valid numbers, not ".". So I *MUST* set missing data to somthing like zero. Adding "." to a number causes bash to error out with a CSV file like so... "2023-09-05","CENTRAL",5,2,152,6,3,1,1 "2023-09-06","CENTRAL",5,2,136,6,3,1,1 "2023-09-07","CENTRAL",7,1,158,8,2,1,1 "2023-09-08","CENTRAL",7,3,154,6,3,-1,0 "2023-09-09","CENTRAL",".",".",169,".",".",".","." "2023-09-10","CENTRAL",".",".",169,".",".",".","." "2023-09-11","CENTRAL",".",".",155,".",".",".","." "2023-09-12","CENTRAL",".",".",147,".",".",".","." The magic incantation for CSV files is to set IFS to a comma like so... oldifs="${IFS}" IFS="," This allows parsing CSV files like so... while read do dataarray=(${REPLY}) ...and, kaboom, you have a fully populated array from reading one line of a CSV file. The following "transformer" is my "parsehospicu" script that summarizes the data to "hospsum.csv". Note: I've deleted the leading "shebang slash bin slash bash" line because the Gentoo mailing list software doesn't seem to like "executable emails". I suppose I could have two different versions of the... echo "${prevdate}... ...line inside of an if/then/else/fi construct. It would depend on the date being inside the "missing data range". ======================================================================= # Strip out missing "." that screw up the script sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv dos2unix -n region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv # # tail skips headers at beginning of file # sed deletes Row_ID, and strips out quotes # Output goes to file /dev/shm/temp0.txt tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > /dev/shm/temp0.txt # ## Set up IFS for easier parsing oldifs="${IFS}" IFS="," # # Initialize previous line's date to enter loop smoothly # expando to read first line dataline=$( head -1 /dev/shm/temp0.txt ) dataarray=(${dataline}) prevdate="${dataarray[0]}" # # Zero out accumulators to enter loop smoothly accum2=0 accum3=0 accum4=0 accum5=0 accum6=0 accum7=0 accum8=0 # # Remove previous hospsum.csv and open a new one for writing rm -rf hospsum.csv exec 3>hospsum.csv # # Write header line to output file echo "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 # # Main loop # Read the data from one line in /dev/shm/temp0.txt while read do dataarray=(${REPLY}) if [ "${dataarray[0]}" = "${prevdate}" ]; then # # If this line's date is same as previous line's date, add amounts to accumulators. accum2=$(( ${accum2} + ${dataarray[2]} )) accum3=$(( ${accum3} + ${dataarray[3]} )) accum4=$(( ${accum4} + ${dataarray[4]} )) accum5=$(( ${accum5} + ${dataarray[5]} )) accum6=$(( ${accum6} + ${dataarray[6]} )) accum7=$(( ${accum7} + ${dataarray[7]} )) accum8=$(( ${accum8} + ${dataarray[8]} )) else # # If this line's date has changed, output to hospsum.csv, update prevdate, # and update accumulators. ***IMPORTANT*** "echo" TO hospsum.csv MUST BE # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate*** echo "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}" >&3 prevdate="${dataarray[0]}" accum2=${dataarray[2]} accum3=${dataarray[3]} accum4=${dataarray[4]} accum5=${dataarray[5]} accum6=${dataarray[6]} accum7=${dataarray[7]} accum8=${dataarray[8]} fi done</dev/shm/temp0.txt # # Close file hospsum.csv exec 3<&- IFS="${oldifs}" -- Roses are red Roses are blue Depending on their velocity Relative to you ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? 2024-03-22 4:20 ` Walter Dnes @ 2024-03-22 6:52 ` Paul Colquhoun 2024-03-22 18:33 ` Walter Dnes 0 siblings, 1 reply; 5+ messages in thread From: Paul Colquhoun @ 2024-03-22 6:52 UTC (permalink / raw To: gentoo-user On Friday, March 22, 2024 3:20:08 P.M. AEDT Walter Dnes wrote: > On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote > > > Why not make the alteration one step before -- in the CSV? There are > > CSV abstraction tools like `q`, which gives you a SQL-like interface > > to a csv file. Or you could write a quick transformer in python, > > if you know the language a bit. > > I wrote a quick transformer in bash, because I know the language a > bit <G>. There are six "hospital regions" in Ontario. The CSV file > does all data for one "hospital region" April 1, 2020 to present, > followed by same for next hospital region, etc, etc, for all 6 regions. > My first step is to run dos2unix and sort by date, ending up with 6 > consecutive lines per day, sorted by date. Then a "while" loop reads > through the input CSV file. Accumulators are added up for all 6 lines > and a line is written out for each date. Because I'm doing bash > arithmetic, the data *MUST* be valid numbers, not ".". So I *MUST* set > missing data to somthing like zero. Adding "." to a number causes bash > to error out with a CSV file like so... > > "2023-09-05","CENTRAL",5,2,152,6,3,1,1 > "2023-09-06","CENTRAL",5,2,136,6,3,1,1 > "2023-09-07","CENTRAL",7,1,158,8,2,1,1 > "2023-09-08","CENTRAL",7,3,154,6,3,-1,0 > "2023-09-09","CENTRAL",".",".",169,".",".",".","." > "2023-09-10","CENTRAL",".",".",169,".",".",".","." > "2023-09-11","CENTRAL",".",".",155,".",".",".","." > "2023-09-12","CENTRAL",".",".",147,".",".",".","." > > The magic incantation for CSV files is to set IFS to a comma like so... > > oldifs="${IFS}" > IFS="," > > This allows parsing CSV files like so... > > while read > do > dataarray=(${REPLY}) > > ...and, kaboom, you have a fully populated array from reading one line > of a CSV file. The following "transformer" is my "parsehospicu" script > that summarizes the data to "hospsum.csv". Note: I've deleted the > leading "shebang slash bin slash bash" line because the Gentoo mailing > list software doesn't seem to like "executable emails". > > I suppose I could have two different versions of the... > > echo "${prevdate}... > > ...line inside of an if/then/else/fi construct. It would depend on the > date being inside the "missing data range". > > ======================================================================= > > # Strip out missing "." that screw up the script > sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > > region_hospital_icu_covid_datax.csv dos2unix -n > region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv # > # tail skips headers at beginning of file > # sed deletes Row_ID, and strips out quotes > # Output goes to file /dev/shm/temp0.txt > tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > > /dev/shm/temp0.txt # > ## Set up IFS for easier parsing > oldifs="${IFS}" > IFS="," > # > # Initialize previous line's date to enter loop smoothly > # expando to read first line > dataline=$( head -1 /dev/shm/temp0.txt ) > dataarray=(${dataline}) > prevdate="${dataarray[0]}" > # > # Zero out accumulators to enter loop smoothly > accum2=0 > accum3=0 > accum4=0 > accum5=0 > accum6=0 > accum7=0 > accum8=0 > # > # Remove previous hospsum.csv and open a new one for writing > rm -rf hospsum.csv > exec 3>hospsum.csv > # > # Write header line to output file > echo > "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_ > total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 # > # Main loop > # Read the data from one line in /dev/shm/temp0.txt > while read > do > dataarray=(${REPLY}) > if [ "${dataarray[0]}" = "${prevdate}" ]; then > # > # If this line's date is same as previous line's date, add amounts to > accumulators. accum2=$(( ${accum2} + ${dataarray[2]} )) > accum3=$(( ${accum3} + ${dataarray[3]} )) > accum4=$(( ${accum4} + ${dataarray[4]} )) > accum5=$(( ${accum5} + ${dataarray[5]} )) > accum6=$(( ${accum6} + ${dataarray[6]} )) > accum7=$(( ${accum7} + ${dataarray[7]} )) > accum8=$(( ${accum8} + ${dataarray[8]} )) > else > # > # If this line's date has changed, output to hospsum.csv, update prevdate, > # and update accumulators. ***IMPORTANT*** "echo" TO hospsum.csv MUST BE > # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate*** > echo > "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${ > accum8}" >&3 prevdate="${dataarray[0]}" > accum2=${dataarray[2]} > accum3=${dataarray[3]} > accum4=${dataarray[4]} > accum5=${dataarray[5]} > accum6=${dataarray[6]} > accum7=${dataarray[7]} > accum8=${dataarray[8]} > fi > done</dev/shm/temp0.txt > # > # Close file hospsum.csv > exec 3<&- > IFS="${oldifs}" Bash can do patern substitution in variable references. Replace accum3=$(( ${accum3} + ${dataarray[3]} )) with accum3=$(( ${accum3} + ${dataarray[3]/'.'/0} )) and similarly with the other lines and any array value of '.' will be replaced with a '0' -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? 2024-03-22 6:52 ` Paul Colquhoun @ 2024-03-22 18:33 ` Walter Dnes 0 siblings, 0 replies; 5+ messages in thread From: Walter Dnes @ 2024-03-22 18:33 UTC (permalink / raw To: gentoo-user On Fri, Mar 22, 2024 at 05:52:04PM +1100, Paul Colquhoun wrote > Bash can do patern substitution in variable references. > > Replace accum3=$(( ${accum3} + ${dataarray[3]} )) > with accum3=$(( ${accum3} + ${dataarray[3]/'.'/0} )) > > and similarly with the other lines and any array value of '.' will > be replaced with a '0' Replacing "." with a zero is not the problem. I already do this at the top of the script with "sed"... sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv The problem is that a certain range of valid-looking zero sums in the summary output is actually invalid and has to be zapped. I was originally hoping to wipe that range with a macro after importing it into gnumeric. "Plan B" is to tweak the way that a certain date range gets written out. Note the difference starting at 2023-09-09... 2023-09-06,28,10,568,31,12,3,2 2023-09-07,30,6,594,33,8,3,2 2023-09-08,33,7,600,31,11,-2,4 2023-09-09,0,0,628,0,0,0,0 2023-09-10,0,0,657,0,0,0,0 2023-09-11,0,0,625,0,0,0,0 2023-09-06,28,10,568,31,12,3,2 2023-09-07,30,6,594,33,8,3,2 2023-09-08,33,7,600,31,11,-2,4 2023-09-09,,,628,0,0,0,0 2023-09-10,,,657,0,0,0,0 2023-09-11,,,625,0,0,0,0 Gnumeric imports it OK with nothing in the cells that have missing data in the source. While this isn't the way I had originally intended, it works, which is what matters. In the revised script, note the filter... if [ "${prevdate}" \< "2023-09-09" ] || [ "${prevdate}" \> "2023-10-20" ] One advantage of YYYY-MM-DD date format is that I can do straight string comparisons. Here's the revised script... ========================================================================== # Strip out missing "." that screw up the script sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv dos2unix -n region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv # # tail skips headers at beginning of file # sed deletes Row_ID, and strips out quotes # Output goes to file /dev/shm/temp0.txt tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > /dev/shm/temp0.txt # ## Set up IFS for easier parsing oldifs="${IFS}" IFS="," # # Initialize previous line's date to enter loop smoothly # expando to read first line dataline=$( head -1 /dev/shm/temp0.txt ) dataarray=(${dataline}) prevdate="${dataarray[0]}" # # Zero out accumulators to enter loop smoothly accum2=0 accum3=0 accum4=0 accum5=0 accum6=0 accum7=0 accum8=0 # # Remove previous hospsum.csv and open a new one for writing rm -rf hospsum.csv exec 3>hospsum.csv # # Write header line to output file echo "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 # # Main loop # Read the data from one line in /dev/shm/temp0.txt while read do dataarray=(${REPLY}) if [ "${dataarray[0]}" = "${prevdate}" ]; then # # If this line's date is same as previous line's date, add amounts to accumulators. accum2=$(( ${accum2} + ${dataarray[2]} )) accum3=$(( ${accum3} + ${dataarray[3]} )) accum4=$(( ${accum4} + ${dataarray[4]} )) accum5=$(( ${accum5} + ${dataarray[5]} )) accum6=$(( ${accum6} + ${dataarray[6]} )) accum7=$(( ${accum7} + ${dataarray[7]} )) accum8=$(( ${accum8} + ${dataarray[8]} )) else # # If this line's date has changed, output to hospsum.csv, update prevdate, # and update accumulators. ***IMPORTANT*** "echo" TO hospsum.csv MUST BE # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate*** # # Data *NOT* in range 2023-09-09 ... 2023-10-20 is written out in full. # Data in that range gets null data for ${accum2} and ${accum3} if [ "${prevdate}" \< "2023-09-09" ] || [ "${prevdate}" \> "2023-10-20" ] then echo "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}" >&3 else echo "${prevdate},,,${accum4},${accum5},${accum6},${accum7},${accum8}" >&3 fi prevdate="${dataarray[0]}" accum2=${dataarray[2]} accum3=${dataarray[3]} accum4=${dataarray[4]} accum5=${dataarray[5]} accum6=${dataarray[6]} accum7=${dataarray[7]} accum8=${dataarray[8]} fi done</dev/shm/temp0.txt # # Close file hospsum.csv exec 3<&- IFS="${oldifs}" -- Roses are red Roses are blue Depending on their velocity Relative to you ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-22 18:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-21 21:46 [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? Walter Dnes 2024-03-21 22:02 ` Frank Steinmetzger 2024-03-22 4:20 ` Walter Dnes 2024-03-22 6:52 ` Paul Colquhoun 2024-03-22 18:33 ` Walter Dnes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox