[gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?
@ 2024-03-21 21:46 Walter Dnes
  2024-03-21 22:02 ` Frank Steinmetzger
  0 siblings, 1 reply; 5+ messages in thread
From: Walter Dnes @ 2024-03-21 21:46 UTC (permalink / raw
  To: Gentoo Users List

  The province of Ontario does weekly Covid data updates which I
summarize and post on the DSLReports Canchat subforum, e.g.
https://www.dslreports.com/forum/r33854514-#google_vignette  Note the
data gap in the pink and brown lines on the 3rd and 4th graphs.  That's
actual missing data.  In the underlying spreadsheet page those gaps
initially show up as zeros.  I manually blank out region B1258:C1299
(i.e. 2023/09/09 to 2023/10/20) every week when I update so that it
doesn't show up as zero hospitalizations.  How do I set up and execute a
macro to to zap the contents of region B1258:C1299 on a page?

  I've recently solved another problem with the COVID data that I
mentioned in the past.  The data files could be downloaded by a browser
but "wget" was being rejected.  The data keepers seem to be filtering
based on the default "wget" user agent.  I've bypassed that by faking
a valid browser user agent... any valid browser user agent.

#!/bin/bash
rm -rf conposcovidloc.csv
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0" https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/455fd63b-603d-4608-8216-7d8647f43350/download/conposcovidloc.csv

#!/bin/bash
rm -rf region_hospital_icu_covid_data.csv
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0" https://data.ontario.ca/dataset/8f3a449b-bde5-4631-ada6-8bd94dbc7d15/resource/e760480e-1f95-4634-a923-98161cfb02fa/download/region_hospital_icu_covid_data.csv

  Note that I have to first remove the previous week's file, because wget
won't overwrite it, and skips the download altogether.

-- 
Roses are red
Roses are blue
Depending on their velocity
Relative to you

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?
  2024-03-21 21:46 [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? Walter Dnes
@ 2024-03-21 22:02 ` Frank Steinmetzger
  2024-03-22  4:20   ` Walter Dnes
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Steinmetzger @ 2024-03-21 22:02 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

Am Thu, Mar 21, 2024 at 05:46:31PM -0400 schrieb Walter Dnes:

>   The province of Ontario does weekly Covid data updates which I
> summarize and post on the DSLReports Canchat subforum, e.g.
> https://www.dslreports.com/forum/r33854514-#google_vignette  Note the
> data gap in the pink and brown lines on the 3rd and 4th graphs.  That's
> actual missing data.  In the underlying spreadsheet page those gaps
> initially show up as zeros.  I manually blank out region B1258:C1299
> (i.e. 2023/09/09 to 2023/10/20) every week when I update so that it
> doesn't show up as zero hospitalizations.  How do I set up and execute a
> macro to to zap the contents of region B1258:C1299 on a page?

Why not make the alteration one step before -- in the CSV?
There are CSV abstraction tools like `q`, which gives you a SQL-like 
interface to a csv file. Or you could write a quick transformer in python, 
if you know the language a bit.

Pseudo code, as I haven’t worked with csv in Python in a looong time:

import csv
with csv.open("input file", 'r') as A:
    with csv.open("output file", 'w') as B:
        for rownum, row in enumerate(A):
            if rownum >= 1258 or rownum <= 1299:
                # write a modified row which has columns B and C blanked
                B.write( [row[0] + ['', ''] + row[3:] )
            else:
                B.write(row)

>   Note that I have to first remove the previous week's file, because wget
> won't overwrite it, and skips the download altogether.

Maybe remove the -r from rm, just to peace of mind. Also, such minimalist 
scripts that don’t use bash features can be sh scripts instead. This 
increases performance, as sh loads faster than bash. ;-)

-- 
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

It is not enough to have no opinion, you must also be unable to express it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?
  2024-03-21 22:02 ` Frank Steinmetzger
@ 2024-03-22  4:20   ` Walter Dnes
  2024-03-22  6:52     ` Paul Colquhoun
  0 siblings, 1 reply; 5+ messages in thread
From: Walter Dnes @ 2024-03-22  4:20 UTC (permalink / raw
  To: gentoo-user

On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote

> Why not make the alteration one step before -- in the CSV?  There are
> CSV abstraction tools like `q`, which gives you a SQL-like interface
> to a csv file. Or you could write a quick transformer in python,
> if you know the language a bit.

  I wrote a quick transformer in bash, because I know the language a
bit <G>.  There are six "hospital regions" in Ontario.  The CSV file
does all data for one "hospital region" April 1, 2020 to present,
followed by same for next hospital region, etc, etc, for all 6 regions.
My first step is to run dos2unix and sort by date, ending up with 6
consecutive lines per day, sorted by date.  Then a "while" loop reads
through the input CSV file.  Accumulators are added up for all 6 lines
and a line is written out for each date.  Because I'm doing bash
arithmetic, the data *MUST* be valid numbers, not ".".  So I *MUST* set
missing data to somthing like zero.  Adding "." to a number causes bash
to error out with a CSV file like so...

"2023-09-05","CENTRAL",5,2,152,6,3,1,1
"2023-09-06","CENTRAL",5,2,136,6,3,1,1
"2023-09-07","CENTRAL",7,1,158,8,2,1,1
"2023-09-08","CENTRAL",7,3,154,6,3,-1,0
"2023-09-09","CENTRAL",".",".",169,".",".",".","."
"2023-09-10","CENTRAL",".",".",169,".",".",".","."
"2023-09-11","CENTRAL",".",".",155,".",".",".","."
"2023-09-12","CENTRAL",".",".",147,".",".",".","."

  The magic incantation for CSV files is to set IFS to a comma like so...

oldifs="${IFS}"
IFS=","

  This allows parsing CSV files like so...

while read
do
   dataarray=(${REPLY})

...and, kaboom, you have a fully populated array from reading one line
of a CSV file.  The following "transformer" is my "parsehospicu" script
that summarizes the data to "hospsum.csv".  Note: I've deleted the
leading "shebang slash bin slash bash" line because the Gentoo mailing
list software doesn't seem to like "executable emails".

  I suppose I could have two different versions of the...

echo "${prevdate}...

...line inside of an if/then/else/fi construct.  It would depend on the
date being inside the "missing data range".

=======================================================================

# Strip out missing "." that screw up the script
sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv
dos2unix -n region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv
#
# tail skips headers at beginning of file
# sed deletes Row_ID, and strips out quotes
# Output goes to file /dev/shm/temp0.txt
tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > /dev/shm/temp0.txt
#
## Set up IFS for easier parsing
oldifs="${IFS}"
IFS=","
#
# Initialize previous line's date to enter loop smoothly
# expando to read first line
dataline=$( head -1 /dev/shm/temp0.txt )
dataarray=(${dataline})
prevdate="${dataarray[0]}"
#
# Zero out accumulators to enter loop smoothly
accum2=0
accum3=0
accum4=0
accum5=0
accum6=0
accum7=0
accum8=0
#
# Remove previous hospsum.csv and open a new one for writing
rm -rf hospsum.csv
exec 3>hospsum.csv
#
# Write header line to output file
echo "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3
#
# Main loop
# Read the data from one line in /dev/shm/temp0.txt
while read
do
   dataarray=(${REPLY})
   if [ "${dataarray[0]}" = "${prevdate}" ]; then
#
# If this line's date is same as previous line's date, add amounts to accumulators.
      accum2=$(( ${accum2} + ${dataarray[2]} ))
      accum3=$(( ${accum3} + ${dataarray[3]} ))
      accum4=$(( ${accum4} + ${dataarray[4]} ))
      accum5=$(( ${accum5} + ${dataarray[5]} ))
      accum6=$(( ${accum6} + ${dataarray[6]} ))
      accum7=$(( ${accum7} + ${dataarray[7]} ))
      accum8=$(( ${accum8} + ${dataarray[8]} ))
   else
#
# If this line's date has changed, output to hospsum.csv, update prevdate,
# and update accumulators.  ***IMPORTANT*** "echo" TO hospsum.csv MUST BE
# EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate***
      echo "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}" >&3
      prevdate="${dataarray[0]}"
      accum2=${dataarray[2]}
      accum3=${dataarray[3]}
      accum4=${dataarray[4]}
      accum5=${dataarray[5]}
      accum6=${dataarray[6]}
      accum7=${dataarray[7]}
      accum8=${dataarray[8]}
   fi
done</dev/shm/temp0.txt
#
# Close file hospsum.csv
exec 3<&-
IFS="${oldifs}"


-- 
Roses are red
Roses are blue
Depending on their velocity
Relative to you


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?
  2024-03-22  4:20   ` Walter Dnes
@ 2024-03-22  6:52     ` Paul Colquhoun
  2024-03-22 18:33       ` Walter Dnes
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Colquhoun @ 2024-03-22  6:52 UTC (permalink / raw
  To: gentoo-user

On Friday, March 22, 2024 3:20:08 P.M. AEDT Walter Dnes wrote:
> On Thu, Mar 21, 2024 at 11:02:21PM +0100, Frank Steinmetzger wrote
> 
> > Why not make the alteration one step before -- in the CSV?  There are
> > CSV abstraction tools like `q`, which gives you a SQL-like interface
> > to a csv file. Or you could write a quick transformer in python,
> > if you know the language a bit.
> 
>   I wrote a quick transformer in bash, because I know the language a
> bit <G>.  There are six "hospital regions" in Ontario.  The CSV file
> does all data for one "hospital region" April 1, 2020 to present,
> followed by same for next hospital region, etc, etc, for all 6 regions.
> My first step is to run dos2unix and sort by date, ending up with 6
> consecutive lines per day, sorted by date.  Then a "while" loop reads
> through the input CSV file.  Accumulators are added up for all 6 lines
> and a line is written out for each date.  Because I'm doing bash
> arithmetic, the data *MUST* be valid numbers, not ".".  So I *MUST* set
> missing data to somthing like zero.  Adding "." to a number causes bash
> to error out with a CSV file like so...
> 
> "2023-09-05","CENTRAL",5,2,152,6,3,1,1
> "2023-09-06","CENTRAL",5,2,136,6,3,1,1
> "2023-09-07","CENTRAL",7,1,158,8,2,1,1
> "2023-09-08","CENTRAL",7,3,154,6,3,-1,0
> "2023-09-09","CENTRAL",".",".",169,".",".",".","."
> "2023-09-10","CENTRAL",".",".",169,".",".",".","."
> "2023-09-11","CENTRAL",".",".",155,".",".",".","."
> "2023-09-12","CENTRAL",".",".",147,".",".",".","."
> 
>   The magic incantation for CSV files is to set IFS to a comma like so...
> 
> oldifs="${IFS}"
> IFS=","
> 
>   This allows parsing CSV files like so...
> 
> while read
> do
>    dataarray=(${REPLY})
> 
> ...and, kaboom, you have a fully populated array from reading one line
> of a CSV file.  The following "transformer" is my "parsehospicu" script
> that summarizes the data to "hospsum.csv".  Note: I've deleted the
> leading "shebang slash bin slash bash" line because the Gentoo mailing
> list software doesn't seem to like "executable emails".
> 
>   I suppose I could have two different versions of the...
> 
> echo "${prevdate}...
> 
> ...line inside of an if/then/else/fi construct.  It would depend on the
> date being inside the "missing data range".
> 
> =======================================================================
> 
> # Strip out missing "." that screw up the script
> sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv >
> region_hospital_icu_covid_datax.csv dos2unix -n
> region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv #
> # tail skips headers at beginning of file
> # sed deletes Row_ID, and strips out quotes
> # Output goes to file /dev/shm/temp0.txt
> tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort >
> /dev/shm/temp0.txt #
> ## Set up IFS for easier parsing
> oldifs="${IFS}"
> IFS=","
> #
> # Initialize previous line's date to enter loop smoothly
> # expando to read first line
> dataline=$( head -1 /dev/shm/temp0.txt )
> dataarray=(${dataline})
> prevdate="${dataarray[0]}"
> #
> # Zero out accumulators to enter loop smoothly
> accum2=0
> accum3=0
> accum4=0
> accum5=0
> accum6=0
> accum7=0
> accum8=0
> #
> # Remove previous hospsum.csv and open a new one for writing
> rm -rf hospsum.csv
> exec 3>hospsum.csv
> #
> # Write header line to output file
> echo
> "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_
> total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3 #
> # Main loop
> # Read the data from one line in /dev/shm/temp0.txt
> while read
> do
>    dataarray=(${REPLY})
>    if [ "${dataarray[0]}" = "${prevdate}" ]; then
> #
> # If this line's date is same as previous line's date, add amounts to
> accumulators. accum2=$(( ${accum2} + ${dataarray[2]} ))
>       accum3=$(( ${accum3} + ${dataarray[3]} ))
>       accum4=$(( ${accum4} + ${dataarray[4]} ))
>       accum5=$(( ${accum5} + ${dataarray[5]} ))
>       accum6=$(( ${accum6} + ${dataarray[6]} ))
>       accum7=$(( ${accum7} + ${dataarray[7]} ))
>       accum8=$(( ${accum8} + ${dataarray[8]} ))
>    else
> #
> # If this line's date has changed, output to hospsum.csv, update prevdate,
> # and update accumulators.  ***IMPORTANT*** "echo" TO hospsum.csv MUST BE
> # EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate***
>       echo
> "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${
> accum8}" >&3 prevdate="${dataarray[0]}"
>       accum2=${dataarray[2]}
>       accum3=${dataarray[3]}
>       accum4=${dataarray[4]}
>       accum5=${dataarray[5]}
>       accum6=${dataarray[6]}
>       accum7=${dataarray[7]}
>       accum8=${dataarray[8]}
>    fi
> done</dev/shm/temp0.txt
> #
> # Close file hospsum.csv
> exec 3<&-
> IFS="${oldifs}"


Bash can do patern substitution in variable references.

Replace  accum3=$(( ${accum3} + ${dataarray[3]} ))
with     accum3=$(( ${accum3} + ${dataarray[3]/'.'/0} ))

and similarly with the other lines and any array value of '.' will be replaced 
with a '0'


-- 
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page?
  2024-03-22  6:52     ` Paul Colquhoun
@ 2024-03-22 18:33       ` Walter Dnes
  0 siblings, 0 replies; 5+ messages in thread
From: Walter Dnes @ 2024-03-22 18:33 UTC (permalink / raw
  To: gentoo-user

On Fri, Mar 22, 2024 at 05:52:04PM +1100, Paul Colquhoun wrote

> Bash can do patern substitution in variable references.
> 
> Replace  accum3=$(( ${accum3} + ${dataarray[3]} ))
> with     accum3=$(( ${accum3} + ${dataarray[3]/'.'/0} ))
> 
> and similarly with the other lines and any array value of '.' will
> be replaced with a '0'

  Replacing "." with a zero is not the problem.  I already do this at
the top of the script with "sed"...

sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv

  The problem is that a certain range of valid-looking zero sums in the
summary output is actually invalid and has to be zapped.  I was
originally hoping to wipe that range with a macro after importing it
into gnumeric.  "Plan B" is to tweak the way that a certain date range
gets written out.  Note the difference starting at 2023-09-09...

2023-09-06,28,10,568,31,12,3,2
2023-09-07,30,6,594,33,8,3,2
2023-09-08,33,7,600,31,11,-2,4
2023-09-09,0,0,628,0,0,0,0
2023-09-10,0,0,657,0,0,0,0
2023-09-11,0,0,625,0,0,0,0

2023-09-06,28,10,568,31,12,3,2
2023-09-07,30,6,594,33,8,3,2
2023-09-08,33,7,600,31,11,-2,4
2023-09-09,,,628,0,0,0,0
2023-09-10,,,657,0,0,0,0
2023-09-11,,,625,0,0,0,0

  Gnumeric imports it OK with nothing in the cells that have missing data
in the source.  While this isn't the way I had originally intended, it
works, which is what matters.  In the revised script, note the filter...

 if [ "${prevdate}" \< "2023-09-09" ] || [ "${prevdate}" \> "2023-10-20" ]

  One advantage of YYYY-MM-DD date format is that I can do straight
string comparisons.  Here's the revised script...

==========================================================================

# Strip out missing "." that screw up the script
sed "s/\"\.\"/0/g" region_hospital_icu_covid_data.csv > region_hospital_icu_covid_datax.csv
dos2unix -n region_hospital_icu_covid_datax.csv region_hospital_icu_covid_datay.csv
#
# tail skips headers at beginning of file
# sed deletes Row_ID, and strips out quotes
# Output goes to file /dev/shm/temp0.txt
tail -n +2 region_hospital_icu_covid_datay.csv | sed "s/\"//g" | sort > /dev/shm/temp0.txt
#
## Set up IFS for easier parsing
oldifs="${IFS}"
IFS=","
#
# Initialize previous line's date to enter loop smoothly
# expando to read first line
dataline=$( head -1 /dev/shm/temp0.txt )
dataarray=(${dataline})
prevdate="${dataarray[0]}"
#
# Zero out accumulators to enter loop smoothly
accum2=0
accum3=0
accum4=0
accum5=0
accum6=0
accum7=0
accum8=0
#
# Remove previous hospsum.csv and open a new one for writing
rm -rf hospsum.csv
exec 3>hospsum.csv
#
# Write header line to output file
echo "date,icu_current_covid,icu_current_covid_vented,hospitalizations,icu_crci_total,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented" >&3
#
# Main loop
# Read the data from one line in /dev/shm/temp0.txt
while read
do
   dataarray=(${REPLY})
   if [ "${dataarray[0]}" = "${prevdate}" ]; then
#
# If this line's date is same as previous line's date, add amounts to accumulators.
      accum2=$(( ${accum2} + ${dataarray[2]} ))
      accum3=$(( ${accum3} + ${dataarray[3]} ))
      accum4=$(( ${accum4} + ${dataarray[4]} ))
      accum5=$(( ${accum5} + ${dataarray[5]} ))
      accum6=$(( ${accum6} + ${dataarray[6]} ))
      accum7=$(( ${accum7} + ${dataarray[7]} ))
      accum8=$(( ${accum8} + ${dataarray[8]} ))
   else
#
# If this line's date has changed, output to hospsum.csv, update prevdate,
# and update accumulators.  ***IMPORTANT*** "echo" TO hospsum.csv MUST BE
# EXECUTED BEFORE UPDATING ACCUMULATORS AND prevdate***
#
# Data *NOT* in range 2023-09-09 ... 2023-10-20 is written out in full.
# Data in that range gets null data for ${accum2} and ${accum3}
      if [ "${prevdate}" \< "2023-09-09" ] || [ "${prevdate}" \> "2023-10-20" ]
      then
        echo "${prevdate},${accum2},${accum3},${accum4},${accum5},${accum6},${accum7},${accum8}" >&3
      else
        echo "${prevdate},,,${accum4},${accum5},${accum6},${accum7},${accum8}" >&3
      fi
      prevdate="${dataarray[0]}"
      accum2=${dataarray[2]}
      accum3=${dataarray[3]}
      accum4=${dataarray[4]}
      accum5=${dataarray[5]}
      accum6=${dataarray[6]}
      accum7=${dataarray[7]}
      accum8=${dataarray[8]}
   fi
done</dev/shm/temp0.txt
#
# Close file hospsum.csv
exec 3<&-
IFS="${oldifs}"

-- 
Roses are red
Roses are blue
Depending on their velocity
Relative to you


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-22 18:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-21 21:46 [gentoo-user] How do I zap a specific area of a gnumeric spreadsheet page? Walter Dnes
2024-03-21 22:02 ` Frank Steinmetzger
2024-03-22  4:20   ` Walter Dnes
2024-03-22  6:52     ` Paul Colquhoun
2024-03-22 18:33       ` Walter Dnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox