PresentationPDF Available

Designing and Administrating the COVID-19 Financial Impacts Survey

Authors:

Abstract

At the end of 2020, State Water Board staff were tasked with gathering information on the financial impacts of COVID-19 to agencies and households throughout California. The final dataset covered a statistically significant sample of agencies throughout the state, gathering both qualitative and quantitative information on agency finances and customer debt levels. This presentation details the statistical work undertaken to design and administer the survey, and to analyze the data to make statewide extrapolations.
California Water Boards
Designing and Administrating
the COVID-19 Financial
Impacts Survey
Bethany Robinson, Ph.D (Water Resource Control Engineer, DDW)
Scott Coffin, Ph.D (Research Scientist III, DDW)
Marielle Pinheiro, Ph.D (Research Data Specialist II, ORPP)
California Water Boards
Key Survey Design/Analysis Tools
Survey Design
Clearly defining data objectives
Question design
Statistical power analysis
Survey distribution and collection
Survey Analysis
Correlation and regression
Imputation
Extrapolation and uncertainty estimation (calibration and weighting)
2
California Water Boards
Background
Executive Order N-42-20 (April 2020): Shutoff moratorium
Preliminary voluntary survey in summer 2020
More expansive survey effort in winter 2020
Motivation:
Understand financial impacts to drinking water systems
Understand household bill debt
Inform development of assistance options
California Water Boards
Summer 2020: Initial Voluntary Survey
Shiny app, similar to a Google form
(but with user authentication for data
quality/PII purposes)
MongoDB backend for response storage
Publicized via listservs (Water Board, water
agency organizations)
Results not statistically significant
Skewed sample (overrepresented larger systems,
spatial gaps)
Only captured 30% of state population served
California Water Boards
Winter 2020: Targeted sample of systems
Small/medium systems
(<10,000 service connections)
Large systems (10,000+
service connections)
Number (percent) of
community water systems
2661 systems (92%) 223 systems (8%)
Number (percent) of state
population served
7.1 million (18%) 32.4 million (82%)
Sample size 510 150
Number of responses
(systems represented*)
406 (428) 131 (151)
Number (percent) of
subset population
represented
4.0 million (56% of
small/medium system
population)
24.2 million (75% of large
system population)
* Some agencies provided responses for multiple systems as they were unable to separate their billing data
California Water Boards
Survey Administration
Small/Medium Water Systems Large Water Systems
450 systems received an email on 11/9,
notifying them that staff would be
following up
150 systems received an email with the
survey on 11/9
DDW staff workshop 11/5: instructed
field staff on specifics of filling out
small/medium system survey
Public workshop 11/16: instructed
agency representatives on specifics of
filling out large system survey
DDW field office staff called systems and
filled out survey forms
Agencies completed survey forms and
returned to ORPP staff
California Water Boards
Sampling and Survey Design
7
California Water Boards
Probabilistic Sampling Approaches
Simple Random Sampling
Stratified Random Sampling
Multi-Stage Sampling
8
Which should you choose?
California Water Boards
Probabilistic Sampling Approaches
9
Groupings of #Service Connections Makes Case for STRATIFIED Random Sampling
California Water Boards
Sampling Procedure
Kept the two populations (small and large) separate
Small/medium sample: 4 bins determined using Jenk’s Natural Breaks
Large sample: random sample of large system population (stratified by
service connections and region)
Final survey results also included 5 voluntary responses from large systems
California Water Boards
Survey Design:
Binning Small/Medium Systems
Bins (Service Connection Breaks):
Bin A: 0 1,008
Bin B: 1,009 3,090
Bin C: 3,091 5,868
Bin D: 5,869 10,000
California Water Boards
Margins of Error
12
Bin Service connections Population Sample Size Margin of Error
for 90% CI
Margin of Error
for 95% CI
Bin A SC < 1,009 1944 184 1.5% 1.7%
Bin B 1,009 <= SC < 3,091 265 107 0.8% 1.0%
Bin C 3,091 <= SC < 5,869 108 70 1.2% 1.4%
Bin D 5,869 <= SC < 10,000 72 55 1.2% 1.4%
z = z-score
σ = standard deviation
P = population
n = sample size
 ∗
 −
 −1
California Water Boards
Question Design Considerations
Number and length of questions
Limits on staff time (both Water Board and local agencies)
Data availability
Some billing systems didn’t have historical records
Some systems declined to participate or provide data
Billing cycle and resultant data uniformity
PII protection
Guidance document for staff to ensure consistent responses
13
California Water Boards
Survey Questions
Small/Medium System Questions Large System Questions
System finances
(Revenues, expenses, cash reserves,
months before assistance needed)
Household debt
(Delinquent accounts, debt levels, total
water bill debt, drinking water debt,
debt per billing cycle, debt levels per zip
code)
Household debt distribution
(Number of accounts with debt, total
amount of debt, debt amount per zip
code)
Fees and repayment options
(Late fees charged, number of accounts
on repayment plans)
Questions created by ORPP/DDW working group and vetted by various
stakeholder groups
California Water Boards
Survey Distribution, Collection, and
Data Extraction
15
California Water Boards
Survey Distribution Considerations
Time constraints
no time to build and test a new site
Couldn’t use existing Shiny app for legal reasons
Platform considerations: Survey needed to be:
Usable for all parties involved (Water Board staff, agencies, general public)
Customizable
Accessible
Data quality control and management: Ensure that responses
Are the right type (i.e. numeric vs string)
Are consistent across different billing cycle types
Can be extracted programmatically and compiled into a single database
California Water Boards
Survey Distribution Solution
Excel spreadsheet with
data validation
Flexible design, but still
machine readable
No new skills needed to
create survey form
Retains data type
information
Created separate template
files for large and small
system responses
Used Python to automate
personalized survey form
generation
California Water Boards
Question Types
18
Yes/No
Multiple Choice
Comment boxes
Tables
California Water Boards
Binned Delinquent Accounts Table
For each zip code
in the service area:
How many
accounts have less
than $100 debt,
between $100-
$200, etc.
Provided ability to
join data with zip
code level ACS
data, examine
spatial trends
19
California Water Boards
File Generation and Management
1. Created separate file trees for
small/large survey responses in the
S Drive
2. Used openpyxl to fill in template
cells with
1. PWSID
2. system name
3. billing cycle length
3. Saved prefilled files as
[MAIN]/[DISTRICT]/Incomplete/
Survey_[DISTRICT]_[PWSID]_[NAME].xlsx
4. Once a survey was finished, Water
Board staff moved file to “Complete”
folder
Main
Folder
District
subfolder
Incomplete Complete
California Water Boards
Data Extraction
21
Excel file: Numeric/string
responses
Dictionary [DICT_LOC] of
{[varname]:[cell_loc]} pairs
Excel file: table with binned
number of delinquent accounts,
per zip code
extract_function([FILE],[DICT_LOC])
1. Open file using openpyxl
2. For each [varname]in [DICT_LOC]:
1. Obtain value [ans] located at
[cell_loc]
2. Store [varname]:[ans] in
dictionary [DICT_ANS]
3. Store filepath (keep track of original
records)
1. Use pandas to read table [ZIP_TAB]
2. Store [ZIP_TAB] in [DICT_ANS]
Return dictionary [DICT_ANS]
California Water Boards
Data Storage and Release
22
Keylist Table:
3-digit ID
System Name
PWSID
Public Release:
3-digit ID
Data with PII
removed
Private Record:
3-digit ID
Full, unredacted
data
References back
to original files
stored on S drive
Use pyodbc to write data to SQL tables on the Water Board server
Data errors were fixed by re-reading corrected file using extract_function()
and changing the table values in the row corresponding to the survey ID
Also added correction details in a “Notes” column
Use pandas.read_sql_query() and pandas.to_excel() to read public
release table from the server and write to Excel for publishing to website
California Water Boards
Imputation, Weighting, and
Extrapolation
23
California Water Boards
Survey Population Did Not Perfectly
Match State Population
California Water Boards
Statewide Extrapolation Workflow
25
Valliant et al. (2018). Springer.
1.Impute missing data
2.Calculate base weights
3.Adjust weights for non-response
4.Calibrate weights
California Water Boards
Statewide Extrapolation - Weighting
26
=
t = total
wi= Weight
yi= Response Valliant et al. (2018). Springer.
California Water Boards
Statewide Extrapolation Workflow
27
Valliant et al. (2018). Springer.
1.Impute missing data
2.Calculate base weights
3.Adjust weights for non-response
4.Calibrate weights
California Water Boards
Reasons for Missing Data
Billing software limitations resulted in incomplete survey responses
79 systems:
Do not keep historic data
Had difficulty sorting accounts by customer class
Had difficulty with zip code extraction
Some systems were unable to isolate drinking water portion of total debt
Time limitationssome systems unable to retrieve data before deadline
28
California Water Boards
Response Data were Incomplete
Mice package (R).
California Water Boards
Types of Missingness
University of Virgina Library
MCAR: Missing Completely at Random
The missingness of data points are at random
MAR: Missing at Random
The missingness is not completely random, but the propensity of
missingness depends on the observed data, not the missing data.
MNAR: Missing Not at Random
The missing is not random, it correlates with unobservable characteristics
unknown to a researcher
California Water Boards
Data were Missing at Random
Missingness
Fraction
Figure from pheatmap package (R)
California Water Boards
Filling in Missing Data with Multiple Imputation
University of Virgina Library
California Water Boards
Imputing Missing Data
California Water Boards
Imputation Via Random Forest Modelling
Imputed Data
Observed Data
Data imputed with Mice package (R).
California Water Boards
Statewide Extrapolation Workflow
35
Valliant et al. (2018). Springer.
1.Impute missing data
2.Calculate base weights
3.Adjust weights for non-response
4.Calibrate weights
California Water Boards
Statewide Extrapolation Base Weights
36
Valliant et al. (2018). Springer.
Base weight = probability of being sample = proportion in population
California Water Boards
Statewide Extrapolation Workflow
37
Valliant et al. (2018). Springer.
1.Impute missing data
2.Calculate base weights
3.Adjust weights for non-response
4.Calibrate weights
California Water Boards
(0 1,008) (1,009 3,090) (3,091 5,868) (5,869 10,000)
Service Connection Breaks:
Survey Bins
Survey Response Was Not Random
Figure made with ggplot2 (R)
Response Rate
California Water Boards
Predicting Non-response with Auxiliary Variables
Non-Response Weight
Median 12-Month Household Income ($USD) Figure made with ggplot2 (R)
California Water Boards
Non-Response Adjustment using Machine-Learning
Zhao et al. (2016). Contemporary Clinical Trials Decision Tree generated with rpart package (R)
California Water Boards
Statewide Extrapolation Workflow
41
Valliant et al. (2018). Springer.
1.Impute missing data
2.Calculate base weights
3.Adjust weights for non-response
4.Calibrate weights
California Water Boards
Calibration – Evaluate Correlative Factors
42
California Water Boards
Calibration –
Identify Candidate Calibration Variables
43
Psych package (R))
California Water Boards
Weighting and Calibration Decreases Uncertainties
Less weighting More weighting
California Water Boards
Data Analysis and Example Results
45
California Water Boards
Analysis Techniques
Random Forest: Variable importance
Weighted Extrapolation
46
California Water Boards
Perceived Assistance Prediction
A random forest machine
learning algorithm of 10,000 trees
determined that the most
important variables for predicting
assistance are:
1. Median household income and;
2. Service connections.
47
California Water Boards
Financial Vulnerability Metrics
Perceived Months before
assistance Days of unrestricted cash Revenue to expense ratio
No assistance >365 days >2
>12 180-365 days 1.5-2
9-12 90-180 days 1.2-1.5
6-9 60-90 days 0.95-1.2
3-6 30-60 days 0.5-0.95
<3 <30 days <0.5
48
Vulnerability scores were determined by the count of “high-risk” indicators per system (red text in table).
Systems with 0 out of 3 high-risk indicators have low financial vulnerability.
Systems with 1 out of 3 high-risk indicators have medium financial vulnerability
Systems with 2 out of 3 high-risk indicators have high financial vulnerability
Systems with 3 out of 3 high-risk indicators have extreme financial vulnerability
California Water Boards
Vulnerability Scores and Statewide
Estimates
49
Vulnerability Survey sample Survey % Statewide
estimate
Statewide %
Low 218 52% 1172 49%
Medium 116 28% 592 25%
High 38 9% 270 11%
Extreme 51% 25 1%
Used weights (explained earlier) to extrapolate from the
survey results to a statewide estimate of how many systems
were at high or extreme financial vulnerability status
California Water Boards
Total Statewide Household Debt:
March – October 2020
50
Data System size Survey Total Statewide Estimate
Total Debt
($USD)
Large $671.8 million $731.7 million +/-$1.4 million
Small/Medium $33.0 million $57.6 million +/-$2.5 million
Tota l $704.8 million $789.3 million +/-$3.9 million
California Water Boards
Demographic Correlations
Racial disparities:
Zip codes with higher percentages of Hispanic and Black
households have:
• A higher percentage of households with some level
of debt
• A higher average level of debt
• A higher percentage of households with very high levels
of debt (above $600 and above $1,000)
Strong correlation with median household income
51
California Water Boards
Key Survey Design/Analysis Tools
Survey Design
Clearly defining data objectives
Question design
Statistical power analysis
Survey distribution and collection
Survey Analysis
Correlation and regression
Imputation
Extrapolation and uncertainty estimation (calibration and weighting)
52
California Water Boards
Further Information
COVID-19 Financial Impacts Survey Website:
https://www.waterboards.ca.gov/drinking_water/certlic/drinkingw
ater/covid-19watersystemsurvey.html
Marielle: Marielle.Pinheiro@Waterboards.ca.gov
Bethany: Bethany.Robinson@Waterboards.ca.gov
@DrSCoffin ScottCoffin
53
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.