Content uploaded by Anuj Tiwari
Author content
All content in this area was uploaded by Anuj Tiwari on Jun 25, 2021
Content may be subject to copyright.
Optimizing Sampling Strategies for
Large Rural Regions
Anuj Tiwari, Charlie Catlett (Discovery Partners Institute of the University of Illinois System); Rachel Poretsky (University of Illinois-Chicago); Aaron Packman (Northwestern University)
Wilnise Jasmin (Chicago Department of Public Health); Charles Williams, Wayne Duffus, Sarah Patrick, James Wendt, Leslie Wise (Illinois Department of Public Health)
Counties with Highest COVID-19
Vulnerability and not Sequencing
(Vulnerability: top 1/3 or 35 out of 102 counties)
Multiple sampling schemes were evaluated,
including (a) largest 8 population centers in
each of Illinois’ 11 COVID-19 recovery
regions, and (b) all communities with
population >20k. Both approaches left vast
rural areas unexamined.
Current Plan: The 2021 plan is to grow from 20 WWTPs as of June 2021 to 50 in September and
150 in January 2022, representing the largest population centers in each of Illinois’ 102 counties.
Next Steps: Two experiments are being conducted in summer 2021 to validate the sampling
scheme. Each will sample additional population centers in several counties to determine to what
the degree the largest population centers adequately represent the surrounding counties.
COVID-19 Impact Assessment Algorithm
(Trend and Homogeneity Analysis)
Tiwari, Anuj, Arya V. Dadhania, Vijay Avin Balaji Ragunathrao,and Edson RA Oliveira. "Using
machine learning to develop a novel COVID-19 Vulnerability Index (C19VI)." Science of The Total
Environment 773 (2021): 145650.
Random Forest Vulnerability Modeling
COVID-19 Vulnerability Index (C19VI)
In Sep 2020,we introduced a machine learning based COVID-19
Vulnerability Index (C19VI) using CDC’s six themes. This model uses an
ensemble learning approach with recursive partitioning to optimally
compute non-linear relationships between input themes.
Wastewater Sampling Site Selection
CDC SVI
(Linear + Overall Vulnerability) CDC CCVI
(Linear Statistical Model)
C19VI
(Non-Linear ML Model)
County Level Sequencing
Information
Walmart Locations & ZIPCODE
level COVID-19 Cases Data
COVID-19 Trend &
Homogeneity
Illinois City Population
Eleven COVID-19
“Restore Illinois” Regions
Geospatial
Analysis
Comparative
Evaluation
Wastewater Based Epidemiology (WBE) typically focuses on wastewater treatment plants (WWTPs) and the corresponding community. However, many public health questions require insight across many
communities such as at the county or state level. What is the optimal sampling strategy in regions such as rural counties with dozens of treatment plants and private septic systems, or in entire states with
many such counties?
The State of Illinois is a primarily rural region comprising 58k square miles with a population of roughly 16M, over half of whom live within 50 miles of Chicago. Outside of the Chicago area, Illinois has
roughly 140 cities with populations greater than 10k and nearly 1,000 with populations under 10k. To understand and track infectious disease such as COVID-19 across these communities, a
comprehensive WBE sampling plan would involve over 1,000 WWTPs and countless private septic systems.
We refined the model in early 2021 with additional demographic, walmartshed and
sequencing information to evaluate WBE sampling strategies across Illinois counties to
support an expansion of WBE from currently several dozen to over 150 WWTPs.
In large rural regions such as are many U.S. states,
different sampling strategies are necessary for different
population densities. In large urban areas, the
sewersheds rarely align with municipal boundaries
because they are shared, and the population served
by these sewersheds is so large that in order to isolate
hotspots or resurgences it is necessary to develop a
sampling strategy that isolates and samples sub-
sewersheds. In lower population density areas, although
the sewersheds are well-defined around small to
medium-sized communities, the sheer number of such
sewer systems requires a sampling strategy that
balances population coverage with geographic
coverage.
C19VI predictive model is a reliable and pragmatic
alternative to the SVI and CCVI for measuring COVID-19
vulnerability and wastewater site suitability at county
level.
Much obvious sampling strategies like largest 8
population centers in each of Illinois’ 11 COVID-19
recovery regions, and all communities with population
>20k left vast rural areas unexamined.
Lessons Learned