ArticlePDF Available

Contribution of low-cost sensor measurements to the prediction of PM2.5 levels: A case study in Imperial County, California, USA

Authors:

Abstract and Figures

Regulatory monitoring networks are often too sparse to support community-scale PM2.5 exposure assessment while emerging low-cost sensors have the potential to fill in the gaps. To date, limited studies, if any, have been conducted to utilize low-cost sensor measurements to improve PM2.5 prediction with high spatiotemporal resolutions based on statistical models. Imperial County in California is an exemplary region with sparse Air Quality System (AQS) monitors and a community-operated low-cost network entitled Identifying Violations Affecting Neighborhoods (IVAN). This study aims to evaluate the contribution of IVAN measurements to the quality of PM2.5 prediction. We adopted the Random Forest algorithm to estimate daily PM2.5 concentrations at a 1-km spatial resolution using three different PM2.5 datasets (AQS-only, IVAN-only, and AQS/IVAN combined). The results show that the integration of low-cost sensor measurements is an effective way to significantly improve the quality of PM2.5 prediction with an increase of cross-validation (CV) R2 by ~0.2. The IVAN measurements also contributed to the increased importance of emission source-related covariates and more reasonable spatial patterns of PM2.5. The remaining uncertainty in the calibrated IVAN measurements could still cause apparent outliers in the prediction model, highlighting the need for more effective calibration or integration methods to relieve its negative impact.
Content may be subject to copyright.
Contribution of Low-Cost Sensor Measurements to the
Prediction of PM2.5 Levels: A Case Study in Imperial County,
California, USA
Jianzhao Bi1, Jennifer Stowell1, Edmund Y. W. Seto2, Paul B. English3, Mohammad Z. Al-
Hamdan4, Patrick L. Kinney5, Frank R. Freedman*,6, Yang Liu*,1
1Department of Environmental Health, Emory University, Rollins School of Public Health, Atlanta,
Georgia 30322, United States
2Department of Environmental & Occupational Health Sciences, University of Washington,
Seattle, Washington 98195, United States
3California Department of Public Health, Richmond, California 94804, United States
4Universities Space Research Association, NASA Marshall Space Flight Center, Huntsville,
Alabama 35808, United States
5Department of Environmental Health, Boston University, School of Public Health, Boston,
Massachusetts 02118, United States
6Department of Meteorology and Climate Science, San Jose State University, San Jose,
California 95192, United States
Abstract
Regulatory monitoring networks are often too sparse to support community-scale PM2.5 exposure
assessment while emerging low-cost sensors have the potential to fill in the gaps. To date, limited
studies, if any, have been conducted to utilize low-cost sensor measurements to improve PM2.5
prediction with high spatiotemporal resolutions based on statistical models. Imperial County in
California is an exemplary region with sparse Air Quality System (AQS) monitors and a
community-operated low-cost network entitled Identifying Violations Affecting Neighborhoods
(IVAN). This study aims to evaluate the contribution of IVAN measurements to the quality of
PM2.5 prediction. We adopted the Random Forest algorithm to estimate daily PM2.5 concentrations
at 1-km spatial resolution using three different PM2.5 datasets (AQS-only, IVAN-only, and AQS/
IVAN combined). The results showed that the integration of low-cost sensor measurements is an
effective way to significantly improve the quality of PM2.5 prediction with an increase of cross-
validation (CV) R2 by ~0.2. The IVAN measurements also contributed to the increased importance
*Corresponding Authors Yang Liu: Mailing Address: Emory University, Rollins School of Public Health, 1518 Clifton Road NE,
Atlanta, GA 30322, USA. Phone: +1 (404) 727-2131, yang.liu@emory.edu, Frank R. Freedman: Mailing Address: San Jose State
University, Department of Meteorology and Climate Science, One Washington Square, San Jose, CA 95192, USA. 23
frank.freedman@sjsu.edu.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of
the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered
which could affect the content, and all legal disclaimers that apply to the journal pertain.
NASA Public Access
Author manuscript
Environ Res
. Author manuscript; available in PMC 2021 January 01.
Published in final edited form as:
Environ Res
. 2020 January ; 180: 108810. doi:10.1016/j.envres.2019.108810.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
of emission source-related covariates and more reasonable spatial patterns of PM2.5. The
remaining uncertainty in the calibrated IVAN measurements could still cause apparent outliers in
the prediction model, highlighting the need for more effective calibration or integration methods to
relieve its negative impact.
Keywords
Low-Cost Sensor; Satellite AOD; Random Forest; Measurement Uncertainty
1. Introduction
Fine particulate matter with an aerodynamic diameter less than or equal to 2.5 micrometers
(PM2.5) has been contributing to a growing disease burden worldwide, causing premature
mortalities and a variety of morbidities including cardiovascular, cerebrovascular, and
respiratory diseases (Bose et al. 2015; Burnett et al. 2014; Madrigano et al. 2013; Sorek-
Hamer et al. 2016). Traditionally, ambient PM2.5 exposure assessments have mainly relied
on measurements from ground monitoring stations. However, as regulatory monitoring is
designed to support compliance with ambient air quality standards (Hall et al. 2014), it lacks
spatial coverage to reflect detailed PM2.5 variations at the community level. Even in the
United States, more than 70% of counties do not have regulatory PM2.5 monitoring so far.
Exposure misclassification due to insufficient coverage of regulatory PM2.5 monitoring can
significantly bias the estimated health impacts of PM2.5 (Zeger et al. 2000).
Over the past decade, satellite aerosol remote sensing has emerged as a useful tool to extend
the coverage of ground PM2.5 monitoring (Bi et al. 2019; Di et al. 2016; Hu et al. 2017;
Kloog et al. 2011; Ma et al. 2016; Xiao et al. 2017). Instruments aboard polar-orbiting
satellites such as the Moderate Resolution Imaging Spectroradiometer (MODIS) and the
Multi-angle Imaging SpectroRadiometer (MISR) have been supplying Aerosol Optical
Depth (AOD) retrievals with global coverage. AOD is a measure of aerosol extinction of the
solar beam along the entire vertical atmospheric column. The relationship of AOD to
ground-level PM2.5 depends on factors such as aerosol vertical profile, water content, size
distribution, and composition (Paciorek et al. 2008; van Donkelaar et al. 2010). Since many
of these factors are not available at large spatial scales, strategies such as statistical models
(Hu et al. 2014; Paciorek et al. 2008; Xiao et al. 2017) and chemical transport model
(CTM)-based scaling approaches (Liu et al. 2004; van Donkelaar et al. 2010) have been
developed to recover the AOD-PM2.5 relationship. Statistical models have been widely used
at urban to national scales due to their excellent performance and ability to yield high-
resolution predictions (Chu et al. 2016). Recently, there is a growing trend of using non-
parametric machine learning models such as artificial neural networks (Di et al. 2016; Zou et
al. 2015) and random forests (Bi et al. 2019; Brokamp et al. 2018; Hu et al. 2017) to better
estimate PM2.5 based on AOD and other covariates. With these methods, spatiotemporally
complete estimates of PM2.5 levels in the areas without ground measurements have been
able to be generated (Di et al. 2016; Just et al. 2015; Ma et al. 2016; Wang et al. 2017).
Bi et al. Page 2
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Sufficient and well-distributed ground measurements are critical to the successful
development of statistical PM2.5 models. An unevenly distributed network may limit the use
of statistical models and the quality of models may significantly decrease as the number of
ground measurements reduces (Geng et al. 2018b). The validation of prediction results may
become unreliable when ground measurements are sparse and the actual quality of
predictions is even unknown in the areas without ground measurements. The requirements
on ground stations are stricter when PM2.5 has significant variations at a fine scale especially
in the areas with complex terrain and many local sources (Saide et al. 2011; van Donkelaar
et al. 2010; van Donkelaar et al. 2006) such as Western United States (Geng et al. 2018a; van
Donkelaar et al. 2006). Additionally, as regulatory monitoring primarily aims to examine the
compliance of air quality standards rather than assess exposure, existing ground stations are
unlikely to represent concentrations where sensitive subpopulations reside. This issue can
further limit the utility of regulatory monitoring data in community-level exposure
assessment.
Recently emerged low-cost PM2.5 sensors have the potential to fill in the gaps of regulatory
PM2.5 monitoring and to overcome the limitations of statistical models based solely on
regulatory measurements. With the features of lower instrument cost, ease of use, and
portability (Jiao et al. 2016; Snyder et al. 2013), low-cost PM2.5 sensors can be densely
deployed by researchers, grass-roots organizations, and citizen scientists. For example, a
commercial low-cost PM monitoring network established in 2015, PurpleAir (https://
www.purpleair.com/), has more than 7,000 nodes worldwide with a growth rate of ~30 per
day (Morawska et al. 2018). The emergence of low-cost sensors has been shifting the
paradigm of air pollution monitoring from being based solely on regulatory networks to
mixed networks consisting of both regulatory and low-cost monitors (Snyder et al. 2013),
and from being conducted by government agencies to increasingly commercial/crowd-
funded projects (Morawska et al. 2018; Snyder et al. 2013). As most of the low-cost PM2.5
sensors use optical light scattering to count particles and convert them to mass
concentrations, they tend to have a lower accuracy than regulatory monitors (Xu 2001).
However, growing efforts have been made to calibrate low-cost PM2.5 measurements in both
laboratory and ambient settings (Broday 2017; Cao and Thompson 2017; Castell et al. 2017;
Holstius et al. 2014; Kelly et al. 2017; Wang et al. 2015). With a significant amount and a
high growth rate, low-cost sensors are expected to shed light on more detailed spatial
variations of PM2.5 at finer scales.
To date, limited studies have focused on using low-cost sensor measurements to improve
PM2.5 prediction with high spatiotemporal resolutions. This study aimed to evaluate the
contribution of low-cost sensor measurements to the estimation of PM2.5 levels in the areas
where sparse regulatory monitors alone cannot support reliable predictions. This case study
focused on Imperial County, California, an exemplary region with PM2.5 pollution
intermittently exceeding the U.S. air quality standard (35 μg/m3 for 24-hour PM2.5 and 12
μg/m3 for annual PM2.5) especially near the U.S.-Mexico border. The PM2.5 pollution is also
associated with critical health issues which promoted a community-based low-cost
monitoring network designed to address public concerns about the ability of regulatory
monitors to reflect true pollutions in local communities (English et al. 2017). Daily PM2.5
predictions with a 1-km resolution were generated by the Random Forest algorithm with
Bi et al. Page 3
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
satellite AOD and relevant covariates. The reliability of PM2.5 predictions before and after
the integration of low-cost PM2.5 measurements were investigated. The limitation of low-
cost PM2.5 measurements caused by their remaining uncertainty and the future perspectives
of better utilizing these measurements were also discussed.
2. Data and Methods
2.1. Study Domain
Imperial County is located in the southern part of the U.S. state of California, bordering the
Mexican state of Baja California. This county has PM2.5 levels frequently exceeding the
U.S. air quality standard with a high rate of childhood asthma-related emergency room visits
(CEHTP 2018). The desert on its west side, the dry lake bed of a saline lake (the Salton Sea)
where an exposed playa is contributing to dust levels (Parajuli and Zender 2018), and the
transboundary pollution have caused substantial variability of PM2.5 levels in different
communities of the county (English et al. 2017). However, there were only three U.S.
Environmental Protection Agency (EPA) Air Quality System (AQS) stations within the
county and three additional near the county that spans over 40,000 square kilometers by
2017 (Figure 1). To meet the request of local communities about more extensive PM2.5
measurements, a low-cost PM2.5 monitoring network, Identifying Violations Affecting
Neighborhoods (IVAN), has been established by a community-engaged research project
(English et al. 2017). As of 2017, the IVAN had built ~40 community PM2.5 monitoring
sites throughout the county.
In this study, the AQS and calibrated IVAN measurements in Imperial County were served
as ground truth for PM2.5 prediction. Figure 1 shows the study domain with the locations of
AQS and IVAN sites. The study domain includes a 50-km buffer beyond county border to
include nearby AQS stations and better illustrate the patterns of transboundary pollution.
Within the study domain, there were 6 AQS stations and 39 IVAN sensors. A 1-km modeling
grid covers the study domain, which totals 41,344 grid cells. The modeling period was from
September 2016 to November 2017 to be consistent with the time span of available
calibrated IVAN PM2.5 measurements.
2.2. PM2.5 Measurements
Regulatory PM2.5 measurements were provided by the U.S. EPA AQS (https://www.epa.gov/
outdoor-air-quality-data). Low-cost PM2.5 measurements were provided by the IVAN air
monitoring system (https://www.ivan-imperial.org/). The IVAN low-cost PM sensor was a
modified version of particle counter Dylos 1700 (Dylos Corporation, Riverside, California).
Raw particle counts from Dylos sensors were calibrated and converted to hourly PM2.5 mass
concentrations using the conversion equation developed by Carvlin et al. (2017). After a
validation with additional collocated reference instruments, Carvlin et al. (2017) found that
the conversion accuracy was moderate to high with R2 values ranging from 0.35 to 0.81 with
an average of 0.59. In this study, hourly IVAN PM2.5 concentrations were further averaged
into daily means (Section 1, Supplementary Material). Negative PM2.5 measurements from
both networks caused by random errors in a clean environment (approaching 0 μg/m3) were
retained to prevent systematic biases (Paciorek et al. 2008).
Bi et al. Page 4
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
2.3. AOD Retrievals
The Multi-Angle Implementation of Atmospheric Correction (MAIAC) is an advanced
MODIS AOD product with global coverage at a 1-km spatial resolution on a daily basis
(MCD19, https://modis-land.gsfc.nasa.gov/MAIAC.html). In order to reflect daytime
changes of AOD, Terra (descending node at 10:30 A.M. local time) and Aqua (ascending
node at 1:30 P.M. local time) AOD served as two separate variables in the prediction models.
According to the quality assessment parameters within MAIAC, the AOD retrievals with
poor quality were filtered out. We followed the approach proposed by Bi et al. (2019) to fill
in missing AOD observations, in which Random Forest models with AOD-related predictors
were established at the daily level (Section 2, Supplementary Material).
2.4. Meteorological Data
Cloud fraction, as the percentage of cloud cover, is an important covariate in AOD gap-
filling since most of the missing AOD data were caused by the existence of cloud in Imperial
County. In this study, satellite-observed cloud fractions were obtained from the MODIS
Level-2 Cloud product (MOD06_L2/MYD06_L2, https://modis.gsfc.nasa.gov/). Other
meteorological variables were obtained from the High-Resolution Rapid Refresh (HRRR)
(https://rapidrefresh.noaa.gov/hrrr/), a National Oceanic and Atmospheric Administration
real-time 3-km resolution updated atmospheric model. The HRRR meteorological
parameters included 2-meter temperature and specific humidity, planetary boundary layer
(PBL) height, sensible heat net flux, frictional velocity, and 10-meter wind direction and
wind speed. These HRRR fields were from the initial forecast hour of operational hourly 18-
hour forecast runs. The fields were obtained from the University of Utah Center for High-
Performance Computing real-time HRRR archive (http://hrrr.chpc.utah.edu/) (Blaylock et al.
2017).
2.5. Land-Use Data
The land-use parameters included 1) the Advanced Spaceborne Thermal Emission and
Reflection Radiometer (ASTER) Global Digital Elevation at a 1 arc-second (~30 m)
resolution (https://asterweb.jpl.nasa.gov/gdem.asp). 2) LandScan ambient population in
2016 at a 900-m resolution (https://web.ornl.gov/sci/landscan/), 3) Normalized Difference
Vegetation Index (NDVI) from the MODIS vegetation indices (MOD13/MYD13) at a 500-m
resolution, 4) the distance to the nearest major road computed from Topologically Integrated
Geographic Encoding and Referencing (TIGER)/Line Geodatabases of the U.S. Census
Bureau and DIVA-GIS (http://www.diva-gis.org/), 5) 0 – 10 cm soil moisture from the North
American Land Data Assimilation System (NLDAS) Noah Land Surface Model at a 0.125-
degree resolution, 6) 8-day land surface temperature from the MODIS land products
(MOD11A2/MYD11A2) at a 1-km resolution, and 7) the percentages of grassland and water
body calculated from GlobCover V2.3 land cover product (European Space Agency, http://
due.esrin.esa.int/page_globcover.php).
2.6. PM2.5 Prediction Models
To evaluate the contribution of low-cost sensor measurements to the quality of PM2.5
estimates, three models with different types of dependent variables were built: 1) the AQS-
Bi et al. Page 5
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
only model, 2) the IVAN-only model, and 3) the AQS/IVAN-combined model. In the first
two models, either AQS or IVAN PM2.5 measurements were used as the dependent variable.
In the third model, both AQS and IVAN PM2.5 measurements were combined. Since the
IVAN measurements had been calibrated and validated with collocated reference-grade
measurements (Carvlin et al. 2017), we treated these measurements as ground truth and
simply merged them with AQS measurements. Three models shared the same set of
independent variables shown in Table 1. The models were based on the Random Forest (RF)
algorithm. RF is an “ensemble learning” method generating a number of decision trees and
aggregating the regressing results from these trees (Breiman 2001). Other statistical models
such as the multi-stage LME-GAM (Linear Mixed Effects-Generalized Additive Model)
(Xiao et al. 2017), XGBoost (Xiao et al. 2018), and artificial neural networks (Di et al.
2016) were also tested in the pilot stage of this study, but RF was able to generate the most
stable and accurate predictions. The number of decision trees in the forest (ntree) and the
number of predictors randomly tried at each split (mtry) are two major hyperparameters of
RF. In this study, ntree was set to be 1000 to guarantee the stability of predictions and mtry
was tuned with cross-validation (CV) and determined to be 6. The prediction model could
generate spatiotemporally continuous PM2.5 estimates with a 1-km resolution at the daily
level. The evaluation of the models was conducted with 10-fold CV (
i.e.
, dropping 10% of
PM2.5 observations). Evaluation metrics included CV R2 and root-mean-square error
(RMSE). The 10-fold CV consisted of overall, spatial, and temporal CVs (Xiao et al. 2017).
10-fold spatial/temporal CV creates validation sets according to the locations/Julian days of
measurements (
i.e.
, dropping 10% of all locations/days of observations). Spatial and
temporal CVs demonstrate model predictability at different locations and times than the
observations used to train the model. Additionally, RF-specific “permutation accuracy
importance” (Breiman 2001) was used to reflect the importance of covariates in the
prediction model. This importance measure is estimated according to the decrease of
prediction accuracy when randomly permuting the “out-of-bag” sample of the targeting
variable (Liaw and Wiener 2002).
The independent variables were determined based on the PM2.5 emission features in
Imperial County. As fugitive dust was emitted from the dry lake bed of the Salton Sea (King
et al. 2011; Parajuli and Zender 2018), we used wind speed and direction, surface soil
moisture, and land surface temperature to reflect the properties of dust emission jointly.
PM2.5/PM10 ratio,
i.e.
, the percentage of PM2.5 in PM10, was found to be a critical predictor
with a high RF variable importance value. This predictor has rarely been considered in
previous studies related to PM2.5 prediction. As several PM2.5 emission sources in Imperial
County also emitted a large amount of PM10 (
e.g.,
dust emissions) (Chow et al. 2000;
Parajuli and Zender 2018), this ratio could help to modify the relationship between AOD and
PM2.5. Another ancillary covariate, PM2.5 convolutional layer, was created following Hu et
al. (2017) who showed that this variable could improve the accuracy of PM2.5 prediction by
considering PM2.5 spatial autocorrelation.
Bi et al. Page 6
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
3. Results
3.1. Summary Statistics and Modeling Performance
Within the study domain, AQS PM2.5 measurements had a mean of 8.55 μg/m3 with an
interquartile range (IQR) of 5.80 μg/m3 (25% and 75% percentiles: [5.00 μg/m3, 10.80 μg/
m3]). IVAN PM2.5 measurements had a mean of 7.44 μg/m3 with an IQR of 5.28 μg/m3
(25% and 75% percentiles: [3.65 μg/m3, 8.93 μg/m3]). AQS measured slightly higher PM2.5
concentrations (~1 μg/m3) than IVAN during the study period. The performance of three
PM2.5 prediction models (AQS-only, IVAN-only, and AQS/IVAN) was summarized in Table
2. Figure 2 shows cross-validation’s scatter plots of the models. Six AQS stations only
provided 1,617 samples and the overall 10-fold CV R2 of the AQS-only model was 0.53.
The spatial CV R2 of the model dropped to 0.24, indicating that the AQS measurements
alone could not support reliable prediction of PM2.5 spatial patterns. In contrast, 39 IVAN
sensors provided 11,965 samples and the IVAN-only model had an overall 10-fold CV R2 of
0.75. The spatial and temporal CV R2 values of this model (0.64 and 0.70, respectively)
were slightly lower than the overall R2 but still significantly higher than those of the AQS-
only model. All three models had similar RMSE values ranged from 3.71 to 3.76 μg/m3.
This was a reasonable value consistent with Hu et al. (2017) who had a regional RMSE of
3.32 μg/m3 in the western climate region (including California and Nevada) in their U.S.
national PM2.5 prediction model for the year of 2011.
Apart from the regular CV, the AQS measurements were also used as a test set to validate
the IVAN-only model. This validation was designed to examine to what extent the IVAN-
based predictions could agree with AQS measurements and whether the IVAN
measurements alone could support a reliable PM2.5 prediction model. The validation showed
an R2 of 0.43 between the AQS measurements and the IVAN-based predictions. This R2
value is lower than the CV R2 of the AQS-only model, 0.53 (Table 2). The decreased R2
indicated that the IVAN-based predictions still deviated from actual PM2.5 levels to a certain
degree. Given the moderate to high correlation between calibrated IVAN measurements and
collocated reference observations (Carvlin et al. 2017), we infer that the deviation may be
due to the uncertainty in calibrated IVAN measurements which was not able to be reduced
(the remaining uncertainty of IVAN hereinafter). Less representative monitor siting of IVAN
could be another potential reason behind the lower agreement (Geng et al. 2018b). This
validation emphasized the importance and necessity of keeping high-quality regulatory
measurements in PM2.5 prediction, although they are temporally less-frequent and spatially
sparser.
After combining the AQS and IVAN observations, the modeling performance had a slight
decrease with overall, spatial, and temporal CV R2 values of 0.73, 0.63, and 0.70,
respectively. Again, we inferred that the decreased performance could be caused by the
remaining uncertainty of IVAN. This uncertainty could be seen in the scatter plot of the
IVAN-only model as there were apparent outliers deviating from the 1:1 line (Figure 2(b)).
The remaining uncertainty again indicated that good fitting performance of the IVAN-only
model did not necessarily mean an accurate representation of actual PM2.5 levels. We
considered the AQS/IVAN model as the optimal model because it incorporated both detailed
Bi et al. Page 7
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
spatial patterns of PM2.5 provided by IVAN and additional accurate PM2.5 information
provided by AQS.
3.2. Analyses with PM2.5 Predictions
Statistical metrics such as CV R2 and RMSE could only reflect model predictability at
monitoring locations. Within our study domain, ground monitors were not evenly
distributed, leaving large areas in the southern and eastern parts uncovered (Figure 1). This
uneven distribution reduced the effectiveness of the statistical metrics. Due to the lack of
reliable references regarding PM2.5 pollution in Imperial County from other sources, we
focused more on analyzing the features of prediction results to examine the quality of PM2.5
estimates and the contribution of IVAN measurements to the prediction.
Figure 3 shows the averaged distributions of daily PM2.5 estimates during the study period.
The AQS-based distribution emphasized PM2.5 pollution near major roads by showing
spatially resolved PM2.5 concentrations on the road network (Figure 3(a)). This road-specific
feature may be related to a fact that the AQS stations were relatively close to the major roads
in the study domain. The mean distance from 6 AQS stations to the major roads was ~600 m
and the maximum distance was ~2,000 m. On the contrary, the IVAN sites had a mean
distance of ~7,600 m with a maximum longer than 10,000 m. The distance-to-road of the
IVAN sites also distributed more evenly within its range compared to which of the AQS
stations. The lack of AQS stations away from the major roads reduced the ability of AQS to
reflect off-road pollution. The PM2.5 estimates derived from the IVAN-only model (Figure
3(b)) and the AQS/IVAN model (Figure 3(c)) did not show road-specific patterns but
smoother PM2.5 distributions. This result indicates that more extensively distributed IVAN
measurements could better reflect off-road pollution sources such as dust, transboundary,
and agricultural emissions.
Apart from more credible PM2.5 spatial patterns, the contribution of IVAN measurements
can also be reflected by the importance of pollution source-related covariates in the models.
Table 3 shows the top-10 important covariates determined by the RF algorithm in three
models. In the AQS-only model, temporally varying parameters such as meteorological
parameters (PBL height, wind speed and direction, and sensible heat net flux) dominated the
most important covariates. This feature reflects that sparse AQS measurements well captured
temporal patterns of PM2.5 but provided limited information regarding the spatial
distribution of PM2.5, which echoes the low spatial CV R2 in the AQS-only model (Table 2).
On the contrary, time-invariant and source-related parameters, especially population,
elevation, the nearest distance to road, and the percentage of grassland, had increased
importance in the IVAN-only and AQS/IVAN models. The increased importance of these
source-related covariates indicated that spatially denser IVAN measurements resolved more
spatial information of PM2.5 in different geographical environments associated with varying
pollution sources.
The PM2.5 spatial patterns derived from the AQS/IVAN model can be explained
appropriately with the emission sources in Imperial County, and these patterns were also
consistent with the coarser distributions observed in previous studies (Di et al. 2016; Hu et
al. 2017; Parajuli and Zender 2018). In the AQS/IVAN-based estimates, the highest PM2.5
Bi et al. Page 8
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
levels occurred on the U.S.-Mexico border, especially in the border cities Calexico and
Mexicali where annual PM2.5 level exceeded the U.S. air quality standard of 12 μg/m3
during the study period. This transboundary PM2.5 hot-spot was also shown in Hu et al.
(2017) who estimated PM2.5 levels in the contiguous U.S. at 12-km resolution. This hot-spot
was not captured by the AQS-based predictions because of limited AQS stations located in
similar geographical environments (only one station near the border). Elevated PM2.5 levels
also occurred on the desert and exposed playa over the southwest shore of the Salton Sea.
These high PM2.5 levels were likely to be associated with dust emissions in the areas, which
is supported by Parajuli and Zender (2018) who suggested that newly exposed playa of the
Salton Sea has contributed to a large number of dust emissions in the southwest side of the
lake. Brawley, a city in the south of the Salton Sea, showed a moderate PM2.5 hot-spot with
mean PM2.5 concentrations ranged from 7.1 to 7.8 μg/m3. The elevated PM2.5 might be
related to the significant cattle and feed industry in the city as the pulverized manure and
animal activity in cattle feedlots may contribute to the emissions of ammonia and nitric
oxide that subsequently lead to the formation of secondary PM2.5 (Rogge et al. 2006; Wilson
et al. 2002).
It should be noted that although the PM2.5 patterns derived from the AQS/IVAN model
(Figure 3(c)) were similar to which of the IVAN-only model (Figure 3(b)) due to the
dominance of IVAN measurements, the additional AQS measurements still led to noticeable
changes. For example, the lower-left AQS station outside the county’s border resulted in the
decreased PM2.5 levels in its neighboring broad, mountainous areas covered by dense
vegetation. The lower PM2.5 levels could be explained by the reduced ventilation and
transport of pollutants affected by topography and less residential emissions associated with
fewer people living in the region (Chow et al. 2006). This result again shows the importance
of keeping AQS measurements in the prediction model despite their smaller sample size.
Figure S1 shows the PM2.5 distributions by season. In spring and summer, PM2.5 had higher
background levels and lower peak levels due to the atmospheric conditions favorable for
diffusion. In contrast, PM2.5 tended to be accumulated in winter due to stagnant weather
conditions.
3.3. Impact of IVAN Remaining Uncertainty
Given the moderate to high agreement between IVAN and collocated reference
measurements after calibration (Carvlin et al. 2017), we analyzed the prediction outliers to
evaluate the influence of the remaining uncertainty of IVAN on the prediction accuracy. We
defined an outlier as a prediction a factor of two greater or smaller than the corresponding
measurement in cross-validation. As we kept negative PM2.5 observations, the predictions
with a reversed sign were also considered as outliers. Figure 4 shows the CV scatter plot the
same as Figure 2(c) with outliers in different colors. There were 1,500 outliers among the
total 12,902 predictions, in which 312 were underestimated and 1,188 were overestimated.
Compared to previous PM2.5 modeling efforts based solely on regulatory measurements in
the U.S. (Ma et al. 2016; Xiao et al. 2017), this CV scatter plot had more apparent outliers.
Figure S2 shows the frequencies of outliers in different grid cells and Figure S3 shows the
relationships between the number of outliers and the number of total observations in a grid
cell. We found that the outliers were randomly distributed without specific spatiotemporal
Bi et al. Page 9
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
patterns and the number of outliers was positively associated with the number of total
observations in a grid cell. The results reflected that the remaining uncertainty of IVAN still
had an evident influence on the predictions, which homogeneously affected the modeling
accuracy. The only collocated AQS/IVAN site (Calexico-Ethel Site, located at the Calexico
High School on East Belcher Street) in the study domain could not support comprehensive
analyses of outliers, and it remains unknown that what the sources of these outliers were,
how these sources were associated with the prediction accuracy, and why overestimated
outliers dominated the prediction biases.
4. Discussion
Imperial County is an exemplary region for studying the effectiveness of low-cost PM
measurements in the U.S. The PM (PM2.5 and PM10) pollution in this county frequently
exceeds state and national air quality standards (CARB 2017). Poor air quality, poverty, and
a high unemployment rate are associated with severe health issues such as childhood asthma
and lead to increasing needs voiced by the local residents for a comprehensive and accurate
display of air quality (English et al. 2017). During the development of the IVAN network,
community members were involved in the study design and monitor siting, and the study
community partner staff were trained in monitor assembly/troubleshooting and data transfer
and analysis (Wong et al. 2018). The IVAN network is now community operated and
maintained. Developed community capacity to run the low-cost network addresses the core
of environmental health issues in this primarily Hispanic and monolingual area by providing
neighborhood-level data on air quality and increasing local environmental health literacy
(Garzón-Galvis et al. 2019).
In this study, we evaluated the contribution of IVAN to PM2.5 prediction in the region with
complex local PM2.5 sources and a sparse regulatory network. On the one hand, our results
showed that current AQS within the county could not support reliable PM2.5 predictions as
indicated by the significantly lower spatial CV R2 of the AQS-only model compared to its
overall CV R2. The IVAN measurements, albeit noisier, were found to be able to serve as an
effective supplement to the regulatory measurements to improve the modeling performance
and prediction quality. Dense IVAN measurements also helped the predictions better resolve
the spatial details of local pollution sources. On the other hand, although AQS were spatially
sparser and temporally less-frequent than IVAN, its “gold-standard” measurements are still
indispensable in PM2.5 prediction. The necessity of keeping AQS was reflected by a lower
validation R2 when using the IVAN-only model to predict AQS measurements compared to
the CV R2 of the AQS-only model itself. The necessity was also reflected by more
reasonable PM2.5 prediction patterns around the AQS stations when combining AQS and
IVAN. The combined AQS/IVAN predictions were in line with the coarser PM2.5 patterns
generated by the national-level models (Di et al. 2016; Hu et al. 2017), and the predicted
PM2.5 hot-spots can be appropriately explained by local PM2.5 sources such as dust,
transboundary, and agricultural pollution. Our analyses implicated that the combination of
regulatory and low-cost sensor measurements is an effective way to improve the quality of
PM2.5 modeling and enable high-resolution PM2.5 predictions in which they were
impossible previously.
Bi et al. Page 10
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
To date, the proposed calibration methods for low-cost PM2.5 measurements have mainly
focused on correcting systematic biases rather than reducing random errors commonly
existing in the measurements (Carvlin et al. 2017; Holstius et al. 2014). In this study, we
found that the remaining errors in the IVAN measurements, especially random errors, still
had an apparent impact on the quality of PM2.5 prediction after a calibration aiming to
reduce the systematic biases (Carvlin et al. 2017). The influence of remaining uncertainty
was reflected by obvious outliers in cross-validation scatters. As the uncertainty had a
homogeneous effect on the predictions without obvious spatiotemporal patterns, it was
difficult to pinpoint and remove inaccurate measurements. Additional studies with sufficient
collocated regulatory/low-cost monitor pairs are needed for in-depth analyses regarding the
low-cost sensor measurements’ remaining uncertainty,
e.g.,
the sources of uncertainty and
the quantitative influence of uncertainty on the prediction quality. The calibration methods
aiming to reduce the random errors of low-cost sensor measurements, in addition to
systematic biases, are also a potential way to improve the quality of PM2.5 prediction.
Although spatiotemporally continuous PM2.5 can be generated with CTMs, their simulations
are difficult to reflect detailed PM2.5 pollution patterns at the community level. Specifically,
the relatively coarse resolution and restricted emission information of CTMs limit their
ability to characterize PM2.5 distribution at small scales (Jerrett et al. 2005). Our study
proved that the combination of dense and frequent low-cost sensor measurements,
spatiotemporally continuous satellite AOD retrievals, and accurate reference-grade
measurements is a possible solution to derive high-resolution PM2.5 distribution details.
Although the U.S. has one of the densest regulatory air quality monitoring networks in the
world, only ~2% of its counties have more active AQS PM2.5 stations than Imperial County
(with 3 active stations) (Figure 5(a)), and only ~20% of the counties have a higher AQS
station density than Imperial County (2.58 × 10−4 stations per square kilometers) (Figure
5(b)). Accordingly, low-cost sensors have enormous potential to be applied to the vast
regions in the U.S. and a large part of the world with sparse regulatory monitors to better
support small-scale PM2.5 prediction and help address PM2.5-related health issues.
A major limitation of this study is the lack of reliable reference PM2.5 measurements in
Imperial County from other sources, which prevented quantitative assessments of our PM2.5
estimates. However, many clues regarding the prediction models such as CV performance,
variable importance, and spatial patterns of PM2.5 estimates provided evidence that the
integration of IVAN could lead to a better prediction quality in the region. Additional studies
with sufficient reference measurements are needed to further prove the findings. The scale of
the IVAN network is another potential limitation affecting the generalizability of our
findings. As a county-level low-cost network with ~40 sensors, which has been well
maintained and operated by local communities, IVAN is less representative of other low-cost
PM networks worldwide which may not be well maintained as such. A more general and
extensive low-cost PM network is needed to further examine the effectiveness of our
proposed PM2.5 prediction framework and to test new methods regarding better utilization of
low-cost sensor measurements in PM2.5 prediction. PurpleAir (https://www.purpleair.com/),
a worldwide commercial PM monitoring network built with low-cost sensors, is a potential
one when it evolves to have enough coverage and density.
Bi et al. Page 11
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
5. Conclusion
With an exemplary low-cost air quality monitoring network in Imperial County, IVAN, we
evaluated the contribution of low-cost sensor measurements to PM2.5 prediction when
regulatory measurements are insufficient to support reliable small-scale PM2.5 modeling.
This study proved that the integration of a large number of low-cost sensor measurements
with sparse regulatory measurements is an effective way to improve the quality of PM2.5
prediction significantly. This study also highlighted the needs of more effective calibration
or integration methods to mitigate the negative impact caused by the remaining uncertainty
in low-cost sensor measurements on the prediction quality. This is the first study to report
high-resolution PM2.5 distributions in Imperial County by virtue of dense low-cost sensor
measurements. The proposed PM2.5 prediction framework with low-cost sensor
measurements has enormous potential to be applied in vast areas worldwide with insufficient
regulatory stations to identify PM2.5 pollution details which are fundamental to PM2.5-
related health research.
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
Acknowledgments
The work of J. Bi, J. Stowell, and Y. Liu was supported by the National Aeronautics and Space Administration
(NASA) Applied Sciences Program (Grant # NNX16AQ28Q and 80NSSC19K0191). The work of F. Freedman was
supported by the NASA Applied Sciences Program (Grant # NNX16AQ91G). The work of P. English was
supported by the National Institute of Environmental Health Sciences of the National Institutes of Health (NIH)
(Grant # R01ES022722). The content is solely the responsibility of the authors and does not necessarily represent
the official views of NASA and NIH.
The authors acknowledge the Imperial Valley Air Network (IVAN) team of researchers and stakeholders at Comite
Civico Del Valle, Inc. lead by Luis Olmedo, Tracking California, and the University of Washington for providing
and assisting us in utilizing the IVAN air monitoring data for this study. See https://ivan-imperial.org/ for more
details about the IVAN monitoring program.
References
Bi J, Belle JH, Wang Y, Lyapustin AI, Wildani A, & Liu Y (2019). Impacts of snow and cloud covers
on satellite-derived PM2.5 levels. Remote Sensing of Environment, 221, 665–674 [PubMed:
31359889]
Blaylock BK, Horel JD, & Liston ST (2017). Cloud archiving and data mining of high-resolution rapid
refresh forecast model output. Computers & Geosciences, 109, 43–50
Bose S, Hansel N, Tonorezos E, Williams D, Bilderback A, Breysse P, Diette G, & McCormack MC
(2015). Indoor particulate matter associated with systemic inflammation in COPD. Journal of
Environmental Protection, 6, 566
Breiman L (2001). Random forests. Machine Learning, 45, 5–32
Broday DM (2017). Wireless Distributed Environmental Sensor Networks for Air Pollution
Measurement—The Promise and the Current Reality. Sensors, 17, 2263
Brokamp C, Jandarov R, Hossain M, & Ryan P (2018). Predicting Daily Urban Fine Particulate Matter
Concentrations Using a Random Forest Model. Environmental science & technology, 52, 4173–
4179 [PubMed: 29537833]
Burnett RT, Pope CA 3rd, Ezzati M, Olives C, Lim SS, Mehta S, Shin HH, Singh G, Hubbell B, Brauer
M, Anderson HR, Smith KR, Balmes JR, Bruce NG, Kan H, Laden F, Pruss-Ustun A, Turner MC,
Gapstur SM, Diver WR, & Cohen A (2014). An integrated risk function for estimating the global
Bi et al. Page 12
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
burden of disease attributable to ambient fine particulate matter exposure. Environ Health Perspect,
122, 397–403 [PubMed: 24518036]
Cao T, & Thompson JE (2017). Portable, Ambient PM2.5 Sensor for Human and/or Animal Exposure
Studies. Analytical Letters, 50, 712–723
CARB (2017). Air Quality Trends Summaries. In: California Air Resources Board
Carvlin GN, Lugo H, Olmedo L, Bejarano E, Wilkie A, Meltzer D, Wong M, King G, Northcross A,
Jerrett M, English PB, Hammond D, & Seto E (2017). Development and field validation of a
community-engaged particulate matter air quality monitoring network in Imperial, California,
USA. J Air Waste Manag Assoc, 67, 1342–1352 [PubMed: 28829718]
Castell N, Dauge FR, Schneider P, Vogt M, Lerner U, Fishbain B, Broday D, & Bartonova A (2017).
Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure
estimates? Environment international, 99, 293–302 [PubMed: 28038970]
CEHTP (2018). Emergency department visits due to asthma. In: California Environmental Health
Tracking Program
Chow JC, Chen LWA, Watson JG, Lowenthal DH, Magliano KA, Turkiewicz K, & Lehrman DE
(2006). PM2.5 chemical composition and spatiotemporal variability during the California Regional
PM10/PM2.5 Air Quality Study (CRPAQS). Journal of Geophysical Research: Atmospheres, 111
Chow JC, Watson JG, Green MC, Lowenthal DH, Bates B, Oslund W, & Torres G (2000). Cross-
border transport and spatial variability of suspended particles in Mexicali and California’s Imperial
Valley. Atmospheric Environment, 34, 1833–1843
Chu Y, Liu Y, Li X, Liu Z, Lu H, Lu Y, Mao Z, Chen X, Li N, & Ren M (2016). A review on
predicting ground PM2.5 concentration using satellite aerosol optical depth. Atmosphere, 7, 129
Di Q, Kloog I, Koutrakis P, Lyapustin A, Wang Y, & Schwartz J (2016). Assessing PM2.5 Exposures
with High Spatiotemporal Resolution across the Continental United States. Environ Sci Technol,
50, 4712–4721 [PubMed: 27023334]
English PB, Olmedo L, Bejarano E, Lugo H, Murillo E, Seto E, Wong M, King G, Wilkie A, &
Meltzer D (2017). The Imperial County Community Air Monitoring Network: a model for
community-based environmental monitoring for public health action. Environmental health
perspectives, 125
Garzón-Galvis C, Wong M, Madrigal D, Olmedo L, Brown M, & English P (2019). Advancing
Environmental Health Literacy Through Community-Engaged Research and Popular Education
Environmental Health Literacy (pp. 97–134): Springer
Geng G, Murray NL, Tong D, Fu JS, Hu X, Lee P, Meng X, Chang HH, & Liu Y (2018a). Satellite-
Based Daily PM2.5 Estimates During Fire Seasons in Colorado. Journal of Geophysical Research:
Atmospheres, 123, 8159–8171
Geng GN, Murray NL, Chang HH, & Liu Y (2018b). The sensitivity of satellite-based PM2.5
estimates to its inputs: Implications to model development in data-poor regions. Environment
international, 121, 550–560 [PubMed: 30300813]
Hall ES, Kaushik SM, Vanderpool RW, Duvall RM, Beaver MR, Long RW, & Solomon PA (2014).
Integrating sensor monitoring technology into the current air pollution regulatory support
paradigm: Practical considerations. American Journal of Environmental Engineering, 4, 147–154
Holstius DM, Pillarisetti A, Smith K, & Seto E (2014). Field calibrations of a low-cost aerosol sensor
at a regulatory monitoring site in California. Atmospheric Measurement Techniques, 7, 1121–1131
Hu XF, Belle JH, Meng X, Wildani A, Waller LA, Strickland MJ, & Liu Y (2017). Estimating PM2.5
Concentrations in the Conterminous United States Using the Random Forest Approach.
Environmental science & technology, 51, 6936–6944 [PubMed: 28534414]
Hu XF, Waller LA, Lyapustin A, Wang YJ, Al-Hamdan MZ, Crosson WL, Estes MG, Estes SM,
Quattrochi DA, Puttaswamy SJ, & Liu Y (2014). Estimating ground-level PM2.5 concentrations in
the Southeastern United States using MAIAC AOD retrievals and a two-stage model. Remote
Sensing of Environment, 140, 220–232
Jerrett M, Arain A, Kanaroglou P, Beckerman B, Potoglou D, Sahsuvaroglu T, Morrison J, & Giovis C
(2005). A review and evaluation of intraurban air pollution exposure models. J Expo Anal Environ
Epidemiol, 15, 185 [PubMed: 15292906]
Bi et al. Page 13
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Jiao W, Hagler G, Williams R, Sharpe R, Brown R, Garver D, Judge R, Caudill M, Rickard J, & Davis
M (2016). Community Air Sensor Network (CAIRSENSE) project: evaluation of low-cost sensor
performance in a suburban environment in the southeastern United States. Atmospheric
Measurement Techniques, 9, 5281–5292
Just AC, Wright RO, Schwartz J, Coull BA, Baccarelli AA, Tellez-Rojo MM, Moody E, Wang YJ,
Lyapustin A, & Kloog I (2015). Using High-Resolution Satellite Aerosol Optical Depth To
Estimate Daily PM2.5 Geographical Distribution in Mexico City. Environmental science &
technology, 49, 8576–8584 [PubMed: 26061488]
Kelly K, Whitaker J, Petty A, Widmer C, Dybwad A, Sleeth D, Martin R, & Butterfield A (2017).
Ambient and laboratory evaluation of a low-cost particulate matter sensor. Environmental
pollution, 221, 491–500 [PubMed: 28012666]
King J, Etyemezian V, Sweeney M, Buck BJ, & Nikolich G (2011). Dust emission variability at the
Salton Sea, California, USA. Aeolian Research, 3, 67–79
Kloog I, Koutrakis P, Coull BA, Lee HJ, & Schwartz J (2011). Assessing temporally and spatially
resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth
measurements. Atmospheric Environment, 45, 6267–6275
Liaw A, & Wiener M (2002). Classification and regression by randomForest. R news, 2, 18–22
Liu Y, Park RJ, Jacob DJ, Li Q, Kilaru V, & Sarnat JA (2004). Mapping annual mean ground-level
PM2.5 concentrations using Multiangle Imaging Spectroradiometer aerosol optical thickness over
the contiguous United States. Journal of Geophysical Research: Atmospheres, 109
Ma ZW, Hu XF, Sayer AM, Levy R, Zhang Q, Xue YG, Tong SL, Bi J, Huang L, & Liu Y (2016).
Satellite-Based Spatiotemporal Trends in PM2.5 Concentrations: China, 2004–2013.
Environmental health perspectives, 124, 184–192 [PubMed: 26220256]
Madrigano J, Kloog I, Goldberg R, Coull BA, Mittleman MA, & Schwartz J (2013). Long-term
exposure to PM2.5 and incidence of acute myocardial infarction. Environ Health Perspect, 121,
192–196 [PubMed: 23204289]
Morawska L, Thai PK, Liu X, Asumadu-Sakyi A, Ayoko G, Bartonova A, Bedini A, Chai F,
Christensen B, & Dunbabin M (2018). Applications of low-cost sensing technologies for air
quality monitoring and exposure assessment: How far have they gone? Environment international,
116, 286–299 [PubMed: 29704807]
Paciorek CJ, Liu Y, Moreno-Macias H, & Kondragunta S (2008). Spatiotemporal associations between
GOES aerosol optical depth retrievals and ground-level PM2.5. Environmental science &
technology, 42, 5800–5806 [PubMed: 18754512]
Parajuli SP, & Zender CS (2018). Projected changes in dust emissions and regional air quality due to
the shrinking Salton Sea. Aeolian Research, 33, 82–92
Rogge WF, Medeiros PM, & Simoneit BR (2006). Organic marker compounds for surface soil and
fugitive dust from open lot dairies and cattle feedlots. Atmospheric Environment, 40, 27–49
Saide PE, Carmichael GR, Spak SN, Gallardo L, Osses AE, Mena-Carrasco MA, & Pagowski M
(2011). Forecasting urban PM10 and PM2.5 pollution episodes in very stable nocturnal conditions
and complex terrain using WRF–Chem CO tracer model. Atmospheric Environment, 45, 2769–
2780
Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GSW, Shelow D, Hindin DA,
Kilaru VJ, & Preuss PW (2013). The Changing Paradigm of Air Pollution Monitoring.
Environmental science & technology, 47, 11369–11377 [PubMed: 23980922]
Sorek-Hamer M, Just AC, & Kloog I (2016). Satellite remote sensing in epidemiological studies. Curr
Opin Pediatr, 28, 228–234 [PubMed: 26859287]
van Donkelaar A, Martin RV, Brauer M, Kahn R, Levy R, Verduzco C, & Villeneuve PJ (2010). Global
estimates of ambient fine particulate matter concentrations from satellite-based aerosol optical
depth: development and application. Environ Health Perspect, 118, 847–855 [PubMed: 20519161]
van Donkelaar A, Martin RV, & Park RJ (2006). Estimating ground-level PM2.5 using aerosol optical
depth determined from satellite remote sensing. Journal of Geophysical Research: Atmospheres,
111
Wang W, Mao F, Du L, Pan Z, Gong W, & Fang S (2017). Deriving hourly pm2.5 concentrations from
himawari-8 aods over beijing–tianjin–hebei in china. Remote Sensing, 9, 858
Bi et al. Page 14
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Wang Y, Li J, Jing H, Zhang Q, Jiang J, & Biswas P (2015). Laboratory evaluation and calibration of
three low-cost particle sensors for particulate matter measurement. Aerosol Science and
Technology, 49, 1063–1077
Wilson SC, Morrow-Tesch J, Straus DC, Cooley JD, Wong WC, Mitlohner FM, & McGlone JJ (2002).
Airborne microbial flora in a cattle feedlot. Appl Environ Microbiol, 68, 3238–3242 [PubMed:
12088999]
Wong M, Bejarano E, Carvlin G, Fellows K, King G, Lugo H, Jerrett M, Meltzer D, Northcross A, &
Olmedo L (2018). Combining community engagement and scientific approaches in next-
generation monitor siting: the case of the imperial county community air network. International
Journal of Environmental Research and Public Health, 15, 523
Xiao Q, Chang HH, Geng G, & Liu Y (2018). An Ensemble Machine-Learning Model To Predict
Historical PM2.5 Concentrations in China from Satellite Data. Environ Sci Technol, 52, 13260–
13269 [PubMed: 30354085]
Xiao QY, Wang YJ, Chang HH, Meng X, Geng GN, Lyapustin A, & Liu Y (2017). Full-coverage high-
resolution daily PM2.5 estimation using MAIAC AOD in the Yangtze River Delta of China.
Remote Sensing of Environment, 199, 437–446
Xu R (2001). Particle characterization: light scattering methods. Springer Science & Business Media
Zeger SL, Thomas D, Dominici F, Samet JM, Schwartz J, Dockery D, & Cohen A (2000). Exposure
measurement error in time-series studies of air pollution: concepts and consequences.
Environmental health perspectives, 108, 419 [PubMed: 10811568]
Zou B, Wang M, Wan N, Wilson JG, Fang X, & Tang YQ (2015). Spatial modeling of PM2.5
concentrations with a multifactoral radial basis function neural network. Environmental Science
and Pollution Research, 22, 10395–10404 [PubMed: 25813644]
Bi et al. Page 15
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Highlights
Ground-level PM2.5 was assessed with low-cost, regulatory, and satellite data
Low-cost sensor measurements contributed to improved modeling
performance
Reasonable PM2.5 spatial details were revealed due to abundant low-cost data
Remaining uncertainty in calibrated low-cost data still affected modeling
accuracy
Bi et al. Page 16
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Figure 1.
Study domain (latitude: [32.2°N, 33.9°N]; longitude: [116.6°W, 113.9°W]). Imperial County
is part of the Southern California border region contiguous to the Mexican state of Baja
California. The area surrounded by the dashed line is a buffer mainly used for better
reflecting transboundary pollution.
Bi et al. Page 17
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Figure 2.
10-fold CV scatter plots of three models: (a) AQS-only model, (b) IVAN-only model, and
(c) AQS/IVAN model.
Bi et al. Page 18
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Figure 3.
Mean PM2.5 distributions for the period September 2016 to November 2017 generated by
three models: (a) AQS-only model, (b) IVAN-only model, and (c) AQS/IVAN model. The
points show mean PM2.5 concentrations at the AQS and IVAN stations during the period.
Bi et al. Page 19
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Figure 4.
The 10-fold CV scatter plot of the AQS/IVAN model. The black dashed lines (with slopes of
2 and 0.5) divide the points into normal predictions and outliers. The points in red are
overestimated outliers and the points in blue are underestimated outliers.
Bi et al. Page 20
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Figure 5.
The contiguous U.S. counties in blue are those with a higher (a) number (~2% of the total
counties) or (b) density (~20% of the total counties) of AQS PM2.5 stations than Imperial
County as of 2017. The red areas are the potential regions in the U.S. where our proposed
PM2.5 prediction framework with low-cost sensor measurements can be applied to generate
PM2.5 spatial details.
Bi et al. Page 21
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Bi et al. Page 22
Table 1
Independent variables in three PM2.5 prediction models (s - spatially varying; t - temporally varying).
MAIAC AOD PM2.5-ancillary variables
Gap-filled Terra/Aqua AOD(s,t) PM2.5 convolutional layer(s,t)
Land-use variables PM2.5/PM10 ratio(t)
Elevation(s) Meteorological variables
Population(s) 2-meter temperature(s,t)
NDVI(s,t) 2-meter specific humidity(s,t)
Nearest distance to road(s) Planetary boundary layer height(s,t)
0 – 10 cm soil moisture(s,t) Sensible heat net flux(s,t)
Land surface temperature(s,t) Frictional velocity(s,t)
Percentage of grassland(s) 10-meter wind direction(s,t)
Percentage of water body(s) 10-meter wind speed(s,t)
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Bi et al. Page 23
Table 2
The performance of three models with overall, spatial, and temporal CV R2 and RMSEs.
Model N Overall CV R2Spatial CV R2Temporal CV R2RMSE
AQS-Only 1,617 0.53 0.24
*
0.55 3.76 μg/m3
IVAN-Only 11,965 0.75 0.64 0.70 3.71 μg/m3
AQS/IVAN 12,902 0.73 0.63 0.70 3.72 μg/m3
*
6-fold (leave-one site-out) spatial CV as there were only 6 AQS stations
Environ Res
. Author manuscript; available in PMC 2021 January 01.
NASA Author Manuscript NASA Author Manuscript NASA Author Manuscript
Bi et al. Page 24
Table 3
The top-10 important covariates determined by the RF algorithm in three prediction models. The bold font
highlighted the time-invariant and source-related covariates with the increased importance after the addition of
IVAN.
Rank AQS-Only IVAN-Only AQS/IVAN
1 PM2.5 convolutional layer PM2.5 convolutional layer PM2.5 convolutional layer
2 PBL height PBL height Population
3 NDVI Population Elevation
4 0 – 10 cm soil moisture Elevation PBL height
5 10-meter wind direction PM2.5/PM10 ratio NDVI
6 2-meter specific humidity 0 – 10 cm soil moisture Percentage of grassland
7 10-meter wind speed NDVI PM2.5/PM10 ratio
8 Sensible heat net flux Nearest distance to road 0 – 10 cm soil moisture
9 PM2.5/PM10 ratio Percentage of grassland Nearest distance to road
10 Frictional velocity 2-meter specific humidity 2-meter temperature
Environ Res
. Author manuscript; available in PMC 2021 January 01.
... applications was found to enhance detailed spatial features Bi et al., 2020a) and improve performance under atypical conditions, e.g. during wildfires (Gupta et al., 2018;Bi et al., 2020b). LCS data have been combined with geostationary satellite AOD and other data to improve hourly PM2.5 reconstruction during wildfire events (Vu et al., 2022). ...
... Furthermore, inclusion of LCS data was noted to provide greater detail and enhanced spatial distribution while maintaining desirable characteristics of reconstruction generated from the RGM and AOD information alone. Satellite AOD information, meteorological data, and land use data were combined via a random forest ML approach to reconstruct daily PM2.5 at 1 km resolution for Imperial Valley, California, USA (Bi et al., 2020a). In cross-validation testing, incorporating LCS data alongside the RGM for model training resulted in improved accuracy compared to using the RGM data alone. ...
Technical Report
Full-text available
Executive Summary Low-cost air quality sensor systems (LCS) are a key emerging class of technologies for expanding policy-relevant air quality analysis, including assessing levels of pollution, identifying sources, and producing forecasts. An LCS contains one or more sensing elements together with hardware and software for control, power supply, data management, and weatherproofing, constituting a complete system capable of collecting atmospheric composition data. The “low cost” of LCS refers to their per-unit capital cost in relation to more traditional reference grade monitors (RGM). However, technical trade-offs which enable this lower cost usually also limit data quality, selectivity, sensitivity to low concentrations, robustness under high concentrations, and/or operational lifetime compared to RGM. These properties also vary across LCS technologies and measured pollutants, i.e. gases or particles. The necessary calibration and data quality control processes needed to establish confidence in LCS data, together with the infrastructure and personnel needed to support networks with multiple LCS in a region, can significantly add to their initial costs. Despite these challenges, LCS represent a key tool for filling gaps in existing global and local air quality monitoring networks and contributing information for policy-relevant air quality products. In recent years, wide-scale deployments of LCS have been made in low- and middle-income countries, where they often provide air quality information in regions lacking RGM networks, as well as in high-income countries, where they typically supplement existing RGM with more localized near real-time air quality information. The aim of the present document is to discuss the use of LCS at a network level and along with other information sources to analyse levels, variations, sources, and other aspects of air quality. This application perspective complements the series of World Meteorological Organization (WMO) reports on low-cost sensors for the measurement of atmospheric composition published in 2018 and 2020. These previous WMO reports focus on operating principles for the use of LCS in measuring different constituents, best practices for calibration, performance assessment, and strategies for communicating LCS data to the public.
... For study conducted by Bi et al. (Bi et al. 2020), Imperial County, California, was the site of a case study focusing continuously exceeded PM 2.5 levels of US air quality standards. For PM 2.5 measurements, a community-operated low-cost sensor network IVAN was employed. ...
Article
Full-text available
The rapid increase in urban populations has led to escalating traffic and higher levels of air pollutants, posing significant threats to urban health. In response, there is growing demand for accessible, real-time, and widespread air quality monitoring systems. This review focuses on the potential of low-cost air quality sensors to meet this demand, with emphasis on their ability to provide high-density spatiotemporal data at a lower cost. The paper critically examines current low-cost air quality sensors, including Wireless Sensor Network (WSN) and Internet of Things (IoT)-based solutions, through both field experiments and laboratory studies. A key contribution of this review is the comprehensive evaluation of calibration methods, showing how factors such as temperature and humidity influence sensor performance. The review highlights common challenges like sensor accuracy, cross-sensitivity, and data quality, offering insights into effective strategies such as calibration against reference instruments and advanced data validation techniques. Ultimately, this review underscores the potential of low-cost sensors in revolutionizing air pollution monitoring, while also addressing the practical challenges that must be resolved to fully realize their capabilities.
... Currently there is an option to use low-cost sensors to monitor air, which is based on the gap in PM monitoring according to regulations and to fill the limitations of statistical models that are only based on regulatory measurements [22]. With low-cost air monitoring, sampling points can be monitored over a wider area to better explain the spatial variation of air pollutants such as identifying hotspots in polluted environments, pollution hotspots and creating pollutant emission inventories [23,24]. ...
Article
Full-text available
Air pollution poses significant health risks worldwide, causing diseases such as cardiovascular ailments, asthma, and premature mortality. This study investigates the influence of wind speed and direction on the performance accuracy of low-cost PM sensors, specifically the SEN1077 model. The research aims to evaluate the sensor's sensitivity and consistency under varying wind conditions, focusing on different wind directions and speeds. Indoor tests were conducted using the SEN1077 sensor connected to an Arduino processing unit. The sensor's performance was assessed across three wind speeds: 0.863 m/s, 1.791 m/s, and 2.789 m/s, and various wind directions, including 45º, 90º, 135º, 180º, 225º, 270º, 315º, and 360º. The data analysis, using ANOVA, revealed that wind speed significantly impacts PM measurements. Higher wind speeds resulted in lower PM readings, with PM2.5 values dropping from an average of 25.2 at 0.863 m/s to 16.4 at 2.789 m/s. The variance in measurements also decreased with increasing wind speeds, indicating more consistent sensor readings. The findings confirm that the SEN1077 sensor maintains consistent sensitivity despite variations in wind direction, with low variance in measurements (e.g., 0.8455556 for PM10 at 0.863 m/s). Major conclusions indicate that while wind direction has a minimal impact on sensor accuracy, wind speed significantly affects PM measurements. This study's contributions include providing insights into the robustness of low-cost PM sensors and emphasizing the need for proper calibration. Practical implications involve improving air quality monitoring systems, while social implications focus on better informing public health policies and pollution control measures.
... Successful calibration using MLR was demonstrated in 37 for a chemiluminescence NO-NO 2 -NO x analyser, also utilizing temperature and humidity data. Further studies focusing on regression-based sensor calibration can be found in [38][39][40] . ...
Article
Full-text available
Accurate tracking of harmful gas concentrations is essential to swiftly and effectively execute measures that mitigate the risks linked to air pollution, specifically in reducing its impact on living conditions, the environment, and the economy. One such prevalent pollutant in urban settings is nitrogen dioxide (NO2), generated from the combustion of fossil fuels in car engines, commercial manufacturing, and food processing. Its elevated levels have adverse effects on the human respiratory system, exacerbating asthma and potentially causing various lung diseases. However, precise monitoring of NO2 requires intricate and costly equipment, prompting the need for more affordable yet dependable alternatives. This paper introduces a new method for reliably calibrating cost-effective NO2 sensors by integrating machine learning with neural network surrogates, global data scaling, and an expanded set of correction model inputs. These inputs encompass differentials of environmental parameters (such as temperature, humidity, atmospheric pressure), as well as readings from both primary and supplementary low-cost NO2 detectors. The methodology was showcased using a purpose-built platform housing NO2 and environmental sensors, electronic control units, drivers, and a wireless communication module for data transmission. Comparative experiments utilized NO2 data acquired during a five-month measurement campaign in Gdansk, Poland, from three independent high-precision reference stations, and low-cost sensor data gathered by the portable measurement platforms at the same locations. The numerical experiments have been carried out using several calibration scenarios using various sets of calibration input, as well as enabling/disabling the use of differentials, global data scaling, and NO2 readings from the primary sensor. The results validate the remarkable correction quality, exhibiting a correlation coefficient exceeding 0.9 concerning reference data, with a root mean squared error below 3.2 µg/m³. This level of performance positions the calibrated sensor as a dependable and cost-effective alternative to expensive stationary equipment for NO2 monitoring.
... practical tools to solve air pollution problems to a certain extent (Krishan et al., 2019;Xu and Yoneda, 2021;Jamei et al., 2023;Guo et al., 2023aGuo et al., , 2023bElbaz et al., 2023aElbaz et al., , 2023b. Most of these practical tools are classification problems based on deep learning (DL) or machine learning (ML) techniques (Bi et al., 2020;Lai et al., 2021). Maltare and Vahora (2023) applied different ML techniques to forecast the air quality using different preprocessing methods for the Ahmedabad city of Gujarat, India. ...
Article
The health risks associated with particulate matter pollution (PM 2.5 ) highlight the importance of comprehending key drivers of its spatial and temporal variability. While explanatory modeling is widely used for statistical inference in such applications, the role of wind variables (speed, direction) and their interplay with built environment are often overlooked. This study addresses this gap through a mobile air quality campaign in a suburban area over 10 days for each of three distinct seasons. We developed four innovative wind-based buffers to account for the wind factors and assess the impact of LiDAR-derived 3D built environment structure on PM 2.5 variability at local and regional scales. Our second objective is to assess the predictive capabilities of wind-based variables. Results indicate that in our low-elevation, low-rise building study area, the built environment does not emerge as a significant factor, contrary to findings in larger, denser urban areas. Wind direction proves more effective in capturing concentration fluctuations than wind speed, and its interaction with pollution source orientation can change pollution dynamics. This study introduces new insights and methodologies for incorporating wind-related factors into the analysis of air pollution dynamics, offering opportunities for further investigation in various urban settings.
Article
Full-text available
Air pollution continues to be a global public health threat, and the expanding availability of small, low-cost air sensors has led to increased interest in both personal and crowd-sourced air monitoring. However, to date, few low-cost air monitoring networks have been developed with the scientific rigor or continuity needed to conduct public health surveillance and inform policy. In Imperial County, California, near the U.S./Mexico border, we used a collaborative, community-engaged process to develop a community air monitoring network that attains the scientific rigor required for research, while also achieving community priorities. By engaging community residents in the project design, monitor siting processes, data dissemination, and other key activities, the resulting air monitoring network data are relevant, trusted, understandable, and used by community residents. Integration of spatial analysis and air monitoring best practices into the network development process ensures that the data are reliable and appropriate for use in research activities. This combined approach results in a community air monitoring network that is better able to inform community residents, support research activities, guide public policy, and improve public health. Here we detail the monitor siting process and outline the advantages and challenges of this approach.
Article
The long satellite aerosol data record enables assessments of historical PM2.5 level in regions where routine PM2.5 monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting PM2.5 levels outside the model-training period. In this study, we proposed an ensemble machine learning approach that provided reliable PM2.5 hindcast capabilities. The missing satellite data were first filled by multiple imputation. Then the modeling domain, China, was divided into seven regions using a spatial clustering method to control for unobserved spatial heterogeneity. A set of machine learning models including random forest, generalized additive model, and extreme gradient boosting were trained in each region separately. Finally, a generalized additive ensemble model was developed to combine predictions from different algorithms. The ensemble prediction characterized the spatiotemporal distribution of daily PM2.5 well with the cross-validation (CV) R2 (RMSE) of 0.79 (21 μg/m3). The cluster-based sub-region models outperformed national models and improved the CV R2 by ~0.05. Compared with previous studies, our model provided more accurate out-of-range predictions at the daily level (R2 = 0.58, RMSE = 29 μg/m3) and monthly level (R2 = 0.76, RMSE = 16 μg/m3). Our hindcast modeling system allows for the construction of unbiased historical PM2.5 levels.
Article
Exposure to fine particulate matter (PM2.5) has been associated with a wide range of negative health outcomes. The overwhelming majority of the epidemiological studies that helped establish such associations was conducted in regions with sufficient ground observations and other supporting data, i.e., the data-rich regions. However, air pollution health effects research in the data-poor regions, where pollution levels are often the highest, is still very limited due to the lack of high-quality exposure estimates. To improve our understanding of the desired input datasets for the application of satellite-based PM2.5 exposure models in data-poor areas, we applied a Bayesian ensemble model in the southeast U.S. that was selected as a representative data-rich region. We designed four groups of sensitivity tests to simulate various data-poor scenarios. The factors considered that would influence the model performance included the temporal sampling frequency of the monitors, the number of ground monitors, the accuracy of the chemical transport model simulation of PM2.5 concentrations, and different combinations of the additional predictors. While our full model achieved a 10-fold cross-validated (CV) R2 of 0.82, we found that when reducing the sampling frequency from the current 1-in-3 day to 1-in-9 day, the CV R2 decreased to 0.58, and the predictions could not capture the daily variations of PM2.5. Half of the current stations (i.e., 30 monitors) could still support a robust model with a CV R2 of 0.79. With 20 monitors, the CV R2 decreased from 0.71 to 0.55 when 100% additional random errors were added to the original CMAQ simulations. However, with a sufficient number of ground monitors (e.g., 30 monitors), our Bayesian ensemble model had the ability to tolerate CMAQ errors with only a slight decrease in CV R2 (from 0.79 to 0.75). With fewer than 15 monitors, our full model collapsed and failed to fit any covariates, while the models with only time-varying variables could still converge even with only five monitors left. A model without the land use parameters lacked fine spatial details in the prediction maps, but could still capture the daily variability of PM2.5 (CV R2 ≥ 0.67) and might support a study of the acute health effects of PM2.5 exposure.
Chapter
This chapter describes the process of increasing and sustaining environmental health literacy (EHL) within communities impacted by environmental hazards and associated health conditions through the comprehensive engagement of community members in environmental health research and education projects. The chapter discusses the use of popular education approaches to facilitate more effective collaboration and optimize mutual co-learning among community members and their project partners. It also explores how, by using this approach, community members can contribute their own knowledge and awareness of environmental and health conditions to advance the research and education process, thereby increasing the EHL of their academically-credentialed partners.
Article
The western United States has experienced increasing wildfire activities, which have negative effects on human health. Epidemiological studies on fine particulate matter (PM2.5) from wildfires are limited by the lack of accurate high-resolution PM2.5 exposure data over fire days. Satellite-based aerosol optical depth (AOD) data can provide additional information in ground PM2.5 concentrations and has been widely used in previous studies. However, the low background concentration, complex terrain, and large wildfire sources add to the challenge of estimating PM2.5 concentrations in the western United States. In this study, we applied a Bayesian ensemble model that combined information from the 1 km resolution AOD products derived from the Multi-angle Implementation of Atmospheric Correction (MAIAC) algorithm, Community Multiscale Air Quality (CMAQ) model simulations, and ground measurements to predict daily PM2.5 concentrations over fire seasons (April to September) in Colorado for 2011-2014. Our model had a 10-fold cross-validated R 2 of 0.66 and root-mean-squared error of 2.00 μg/m3, outperformed the multistage model, especially on the fire days. Elevated PM2.5 concentrations over large fire events were successfully captured. The modeling technique demonstrated in this study could support future short-term and long-term epidemiological studies of wildfire PM2.5.
Article
The short-term and acute health effects of fine particulate matter less than 2.5 μm (PM2.5) have highlighted the need for exposure assessment models with high spatiotemporal resolution. Here, we utilize satellite, meteorologic, atmospheric, and land-use data to train a random forest model capable of accurately predicting daily PM2.5 concentrations at a resolution of 1 × 1 km throughout an urban area encompassing seven counties. Unlike previous models based on aerosol optical density (AOD), we show that the missingness of AOD is an effective predictor of ground-level PM2.5 and create an ensemble model that explicitly deals with AOD missingness and is capable of predicting with complete spatial and temporal coverage of the study domain. Our model performed well with an overall cross-validated root mean squared error (RMSE) of 2.22 μg/m³ and a cross-validated R² of 0.91. We illustrate the daily changing spatial patterns of PM2.5 concentrations across our urban study area made possible by our accurate, high-resolution model. The model will facilitate high-resolution assessment of both long-term and acute PM2.5 exposures in order to quantify their associations with related health outcomes.
Book
Particle characterization is an important component in product research and development, manufacture, and quality control of particulate materials and an important tool in the frontier of sciences, such as in biotechnology and nanotechnology. This book systematically describes one major branch of modern particle characterization technology - the light scattering methods. This is the first monograph in particle science and technology covering the principles, instrumentation, data interpretation, applications, and latest experimental development in laser diffraction, optical particle counting, photon correlation spectroscopy, and electrophoretic light scattering. In addition, a summary of all major particle sizing and other characterization methods, basic statistics and sample preparation techniques used in particle characterization, as well as almost 500 latest references are provided. The book is a must for industrial users of light scattering techniques characterizing a variety of particulate systems and for undergraduate or graduate students who want to learn how to use light scattering to study particular materials, in chemical engineering, material sciences, physical chemistry and other related fields.