PreprintPDF Available

Using a Network of Locally Developed Low Cost Particulate Matter Sensors for Land Use Regression Modeling of PM2.5 in Urban Uganda

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Background: There are major air pollution monitoring gaps in sub-Saharan Africa. Developing capacity in the region to conduct air monitoring in the region can help estimate exposure to air pollution for epidemiology research. The purpose of our study is to develop a land use regression (LUR) model using low-cost air quality sensors developed by a research group in Uganda (AirQo). Methods: Using these low-cost sensors, we collected continuous measurements of fine particulate matter (PM2.5) between May 1, 2019 and February 29, 2020 at 22 monitoring sites across urban municipalities of Uganda. We compared average monthly PM2.5 concentrations from the AirQo sensors with measurements from a BAM-1020 reference monitor operated at the US Embassy in Kampala. Monthly PM2.5 concentrations were used for LUR modeling. We used eight Machine Learning (ML) algorithms and ensemble modeling; using 10-fold cross validation and root mean squared error (RMSE) to evaluate model performance. Results: Monthly PM2.5 concentration was 60.2 µg/m3 (IQR: 45.4-73.0 µg/m3; median= 57.5 µg/m3). For the ML LUR models, RMSE values ranged between 5.43 µg/m3 - 15.43 µg/m3 and explained between 28% and 92% of monthly PM2.5 variability. Generalized additive models explained the largest amount of PM2.5 variability (R2=0.92) and produced the lowest RMSE (5.43 µg/m3) in the held-out test set. The most important predictors of monthly PM2.5 concentrations included monthly precipitation, major roadway density, population density, latitude, greenness, and percentage of households using solid fuels. Conclusion: To our knowledge, ours is the first study to model the spatial distribution of urban air pollution in sub-Saharan Africa using air monitors developed from the region itself. Non-parametric ML for LUR modeling performed with high accuracy for prediction of monthly PM2.5 levels. Our analysis suggests that locally produced low-cost air quality sensors can help build capacity to conduct air pollution epidemiology research in the region.
1
Using a Network of Locally Developed Low Cost Particulate 1
Matter Sensors for Land Use Regression Modeling of PM2.5 2
in Urban Uganda 3
4
5
Eric S. Coker1*, Joel Ssematimba2, Engineer Bainomugisha2 6
7
1Department of Environmental and Global Health, College of Public Health and Health, 8
Professions, University of Florida, 1255 Center Dr., Gainesville, FL.; eric.coker@phhp.ufl.edu 9
2AirQo, Department of Computer Science, College of Computing and Information Sciences, 10
Makerere University, Plot 56 Pool Road, Kampala, Uganda. 11
12
*Corresponding Author 13
14
Acknowledgements 15
16
We would like to acknowledge the tireless efforts of the AirQo staff who assisted with data 17
collection and data management. 18
19
20
21
22
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
© 2020 by the author(s). Distributed under a Creative Commons CC BY license.
2
ABSTRACT 23
24
Background 25
26
There are major air pollution monitoring gaps in sub-Saharan Africa. Developing capacity in the 27
region to conduct air monitoring in the region can help estimate exposure to air pollution for 28
epidemiology research. The purpose of our study is to develop a land use regression (LUR) 29
model using low-cost air quality sensors developed by a research group in Uganda (AirQo). 30
31
Methods 32
33
Using these low-cost sensors, we collected continuous measurements of fine particulate matter 34
(PM2.5) between May 1, 2019 and February 29, 2020 at 22 monitoring sites across urban 35
municipalities of Uganda. We compared average monthly PM2.5 concentrations from the AirQo 36
sensors with measurements from a BAM-1020 reference monitor operated at the US Embassy in 37
Kampala. Monthly PM2.5 concentrations were used for LUR modeling. We used eight Machine 38
Learning (ML) algorithms and ensemble modeling; using 10-fold cross validation and root mean 39
squared error (RMSE) to evaluate model performance. 40
41
Results 42
43
Monthly PM2.5 concentration was 60.2 µg/m3 (IQR: 45.4-73.0 µg/m3; median= 57.5 µg/m3). For 44
the ML LUR models, RMSE values ranged between 5.43 µg/m3 - 15.43 µg/m3 and explained 45
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
3
between 28% and 92% of monthly PM2.5 variability. Generalized additive models explained the 46
largest amount of PM2.5 variability (R2=0.92) and produced the lowest RMSE (5.43 µg/m3) in 47
the held-out test set. The most important predictors of monthly PM2.5 concentrations included 48
monthly precipitation, major roadway density, population density, latitude, greenness, and 49
percentage of households using solid fuels. 50
51
Conclusion 52
53
To our knowledge, ours is the first study to model the spatial distribution of urban air pollution in 54
sub-Saharan Africa using air monitors developed from the region itself. Non-parametric ML for 55
LUR modeling performed with high accuracy for prediction of monthly PM2.5 levels. Our 56
analysis suggests that locally produced low-cost air quality sensors can help build capacity to 57
conduct air pollution epidemiology research in the region. 58
59
KEYWORDS 60
land use regression, low-cost sensors, machine learning, particulate matter, Africa 61
62
1. Introduction 63
64
Data gaps in lower and middle-income countries (LMICs) related to environmental pollution is 65
limiting environmental policy development and governance as well as our understanding of 66
health impacts from pollution in LMICs. Low-cost sensors (LCS) hold great promise for being 67
able to bridge these environmental pollution data gaps in LMICs. (Amegah, 2018) The 68
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
4
widespread use of LCSs in LMIC settings, however, is yet to be realized. This underutilization of 69
LCS in LMICs is due to both technical and non-technical reasons, including: (1) limitations of 70
data quality collected by LCSs; (2) a lack of downstream data analytics applications for LCSs; 71
and (3) a lack of consideration for sustainable operating mechanisms and physical and 72
socioeconomic contexts in LMICs.(Amegah, 2018; Mao et al., 2019) Despite their current 73
limitations, low-cost air quality sensors (LCAQS) have made substantial progress in terms of 74
acceptance for their use in certain air pollution measurement and research applications. 75
(Amegah, 2018; Clements et al., 2017; Malings et al., 2020; Masiol et al., 2019, 2018; 76
McKercher and Vanos, 2018; Weissert et al., 2020, 2019) 77
78
Emergent LCAQS applications include the capability to enhance air quality regulatory 79
monitoring by improving spatial and temporal resolution of current air monitoring programs, 80
(Malings et al., 2020; McKercher and Vanos, 2018) and identifying particulate matter (PM) 81
sources in complex urban environments. (Hagan et al., 2019) Recent studies conducted in the 82
U.S. suggest that air pollution data collected using LCAQS can also help with generating spatio-83
temporal models that can reliably predict fine spatial-scale urban air pollution concentrations. 84
(Masiol et al., 2019, 2018; Weissert et al., 2020, 2019) The present study builds off of these 85
recent advances in air pollution-modeling by using LCAQS data for a spatial air pollution-86
prediction model. Where our study differs, however, is we implement the study in the LMIC 87
context of urban Uganda. 88
89
What makes our study particularly unique is that we are using a spatially dense network of 90
LCAQS that have been designed and fabricated locally in Uganda. These LCAQS developed by 91
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
5
AirQo are the first, to our knowledge, to originate from a sub-Saharan Africa (SSA) country. 92
Such locally designed and produced LCAQS can plausibly address several limitations of other 93
LCAQS, including offering a more sustainable operating mechanism as well as creating an 94
LCAQS designed to operate in the challenging urban SSA infrastructural, socioeconomic and 95
environmental context. For instance, the LCAQS used in our study, named AirQo, are designed 96
and optimized to work in places characterized by sporadic internet connectivity, irregular power 97
supply, high temperatures and dusty environments. The devices include a custom designed 98
filtration system to minimize clogging, dust deposition, and reduce insect infestation common in 99
the SSA region. Therefore, this study is motivated by a proof-of-concept in terms of using 100
locally-sourced LCAQS for developing a LUR model to be employed in future ambient air 101
pollution epidemiology research in the SSA region. 102
103
Moreover, conventional LUR air pollution modeling is implemented using multivariable linear 104
regression and often applies K-fold cross-validation to validate the prediction model. (Brokamp 105
et al., 2017; Eeftens et al., 2012; Mao et al., 2012; Sahsuvaroglu et al., 2006) Recent advances in 106
LUR air pollution modeling suggests that Machine Learning (ML) algorithms, such as Random 107
Forests (RF), helps deal with overfitting and relaxing assumptions of linearity. (Araki et al., 108
2018; Beckerman et al., 2013; Brokamp et al., 2017; Di et al., 2019; Rahman et al., 2020; 109
Weissert et al., 2020, 2019) Hence, our study takes a ML approach to LUR modeling, using the 110
data generated from the LCAQS network described in this study. 111
112
2. Materials and Methods 113
114
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
6
The country of Uganda straddles the equator and is located in the East Africa region of SSA. The 115
study area (Figure 1) encompasses six urban sites of Uganda’s central and eastern region, 116
including Jinja, Kampala, Luwero, Mityana, Mukono, and Wakiso districts. Uganda’s capital 117
city, Kampala, where nearly two-thirds (n=14) of the LCAQS were placed in our study, is 118
located along the northern shores of Lake Victoria at an altitude of approximately 1,140 meters 119
above sea level. (Fuhrimann et al., 2015) The districts included in the study have a wide range of 120
population sizes; ranging from ~2.0 million, 1.5 million, 0.6 million, 0.47 million, 0.46 million, 121
and 0.33 million, for Wakiso, Kampala, Mukono, Jinja, Luwero, and Mityana, respectively. 122
(UBOS, 2014) 123
Figure 1. Map of the Study Area’s Six Districts and the Spatial Coverage of the LCAQS Monitoring Sites. 124
125
126
127
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
7
2.2. PM2.5 Measurements 128
2.2.1. Sensor Network 129
The AirQo devices measure particulate matter (PM) PM2.5 and PM10 using a nephelometer 130
(light-scattering) technology. The devices also measure location (latitude, longitude) and 131
meteorology parameters including internal and external temperature, atmospheric pressure, and 132
humidity. The AirQo devices transmit data over a local Global System for Mobile 133
Communications (GSM) network every 90 seconds and can run off solar or mains. Currently, we 134
have deployed devices at static locations and mobile monitors (e.g., motorcycle taxis) thereby 135
forming a network of both fixed and dynamic nodes. Currently, the sensor network includes 65 136
nodes with 40 in Kampala area and 25 in other urban areas of Uganda. In this study we use the 137
data from the fixed monitoring locations only and have restricted the data to monitors that have 138
been in operation for at least 75% of the study period (n=22 AirQo sensors). We installed 139
devices between 2.5 and 4 meters high. Sensor placement is determined on a number of spatial 140
features including population density, land use, road network, pollution sources and receptors, 141
economic activities, and practical limitations, among others. The fixed installation locations 142
include private property, schools, buildings, and lighting poles. Depending on the installation 143
location, we fabricated custom mountings to support and secure the air quality monitor. To 144
ensure data quality, at least one AirQo devices is co-located near (~10 meters) a Beta 145
Attenuation Mass Monitor (BAM)1020 reference monitor currently installed and operated at the 146
U.S. Embassy in Kampala. Additionally, for internal data quality assurance, each device includes 147
two PM sensors. This dual sensor approach enables us to rapidly compare a given sensor against 148
its twin sensor in order to detect any problems for the sensor. The collected data are transmitted 149
in near real-time to a cloud-based platform. In addition to the AirQo sensors, we also used 150
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
8
another LCAQS known as Clarity Node (n=1). The Clarity sensor uses a similar nephelometer 151
technology that the AirQo device uses to detect PM2.5. Additionally, the Clarity sensor transmits 152
PM monitoring data over a local GSM network in near real-time to a cloud. (Pantelic et al., 153
2019) The majority of the LCAQS used in this study are AirQo sensors (n=22) while only one 154
Clarity Node-S sensor was used. 155
156
2.3. PM2.5 Estimation 157
2.3.1. Predictor Variables 158
We assembled 18 predictor variables for LUR modeling. We define these variables using four 159
broad categories, including spatial variables, meteorological variables, land use variables, and 160
demographic variables. Table 1 summarizes the relevant information for each of the predictor 161
variables in terms of their range of buffer sizes, spatial resolution, data format, and references. 162
Table 1. Predictor Variables used for LUR Modeling. 163
Variable
Buffer Size/Resolution/Spatial
Unit
Data Format
Reference
Meteorological and Spatial Predictors
Precipitation
Inches (monthly averages; 2005-
2015)
Tabular/Vector
(https://www.timeanddate.
com/weather/uganda/ente
bbe/climate, n.d.)
Latitude
Tabular/Vector
Longitude
Tabular/Vector
Elevation
100m
Raster values
transformed into
Vector for analysis
(USGS, n.d.)
Land Use Predictors
Major Roadways
250m buffer
Raster values
transformed into
Vector for analysis
(OpenStreetMap, n.d.)
Major Roadways
500m buffer
Raster values
transformed into
(OpenStreetMap, n.d.)
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
9
Vector for analysis
Major Roadways
750m buffer
Raster values
transformed into
Vector for analysis
(OpenStreetMap, n.d.)
Greenness (NDVI)
250m buffer/250m
Raster values
transformed into
Vector for analysis
(FEWSNET, n.d.)
Greenness (NDVI)
500m buffer/250m
Raster values
transformed into
Vector for analysis
(FEWSNET, n.d.)
Greenness (NDVI)
750m buffer/250m
Raster values
transformed into
Vector for analysis
(FEWSNET, n.d.)
Demographic and Household Predictors
Number of People
Parish-level
Tabular/Vector
(UBOS, n.d.)
Number of Households
Parish-level
Tabular/Vector
(UBOS, n.d.)
Household Density
(number of
households/Parish area)
Parish-level
Raster values
transformed into
Vector for analysis
(UBOS, n.d.)
Percent Households
Solid Fuel Use
Parish-level
Tabular/Vector
(UBOS, n.d.)
Population Density
~2km
Tabular/Vector
(HDX, n.d.)
Population Density
250m buffer/~2km
Tabular/Vector
(HDX, n.d.)
Population Density
500m buffer/~2km
Tabular/Vector
(HDX, n.d.)
Population Density
750m buffer/~2km
Tabular/Vector
(HDX, n.d.)
164
2.3.2. Statistical Analysis and Land Use Regression Modeling 165
We used PM2.5 concentration data from 23 LCAQS in total, including 22 sensors from the 166
AirQo network and one Clarity sensor. We used sensor data collected between May 1, 2019 and 167
February 29, 2020. Monthly PM2.5 air concentration averages were computed (n=218 168
observations) and combined with covariates for the LUR modeling. We calculated summary 169
statistics for the monthly averages overall for the study area and stratified by month and district. 170
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
10
We then calculated Pearson correlation coefficients between monthly PM2.5 averages and land 171
use variables. Since the distribution of monthly averaged PM2.5 measurements were highly 172
right-skewed, we log-transformed PM2.5 concentrations for the machine learning LUR (ML-173
LUR) modeling described in turn. We used monthly PM2.5 averages for modeling since the 174
intended purpose of these exposure estimates is for predicting trimester-specific and entire 175
pregnancy PM2.5 exposure averages for a future birth cohort study, as has been done in previous 176
studies. (Coker et al., 2015) 177
178
For the ML-LUR algorithms, the combined PM2.5 and LUR dataset was first split into a training 179
set (90%) and validation test set (10%). Next, we performed 10-fold cross-validation on the 180
training set (n=198 observations) only, using root mean squared error (RMSE) to guide each 181
base learner model. Eight different ML algorithms were fit in order to compare each learner’s 182
performance. These models include linear regression model (LM), Support Vector Machines 183
with Radial Basis Function Kernel (SVM), Random Forest (RF), Quantile Random Forest 184
(QRF), eXtreme Gradient Boosting (xgbTree), Generalized Additive Model (GAM), Lasso and 185
Elastic-Net Regularized Generalized Linear Models (GLMNET), and Least Angle Regression 186
(LARS). All 18 covariates described in Table 1, which included land use variables (e.g., major 187
roadway density, greenspace), population demographic variables, and historical precipitation 188
data, were included in the analysis. We implemented the base ML algorithms using the caret 189
package in R, with the ‘caretList’ command used to fit all ML algorithms in parallel. In addition 190
to the individual base learner models already mentioned, we performed ensemble modeling using 191
the caret package in order to assess whether improved ML-LUR model performance is achieved 192
through ensemble modeling as seen in (Di et al., 2019 and Lim et al., 2019). (Di et al., 2019; Lim 193
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
11
et al., 2019) We implemented ensemble modeling with the caretEnsemble package in R, using 194
approaches offered by the ‘caretEnsemble’ (CE) and ‘caretStack’ (CS) commands. For the CE 195
approach, we applied GLM to create a linear combination of all base learner models. Whereas in 196
the CS approach we applied a stacked caret approach that combined the results from multiple 197
component caret models. Since there were strong correlations between results from the base 198
learner models, we used GLMNET when applying the stacked approach. Our final assessment of 199
RMSE and R2 applied to the 10% held-out test set only since this should better represent the 200
ability of the ML-LUR to predict monthly PM2.5 concentrations at unmeasured locations for our 201
study area. 202
3. RESULTS 203
3.1. PM2.5 Monitoring Results 204
Average monthly PM2.5 concentrations for the entire study area was 60.2 µg/m3 (IQR: 45.4-73.0 205
µg/m3; median= 57.5 ug/m3). According to Figure 2a, monitoring sites in Luwero and Mukono 206
Districts exhibited the highest PM2.5 levels. As expected, according to Figure 2b, elevated 207
PM2.5 concentrations were observed to be lowest during the wet season and highest during the 208
dry season. 209
3.1.1. Comparison of AirQo with a reference monitor 210
For comparison, we co-located an AirQo sensor with a BAM1020 reference monitor located at 211
the US Embassy in Kampala. The mean monthly PM2.5 concentrations were 63.1 µg/m3 and 212
60.2 µg/m3 for the BAM1020 and AirQo monitors, respectively. Figure 3 plots the monthly 213
PM2.5 averages of the BAM1020 embassy monitor versus the AirQo sensor. With an RMSE of 214
5.58 µg/m3, normalized RMSE of 8.8%, and an R2 of 0.87, the AirQo sensor compare well with 215
the BAM1020 in terms of monthly averages. 216
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
12
Figure 2. PM2.5 Concentrations (May 2019-February 2020) by (a) District and (b) Month. 217
218
219
Figure 3. Average Monthly PM2.5 Concentrations from AirQo and BAM1020 Monitors. 220
221
222
3.2. ML-LUR Results 223
We summarize the RMSE and R2 values for the base learner models and ensemble models in 224
Table 2 (for log-transformed and exponentiated values). These values were computed using the 225
held-out test set (N=20) only. The GAM resulted in the lowest RMSE as well as highest R2 226
values (R2=0.94) for the log-transformed values. Even the ensemble models performed quite 227
(a)
(b)
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
13
well, as shown in Table 2, both the GAM and xgbTree models outperformed the ensemble 228
models. The exponentiated predictions exhibit a similar pattern as the log-transformed values, 229
indicating the GAM with the lowest RMSE (5.43 µg/m3) and highest R2 (0.92) values. 230
Table 2. Performance (RMSE and R2) for Models 231
Model
RMSE a
RMSE b
R2 b
GAM
0.083
5.43
0.92
xgbTree
0.111
6.75
0.85
Stacked Ensemble (glmnet)
0.117
7.10
0.84
Ensemble (lm)
0.121
7.35
0.83
RF
0.148
7.60
0.82
QRF
0.183
9.62
0.69
SVM
0.195
12.6
0.45
LM
0.198
13.2
0.42
LARS
0.210
13.9
0.39
GLMNET
0.242
15.4
0.28
aLog-transformed (not exponentiated) 232
bExponentiated 233
234
3.2.1. Variable Importance 235
As shown in Figure A1 in the Appendix, several of the LUR variables are moderately to highly 236
correlated with one another. After extracting the variable importance values of study variables, 237
as calculated from the top performing model (GAM), we are able to rank the ML-LUR variables 238
in terms of predicting monthly PM2.5 concentrations. According to Figure A2, precipitation, 239
greenness (NDVI), roadway density, latitude, and solid fuel usage are the top-ranking variables. 240
When restricting our analysis to the top-5 predictor categories (precipitation, NDVI, roadway 241
density, latitude, and solid fuel use) only, the GAM model explained 88% of the monthly PM2.5 242
concentration variability using the entire data set (data not shown). 243
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
14
Discussion 244
In our study, we leveraged air quality data from a network of mostly locally designed and 245
produced LCAQS that were then used to predict estimates of PM2.5 in urban districts of Uganda. 246
Moreover, we applied ML to optimize the LUR model. Importantly, we find that the AirQo 247
sensors compare well against a BAM1020 reference monitor co-located at the U.S. Embassy. In 248
general, when using predictors typically used in LUR modeling, the non-parametric ML 249
algorithms performed the best in terms of being able to accurately predict monthly PM2.5 250
concentrations when compared to parametric modeling (e.g., linear model). 251
252
Of the land use predictors considered in our study, several stood out as strong predictors. The 253
strongest predictors include precipitation, greenness, density of major roadways, latitude, solid 254
fuel usage, and population density. To our knowledge, our study is the first to use population 255
census data on solid fuel usage (at the Parish-level) in a LUR model. Specifically, we find that 256
higher solid fuel usage is positively correlated with monthly PM2.5 concentrations. This finding 257
suggests that area-level solid fuel use data can help inform LUR prediction models for PM2.5 in 258
lower income SSA urban areas, and potentially other regions with high levels of solid fuel usage. 259
Previous LUR prediction models for PM2.5 in SSA have been shown to have relatively poorer 260
performance(Saucy et al., 2018; Tularam, 2019) compared to gaseous pollutant models for SSA 261
or PM2.5 models developed in higher income regions. Given our results, as well as other air 262
pollution research conducted in urban SSA that also show strong correlations between 263
neighborhood-level solid fuel use and outdoor PM concentrations(Zhou et al., 2011), we suggest 264
future modeling efforts in this region should incorporate solid fuel use data to improve PM2.5 265
modeling predictions. 266
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
15
267
To our knowledge, this is the first study to use LCAQS for LUR modeling in SSA. As suggested 268
by previous authors(Amegah, 2018), we demonstrate that LCAQS hold strong potential for 269
providing highly spatially resolved PM2.5 measurement data that can be harnessed for exposure 270
estimation in air pollution epidemiology research. While data can be integrated to improve model 271
performance, such as aerosol optical depth (AOD) remote sending data, we are encouraged by 272
our findings. Future analyses will focus on optimizing calibration approaches for the AirQo PM 273
sensor data. Since accurate measurement of PM2.5 with light scattering sensors can be limited by 274
accuracy errors caused by environmental parameters such as relative humidity and temperature 275
and may be subject to drift(US EPA, n.d.), we will use a co-located reference method (e.g., 276
BAM1020) and model the influence of relative humidity (RH) and temperature on measurement 277
accuracy; which can then be used in turn for regression-based calibration purposes in future 278
epidemiology research. (Wang et al., 2019) 279
280
Conclusion 281
Deploying LCAQS can help address the urgent and growing need for expanding and improving 282
air quality monitoring in resource-limited settings of SSA. With reasonably accurate predictions 283
of PM2.5 using ML-LUR with 10-fold cross-validation, data from the locally developed AirQo 284
sensors used in the present study provided evidence suggesting that they can be used for 285
modeling exposures for a birth cohort study. 286
287
References 288
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
16
Amegah, A.K., 2018. Proliferation of low-cost sensors. What prospects for air 289
pollution epidemiologic research in Sub-Saharan Africa? Environ. Pollut. 290
241, 1132–1137. https://doi.org/10.1016/j.envpol.2018.06.044 291
Araki, S., Shima, M., Yamamoto, K., 2018. Spatiotemporal land use random forest 292
model for estimating metropolitan NO2 exposure in Japan. Sci. Total 293
Environ. 634, 12691277. https://doi.org/10.1016/j.scitotenv.2018.03.324 294
Beckerman, B.S., Jerrett, M., Martin, R.V., van Donkelaar, A., Ross, Z., Burnett, 295
R.T., 2013. Application of the deletion/substitution/addition algorithm to 296
selecting land use regression models for interpolating air pollution 297
measurements in California. Atmos. Environ. 77, 172177. 298
https://doi.org/10.1016/j.atmosenv.2013.04.024 299
Brokamp, C., Jandarov, R., Rao, M.B., LeMasters, G., Ryan, P., 2017. Exposure 300
assessment models for elemental components of particulate matter in an 301
urban environment: A comparison of regression and random forest 302
approaches. Atmos. Environ. 151, 111. 303
https://doi.org/10.1016/j.atmosenv.2016.11.066 304
Clements, A.L., Griswold, W.G., Rs, A., Johnston, J.E., Herting, M.M., Thorson, 305
J., Collier-Oxandale, A., Hannigan, M., 2017. Low-Cost Air Quality 306
Monitoring Tools: From Research to Practice (A Workshop Summary). 307
Sensors 17, 2478. https://doi.org/10.3390/s17112478 308
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
17
Coker, E., Ghosh, J., Jerrett, M., Gomez-Rubio, V., Beckerman, B., Cockburn, M., 309
Liverani, S., Su, J., Li, A., Kile, M.L., Ritz, B., Molitor, J., 2015. Modeling 310
spatial effects of PM2.5 on term low birth weight in Los Angeles County. 311
Environ. Res. 142, 354364. https://doi.org/10.1016/j.envres.2015.06.044 312
Di, Q., Amini, H., Shi, L., Kloog, I., Silvern, R., Kelly, J., Sabath, M.B., Choirat, 313
C., Koutrakis, P., Lyapustin, A., Wang, Y., Mickley, L.J., Schwartz, J., 314
2019. An ensemble-based model of PM2.5 concentration across the 315
contiguous United States with high spatiotemporal resolution. Environ. Int. 316
130, 104909. https://doi.org/10.1016/j.envint.2019.104909 317
Eeftens, M., Beelen, R., de Hoogh, K., Bellander, T., Cesaroni, G., Cirach, M., 318
Declercq, C., Dėdelė, A., Dons, E., de Nazelle, A., Dimakopoulou, K., 319
Eriksen, K., Falq, G., Fischer, P., Galassi, C., Gražulevičienė, R., Heinrich, 320
J., Hoffmann, B., Jerrett, M., Keidel, D., Korek, M., Lanki, T., Lindley, S., 321
Madsen, C., Mölter, A., Nádor, G., Nieuwenhuijsen, M., Nonnemacher, M., 322
Pedeli, X., Raaschou-Nielsen, O., Patelarou, E., Quass, U., Ranzi, A., 323
Schindler, C., Stempfelet, M., Stephanou, E., Sugiri, D., Tsai, M.-Y., Yli-324
Tuomi, T., Varró, M.J., Vienneau, D., Klot, S. von, Wolf, K., Brunekreef, 325
B., Hoek, G., 2012. Development of Land Use Regression Models for PM 2.5 326
, PM 2.5 Absorbance, PM 10 and PM coarse in 20 European Study Areas; 327
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
18
Results of the ESCAPE Project. Environ. Sci. Technol. 46, 1119511205. 328
https://doi.org/10.1021/es301948k 329
FEWSNET, n.d. Products | Early Warning and Environmental Monitoring Program 330
[WWW Document]. URL https://earlywarning.usgs.gov/fews/product/448 331
(accessed 2.24.20). 332
Fuhrimann, S., Stalder, M., Winkler, M.S., Niwagaba, C.B., Babu, M., Masaba, G., 333
Kabatereine, N.B., Halage, A.A., Schneeberger, P.H.H., Utzinger, J., Cissé, 334
G., 2015. Microbial and chemical contamination of water, sediment and soil 335
in the Nakivubo wetland area in Kampala, Uganda. Environ. Monit. Assess. 336
187, 475. https://doi.org/10.1007/s10661-015-4689-x 337
Hagan, D.H., Gani, S., Bhandari, S., Patel, K., Habib, G., Apte, J.S., Hildebrandt 338
Ruiz, L., Kroll, J.H., 2019. Inferring Aerosol Sources from Low-Cost Air 339
Quality Sensor Measurements: A Case Study in Delhi, India. Environ. Sci. 340
Technol. Lett. 6, 467472. https://doi.org/10.1021/acs.estlett.9b00393 341
HDX, n.d. High Resolution Population Density Maps - Humanitarian Data 342
Exchange [WWW Document]. URL 343
https://data.humdata.org/dataset/highresolutionpopulationdensitymaps 344
(accessed 2.24.20). 345
https://www.timeanddate.com/weather/uganda/entebbe/climate, n.d. Climate & 346
Weather Averages in Entebbe, Uganda [WWW Document]. URL 347
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
19
https://www.timeanddate.com/weather/uganda/entebbe/climate (accessed 348
2.24.20). 349
Lim, C.C., Kim, H., Vilcassim, M.J.R., Thurston, G.D., Gordon, T., Chen, L.-C., 350
Lee, K., Heimbinder, M., Kim, S.-Y., 2019. Mapping urban air quality using 351
mobile sampling with low-cost sensors and machine learning in Seoul, South 352
Korea. Environ. Int. 131, 105022. 353
https://doi.org/10.1016/j.envint.2019.105022 354
Malings, C., Tanzer, R., Hauryliuk, A., Saha, P.K., Robinson, A.L., Presto, A.A., 355
Subramanian, R., 2020. Fine particle mass monitoring with low-cost 356
sensors: Corrections and long-term performance evaluation. Aerosol Sci. 357
Technol. 54, 160174. https://doi.org/10.1080/02786826.2019.1623863 358
Mao, F., Khamis, K., Krause, S., Clark, J., Hannah, D.M., 2019. Low-Cost 359
Environmental Sensor Networks: Recent Advances and Future Directions. 360
Front. Earth Sci. 7, 221. https://doi.org/10.3389/feart.2019.00221 361
Mao, L., Qiu, Y., Kusano, C., Xu, X., 2012. Predicting regional spacetime 362
variation of PM2.5 with land-use regression model and MODIS data. 363
Environ. Sci. Pollut. Res. 19, 128138. https://doi.org/10.1007/s11356-011-364
0546-9 365
Masiol, M., Squizzato, S., Chalupa, D., Rich, D.Q., Hopke, P.K., 2019. Spatial-366
temporal variations of summertime ozone concentrations across a 367
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
20
metropolitan area using a network of low-cost monitors to develop 24 hourly 368
land-use regression models. Sci. Total Environ. 654, 11671178. 369
https://doi.org/10.1016/j.scitotenv.2018.11.111 370
Masiol, M., Zíková, N., Chalupa, D.C., Rich, D.Q., Ferro, A.R., Hopke, P.K., 371
2018. Hourly land-use regression models based on low-cost PM monitor 372
data. Environ. Res. 167, 714. https://doi.org/10.1016/j.envres.2018.06.052 373
McKercher, G.R., Vanos, J.K., 2018. Low-cost mobile air pollution monitoring in 374
urban environments: a pilot study in Lubbock, Texas. Environ. Technol. 39, 375
1505–1514. https://doi.org/10.1080/09593330.2017.1332106 376
OpenStreetMap, n.d. OpenStreetMap [WWW Document]. OpenStreetMap. URL 377
https://www.openstreetmap.org/ (accessed 2.24.20). 378
Pantelic, J., Dawe, M., Licina, D., 2019. Use of IoT sensing and occupant surveys 379
for determining the resilience of buildings to forest fire generated PM2.5. 380
PLOS ONE 14, e0223136. https://doi.org/10.1371/journal.pone.0223136 381
Rahman, M.M., Karunasinghe, J., Clifford, S., Knibbs, L.D., Morawska, L., 2020. 382
New insights into the spatial distribution of particle number concentrations 383
by applying non-parametric land use regression modelling. Sci. Total 384
Environ. 702, 134708. https://doi.org/10.1016/j.scitotenv.2019.134708 385
Sahsuvaroglu, T., Arain, A., Kanaroglou, P., Finkelstein, N., Newbold, B., Jerrett, 386
M., Beckerman, B., Brook, J., Finkelstein, M., Gilbert, N.L., 2006. A Land 387
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
21
Use Regression Model for Predicting Ambient Concentrations of Nitrogen 388
Dioxide in Hamilton, Ontario, Canada. J. Air Waste Manag. Assoc. 56, 389
1059–1069. https://doi.org/10.1080/10473289.2006.10464542 390
Saucy, A., Röösli, M., Künzli, N., Tsai, M.-Y., Sieber, C., Olaniyan, T., Baatjies, 391
R., Jeebhay, M., Davey, M., Flückiger, B., Naidoo, R., Dalvie, M., Badpa, 392
M., de Hoogh, K., 2018. Land Use Regression Modelling of Outdoor NO2 393
and PM2.5 Concentrations in Three Low Income Areas in the Western Cape 394
Province, South Africa. Int. J. Environ. Res. Public. Health 15, 1452. 395
https://doi.org/10.3390/ijerph15071452 396
Tularam, H., 2019. Land use Regression Model for Exposure Assessment in the 397
Mace Birth Cohort Study in Ethekwini: Environ. Epidemiol. 3, 401. 398
https://doi.org/10.1097/01.EE9.0000610480.85058.4a 399
UBOS, 2014. National Population and Housing Census. Kampala, UG. 400
UBOS, n.d. Visualizations Uganda Bureau of Statistics. URL 401
https://www.ubos.org/data-portals-2/visualizations/ (accessed 2.24.20). 402
US EPA, n.d. collocation_instruction_guide.pdf [WWW Document]. URL 403
https://www.epa.gov/sites/production/files/2018-404
01/documents/collocation_instruction_guide.pdf (accessed 5.12.20). 405
USGS, n.d. EarthExplorer - Home [WWW Document]. URL 406
https://earthexplorer.usgs.gov/ (accessed 2.24.20). 407
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
22
Wang, Y., Du, Y., Wang, J., Li, T., 2019. Calibration of a low-cost PM2.5 monitor 408
using a random forest model. Environ. Int. 133, 105161. 409
https://doi.org/10.1016/j.envint.2019.105161 410
Weissert, L., Alberti, K., Miles, E., Miskell, G., Feenstra, B., Henshaw, G.S., 411
Papapostolou, V., Patel, H., Polidori, A., Salmond, J.A., Williams, D.E., 412
2020. Low-cost sensor networks and land-use regression: Interpolating 413
nitrogen dioxide concentration at high temporal and spatial resolution in 414
Southern California. Atmos. Environ. 223, 117287. 415
https://doi.org/10.1016/j.atmosenv.2020.117287 416
Weissert, L.F., Alberti, K., Miskell, G., Pattinson, W., Salmond, J.A., Henshaw, 417
G., Williams, D.E., 2019. Low-cost sensors and microscale land use 418
regression: Data fusion to resolve air quality variations with high spatial and 419
temporal resolution. Atmos. Environ. 213, 285295. 420
https://doi.org/10.1016/j.atmosenv.2019.06.019 421
Zhou, Z., Dionisio, K.L., Arku, R.E., Quaye, A., Hughes, A.F., Vallarino, J., 422
Spengler, J.D., Hill, A., Agyei-Mensah, S., Ezzati, M., 2011. Household and 423
community poverty, biomass use, and air pollution in Accra, Ghana. Proc. 424
Natl. Acad. Sci. 108, 1102811033. 425
https://doi.org/10.1073/pnas.1019183108 426
427
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
23
428
429
430
431
432
433
434
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 14 June 2020 doi:10.20944/preprints202006.0158.v1
... In some recent studies, researchers used low-cost mobile phonebased sensors to measure PM 2.5 exposure to supplement data from fixed monitoring sites. Coker et al. (2020) developed the LUR model of PM 2.5 for Kampala, Uganda, using monthly data from 22 sites of low-cost air quality sensors (LCAQS) between May 2019 and February 2020. The data from LCAQS were compared with those collected from the standard, fixed monitoring site to ensure their reliability. ...
Article
In recent years, as the level of fine particulate matter (PM2.5) concentration has become more closely monitored in Thailand and its harmful effects on health have been widely recognized by the public, the Thai government has debated various measures to improve air quality. In this paper, the Land Use Regression (LUR) technique was used to model the relationship between the daily PM2.5 concentration and various predictor variables using data from the entire year of 2019. The results confirmed strong seasonal effects on PM2.5 and substantial effects of time-variant predictors, including open biomass burning and meteorological conditions. However, time-invariant variables, including traffic, transportation, and land use characteristics were generally weaker predictors in the LUR models. The results of the model based on data for the entire year showed better statistical fit and robustness than the seasonal models. The relatively low adjusted R² of the models developed in this study compared with previous LUR studies suggests that more detailed data, especially the traffic volume on roads nearby monitoring sites, might be necessary to improve the model's performance. Finally, the large buffer size of the open biomass burning predictor implied that the measures to reduce PM2.5 by limiting open biomass burning would require international cooperation as some fires within the buffer area occurred in neighboring countries outside the borders of Thailand.
Article
Full-text available
Wildfires and associated emissions of particulate matter pose significant environmental and health concerns. In this study we propose tools to evaluate building resilience to extreme episodes of outdoor particulate matter using a combination of indoor and outdoor IoT measurements, coupled with survey-based information of occupants’ perception and behaviour. We demonstrated the application of the tools on two buildings with different modes of ventilation during the Chico Camp fire event. We characterized the resilience of the buildings on different temporal and spatial scales using the well-established I/O ratio and a newly proposed E-index that evaluates indoor concentration in the context of adopted 24-hour exposure thresholds. Indoor PM2.5 concentration during the entire Chico Camp Fire event was 21 μg/m³ for 4th Street (Mechanically Ventilated) and 36 μg/m³ for Wurster Hall (Naturally Ventilated). The cumulative median I/O ratio during the fire event was 0.27 for 4th Street and 0.67 for Wurster Hall. Overall E-index for 4th Street was 0.82, suggesting that the whole building was resilient to outdoor air pollution while overall E-index was 1.69 for Wurster Hall suggesting that interventions are necessary. The survey revealed that occupant perception of workplace air quality aligns with measured PM2.5 in the two buildings. The results also highlight that a large portion of occupants wore face masks, even though the PM2.5 concentration was below WHO threshold level. The results of our study demonstrate the utility of the proposed IoT-enabled and survey tools to assess the degree of protection from air pollution of outdoor origin for a single building or across a portfolio of buildings. The proposed survey tool also provides direct links between the PM2.5 levels and occupants’ perception and behavior.
Article
Full-text available
Air pollution can cause many adverse health outcomes, including cardiovascular and respiratory disorders. Land use regression (LUR) models are frequently used to describe small-scale spatial variation in air pollution levels based on measurements and geographical predictors. They are particularly suitable in resource limited settings and can help to inform communities, industries, and policy makers. Weekly measurements of NO2 and PM2.5 were performed in three informal areas of the Western Cape in the warm and cold seasons 2015–2016. Seasonal means were calculated using routinely monitored pollution data. Six LUR models were developed (four seasonal and two annual) using a supervised stepwise land-use-regression method. The models were validated using leave-one-out-cross-validation and tested for spatial autocorrelation. Annual measured mean NO2 and PM2.5 were 22.1 μg/m3 and 10.2 μg/m3, respectively. The NO2 models for the warm season, cold season, and overall year explained 62%, 77%, and 76% of the variance (R2). The PM2.5 annual models had lower explanatory power (R2 = 0.36, 0.29, and 0.29). The best predictors for NO2 were traffic related variables (major roads, bus routes). Local sources such as grills and waste burning sites appeared to be good predictors for PM2.5, together with population density. This study demonstrates that land-use-regression modelling for NO2 can be successfully applied to informal peri-urban settlements in South Africa using similar predictor variables to those performed in Europe and North America. Explanatory power for PM2.5 models is lower due to lower spatial variability and the possible impact of local transient sources. The study was able to provide NO2 and PM2.5 seasonal exposure estimates and maps for further health studies.
Article
The development of low-cost sensors and novel calibration algorithms offer new opportunities to supplement existing regulatory networks to measure air pollutants at a high spatial resolution and at hourly and sub-hourly timescales. We use a random forest model on data from a network of low-cost sensors to describe the effect of land use features on local-scale air quality, extend this model to describe the hourly-scale variation of air quality at high spatial resolution, and show that deviations from the model can be used to identify particular conditions and locations where air quality differs from the expected land-use effect. The conditions and locations under which deviations were detected conform to expectations based on general experience.
Article
Ambient particle number concentration (PNC) varies significantly in time and space within cities, yet complexity and cost prohibit large-scale routine monitoring; as a consequence, there is not enough data for assessment of human exposure to, or risk from the particles. The quality of assessments can be augmented by modelling; however, models are generally less capable of predicting PNC spatial variation than predicting variations in other ambient pollutants. To advance modelling of PNC, we aimed to develop and compare the performance of parametric and non-parametric machine learning land-use regression (LUR) models to predict hourly average PNC. We used data from 25 short-term stationary campaigns and five long-term sites during 2009–2012 in the Brisbane Metropolitan Area, Australia. We analysed three particle size ranges of total PNC (<30 nm, <414 nm and <3000 nm) as response variables, and over 150 independent variables, including land use, roads and traffic, population, distance, elevation, meteorology and time of day as potential predictors of PNC. The LUR models were developed separately for All Days, Nuc Days (when particle nucleation occurred), and No-nuc Days (when no particle nucleation occurred). We selected two algorithms to develop LUR models for PNC: a random forest (RF) model, and a generalised additive model (GAM) based on the least angle regression (LARS). The best LARS model for <30 nm, <414 nm and <3000 nm explained 30%, 31%, and 34%, respectively, whereas the best RF models were significantly better, explaining 73%, 64%, and 88%, respectively. Using this novel approach, we provided new insights into spatial variation in PNC and also demonstrated that the non-parametric RF model is a better choice for developing a LUR model for PNCs because of its robust predictive performance in comparison with the LARS parametric regression model.
Article
Land-use regression (LUR) models provide location and time specific estimates of exposure to air pollution and thereby improve the sensitivity of health effects models. However, they require pollutant concentrations at multiple locations along with land-use variables. Often, monitoring is performed over short durations using mobile monitoring with research-grade instruments. Low-cost PM monitors provide an alternative approach that increases the spatial and temporal resolution of the air quality data. LUR models were developed to predict hourly PM concentrations across a metropolitan area using PM concentrations measured simultaneously at multiple locations with low-cost monitors. Monitors were placed at 23 sites during the 2015/16 heating season. Monitors were externally calibrated using co-located measurements including a reference instrument (GRIMM particle spectrometer). LUR models for each hour of the day and weekdays/weekend days were developed using the deletion/substitution/addition algorithm. Coefficients of determination for hourly PM predictions ranged from 0.66 and 0.76 (average 0.7). The hourly-resolved LUR model results will be used in epidemiological studies to examine if and how quickly, increases in ambient PM concentrations trigger adverse health events by reducing the exposure misclassification that arises from using less time resolved exposure estimates
Article
The complex nature of air pollution in urban areas prevents traditional monitoring techniques from obtaining measurements representative of true human exposure. The current study assessed the capability of low-cost mobile monitors to acquire useful data in a city without a monitoring network in place (Lubbock, Texas) using a bicycle platform. The monitoring campaign resulted in 30 days of data along a 13.4 km fixed concentric route. Due to high sensitivities to airflow, the apparent wind velocity was accounted for throughout the route. The data were also normalized into percentiles in order to visualize spatial patterns. The highest estimated pollution levels were located near frequently busy intersections and roads; however, sensor issues resulted in lower confidence. Additional research is needed concerning the appropriate use of low-cost metal oxide sensors for citizen science applications, as measurements can be misleading if the user is unaware of sensors specifications. The simultaneous use of several low-cost mobile platforms, rather than a single platform, as well as the use of high-end cases, are recommended to create a more robust spatial analysis. The issues addressed from this research are important to understand for accurate and beneficial application of low-cost gaseous monitors for citizen science.
A Land Dioxide in Hamilton, Ontario
  • M Beckerman
  • B Brook
  • J Finkelstein
  • M Gilbert
M., Beckerman, B., Brook, J., Finkelstein, M., Gilbert, N.L., 2006. A Land Dioxide in Hamilton, Ontario, Canada. J. Air Waste Manag. Assoc. 56, 389 1059-1069. https://doi.org/10.1080/10473289.2006.10464542
Land use Regression Model for Exposure Assessment in the 397
  • H Tularam
Tularam, H., 2019. Land use Regression Model for Exposure Assessment in the 397
National Population and Housing Census
UBOS, 2014. National Population and Housing Census. Kampala, UG.
Visualizations -Uganda Bureau of Statistics
  • N Ubos
UBOS, n.d. Visualizations -Uganda Bureau of Statistics. URL 401 https://www.ubos.org/data-portals-2/visualizations/ (accessed 2.24.20).