ArticlePDF Available

Mapping urban air quality using mobile sampling with low-cost sensors and machine learning in Seoul, South Korea

Authors:

Abstract and Figures

Recent studies have demonstrated that mobile sampling can improve the spatial granularity of land use regression (LUR) models. Mobile sampling campaigns deploying low-cost (<$300) air quality sensors could potentially offer an inexpensive and practical approach to measure and model air pollution concentration levels. In this study, we developed LUR models for street-level fine particulate matter (PM2.5) concentration levels in Seoul, South Korea. 169 h of data were collected from an approximately three week long campaign across five routes by ten volunteers sharing seven AirBeams, a low-cost ($250 per unit), smartphone-based particle counter, while geospatial data were extracted from OpenStreetMap, an open-source and crowd-generated geographical dataset. We applied and compared three statistical approaches in constructing the LUR models - linear regression (LR), random forest (RF), and stacked ensemble (SE) combining multiple machine learning algorithms - which resulted in cross-validation R2 values of 0.63, 0.73, and 0.80, respectively, and identification of several pollution 'hotspots.' The high R2 values suggest that study designs employing mobile sampling in conjunction with multiple low-cost air quality monitors could be applied to characterize urban street-level air quality with high spatial resolution, and that machine learning models could further improve model performance. Given this study design's cost-effectiveness and ease of implementation, similar approaches may be especially suitable for citizen science and community-based endeavors, or in regions bereft of air quality data and preexisting air monitoring networks, such as developing countries.
Content may be subject to copyright.
Contents lists available at ScienceDirect
Environment International
journal homepage: www.elsevier.com/locate/envint
Mapping urban air quality using mobile sampling with low-cost sensors and
machine learning in Seoul, South Korea
Chris C. Lim
a,
, Ho Kim
b
, M.J. Ruzmyn Vilcassim
a
, George D. Thurston
a
, Terry Gordon
a
,
Lung-Chi Chen
a
, Kiyoung Lee
b
, Michael Heimbinder
c
, Sun-Young Kim
d
a
Department of Environmental Medicine, New York School of Medicine, New York, NY, United States of America
b
Graduate School of Public Health, Seoul National University, Seoul, South Korea
c
HabitatMap, Brooklyn, NY, United States of America
d
Graduate School of Cancer Science and Policy, National Cancer Center, Gyeonggi, South Korea
ARTICLE INFO
Handling Editor: Xavier Querol
ABSTRACT
Recent studies have demonstrated that mobile sampling can improve the spatial granularity of land use re-
gression (LUR) models. Mobile sampling campaigns deploying low-cost (< $300) air quality sensors could po-
tentially oer an inexpensive and practical approach to measure and model air pollution concentration levels. In
this study, we developed LUR models for street-level ne particulate matter (PM
2.5
) concentration levels in
Seoul, South Korea. 169 h of data were collected from an approximately three week long campaign across ve
routes by ten volunteers sharing seven AirBeams, a low-cost ($250 per unit), smartphone-based particle counter,
while geospatial data were extracted from OpenStreetMap, an open-source and crowd-generated geographical
dataset. We applied and compared three statistical approaches in constructing the LUR models linear re-
gression (LR), random forest (RF), and stacked ensemble (SE) combining multiple machine learning algorithms
which resulted in cross-validation R
2
values of 0.63, 0.73, and 0.80, respectively, and identication of several
pollution hotspots.The high R
2
values suggest that study designs employing mobile sampling in conjunction
with multiple low-cost air quality monitors could be applied to characterize urban street-level air quality with
high spatial resolution, and that machine learning models could further improve model performance. Given this
study design's cost-eectiveness and ease of implementation, similar approaches may be especially suitable for
citizen science and community-based endeavors, or in regions bereft of air quality data and preexisting air
monitoring networks, such as developing countries.
1. Introduction
Ambient air pollution is a major global public health concern, with
the World Health Organization estimating that 4.2 million premature
deaths annually are attributable to ne particulate matter (PM
2.5
) ex-
posure (WHO, 2018). Government and regulatory agencies throughout
the world have traditionally relied on networks of xed-site monitors in
order to measure air quality and establish standards. Owing to their
prohibitive equipment and operational costs, these monitors tend to be
sparsely located even in large metropolitan cities, or may be entirely
missing in many locales. However, as concentrations of air pollutants
can vary markedly over small distances and short time periods, the
urban environment cannot be fully characterized using information
from sparse, static networks of air pollution monitors (Kumar et al.,
2015). To empirically model and characterize the spatial or spatio-
temporal variability of PM
2.5
concentrations, land use regression (LUR)
models based on data from monitoring networks have been employed.
Recently, LUR models based on data collected from mobile sampling
designs where predetermined locations or routes are repeatedly
sampled on modes of transport have gained traction, oering im-
proved spatial resolution at a lower cost (e.g., Hankey and Marshall,
2015;Shi et al., 2016;Deville Cavellin et al., 2016).
Recent technological advancements and proliferation of air quality
sensors oer additional avenues to rene the spatiotemporal char-
acterization of air pollution levels (Morawska et al., 2018. Numerous
instruments from commercial entities, non-prots, and startups have
entered the market to date (Borghi et al., 2017;McKercher et al., 2017),
although the performance of these sensors can dier substantially
https://doi.org/10.1016/j.envint.2019.105022
Received 1 March 2019; Received in revised form 26 June 2019; Accepted 15 July 2019
Corresponding author at: Department of Environmental Medicine, New York University School of Medicine, 341 East 25th Street, New York, NY 10010, United
States of America.
E-mail address: ccl414@nyu.edu (C.C. Lim).
Environment International 131 (2019) 105022
Available online 27 July 2019
0160-4120/ © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
T
between the dierent models as well as between individual units, as
noted by evaluations in eld and laboratory settings (Jiao et al., 2016;
Jerrett et al., 2017;Castell et al., 2017;Kelly et al., 2017;Feinberg
et al., 2018;Levy Zamora et al., 2019). Oering the capability to in-
expensively generate a large volume of data, distributed networks of
low-cost air quality sensors are beginning to be established to augment
existing monitoring networks or provide novel real-time data streams
(Gao et al., 2015;Schneider et al., 2017;Zikova et al., 2017). Note-
worthy examples of collaborative endeavors between government
agencies, research organizations, and communities include: Open-
Sensein Geneva, Switzerland (Hasenfratz et al., 2015), Array of
Thingsin Chicago, U.S (Catlett et al., 2017), and the Imperial County
Community Air Monitoring Network (English et al., 2017) in California,
U.S.
LUR models based on data collected from mobile sampling with
low-cost (< $300) consumer-based sensors are very limited thus far,
which could potentially oer a highly cost-eective approach to model
and map air pollution concentration levels. The main aim of this study
was to deploy multiple units of the smartphone-based particle counter
AirBeamto measure and model street-level urban air quality in Seoul,
South Korea, a location with limited xed regulatory monitoring sites
relative to the high population and diverse urban environments. The
individual AirBeam units were rst collocated with a pDR-1500 within
a laboratory setting to adjust for intra-instrument variability and equate
particle counts to mass equivalents, and a mobile sampling campaign
was conducted by repeatedly walking across ve routes during an ap-
proximately three-week period. The collected air pollution data, to-
gether with an openly available and crowd-sourced geographical data
source OpenStreetMap (OSM), were then used to construct LUR models
with both linear regression and machine learning methods. This work
explores the potential of mobile sampling with low-cost air quality
sensors, machine learning models, and open datasources to char-
acterize street-level air quality in urban locations with ne spatial re-
solution.
2. Materials and methods
2.1. Equipment description and intra-instrument variability adjustment
The internal optical particle sensor of the AirBeam (dimensions:
105 × 95 × 43.5 mm; weight: 198 g) is the PPD60PV-T2 (detectable
particle range: 0 to 400 μg/m
3
; detective particle size 0.52.5 μm) from
Shinyei Technology Co. LTD. (Kyoto, Japan), connected to an Android
OX smartphone running the AirCasting application (aircasting.org).
Supplemental Fig. 1 depicts the AirBeam, its specications, and the
Android AirCasting app. This mobile system is capable of continuous
measurement (programmable intervals as little as per 1 s) and mapping
(by GPS and Google Maps). The platform code is open-source, and
collected data can be shared and mapped via an online platform,
Aircasting(www.aircasting.org/map).
To adjust for potential intra-instrument variability and to convert
particle counts to PM
2.5
mass equivalents, the AirBeam units were
collocated with a DataRAM pDR-1500 (Thermo Scientic, Franklin,
MA) within a concentrated air particle (CAP) system in Sterling Forest,
New York. The system draws in and concentrates ambient air through a
cyclone inlet that rst removes most of the particles larger than 2.5 μm
in aerodynamic diameter. The cyclone outow is passed over the warm
bath of water and is then rapidly cooled in the condenser, resulting in
supersaturation and particle growth (Maciejczyk et al., 2005). The pDR-
1500 was initially calibrated with ambient particles via its internal
a) b)
c)
Fig. 1. (a) Locations of the ve sampling routes in Seoul and government-run, xed-site monitors (blue markers). Mean PM
2.5
concentration levels (μg/m
3
) during
the sampling period at each of the 100 m segments are also depicted. The red arrows point to an underground roadway, which was not included in analyses. We also
present close-up views of route C as an example to depict sampled data points, with (b) OpenStreetMap and (c) satellite backgrounds. (For interpretation of the
references to color in this gure legend, the reader is referred to the web version of this article.)
C.C. Lim, et al. Environment International 131 (2019) 105022
2
gravimetric lter and pump system at a ow rate of 1.5 L/min in the
CAP chamber. The individual AirBeam units were then calibrated with
the pDR-1500; rst, the individual AirBeam units were placed within
the CAPS chamber together with the pDR-1500 and tested for ap-
proximately 3 to 4 h periods per day and between 2 and 3 days per unit,
and separate linear regression models were then t for each unit.
2.2. Sampling location and protocol
Seoul, the capital of South Korea and the 5th most populous me-
tropolitan area in the world, experiences one of the highest air pollution
concentration levels among cities in developed countries. The city is
characterized by extremely high urban density, abundance of high-rise
buildings and apartments, and a mountainous terrain. This study was
carried out in the southern part of Seoul, south of the Han River, in
three districts: Dongjak-gu (area = 16.35 km
2
; population den-
sity = 24,000/km
2
), Seocho-gu (area = 47.14 km
2
; density = 8300/
km
2
), and Gwanak-gu (area = 29.57 km
2
; density = 18,000/km
2
). The
sampling campaign was conducted during an approximately three-week
period (July 23rd to August 11th) in the summer of 2015, on weekdays
only (12 days total, on non-rainy days) during three dierent time
periods: morning (810 am), evening (68 pm), and night (911 pm).
Ten volunteers sharing 7 AirBeam units were instructed to repeatedly
sample the ve routes without predetermined beginning/ending loca-
tions and times.
The ve routes (Fig. 1), four of which were based near or around
government-run regulatory monitors, were designed to span various
neighborhoods and to obtain spatial coverage of a wide range of types
of geographical variables, such as major roads and highways, green
spaces, and both low and high density residential areas. Route A is
located in Sillim; the neighborhood is largely residential with low-rise
buildings and houses. Route B is in Sadang, which is also mainly re-
sidential with a large park and three major roads that surround the
neighborhood. Route C is in Seocho, where the central bus transport
terminal for Seoul is located, as well as the main city highway, a riv-
erside park, and high-rise apartment buildings. Route D is located at
Isu, where major highways and high-density residential areas are pre-
sent. Route E is located near Seoul National University, a large uni-
versity campus located at the base of a mountain; the area is hilly and
tree-covered, and has a relatively low volume of trac, mainly con-
sisting of buses used for student transport. The lengths of the routes
ranged from 3.9 km to 4.9 km, and the total sum length of all the routes
was 21.5 km.
2.3. Data source for land use predictors
Geospatial data for the city of Seoul, South Korea were downloaded
from OpenStreetMap (OSM), a freely available, crowd-sourced and
user-generated online mapping system. The dataset included > 60
variables, grouped by the following categories: roads (cycleway,
footway, living, path, pedestrian, residential, primary, secondary, road,
secondary link, service, steps, subway, tertiary, trunk, trunk link, un-
classied); land use (cemetery, farm, footway, forest, garden, golf,
grass, hospital, island, park, parking, pitch, place of worship, play-
ground, residential, school, sports center, substation, university, wood);
buildings (apartments, cathedral, church, commercial, hospital, hotel,
house, public, residential, retail, school, university, identied/uni-
dentied), public amenities (re station, fuel station, hospital, library,
police, school, town hall); transportation points (bus stop, motorway
junction, station, subway entrance); and water areas and waterways
(stream, river, riverbank, water). Several variables in dierent cate-
gories that repeatedly describe the same land use morphology e.g.
university, which is counted as land use, buildings, and public ame-
nities were all initially included in the analysis. After removing the
subway variable (as it describes underground paths), there were 67
predictor variables available for analysis (Supplemental Table 1).
2.4. Data reduction
As the frequency of data collection was in 1-second intervals, the
data points were rst aggregated into 1-min averages to match the pDR-
1500 sampling frequency and to reduce data noise. Measurement points
with obvious GPS (e.g. located in middle of rivers) and sampling errors
(e.g. volunteer did not follow sampling route properly) were removed
by restricting data points to < 50 M away from the routes and also by
manually after visual inspection. We then employed a snapping
procedure to assign the collected data points to the nearest route seg-
ment on the basis of measured GPS coordinates to allow measurements
along the same segment to be analyzed as a group, as per previous
mobile LUR studies (Hankey and Marshall, 2015). Segments were rst
dened by length from a starting point along a route, and buers with
dierent radiuses were drawn around centroids of the route segments,
with geospatial data from OSM within the buers then extracted. Each
road segment was thereby associated with land use, built, and natural
environment variables, calculated as dierent OSM variables within the
buers of dierent sizes. We calculated road segments at 5 dierent
lengths (25 M, 50 M, 100 M, 150 M, 250 M) and 5 buer radiuses (50 M,
100 M, 150 M, 350 M, 500 M) in order to build the LUR models as well
as to assess how these parameters inuence the LUR model perfor-
mance.
2.5. Adjustment for background temporal trends
Previous mobile sampling investigations adjusted for potential
temporal bias through several approaches; for example, Tessum et al.
(2018) adjusted for between-day temporal trends by subtracting the
daily fth percentile from all measured concentration values on a given
day. Deville Cavellin et al. (2016) used linear and quadratic terms for
temperature as independent variables in the model as adjustment for
potential temporal variability. We modied an approach applied by
multiple studies (Larson et al., 2009;Dons et al., 2012;Clougherty
et al., 2013;Van den Bossche et al., 2015;Apte et al., 2017) that used
background concentration levels from a nearby regulatory monitor to
adjust for temporal trends and normalize measured values. Leveraging
the available information on background PM
2.5
concentrations from
multiple xed-site regulatory monitors nearby the sampling routes, we
adjusted each 1-min averaged measurements from AirBeams for each
day by applying a multiplicative hourly factor (dened as the ratio of
mean concentration level during the entire sampling period to corre-
sponding hour in which that measurement is taken) derived from the
nearby regulatory monitor. For route E, which was not designed around
a regulatory monitor, we used averaged values from the two nearby
monitors (approx. 24 km away) located by routes A and B. This re-
sulted in 6 factors per each sampling day for each of the 5 routes. Using
multiple nearby monitors, instead of a single monitor as done in past
studies, allowed for variable temporal adjustments across several lo-
cations. This approach minimizes the eect of day-to-day variations in
background air quality on the measurements, thereby decreasing the
amount of required sampling data (Van den Bossche et al., 2015).
Hourly measurements from regulatory monitors in Seoul revealed
considerable temporal variability during the study period, with hourly
PM
2.5
levels as low as 5 μg/m
3
and reaching 67 μg/m
3
during pollution
episodes (Fig. 2).
2.6. LUR model building
We rst tested the potential eects of spatial aggregation by dif-
ferent route segment lengths and buer sizes in the linear regression
model by including all available 67 variables into a linear regression
model, and we selected 100 m route segments to spatially aggregate the
collected data points based on the high adj-R
2
, resulting in 215 avail-
able segments for subsequent analyses. We then applied and compared
three statistical approaches for building the LUR model: linear
C.C. Lim, et al. Environment International 131 (2019) 105022
3
regression (LR), random forest (RF), and stacked ensemble (SE).
In the linear regression model, the GIS variables were retained for
multivariable models based on a distance-decay regression selection
strategy (ADDRESS) to screen and select informative candidate vari-
ables and corresponding buer size from all of the available potential
variables (Su et al., 2015). We then applied a supervised forward search
approach, adding the variables one at a time in the LR model and
keeping the variable only if it increased the R
2
of the model by 1.0%
and if all predictor variables have statistically signicant coecients
(p< 0.05) (Van den Bossche et al., 2018). We also applied the random
forest (RF) model after rst removing highly correlated variables (ab-
solute correlation > 0.8). Random forests, in brief, are an ensemble of
decision trees and each tree is constructed using the best split for each
node among a subset of predictors randomly chosen. Random search,
which randomly chooses combination of hyperparameters at every
iteration, was used to tune and optimize the model (Bergstra and
Bengio, 2012). Finally, we employed the stacked ensemble (SE) model,
a machine learning ensemble approach that involves training a learning
algorithm to combine the predictions of several other learning algo-
rithms; rst, all of the other algorithms are trained using the available
data, then a meta-classieralgorithm (chosen from the list of algo-
rithms) is trained to make a nal prediction combine all the predictions
of the other algorithms as additional inputs. We evaluated and selected
a diverse group of machine learning algorithms, including random
forest (rf), Bayesian generalized linear model (bayesglm), k-nearest
neighbors (knn), recursive partitioning and regression trees (rpart),
and partitioning using deletion, substitution, and addition moves
(partDSA).
We applied 10-fold cross validation (with 500 repeats) to calculate
mean CV-R
2
(cross-validation R
2
; 1-(mean square error/variance)) and
root mean square errors (RMSE; a measure of the dierences between
values predicted by a model and the values observed) for the three
methods to quantify their accuracy. We used packages ggplot2and
leaetfor visualization and caretfor statistical analyses in R (version
3.4.4).
3. Results
3.1. Adjustment for intra-instrument variability
We t univariate linear regression models for each of the deployed
Airbeam unit in order to adjust for intra-unit variability and to convert
particle counts to PM
2.5
mass concentrations. During the collocated
sessions with the DataRAM pDR-1500 in the CAP chamber, the PM
2.5
concentration (as measured by pDR-1500) ranged from 0 to 81 μg/m
3
.
The AirBeams revealed strong agreements with the pDR-1500 (adj-
R
2
= 0.950.98) and noticeable dierences in responses between the
individual units (Fig. 3). The regression models' intercepts, slopes, and
RMSE values varied across the units; detailed statistical summaries of
the models are presented in Table 1.
3.2. Mobile sampling summary statistics
The mobile sampling campaign yielded a total of 10,871 min of
data, of which after removing GPS and sampling errors, 10,177 min
(93.6%) of data remained, equaling > 169 h of total data across the 5
sampled routes (Table 2, Supplemental Tables 2 & 3). 1992 min (33.2 h)
of sampling data were collected at Route A; 2449 min (40.8h) at Route
B; 2313 min (38.6 h) at Route C; 1970 min (32.8 h) at Route D; and
1453 min (24.2 h) at Route E. Route D, which is located near major
roads and highways, had the highest concentration levels
(55.5 ± 27.7 μg/m
3
), while Route B (42.0 ± 24.2 μg/m
3
) and Route E
Fig. 2. Hourly (at 8 am, 9 am, 6 pm, 7 pm, 9 pm, 10 pm) PM
2.5
concentration levels during the sampling period (7/23/15 to 8/10/15) at the four regulatory
background monitors.
C.C. Lim, et al. Environment International 131 (2019) 105022
4
(48.4 ± 31.3 μg/m
3
) had the lowest concentration levels. Notable
dierences between morning, evening, and night were also observed
across the ve routes, especially for Route D, which had elevated levels
during morning (70.7 ± 25.5 μg/m
3
) compared to evening
(46.6 ± 28.3 μg/m
3
) and night (54.8 ± 24.1 μg/m
3
). The amount of
sampling data varied across the 215 segments, with a median of 44 min
per segment (minimum = 5; 25% percentile = 34; 75% percen-
tile = 55; maximum = 179). Summary statistics for minutes of sam-
pling per 100 m segment for each of the ve routes are visualized as
boxplots in Fig. 4.
3.3. Model results
The LUR models were sensitive to dierent segment lengths and
buer radiuses, with adj-R
2
generally increasing with larger buer ra-
diuses (Fig. 5), while 100 m to 150 m segments for spatial aggregation
performed the best. Fitting individual equations to account for intra-
instrument variability for each AirBeam unit generally improved the
accuracy of the constructed LUR models, with an increase in CV-R
2
values by ~0.100.15.
In constructing the LR model, we screened and removed several
point variables (e.g. re stations) that were not frequently present
across the sampling space but clustered near the pollution hotspots, as
these variables ended up having very strong inuences on the models.
The nal LR LUR model showed high goodness-of-t with a CV-R
2
of
0.63 and RMSE of 7.01, and the following variables were included in
the model: wood, secondary link, residential road, cathedral, station,
pitch, and apartments (Table 3). The machine learning approaches
explained a greater proportion of the variance of PM
2.5
concentrations
than the LR model. The random forest model identied mostly dierent
variables as important (wood, residential road, living street, school,
park, apartments, residential, building, tertiary, and service) and also
revealed better performance metrics compared to the LR model, with
higher mean CV-R
2
(0.73) and lower RMSE (6.20). The stacked en-
semble model with random forest as the meta-predictor algorithm
performed the best, and the SE model outperformed both LR and RF
models, with higher CV-R
2
(0.80) and lower RMSE (5.22). Individual R
2
values for the algorithms in the ensemble were 0.74 for random forest,
0.45 for partDSA, 0.50 for rpart, 0.70 for bayesglm, and 0.69 for knn.
Adjusting for background temporal trends changed the overall
morning average concentration levels from 49.4 to 59.2 μg/m
3
; evening
from 46.4 to 45.7 μg/m
3
; and night from 51.5 to 47.3 μg/m
3
. The
changes in concentration levels after temporal adjustment during the
three sampling periods diered signicantly across the routes
(Supplemental Table 4). This adjustment also improved the CV-R
2
for
the three approaches, as not doing so resulted in lower CV-R
2
values of
Fig. 3. DataRam pDR-1500 (mass; μg/m
3
) vs. 1-minute averaged AirBeam (hundreds of particles per cubic feet; hppcf) measurements in the concentrated air particle
chamber (CAP).
Table 1
Linear regression equations to convert particle counts to mass for each of the AirBeam unit.
Unit name Intercepts (standard error) Slope (standard error) RMSE Adj-R
2
99B 10.72 (0.29) 0.002616 (1.77 × 105) 23.48 0.95
B7E 11.69 (0.44) 0.001974 (1.41 × 105) 14.84 0.98
B99 13.16 (0.42) 0.002102 (1.47 × 105) 15.50 0.98
C54 6.68 (0.18) 0.001905 (1.35 × 105) 8.92 0.96
C58 4.91 (0.16) 0.002000 (1.05 × 105) 9.16 0.98
C72 9.94 (0.34) 0.002537 (3.17 × 105) 10.50 0.95
D46 11.26 (0.37) 0.002049 (1.64 × 105) 11.39 0.98
C.C. Lim, et al. Environment International 131 (2019) 105022
5
0.54, 0.65, and 0.71 for the LR, RF, and SE models, respectively. The
constructed LUR models were used to create prediction maps of street-
level PM
2.5
concentration levels in Seoul nearby the sampled locations,
which revealed several hotspotswith elevated PM
2.5
levels (Fig. 6).
The prediction maps revealed similar spatial patterns between the three
modeling approaches with emphasis on similar locations as hotspots,
especially at locations with major roads/highways and high population
density. Conversely, the lowest concentrations were predicted at
greenspace locations, such as parks and mountains. The three ap-
proaches resulted in relatively similar mean predicted values across the
exposure surface, at 47.31, 48.86, and 49.43 μg/m
3
, for LR, RF, and SE,
respectively. However, the LR prediction map predicted lower values
than machine learning approaches at the extremes (range:
26.3668.96 μg/m
3
), while maps for RF (34.9771.43 μg/m
3
) and
especially SE (33.5083.19 μg/m
3
) models resulted in higher predicted
values.
4. Discussion
In this study, we conducted a mobile sampling campaign in Seoul,
South Korea deploying low-cost smartphone-based air quality sensors
and utilized the collected data to construct LUR models employing
three statistical approaches. The strengths of the resulting R
2
values
were comparable to recent, similar studies across multiple locations
around the world that utilized more advanced equipment. Our study is
unique for developing LUR models using multiple low-cost (< $300),
mobile sensors; priced at $250 per unit, AirBeams are order(s) of
magnitude less expensive than the commercially available portable (in
the thousands; the pDR-1500 used in this study cost ~$5700) and
federal standard (in the tens of thousands) instruments. AirBeam and its
operating platform, Aircasting, is also notable for being primarily de-
veloped for citizen science whereby users can upload their measure-
ments to share with the public, as well as for being open-sourced, al-
lowing developers and researchers to program and customize the
instruments and the smartphone app according to their needs and re-
quirements. Many similarly priced ($200$300) sensors have entered
the market since the present study was conducted, underlining the
public's increasing interest in the capability to measure personalized
real-time exposure data (Caplin et al., 2019). Through deployment of
such low-cost sensors, we were able to characterize the spatial varia-
bility of street-level PM
2.5
in Seoul, the main source of which is likely to
be from trac given the near-road sampling approach applied in this
study. Past source apportionment studies also identied the primary
source of PM
2.5
in Seoul as motor vehicle emissions and road dust (Heo
et al., 2008;Ryou et al., 2018).
Recent mobile sampling approaches for LUR model building have
employed a variety of study designs and instruments. For example,
Hankey and Marshall (2015) collected over 85 h of data on a bicycle-
based sampling platform in Minneapolis, MN and constructed LUR
models for particle size, black carbon, and PM
2.5
with modest goodness-
of-t (adj-R
2
of ~0.5 for particle number and ~0.4 for PM
2.5
). Apte
et al. (2017) analyzed data collected from a Google Street View map-
ping vehicle equipped with air quality sensors that repeatedly sampled
every street in a 30-km
2
area of Oakland, CA, to model and reveal urban
air pollution patterns at 45 orders of magnitude greater spatial pre-
cision than possible with current central-site ambient monitoring. The
OpenSenseproject in Zurich, Switzerland (Hasenfratz et al., 2015)
utilized mobile sensor nodes installed on top of public transport tram
vehicles in the city to create high-resolution pollution prediction maps
for ultrane particles and particle counts. Vehicle-based mobile mea-
surements were also applied to create LUR models to estimate the
spatial variation of street-level PM
2.5
and PM
10
in the downtown area of
Hong Kong (Shi et al., 2016), and integration of urban/building mor-
phology as independent variables increased the adj-R
2
of the LUR
model, suggesting that incorporating detailed 3D characteristics of the
land use can improve the predictive power of such models.
Table 2
Summary statistics for measurements across the ve routes.
Route ID Route name AirBeam units
deployed
Total Morning Evening Night
Minutes
sampled
Average (Std. Dev),
μg/m
3
IQR Minutes
sampled
Average (Std. Dev),
μg/m
3
IQR Minutes
sampled
Average (Std. Dev),
μg/m
3
IQR Minutes
sampled
Average (Std. Dev),
μg/m
3
IQR
A Sillim B7E, B99, C54,
D46
1992 51.3 (32.6) 56.3 901 43.3 (32.8) 40.4 462 58.6 (27.9) 49.6 629 57.4 (33.0) 47.4
B Sadang 99B, B7E, C58,
D46
2449 42.0 (24.2) 36.5 744 40.5 (24.8) 34.2 858 38.6 (21.2) 38.1 847 46.7 (25.8) 30.3
C Seocho 99B, C58, C72 2313 49.9 (31.4) 50.3 574 47.9 (35.2) 69.2 892 46.7 (29.4) 43.3 847 54.5 (30.1) 38.1
D Isu 99B, C72, D46 1970 55.5 (27.7) 33.9 477 70.7 (25.5) 39.6 755 46.6 (28.3) 43.3 738 54.8 (24.1) 26.1
E Seoul National
University
B7E, B99, C54,
D46
1453 48.4 (31.3) 48.4 396 56.8 (39.4) 74.3 528 47.4 (26.5) 23.2 529 43.0 (27.4) 54.0
C.C. Lim, et al. Environment International 131 (2019) 105022
6
Our study and sampling design highlight the potential advantages of
mobile sampling with low-cost and portable air quality sensors in
constructing the LUR models. The aforementioned studies were largely
based on sampling campaigns conducted on modes of transport (e.g.
cars) visiting a single location at a given time, which may potentially
result in a low number of visits per location. The results from this and
past studies found that mobile LUR models are highly sensitive to
parameters such as the number of route segments, radiuses of buers,
and number of measurements per segment (Minet et al., 2017).
Hatzopoulou et al. (2017) evaluated the inuence of the number of
sampling locations and durations of sampling on LUR model perfor-
mance, noting that mobile sampling campaigns can be inecient due to
low sampling frequency at a large number of locations, and that spatial
variability may be more important than the numbers of locations when
designing the sampling routes. The authors also found that the LUR
models became relatively robust after 150200 segments and 1012
visits per segment. In the present study, walking at a slow speed, in-
stead of sampling on mechanical modes of transportation, resulted in
each route generally having a high number of data points
(median = 44) per segment. This approach also allows for assessing
personal-level exposure in urban areas where there are a larger number
of people on the streets than in cars. The disadvantage of shorter dis-
tances being covered when sampling on foot was oset by the low cost
and portability of AirBeams, which allowed for several units that could
be deployed simultaneously across multiple locations at a given time
and thereby maximize spatial coverage, as opposed to the majority of
past mobile sampling studies that were carried out on a single platform.
Simultaneous measurements within a structured sampling design could
Fig. 4. Boxplot demonstrating distribution of minutes of sampling per 100 m segment for each sampling route.
Fig. 5. Adjusted R
2
of LR LUR models (including all available 67 predictor variables) for mass, by segment radius and buer sizes.
C.C. Lim, et al. Environment International 131 (2019) 105022
7
decrease the amount of collected data (and manpower) required to
construct robust models, whereas participatory sensing where sampling
is done opportunisticallycould lead to unstructured data that is more
dicult to interpret (Van den Bossche et al., 2016). Furthermore, Air-
Beam's ease of operation meant that minimal training (a few minutes at
most) was required prior to eld deployment, resulting in a relatively
large volume of data being generated within the short sampling cam-
paign period during this study.
This study leveraged OpenStreetMap (OSM), an openly available
and crowd-sourced GIS dataset, which provided a rich and compre-
hensive source of geospatial data for a wide range of LUR variables.
OSM and other open datasources oer underexplored but valuable
information for data-driven methods to predict air pollution levels
(VoPham et al., 2018). Notably, the OSM GIS variables were highly
developed for Seoul and provided detailed and dierentiated data for
the numerous types of roads and buildings, which are the land use
categories that usually provide the highest predictive power for air
pollution LUR models. Another advantageous aspect of crowd-sourced
data is that it is continually updated; for example, using an earlier
download of OSM from September 2015 (versus January 2018 in this
analysis) with less developed characterization of Seoul resulted inLUR
models with lower CV-R
2
, suggesting that in locations with lacking
geospatial data, crowd-sourced eorts to generate the relevant GIS
variables could be carried out in concert with the air pollution sampling
campaign to strengthen the predictive capability of LUR models. De-
spite recent endeavors to democratize data by agencies and organiza-
tions throughout the world as part of the open datamovement, many
detailed GIS les remain proprietary and thereby cost-prohibitive, and
freely available data like OSM oer an alternative and important source
of detailed spatial data for researchers and communities.
Machine learning methods oered improved goodness-of-t com-
pared to traditional stepwise linear regression in constructing the LUR
models. Prior work on machine learning applications in both national
(Hu et al., 2017;Di et al., 2016) and local-level (Adams and
Kanaroglou, 2016;Weichenthal et al., 2016;Brokamp et al., 2017)
predictions of air pollution concentration levels highlight the ad-
vantages associated with the approach, including higher accuracy and
identication of important variables. A recent example further under-
lines additional potential benets; a study in Los Angeles, USA used a
multi-step and exible spatial data mining approach using machine
learning to select for most important OSM geographic features and
predict PM
2.5
concentrations, removing the need for a priori selection of
predictors for exposure modeling (Lin et al., 2017). Similarly in our
analysis, applying the traditional step-wise linear regression LUR ap-
proach with the highly correlated OSM dataset, which also contained
several highly inuential variables, required manual screening and re-
moval of predictor variables prior to input and during the model
building process. Notably, the stacked ensemble model combining
multiple machine learning algorithms outperformed both LR and RF in
this study. In recent years ensemble machine learning methods have
emerged as an important tool for modeling complex relationships and
have been applied successfully in various research areas (Yang et al.,
2010). Application of ensembles have been generally limited in air
pollution exposure assessment and modeling eorts to date, and the
results here suggest that ensemble-based approaches could further en-
hance the predictive performance of LUR models.
We note several potential weaknesses that are present in this study.
As we evaluated the AirBeam units in a carefully controlled experi-
mental chamber drawing in air from a forested and rural area (Tuxedo,
New York), the particle composition and the environmental conditions
(e.g., humidity and temperature) encountered during the experiment
are likely to be signicantly dierent from the heavily urban location
where this study was carried out. Although the potential impacts of
these factors were not assessed in this study, previous performance
evaluations of AirBeams in various laboratory and eld settings oer
insight. The initial manufacturer calibration was conducted in a simi-
larly urban setting (New York City), which revealed high correlations
with both gravimetric sampling and pDR-1500 (takingspace.org, 2014).
Comparison against federal equivalent method monitors showed high
agreements with GRIMM (R
2
~0.60.8) (Mukherjee et al., 2017;
SCAQMD, 2017;Feinberg et al., 2018), but mixed results were observed
with BAM (R
2
~0.20.7) (Jiao et al., 2016;SCAQMD, 2017). A study of
sensor responses to Arizona road dust, salt, and welding fumes (Sousan
et al., 2017) demonstrated that particle types had signicant impacts on
AirBeam (and other low-cost sensors) measurements. Relative humidity
(RH) levels also inuenced the measurements; a laboratory evaluation
found that bias was observed when both RH (> 65%) levels and con-
centration levels (> 100 μg/m
3
) were elevated (SCAQMD, 2017), while
another study (Feinberg et al., 2018) found that the particle counts
measurements were aected by higher humidity levels in a eld setting.
Highly humid summers in Korea would likely inuence the absolute
measurement values, but the potential impact on prediction model
performance is likely to be minimal as the spatial variability of hu-
midity levels is relatively uniform across a city. Nevertheless, these
ndings emphasize the need to consider the potential inuence of en-
vironmental factors in sensor deployments, and performance evalua-
tions at the study location is suggested for similar studies applying low-
cost sensors. In addition, the particle concentration levels encountered
during sampling in Seoul were higher than the range used for con-
structing calibration equations for the AirBeam units, which may ignore
Table 3
Selected LUR model predictor variables in the LR and RF models and associated statistics.
Variable name Variable type Buer length Linear regression Random forest
a
ΒStd. error P-value Importance
Intercept 50.02 1.21 < 0.001
Wood Area 500 m 3.80 × 10
5
4.25 × 10
6
< 0.001 14.45
Residential Road Line 500 m 2.59 × 10
5
5.88 × 10
6
< 0.001 13.10
Secondary Link Line 500 m 6.88 × 10
3
1.00 × 10
3
< 0.001
Cathedral Point 500 m 2.47 × 10
3
1.03 × 10
3
0.02
Station Point 500 m 3.75 1.02 < 0.001
Pitch Area 350 m 1.88 × 10
4
4.28 × 10
5
< 0.001
Apartments Point 500 m 7.70 × 10
5
4.02 × 10
5
0.05 10.21
School Area 500 m 10.73
Living Street Line 500 m 10.85
Park Area 500 m 10.31
Residential Area 500 m 9.93
Building (Unclassied) Area 500 m 9.72
Tertiary Line 350 m 9.70
Service Line 350 m 8.97
a
Top ten variables by variable importance are shown in the table.
C.C. Lim, et al. Environment International 131 (2019) 105022
8
the potential nonlinearity of sensor responses. We also did not check for
potential sensor drift a common issue for low-cost air quality sensors
during and after the mobile sampling, although this is unlikely due to
the relatively short sampling period. These issues may have contributed
to predicted values that were signicantly higher than observed values
from nearby xed-site monitors, although it is also possible that such
dierences are due to the fact that xed-site monitors are often located
well above ground and tend to underestimate personal exposures when
walking near trac(Deville Cavellin et al., 2016). Another potential
weakness is that the OSM data quality and density could be potentially
uneven across locations, as some areas could be characterized in more
detail than others. For example, in some of the sampled areas in this
study, several of the houses in residential areas were not captured in the
OSM le and thereby could have inuenced model quality; however, as
OSM data coverage and quality continues to improve this should be-
come less of an issue over time.
5. Conclusions
Low-cost sensors represent an opportunity to bridge the data gap,
thereby promoting public discourse, inuencing air pollution regula-
tions, and protecting public health (Amegah, 2018). This study high-
lights the advantages and potential of applying data collected from
mobile sampling with multiple low-cost sensors to model and map
street-level air pollution levels in urban locations, especially the cap-
ability to generate a large volume of sampling data with ease. The
predictive power of models developed here, despite deploying only a
limited number of signicantly less expensive, consumer-based air
quality sensors, were comparable to the past mobile sampling LUR
studies, especially after adjusting for intra-instrument variability and
temporal trends. To minimize the potential inuence of local particle
characteristics and environmental conditions, calibration with collo-
cated reference monitors at the sampling location is suggested for fu-
ture projects using similar low-cost sensors, as well as to convert par-
ticle counts to mass concentration, a unit of measurement that is more
readily transferable for policy-relevant metrics. Initial calibrations
should also carefully evaluate and adjust for the potential eects of
relative humidity levels, which can have signicant inuences on the
low-cost sensors. Overall, the ndings here suggest that similar mobile
sampling designs using low-cost sensors and open datasources could
be applied to generate a large volume of data and construct LUR models
and maps with ne spatial granularity, and that machine learning
methods could further improve model performance. Our study design
and approach may be especially suitable for citizen science and com-
munity-based endeavors, or in locations without preexisting air mon-
itoring networks, such as developing countries.
Acknowledgement
This study was funded by the Basic Science Research Program
through the National Research Foundation of Korea (NRF) funded by
the South Korea Ministry of Education (2018R1A2B6004608), the NSF
East Asia and Pacic Summer Institute (EAPSI) Fellowship, the Air &
Waste Management Air Pollution Education and Research Grant
(APERG), the EPA STAR Graduate Fellowship, and by a grant from the
National Institutes of Environmental Health Sciences Center (ES00260).
This publication was developed under Assistance Agreement No.
FP917825 awarded by the U.S. Environmental Protection Agency to
Chris C. Lim. It has not been formally reviewed by EPA. The views
expressed in this document are solely those of the authors and do not
necessarily reect those of the Agency. EPA does not endorse any
products or commercial services mentioned in this publication.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://
Fig. 6. PM
2.5
prediction maps nearby sampled areas constructed applying (a)
linear regression, (b) random forest, and (c) stacked ensemble approaches.
C.C. Lim, et al. Environment International 131 (2019) 105022
9
doi.org/10.1016/j.envint.2019.105022.
References
Adams, M.D., Kanaroglou, P.S., 2016. Mapping real-time air pollution health risk for
environmental management: combining mobile and stationary air pollution mon-
itoring with neural network models. J. Environ. Manag. 168, 133141.
Amegah, A.K., 2018. Proliferation of Low-Cost Sensors. What Prospects for Air Pollution
Epidemiologic Research in Sub-Saharan Africa? vol. 241. pp. 11321137.
Apte, J.S., Messier, K.P., Gani, S., Brauer, M., Kirchstetter, T.W., Lunden, M.M., ...
Hamburg, S.P., 2017. High-resolution air pollution mapping with Google street view
cars: exploiting big data. Environmental Science & Technology 69997008.
Bergstra, James, Bengio, Yoshua, 2012. Random search for hyper-parameter optimiza-
tion. J. Mach. Learn. Res. 13.Feb, 281305.
Borghi, F., Spinazz, A., Rovelli, S., Campagnolo, D., Buono, L. Del, Cattaneo, A., &
Cavallo, D. M. (2017). Miniaturized monitors for assessment of exposure to air pol-
lutants: a review.
Brokamp, C., Jandarov, R., Rao, M.B., LeMasters, G., Ryan, P., 2017. Exposure assessment
models for elemental components of particulate matter in an urban environment: a
comparison of regression and random forest approaches. Atmos. Environ. 151, 111.
Caplin, A., Ghandehari, M., Lim, C., Glimcher, P., Thurston, G., 2019. Advancing en-
vironmental exposure assessment science to benet society. Nat. Commun. 10 (1),
1236.
Castell, N., Dauge, F.R., Schneider, P., Vogt, M., Lerner, U., Fishbain, B., Bartonova, A.,
2017. Can commercial low-cost sensor platforms contribute to air quality monitoring
and exposure estimates? Environ. Int. 99, 293302.
Catlett, C.E., Beckman, P.H., Sankaran, R., Galvin, K.K., 2017. Array of things: a scientic
research instrument in the public way: Platform design and early lessons learned. In:
Proceedings of the 2nd International Workshop on Science of Smart City Operations
and Platforms Engineering. ACM, pp. 2633 April.
Clougherty, J.E., Kheirbek, I., Eisl, H.M., Ross, Z., Pezeshki, G., Gorczynski, J.E., Johnson,
S., Markowitz, S., Kass, D., Matte, T., 2013. Intra-urban spatial variability in win-
tertime street-level concentrations of multiple combustion- related air pollutants: the
New York City Community Air Survey (NYCCAS). J. Expo. Sci. Environ. Epidemiol.
23, 232e240.
Deville Cavellin, L., Weichenthal, S., Tack, R., Ragettli, M.S., Smargiassi, A., Hatzopoulou,
M., 2016. Investigating the use of portable air pollution sensors to capture the spatial
variability of trac-related air pollution. Environmental Science & Technology 50
(1), 313320.
Di, Q., Kloog, I., Koutrakis, P., Lyapustin, A., Wang, Y., Schwartz, J., 2016. Assessing PM
2.5 Exposures with High Spatiotemporal Resolution across the Continental United
States. Environ Sci Technol. 50 (9), 214712.
Dons, E., Int Panis, L., Van Poppel, M., Theunis, J., Wets, G., 2012. Personal exposure to
Black Carbon in transport microenvironments. Atmos. Environ. 55, 392398.
English, P.B., Olmedo, L., Bejarano, E., Lugo, H., Murillo, E., Seto, E., Wong, M., King, G.,
Wilkie, A., Meltzer, D., Carvlin, G., 2017. The Imperial County Community Air
Monitoring Network: a model for community-based environmental monitoring for
public health action. Environ. Health Perspect. 125 (7).
Feinberg, S., et al., 2018. Long-term evaluation of air sensor technology under ambient
conditions in Denver, Colorado. Atmos. Meas. Tech. 11, 46054615.
Gao, M., Cao, J., Seto, E., 2015. A distributed network of low-cost continuous reading
sensors to measure spatiotemporal variations of PM
2.5
in Xi'an, China. Environ.
Pollut. 199, 5665.
Hankey, S., Marshall, JD., 2015. Land Use Regression Models of On-Road Particulate Air
Pollution (Particle Number, Black Carbon, PM 2.5 , Particle Size) Using Mobile
Monitoring. Environ Sci Technol. 49 (15), 2029194.
Hasenfratz, D., et al., 2015. Deriving high-resolution urban air pollution maps using
mobile sensor nodes. Pervasive Mob. Comput. 16, 268285.
Hatzopoulou, M., Valois, M.F., Levy, I., Mihele, C., Lu, G., Bagg, S., ... Brook, J., 2017.
Robustness of land-use regression models developed from mobile air pollutant
measurements. Environ. Sci. Technol. 51 (7), 39383947.
Heo, J.B., Hopke, P.K., Yi, S.M., 2008. Source apportionment of PM
2.5
in Seoul.
Korea.Atmos. Chem. Phys. Discuss. 8, 20427e2046.
Hu, X., Belle, J. H., Meng, X., Wildani, A., Waller, L. A., & Strickland, M. J. (2017).
Estimating PM
2.5
concentrations in the conterminous United States using the random
forest approach.
Jerrett, M., et al., 2017. Validating novel air pollution sensors to improve exposure es-
timates for epidemiological analyses and citizen science. Environ. Res. 158, 286294.
Jiao, W., et al., 2016. Community Air Sensor Network (CAIRSENSE) project: evaluation of
low-cost sensor performance in a suburban environment in the southeastern United
States. Atmos. Meas. Tech. Discuss. 124.
Kelly, K.E., et al., 2017. Ambient and laboratory evaluation of a low-cost particulate
matter sensor. Environ. Pollut. 221, 491500. https://doi.org/10.1016/j.envpol.
2016.12.039. Feb.
Kumar, P., Morawska, L., Martani, C., Biskos, G., Neophytou, M., Di Sabatino, S., Bell, M.,
Norford, L., Britter, R., 2015. The rise of low-cost sensing for managing air pollution
in cities. Environ. Int. 75, 199205.
Larson, T., Henderson, S.B., Brauer, M., 2009. Mobile Monitoring of Particle Light
Absorption Coecient in an Urban Area as a Basis for Land Use Regression. Environ
Sci Technol. 43 (13), 84672.
Levy Zamora, Misti, et al., 2019. Field and laboratory evaluations of the low-cost plan-
tower particulate matter sensor. Environ. Sci. Technol. 53 (2), 838849 American
Chemical Society.
Lin, Y., Chiang, Y.-Y., Pan, F., Stripelis, D., Ambite, J.L., Eckel, S.P., Habre, R., 2017.
Mining public datasets for modeling intra-city PM
2.5
concentrations at a ne spatial
resolution. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on
Advances in Geographic Information Systems. ACM, Los Angeles area, CA, pp. 110.
Maciejczyk, P., Zhong, M., Li, Q., Xiong, J., Nadziejko, C., Chen, L.C., 2005. Eects of
subchronic exposures to concentrated ambient particles (CAPs) in mice: II. The design
of a CAPs exposure system for biometric telemetry monitoring. Inhal. Toxicol. 17
(45), 189197.
McKercher, G.R., Salmond, J.A., Vanos, J.K., 2017. Characteristics and applications of
small, portable gaseous air pollution monitors. Environ. Pollut. https://doi.org/10.
1016/j.envpol.2016.12.045.
Minet, L., Gehr, R., Hatzopoulou, M., 2017. Capturing the sensitivity of land-use re-
gression models to short-term mobile monitoring campaigns using air pollution
micro-sensors. Environ. Pollut. 230, 280290.
Morawska, L., et al., 2018. Applications of low-cost sensing technologies for air quality
monitoring and exposure assessment: how far have they gone? Environ. Int. 116,
286299.
Mukherjee, Anondo, Stanton, Levi, Graham, Ashley, Roberts, Paul, August 5, 2017.
Assessing the utility of low-cost particulate matter sensors over a 12-week period in
the Cuyama Valley of California. Sensors 17 (8), 1805.
Ryou, H., Heo, J., Kim, S.Y., 2018. Source apportionment of PM10 and PM
2.5
air pollu-
tion, and possible impacts of study characteristics in South Korea. Environ. Pollut.
240, 963972.
Schneider, P., Castell, N., Vogt, M., Dauge, F.R., Lahoz, W.A., Bartonova, A., 2017.
Mapping urban air quality in near real-time using observations from low-cost sensors
and model information. Environ. Int. 106 (December 2016), 234247.
Shi, Y., Lau, K.K.L., Ng, E., 2016. Developing street-level PM
2.5
and PM
10
land use re-
gression models in high-density Hong Kong with urban morphological factors.
Environ. Sci. Technol. 50 (15), 81788187.
Sousan, S., Koehler, K., Hallett, L., Peters, T.M., 2017. Evaluation of consumer monitors to
measure particulate matter. J. Aerosol Sci. 107, 123133.
South Coast AQMD, 2017. AirBeam summary report [online]. Available at. http://www.
aqmd.gov/aq-spec/sensordetail/airbeam, Accessed date: June 2019.
Su, J.G., Hopke, P.K., Tian, Y., Baldwin, N., Thurston, S.W., Evans, K., Rich, D.Q., 2015.
Modeling particulate matter concentrations measured through mobile monitoring in
a deletion/substitution/addition approach. Atmos. Environ. 122, 477483.
Takingspace.org, 2014. AirBeam Technical Specications, Operation & Performance:
Taking Space. [online] Available at. http://www.takingspace.org/airbeam-
technical-specications-operation-performance/, Accessed date: June 2019.
Tessum, M.W., et al., 2018. Mobile and xed-site measurements to identify spatial dis-
tributions of trac-related pollution sources in Los Angeles. Environ. Sci. Technol.
https://doi.org/10.1021/acs.est.7b04889. acs.est.7b04889.
Van den Bossche, J., Peters, J., Verwaeren, J., Botteldooren, D., Theunis, J., De Baets, B.,
2015. Mobile monitoring for mapping spatial variation in urban air quality: devel-
opment and validation of a methodology based on an extensive dataset. Atmos.
Environ. 105, 148161.
Van den Bossche, Joris, Theunis, Jan, Elen, Bart, Peters, Jan, Botteldooren, Dick, De
Baets, Bernard, 2016. Opportunistic mobile air pollution monitoring: a case study
with city wardens in Antwerp. Atmos. Environ. 141, 408421.
Van den Bossche, J., De Baets, B., Verwaeren, J., Botteldooren, D., Theunis, J., 2018.
Development and evaluation of land use regression models for black carbon based on
bicycle and pedestrian measurements in the urban environment. Environ. Model.
Softw. 99, 5869.
VoPham, T., Hart, J.E., Laden, F., Chiang, Y.-Y., 2018. Emerging trends in geospatial
articial intelligence (geoAI): potential applications for environmental epidemiology.
Environmental Health: A Global Access Science Source 17 (1), 16.
Weichenthal, S., Ryswyk, K. Van, Goldstein, A., Bagg, S., Shekkarizfard, M., Hatzopoulou,
M., 2016. A land use regression model for ambient ultrane particles in Montreal,
Canada: a comparison of linear regression and a machine learning approach. Environ.
Res. 146, 6572.
World Health Organization, May 2, 2018. 9 out of 10 People Worldwide Breathe Polluted
Air, but More Countries Are Taking Action. Available at. https://www.who.int/
news-room/detail/02-05-2018-9-out-of-10-people-worldwide-breathe-polluted-air-
but-more-countries-are-taking-action, Accessed date: 16 September 2018.
Yang, P., Yang, Y.H., Zhou, B.B., Zomaya, A.Y., 2010. A review of ensemble methods in
bioinformatics. Curr. Bioinforma. 5, 296308.
Zikova, N., et al., 2017. Estimating hourly concentrations of PM
2.5
across a metropolitan
area using low-cost particle monitors. Sensors 17, 1922.
C.C. Lim, et al. Environment International 131 (2019) 105022
10
... For conducting the relationship between air pollution, human activities, and health at the urban scale in China, previous studies have shown that the fixed air pollution monitoring stations are spatially uneven with high construction operation and maintenance costs (Li et al., 2021;Zhang et al., 2019a), only represent air pollution in a small surrounding area and cannot meet the requirement of high spatial-temporal resolution studies to reflect the real exposure of residents (Dias and Tchepel, 2018). With the development of production and technology, low-cost mobile monitoring devices have been proven to improve the spatial-temporal resolution of air pollution (Lim et al., 2019). Meanwhile, the study indicates that there are spatial differences in the distribution of air pollution on weekdays and weekends (Requia et al., 2018;Wong et al., 2009). ...
... Exposure to PM may cause acute or chronic effects on the human body and cause adverse effects on economic development (Ma et al., 2011). PM monitoring based on high spatial and temporal resolution is helpful to understand the complex changes caused by the complex landscape structure in cities (Lim et al., 2019). This study indicated that as the PM diameter decreases, model accuracy increases on weekdays and decreases on weekends. ...
... Regardless of a strictly laboratory calibration approach or a combined field and laboratory calibration approach, linear regression is most commonly used to create the calibration model despite known non-linear relationships between PM 2.5 and meteorological variables [13,15]. While non-linear models have been developed for sensor calibration, these models still only produce point predictions without estimates of variance [17][18][19][20]. However, given that EPA and NIOSH both recommend probabilistic exposure and risk assessments, more accurate assessments are possible if point and/or uniform variance predictions are replaced with predictions from models that also describe variance, particularly on a per-prediction level [21][22][23]. ...
... Large-scale approaches often utilize satellite data, country scale sensor networks, land use data, topography, etc. and have been built using random forests, GBDTs, and neural nets [31][32][33][34]. On a smaller scale more analogous to SEARCH, personal monitoring device networks, mobile sampling networks, and cityscale sensor networks have also demonstrated the utility of machine learning regression techniques to optimize predictions and take into account environmental factors [17][18][19][20]. However, while prediction of PM 2.5 using sensor measurements and additional data has been conducted by numerous studies, this study fills a unique position by providing a methodology for both increasing the utility of low-cost sensor networks by creating a probabilistic output useful for exposure assessments, a state-ofthe-art model that improves on existing approaches, and also removes the need for lab-calibrated data, a time intensive process for mitigating environmental biases for PM 2.5 data. ...
Article
Full-text available
Background Low-cost sensor networks for monitoring air pollution are an effective tool for expanding spatial resolution beyond the capabilities of existing state and federal reference monitoring stations. However, low-cost sensor data commonly exhibit non-linear biases with respect to environmental conditions that cannot be captured by linear models, therefore requiring extensive lab calibration. Further, these calibration models traditionally produce point estimates or uniform variance predictions which limits their downstream in exposure assessment. Objective Build direct field-calibration models using probabilistic gradient boosted decision trees (GBDT) that eliminate the need for resource-intensive lab calibration and that can be used to conduct probabilistic exposure assessments on the neighborhood level. Methods Using data from Plantower A003 particulate matter (PM) sensors deployed in Baltimore, MD from November 2018 through November 2019, a fully probabilistic NGBoost GBDT was trained on raw data from sensors co-located with a federal reference monitoring station and compared against linear regression trained on lab calibrated sensor data. The NGBoost predictions were then used in a Monte Carlo interpolation process to generate high spatial resolution probabilistic exposure gradients across Baltimore. Results We demonstrate that direct field-calibration of the raw PM2.5 sensor data using a probabilistic GBDT has improved point and distribution accuracies compared to the linear model, particularly at reference measurements exceeding 25 μg/m³, and also on monitors not included in the training set. Significance We provide a framework for utilizing the GBDT to conduct probabilistic spatial assessments of human exposure with inverse distance weighting that predicts the probability of a given location exceeding an exposure threshold and provides percentiles of exposure. These probabilistic spatial exposure assessments can be scaled by time and space with minimal modifications. Here, we used the probabilistic exposure assessment methodology to create high quality spatial-temporal PM2.5 maps on the neighborhood-scale in Baltimore, MD. Impact statement We demonstrate how the use of open-source probabilistic machine learning models for in-place sensor calibration outperforms traditional linear models and does not require an initial laboratory calibration step. Further, these probabilistic models can create uniquely probabilistic spatial exposure assessments following a Monte Carlo interpolation process. Graphical abstract
... Without optimal sensing granularity that results in considerable spatial data gaps, it is impossible to accurately map the urban air quality-a challenge that can be overcome by combining the crowdsourced measurements with model data with comprehensive spatial coverage . Other deployment methods include IoT sensor installation on mobile sensing platforms such as bicycles, cars, buses, and trams that can improve spatial coverage (Devarakonda et al., 2013;Mead et al., 2013;Castell et al., 2015;Hasenfratz et al., 2015;Lim et al., 2019). While economically more appealing due to the significantly reduced number of sensors, the mobility of the platform in conjunction with a prolonged response time of typical sensors can cause large signal distortion-an issue that can be overcome through the application of active sampling that employs pumps/fans as actuators (Arfire et al., 2016). ...
Article
Full-text available
Cities today encounter significant challenges pertaining to urbanization and population growth, resource availability, and climate change. Concurrently, unparalleled datasets are generated through Internet of Things (IoT) sensing implemented at urban, building, and personal scales that serve as a potential tool for understanding and overcoming these issues. Focusing on air pollution and thermal exposure challenges in cities, we reviewed and summarized the literature on IoT environmental sensing on urban, building, and human scales, presenting the first integrated assessment of IoT solutions from the data convergence perspective on all three scales. We identified that there is a lack of guidance on what to measure, where to measure, how frequently to measure, and standards for the acceptable measurement quality on all scales of application. The current literature review identified a significant disconnect between applications on each scale. Currently, the research primarily considers urban, building, and personal scale in isolation, leading to significant data underutilization. We addressed the scientific and technological challenges and opportunities related to data convergence across scales and detailed future directions of IoT sensing along with short- and long-term research and engineering needs. IoT application on a personal scale and integration of information on all scales opens up the possibility of developing personal thermal comfort and exposure models. The development of personal models is a vital promising area that offers significant advancements in understanding the relationship between environment and people that requires significant further research.
... Previous studies have reported the implementation of smart sensors in a mobile air pollution framework. A vehicular wireless sensor network architecture was implemented at the National Chiao-Tung University in Taiwan (Hu et al. 2009), and researchers in Seoul, South Korea, mapped urban air quality using mobile sampling with low-cost sensors and machine learning (Lim et al. 2019). The public buses in Sharjah city, United Arab Emirates, were also used to test an air pollution sensing network (Al-Ali et al. 2010) while in New Jersey and New York, the United States, a finegrained vehicular-based mobile air pollution measuring technique using solid-state carbon monoxide (CO) sensors and optical analysers (PM) was used to measure 'on road' pollution (Devarakonda et al. 2013). ...
Article
Motor vehicle emissions are the primary air pollution source in cities worldwide. Changes in traffic flow in a city can drastically change overall levels of air pollution. The level of air pollution may vary significantly in some street segments compared to others, and a small number of stationary ambient air pollution monitors may not capture this variation. This study aimed to evaluate air pollution before and during a new traffic plan established in March 2019 in the city of Kandy, Sri Lanka, using smart sensor technology. Street level air pollution data (PM2.5 and NO2 ) was acquired using a mobile air quality sensor unit before and during the implementation of the new traffic plan. The sensor unit was mounted on a police traffic motorcycle that travelled through the city four times per day. Air pollution in selected road segments was compared before and during the new traffic plan, and the trends at different times of the day were compared using data from a stationary smart sensor. Both PM2.5 and NO2 levels were well above the World Health Organization (WHO) 24-hour guidelines during the monitoring period, regardless of the traffic plan period. Most of the road segments had comparatively higher air pollution levels during compared to before the new traffic plan. For any given time (morning, midday, afternoon, evening), day of the week, and period (before or during the new traffic plan), the highest PM2.5 and NO2 concentrations were observed at the road segment from Girls High School to Kandy Railway Station. The mobile air pollution monitoring data provided evidence that the mean concentration of PM2.5 during the new traffic plan (116.7 µg m-3) was significantly higher than before the new traffic plan (92.3 µg m-3) (p < 0.007). Increasing spatial coverage can provide much better information on human exposure to air pollutants, which is essential to control traffic related air pollution. Before implementing a new traffic plan, careful planning and improvement of road network infrastructure could reduce air pollution in urban areas.
... Studies employing Big Data analytics and machine learning methods for air quality prediction and management are also abound. For instance, random forest and stacked ensemble have been used in a study focused on mapping urban air quality using mobile sampling in Seoul, South Korea (Lim et al., 2019). Some sectors such as transport have received more attention as can be seen from Fig. 5.1. ...
Chapter
Efforts aimed at addressing climate change have gained significant momentum in the past few years and following the Paris Climate Agreement. Recent data show that global temperatures are already about 1.1°C above pre-industrial levels and the window to limit global warming to 1.5°C or 2°C is rapidly closing. Cities account for over 70% of global CO2 emissions, indicating their significance for achieving climate stabilization targets. Recognizing this, many cities around the world are increasingly developing plans and strategies to contribute to climate change mitigation. In the meantime, smart technologies are rapidly becoming ubiquitous in many cities around the world and governments and local authorities have used this as an opportunity to develop and implement smart city programs. While the impacts of smart city programs on urban CO2 emissions are not yet fully examined, it is hoped that they can contribute to meeting climate change mitigation targets. Through text mining of bibliometric data archived in the Web of Science, this chapter seeks to provide an overview of existing research at the intersection of climate change mitigation and smart city solutions and technologies. The chapter aims to discuss actual and potential contributions of smart city solutions, related to various urban systems, to climate change mitigation. Based on outputs of bibliometric analysis (term co-occurrence) obtained from the VOSviewer software tool, issues related to urban planning, buildings, transportation, waste management, energy and water resource, economy, urban infrastructure, and urban governance are discussed. It is argued that smart solutions and technologies have high potential to contribute to climate change mitigation. They can also provide co-benefits for climate change adaptation and sustainable development. However, appropriate planning and regulating measures are needed to avoid potential trade-offs and rebound effects. Such effects and trade-offs have also been discussed. Further, the need for integrated systems that accommodate different urban sectors is highlighted.
Article
Full-text available
China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as -25.88 in Wuhan and -20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities.
Preprint
Full-text available
As the changing climate expands the extent of arid and semi-arid lands, the number, severity of, and health effects associated with dust events are likely to increase. However, regulatory measurements capable of capturing dust (PM10, particulate matter smaller than 10 µm in diameter) are sparse, sparser than measurements of PM2.5 (PM smaller than 2.5 µm in diameter). Although low-cost sensors could supplement regulatory monitors, as numerous studies have shown for PM2.5 concentration, most of these sensors are not effective at measuring PM10 despite claims by sensor manufacturers. This study focuses on the Salt Lake Valley, adjacent to the Great Salt Lake, which recently reached historic lows exposing 1865 km2 of dry lakebed. It evaluated the field performance of the Plantower PMS 5003, a common low-cost PM sensor, and the Alphasense OPC-N3, a promising candidate for low-cost measurement of PM10, against a federal equivalent method (FEM, beta attenuation) and research measurements (GRIMM aerosol spectrophotometer) at three different locations. During a month-long field study that included five dust events in the Salt Lake Valley with PM10 concentrations reaching 311 µg/m3, the OPC-N3 exhibited strong correlation with FEM PM10 measurements (R2 = 0.865, RMSE = 12.4 µg/m3) and GRIMM (R2= 0.937, RMSE = 17.7 µg/m3). The PMS sensor exhibited poor to moderate correlations (R2<0.49, RMSE = 33–45 µg/m3) with reference/research monitors and severely underestimated the PM10 concentrations (slope <0.099) for PM10. We also evaluated a PM-ratio-based correction method to improve the estimated PM10 concentration from PMS sensors. After applying this method, PMS PM10 concentrations correlated reasonably well with FEM measurements (R2 > 0.63) and GRIMM measurements (R2 > 0.76), and the RMSE decreased to 15–25 µg/m3. Our results suggest that it may be possible to obtain better resolved spatial estimates of PM10 concentration using a combination of PMS sensors (often publicly available in communities) and measurements of PM2.5 and PM10, such as those provided by FEMs, research-grade instrumentation, or the OPC-N3.
Article
The deployment of a mobile air quality monitoring laboratory requires advanced real-time instrument monitoring and data management software, which can be prohibitively expensive. In this work we present the PLUME Dashboard: a software package built in Python designed specifically for mobile air quality monitoring purposes. It aims to provide a free and open-source alternative to comparable commercial packages, thus reducing the barrier to entry of conducting such research. This paper outlines the development of the PLUME Dashboard and justifies the design choices that were made while also providing thorough documentation and explanation for how the software works. Functionality includes real-time data display, real-time peak identification, baseline subtraction, real-time air quality and self-sampling alerts (based on wind direction and vehicle speed), and post-processing tools such as peak identification and map merging with GPS data. The functionality of PLUME Dashboard is tested using real-world data collected in Toronto and Vancouver Canada.
Article
The Chengyu Metropolitan Area (CYMA), located in the Sichuan Basin, is an unevenly developed region with high PM2.5 concentrations and a population of approximately 100 million. Although exposure inequality in air pollution has received increasing concern, no related research has been carried out in the CYMA to date. In this work, we used the concentration index to assess inequality of PM2.5 population-weighted exposure in the CYMA among different subgroups, including age, education, gender, occupation and GDP per capita in the city of residence. Our findings revealed that the non-disadvantaged subgroups (people aged 15–64, people with senior and higher education, people with high-income occupations and residents of cities with high GDP per capita) had a higher PM2.5 exposure in the CYMA, with the concentration indices of −0.03 (95% CI: 0.064, −0.001), −0.14 (95% CI: 0.221, −0.059), −0.15 (95% CI: 0.238, −0.056) and −0.27 (95% CI: 0.556, 0.012), opposite to previous studies in developed countries such as the United States and France. In addition, exposure differences among cities were much larger than those among populations in the CYMA. These findings may benefit the government in identifying disproportionately exposed subgroups in developing regions, and suggest that related measures should initially be carried out for cities exposed to high PM2.5 concentrations rather than for populations exposed to high PM2.5 concentrations.
Article
Ambient air quality shows a high degree of spatiotemporal variation. It is known to impact human life and health. Obtaining meaningful estimates of human exposure to air pollutants is key to mitigating its ill effects. In order to ascertain individual exposure, pollutant information is required at high resolution. Current air quality monitoring approaches utilize static monitoring stations that are sparsely distributed across a city which in turn results in poor spatial resolution. Such stations allow only for gross assessment of air quality across a city. This work proposes a comprehensive framework for hyperlocal assessment of ambient air quality through a mobile monitoring approach. PM2.5 concentration is considered in this work as a specific air pollutant to demonstrate the framework. In order to test the efficacy of the approach, two case studies were undertaken in the city of Chennai, India as a part of this work. The first study aims at locating diurnal, spatio-temporal PM2.5 hotspots in an urban environment. These hotspots are then associated with daily anthropogenic activity. The second study enables an event centric understanding of spatio-temporal short-lived extremities in PM2.5 concentration. The results are then retrospectively associated with land use profile and anthropological activities. The mobile sensing paradigm provides a unique perspective in the assessment of such localized events. This micro-level assessment offers significant insights on the spatio-temporal variations within a city in a regular case and an extreme, event centric case. Real world applications of hyperlocal air quality assessment, such as least exposure route estimation, are also explored at the end of this work.
Article
Full-text available
Awareness of the human health impacts of exposure to air pollution is growing rapidly. For example, it has become evident that the adverse health effects of air pollution are more pronounced in disadvantaged populations. Policymakers in many jurisdictions have responded to this evidence by enacting initiatives that lead to lower concentrations of air pollutants, such as urban traffic restrictions. In this review, we focus on the interplay between advances in environmental exposure assessment and developments in policy. We highlight recent progress in the granular measurement of air pollutants and individual-level exposures, and how this has enabled focused local policy actions. Finally, we detail an illustrative study designed to link individual-level health-relevant exposures with economic, behavioral, biological, familial, and environmental variables.
Article
Full-text available
Air pollution sensors are quickly proliferating for use in a wide variety of applications, with a low price point that supports use in high-density networks, citizen science, and individual consumer use. This emerging technology motivates the assessment under real-world conditions, including varying pollution levels and environmental conditions. A seven-month, systematic field evaluation of low-cost air pollution sensors was performed in Denver, Colorado, over 2015–2016; the location was chosen to evaluate the sensors in a high-altitude, cool, and dry climate. A suite of particulate matter (PM), ozone (O3), and nitrogen dioxide (NO2) sensors were deployed in triplicate and were collocated with federal equivalent method (FEM) monitors at an urban regulatory site. Sensors were evaluated for their data completeness, correlation with reference monitors, and ability to reproduce trends in pollution data, such as daily concentration values and wind-direction patterns. Most sensors showed high data completeness when data loggers were functioning properly. The sensors displayed a range of correlations with reference instruments, from poor to very high (e.g., hourly-average PM Pearson correlations with reference measurements varied from 0.01 to 0.86). Some sensors showed a change in response to laboratory audits/testing from before the sampling campaign to afterwards, such as Aeroqual, where the O3 response slope changed from about 1.2 to 0.6. Some PM sensors measured wind-direction and time-of-day trends similar to those measured by reference monitors, while others did not. This study showed different results for sensor performance than previous studies performed by the U.S. EPA and others, which could be due to different geographic location, meteorology, and aerosol properties. These results imply that continued field testing is necessary to understand emerging air sensing technology.
Article
Full-text available
Geospatial artificial intelligence (geoAI) is an emerging scientific discipline that combines innovations in spatial science, artificial intelligence methods in machine learning (e.g., deep learning), data mining, and high-performance computing to extract knowledge from spatial big data. In environmental epidemiology, exposure modeling is a commonly used approach to conduct exposure assessment to determine the distribution of exposures in study populations. geoAI technologies provide important advantages for exposure modeling in environmental epidemiology, including the ability to incorporate large amounts of big spatial and temporal data in a variety of formats; computational efficiency; flexibility in algorithms and workflows to accommodate relevant characteristics of spatial (environmental) processes including spatial nonstationarity; and scalability to model other environmental exposures across different geographic areas. The objectives of this commentary are to provide an overview of key concepts surrounding the evolving and interdisciplinary field of geoAI including spatial data science, machine learning, deep learning, and data mining; recent geoAI applications in research; and potential future directions for geoAI in environmental epidemiology.
Article
Full-text available
Air pollution sensors are quickly proliferating for use in a wide variety of applications, with a low price point that supports use in high density networks, citizen science, and individual consumer use. This emerging technology motivates the assessment under real-world conditions, including varying pollution levels and environmental conditions. A seven-month, systematic field evaluation of low-cost air pollution sensors was performed in Denver, Colorado over 2015–2016; the location chosen to evaluate the sensors in a high altitude, cool, and dry climate. A suite of particulate matter (PM), Ozone (O3), and nitrogen dioxide (NO2) sensors were deployed in triplicate, and were collocated with Federal Equivalent Method (FEM) monitors at an urban regulatory site. Sensors were evaluated for their data completeness, correlation with reference monitors, and ability to reproduce trends in pollution data, such as daily concentration values and wind-direction patterns. Most sensors showed high data completeness when data loggers were functioning properly. The sensors displayed a range of correlations with reference instruments, from poor to very high (e.g. hourly-average PM Pearson correlations with reference measurements varied from 0.01 to 0.86). Some sensors showed a change in response to laboratory audits/testing from before the sampling campaign to afterwards, such as the Aeroqual, where the O3 response slope changed from about 1.2 to 0.6. Some PM sensors measured wind-direction and time of day trends similar to those measured by reference monitors, while others did not. This study showed different results for sensor performance than previous studies performed by the U.S. EPA and others, which could be due to different geographic location, meteorology, and aerosol properties. These results imply that continued field testing is necessary to understand emerging air sensing technology.
Article
Due to the rapid development of low cost air quality sensors, a rigorous scientific evaluation has not been conducted for many available sensors. We evaluated three Plantower PMS A003 sensors when exposed to 8 particulate matter (PM) sources (i.e., incense, oleic acid, NaCl, talcum powder, cooking emissions, and monodispersed polystyrene latex spheres under controlled laboratory conditions and also residential air and ambient outdoor air in Baltimore, MD). The PM2.5 sensors exhibited a high degree of precision and R2 values greater than 0.86 for all sources, but the accuracy ranged from 13 to >90% compared to reference instruments. The sensors were most accurate for PM with diameters below 1 µm, and they poorly measured PM in the 2.5-5 µm range. The accuracy of the sensors was dependent on relative humidity (RH), with decreases in accuracy at RH >50%. The sensors were able to produce meaningful data at low and high temperatures and when in motion, as it would be if utilized for outdoor or personal monitoring applications. It was most accurate in environments with polydispersed particle sources and may not be useful in specialized environments or experiments with narrow distributions of PM or aerosols with a large proportion of coarse PM.
Article
Addressing the worsening urban air quality situation in Sub-Saharan Africa (SSA) is proving increasingly difficult owing to paucity of data on air pollution levels and also, lack of local evidence on the magnitude of the associated health effects. There is therefore the urgent need to expand air quality monitoring (AQM) networks in SSA to enable the conduct of high quality epidemiologic studies to help inform policies aimed at addressing air pollution and the associated health effects. In this commentary, I explore the prospects that the proliferation of low-cost sensors in recent times holds for air pollution epidemiologic research in SSA. This commentary is timely because most SSA governments do not see investments in air pollution control that requires assembling a network of sophisticated and prohibitively expensive instrumentation for AQM as necessary for improving and protecting public health. I conclude that, in a region that is bereft of air pollution data, the growing influx of low-cost sensors represents an excellent opportunity for bridging the data gap to inform air pollution control policies and regulations for public health protection. However, it is essential that only the most promising sensor technologies that performs creditably well in the harsh environmental conditions of the region are promoted.
Article
Introduction: Studies of source apportionment (SA) for particulate matter (PM) air pollution have enhanced understanding of dominant pollution sources and quantification of their contribution. Although there have been many SA studies in South Korea over the last two decades, few studies provided an integrated understanding of PM sources nationwide. The aim of this study was to summarize findings of PM SA studies of South Korea and to explore study characteristics. Methods: We selected studies that estimated sources of PM10 and PM2.5 performed for 2000-2017 in South Korea using Positive Matrix Factorization and Chemical Mass Balance. We reclassified the original PM sources identified in each study into seven categories: motor vehicle, secondary aerosol, soil dust, biomass/field burning, combustion/industry, natural source, and others. These seven source categories were summarized by using frequency and contribution across four regions, defined by northwest, west, southeast, and southwest regions, by PM10 and PM2.5. We also computed the population-weighted mean contribution of each source category. In addition, we compared study features including sampling design, sampling and lab analysis methods, chemical components, and the inclusion of Asian dust days. Results: In the 21 selected studies, all six PM10 studies identified motor vehicle, soil dust, and combustion/industry, while all 15 PM2.5 studies identified motor vehicle and soil dust. Different from the frequency, secondary aerosol produced a large contribution to both PM10 and PM2.5. Motor vehicle contributed highly to both, whereas the contribution of combustion/industry was high for PM10. The population-weighted mean contribution was the highest for the motor vehicle and secondary aerosol sources for both PM10 and PM2.5. However, these results were based on different subsets of chemical speciation data collected at a single sampling site, commonly in metropolitan areas, with short overlap and measured by different lab analysis methods. Conclusion: We found that motor vehicle and secondary aerosol were the most common and influential sources for PM in South Korea. Our study, however, suggested a caution to understand SA findings from heterogeneous study features for study designs and input data.
Article
Air quality models are important for studying the impact of air pollutant on health conditions at a fine spatiotemporal scale. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e,g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this paper, we present a data mining approach that utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the concentrations of fine particulate matter less than 2.5μm in aerodynamic diameter at various temporal scales. Our experiment shows that our (domain-) expert-free model could generate accurate PM2.5concentration predictions, which can be used to improve air quality models that traditionally rely on expert-selected input. Our approach also quantifies the impact on air quality from a variety of geographic features (i.e., how various types of geographic features such as parking lots and commercial buildings affect air quality and from what distance) representing mobile, stationary and area natural and anthropogenic air pollution sources. This approach is particularly important for enabling the construction of context-specific spatiotemporal models of air pollution, allowing investigations of the impact of air pollution exposures on sensitive populations such as children with asthma at scale.
Article
Mobile monitoring and passive sampling device (PSD) monitoring are popular air pollutant measurement techniques with complementary strengths and weaknesses. This study investigates the utility of combining data from concurrent two-week mobile monitoring and PSD campaigns in Los Angeles in summer and early spring to identify sources of traffic-related air pollutants (TRAP) and their spatial distributions. There were strong to moderate correlations between mobile and PSD measurements of both NO2 and NOx in summer and spring (Pearson's r between 0.43 and 0.79), suggesting that the two datasets can be reliably combined for source apportionment. PCA identified the major TRAP sources as light duty vehicle emissions, diesel exhaust, crankcase vent emissions, and an independent source of combustion derived ultra-fine particle emissions. The component scores of those four sources at each site were significantly correlated across the two seasons (Pearson's r between 0.58 and 0.79). Spatial maps of absolute principal component scores showed all sources to be most prominent near major roadways and the central business district, and the ultrafine particle source being, in addition, more prominent near the airport. Mobile monitoring combined with fixed-site PSD sampling can provide high spatial resolution estimates of TRAP and can reveal underlying sources of exposure variability.