ArticlePDF Available

Mapping urban air quality using mobile sampling with low-cost sensors and machine learning in Seoul, South Korea


Abstract and Figures

Recent studies have demonstrated that mobile sampling can improve the spatial granularity of land use regression (LUR) models. Mobile sampling campaigns deploying low-cost (<$300) air quality sensors could potentially offer an inexpensive and practical approach to measure and model air pollution concentration levels. In this study, we developed LUR models for street-level fine particulate matter (PM2.5) concentration levels in Seoul, South Korea. 169 h of data were collected from an approximately three week long campaign across five routes by ten volunteers sharing seven AirBeams, a low-cost ($250 per unit), smartphone-based particle counter, while geospatial data were extracted from OpenStreetMap, an open-source and crowd-generated geographical dataset. We applied and compared three statistical approaches in constructing the LUR models - linear regression (LR), random forest (RF), and stacked ensemble (SE) combining multiple machine learning algorithms - which resulted in cross-validation R2 values of 0.63, 0.73, and 0.80, respectively, and identification of several pollution 'hotspots.' The high R2 values suggest that study designs employing mobile sampling in conjunction with multiple low-cost air quality monitors could be applied to characterize urban street-level air quality with high spatial resolution, and that machine learning models could further improve model performance. Given this study design's cost-effectiveness and ease of implementation, similar approaches may be especially suitable for citizen science and community-based endeavors, or in regions bereft of air quality data and preexisting air monitoring networks, such as developing countries.
Content may be subject to copyright.
Contents lists available at ScienceDirect
Environment International
journal homepage:
Mapping urban air quality using mobile sampling with low-cost sensors and
machine learning in Seoul, South Korea
Chris C. Lim
, Ho Kim
, M.J. Ruzmyn Vilcassim
, George D. Thurston
, Terry Gordon
Lung-Chi Chen
, Kiyoung Lee
, Michael Heimbinder
, Sun-Young Kim
Department of Environmental Medicine, New York School of Medicine, New York, NY, United States of America
Graduate School of Public Health, Seoul National University, Seoul, South Korea
HabitatMap, Brooklyn, NY, United States of America
Graduate School of Cancer Science and Policy, National Cancer Center, Gyeonggi, South Korea
Handling Editor: Xavier Querol
Recent studies have demonstrated that mobile sampling can improve the spatial granularity of land use re-
gression (LUR) models. Mobile sampling campaigns deploying low-cost (< $300) air quality sensors could po-
tentially oer an inexpensive and practical approach to measure and model air pollution concentration levels. In
this study, we developed LUR models for street-level ne particulate matter (PM
) concentration levels in
Seoul, South Korea. 169 h of data were collected from an approximately three week long campaign across ve
routes by ten volunteers sharing seven AirBeams, a low-cost ($250 per unit), smartphone-based particle counter,
while geospatial data were extracted from OpenStreetMap, an open-source and crowd-generated geographical
dataset. We applied and compared three statistical approaches in constructing the LUR models linear re-
gression (LR), random forest (RF), and stacked ensemble (SE) combining multiple machine learning algorithms
which resulted in cross-validation R
values of 0.63, 0.73, and 0.80, respectively, and identication of several
pollution hotspots.The high R
values suggest that study designs employing mobile sampling in conjunction
with multiple low-cost air quality monitors could be applied to characterize urban street-level air quality with
high spatial resolution, and that machine learning models could further improve model performance. Given this
study design's cost-eectiveness and ease of implementation, similar approaches may be especially suitable for
citizen science and community-based endeavors, or in regions bereft of air quality data and preexisting air
monitoring networks, such as developing countries.
1. Introduction
Ambient air pollution is a major global public health concern, with
the World Health Organization estimating that 4.2 million premature
deaths annually are attributable to ne particulate matter (PM
) ex-
posure (WHO, 2018). Government and regulatory agencies throughout
the world have traditionally relied on networks of xed-site monitors in
order to measure air quality and establish standards. Owing to their
prohibitive equipment and operational costs, these monitors tend to be
sparsely located even in large metropolitan cities, or may be entirely
missing in many locales. However, as concentrations of air pollutants
can vary markedly over small distances and short time periods, the
urban environment cannot be fully characterized using information
from sparse, static networks of air pollution monitors (Kumar et al.,
2015). To empirically model and characterize the spatial or spatio-
temporal variability of PM
concentrations, land use regression (LUR)
models based on data from monitoring networks have been employed.
Recently, LUR models based on data collected from mobile sampling
designs where predetermined locations or routes are repeatedly
sampled on modes of transport have gained traction, oering im-
proved spatial resolution at a lower cost (e.g., Hankey and Marshall,
2015;Shi et al., 2016;Deville Cavellin et al., 2016).
Recent technological advancements and proliferation of air quality
sensors oer additional avenues to rene the spatiotemporal char-
acterization of air pollution levels (Morawska et al., 2018. Numerous
instruments from commercial entities, non-prots, and startups have
entered the market to date (Borghi et al., 2017;McKercher et al., 2017),
although the performance of these sensors can dier substantially
Received 1 March 2019; Received in revised form 26 June 2019; Accepted 15 July 2019
Corresponding author at: Department of Environmental Medicine, New York University School of Medicine, 341 East 25th Street, New York, NY 10010, United
States of America.
E-mail address: (C.C. Lim).
Environment International 131 (2019) 105022
Available online 27 July 2019
0160-4120/ © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
between the dierent models as well as between individual units, as
noted by evaluations in eld and laboratory settings (Jiao et al., 2016;
Jerrett et al., 2017;Castell et al., 2017;Kelly et al., 2017;Feinberg
et al., 2018;Levy Zamora et al., 2019). Oering the capability to in-
expensively generate a large volume of data, distributed networks of
low-cost air quality sensors are beginning to be established to augment
existing monitoring networks or provide novel real-time data streams
(Gao et al., 2015;Schneider et al., 2017;Zikova et al., 2017). Note-
worthy examples of collaborative endeavors between government
agencies, research organizations, and communities include: Open-
Sensein Geneva, Switzerland (Hasenfratz et al., 2015), Array of
Thingsin Chicago, U.S (Catlett et al., 2017), and the Imperial County
Community Air Monitoring Network (English et al., 2017) in California,
LUR models based on data collected from mobile sampling with
low-cost (< $300) consumer-based sensors are very limited thus far,
which could potentially oer a highly cost-eective approach to model
and map air pollution concentration levels. The main aim of this study
was to deploy multiple units of the smartphone-based particle counter
AirBeamto measure and model street-level urban air quality in Seoul,
South Korea, a location with limited xed regulatory monitoring sites
relative to the high population and diverse urban environments. The
individual AirBeam units were rst collocated with a pDR-1500 within
a laboratory setting to adjust for intra-instrument variability and equate
particle counts to mass equivalents, and a mobile sampling campaign
was conducted by repeatedly walking across ve routes during an ap-
proximately three-week period. The collected air pollution data, to-
gether with an openly available and crowd-sourced geographical data
source OpenStreetMap (OSM), were then used to construct LUR models
with both linear regression and machine learning methods. This work
explores the potential of mobile sampling with low-cost air quality
sensors, machine learning models, and open datasources to char-
acterize street-level air quality in urban locations with ne spatial re-
2. Materials and methods
2.1. Equipment description and intra-instrument variability adjustment
The internal optical particle sensor of the AirBeam (dimensions:
105 × 95 × 43.5 mm; weight: 198 g) is the PPD60PV-T2 (detectable
particle range: 0 to 400 μg/m
; detective particle size 0.52.5 μm) from
Shinyei Technology Co. LTD. (Kyoto, Japan), connected to an Android
OX smartphone running the AirCasting application (
Supplemental Fig. 1 depicts the AirBeam, its specications, and the
Android AirCasting app. This mobile system is capable of continuous
measurement (programmable intervals as little as per 1 s) and mapping
(by GPS and Google Maps). The platform code is open-source, and
collected data can be shared and mapped via an online platform,
To adjust for potential intra-instrument variability and to convert
particle counts to PM
mass equivalents, the AirBeam units were
collocated with a DataRAM pDR-1500 (Thermo Scientic, Franklin,
MA) within a concentrated air particle (CAP) system in Sterling Forest,
New York. The system draws in and concentrates ambient air through a
cyclone inlet that rst removes most of the particles larger than 2.5 μm
in aerodynamic diameter. The cyclone outow is passed over the warm
bath of water and is then rapidly cooled in the condenser, resulting in
supersaturation and particle growth (Maciejczyk et al., 2005). The pDR-
1500 was initially calibrated with ambient particles via its internal
a) b)
Fig. 1. (a) Locations of the ve sampling routes in Seoul and government-run, xed-site monitors (blue markers). Mean PM
concentration levels (μg/m
) during
the sampling period at each of the 100 m segments are also depicted. The red arrows point to an underground roadway, which was not included in analyses. We also
present close-up views of route C as an example to depict sampled data points, with (b) OpenStreetMap and (c) satellite backgrounds. (For interpretation of the
references to color in this gure legend, the reader is referred to the web version of this article.)
C.C. Lim, et al. Environment International 131 (2019) 105022
gravimetric lter and pump system at a ow rate of 1.5 L/min in the
CAP chamber. The individual AirBeam units were then calibrated with
the pDR-1500; rst, the individual AirBeam units were placed within
the CAPS chamber together with the pDR-1500 and tested for ap-
proximately 3 to 4 h periods per day and between 2 and 3 days per unit,
and separate linear regression models were then t for each unit.
2.2. Sampling location and protocol
Seoul, the capital of South Korea and the 5th most populous me-
tropolitan area in the world, experiences one of the highest air pollution
concentration levels among cities in developed countries. The city is
characterized by extremely high urban density, abundance of high-rise
buildings and apartments, and a mountainous terrain. This study was
carried out in the southern part of Seoul, south of the Han River, in
three districts: Dongjak-gu (area = 16.35 km
; population den-
sity = 24,000/km
), Seocho-gu (area = 47.14 km
; density = 8300/
), and Gwanak-gu (area = 29.57 km
; density = 18,000/km
). The
sampling campaign was conducted during an approximately three-week
period (July 23rd to August 11th) in the summer of 2015, on weekdays
only (12 days total, on non-rainy days) during three dierent time
periods: morning (810 am), evening (68 pm), and night (911 pm).
Ten volunteers sharing 7 AirBeam units were instructed to repeatedly
sample the ve routes without predetermined beginning/ending loca-
tions and times.
The ve routes (Fig. 1), four of which were based near or around
government-run regulatory monitors, were designed to span various
neighborhoods and to obtain spatial coverage of a wide range of types
of geographical variables, such as major roads and highways, green
spaces, and both low and high density residential areas. Route A is
located in Sillim; the neighborhood is largely residential with low-rise
buildings and houses. Route B is in Sadang, which is also mainly re-
sidential with a large park and three major roads that surround the
neighborhood. Route C is in Seocho, where the central bus transport
terminal for Seoul is located, as well as the main city highway, a riv-
erside park, and high-rise apartment buildings. Route D is located at
Isu, where major highways and high-density residential areas are pre-
sent. Route E is located near Seoul National University, a large uni-
versity campus located at the base of a mountain; the area is hilly and
tree-covered, and has a relatively low volume of trac, mainly con-
sisting of buses used for student transport. The lengths of the routes
ranged from 3.9 km to 4.9 km, and the total sum length of all the routes
was 21.5 km.
2.3. Data source for land use predictors
Geospatial data for the city of Seoul, South Korea were downloaded
from OpenStreetMap (OSM), a freely available, crowd-sourced and
user-generated online mapping system. The dataset included > 60
variables, grouped by the following categories: roads (cycleway,
footway, living, path, pedestrian, residential, primary, secondary, road,
secondary link, service, steps, subway, tertiary, trunk, trunk link, un-
classied); land use (cemetery, farm, footway, forest, garden, golf,
grass, hospital, island, park, parking, pitch, place of worship, play-
ground, residential, school, sports center, substation, university, wood);
buildings (apartments, cathedral, church, commercial, hospital, hotel,
house, public, residential, retail, school, university, identied/uni-
dentied), public amenities (re station, fuel station, hospital, library,
police, school, town hall); transportation points (bus stop, motorway
junction, station, subway entrance); and water areas and waterways
(stream, river, riverbank, water). Several variables in dierent cate-
gories that repeatedly describe the same land use morphology e.g.
university, which is counted as land use, buildings, and public ame-
nities were all initially included in the analysis. After removing the
subway variable (as it describes underground paths), there were 67
predictor variables available for analysis (Supplemental Table 1).
2.4. Data reduction
As the frequency of data collection was in 1-second intervals, the
data points were rst aggregated into 1-min averages to match the pDR-
1500 sampling frequency and to reduce data noise. Measurement points
with obvious GPS (e.g. located in middle of rivers) and sampling errors
(e.g. volunteer did not follow sampling route properly) were removed
by restricting data points to < 50 M away from the routes and also by
manually after visual inspection. We then employed a snapping
procedure to assign the collected data points to the nearest route seg-
ment on the basis of measured GPS coordinates to allow measurements
along the same segment to be analyzed as a group, as per previous
mobile LUR studies (Hankey and Marshall, 2015). Segments were rst
dened by length from a starting point along a route, and buers with
dierent radiuses were drawn around centroids of the route segments,
with geospatial data from OSM within the buers then extracted. Each
road segment was thereby associated with land use, built, and natural
environment variables, calculated as dierent OSM variables within the
buers of dierent sizes. We calculated road segments at 5 dierent
lengths (25 M, 50 M, 100 M, 150 M, 250 M) and 5 buer radiuses (50 M,
100 M, 150 M, 350 M, 500 M) in order to build the LUR models as well
as to assess how these parameters inuence the LUR model perfor-
2.5. Adjustment for background temporal trends
Previous mobile sampling investigations adjusted for potential
temporal bias through several approaches; for example, Tessum et al.
(2018) adjusted for between-day temporal trends by subtracting the
daily fth percentile from all measured concentration values on a given
day. Deville Cavellin et al. (2016) used linear and quadratic terms for
temperature as independent variables in the model as adjustment for
potential temporal variability. We modied an approach applied by
multiple studies (Larson et al., 2009;Dons et al., 2012;Clougherty
et al., 2013;Van den Bossche et al., 2015;Apte et al., 2017) that used
background concentration levels from a nearby regulatory monitor to
adjust for temporal trends and normalize measured values. Leveraging
the available information on background PM
concentrations from
multiple xed-site regulatory monitors nearby the sampling routes, we
adjusted each 1-min averaged measurements from AirBeams for each
day by applying a multiplicative hourly factor (dened as the ratio of
mean concentration level during the entire sampling period to corre-
sponding hour in which that measurement is taken) derived from the
nearby regulatory monitor. For route E, which was not designed around
a regulatory monitor, we used averaged values from the two nearby
monitors (approx. 24 km away) located by routes A and B. This re-
sulted in 6 factors per each sampling day for each of the 5 routes. Using
multiple nearby monitors, instead of a single monitor as done in past
studies, allowed for variable temporal adjustments across several lo-
cations. This approach minimizes the eect of day-to-day variations in
background air quality on the measurements, thereby decreasing the
amount of required sampling data (Van den Bossche et al., 2015).
Hourly measurements from regulatory monitors in Seoul revealed
considerable temporal variability during the study period, with hourly
levels as low as 5 μg/m
and reaching 67 μg/m
during pollution
episodes (Fig. 2).
2.6. LUR model building
We rst tested the potential eects of spatial aggregation by dif-
ferent route segment lengths and buer sizes in the linear regression
model by including all available 67 variables into a linear regression
model, and we selected 100 m route segments to spatially aggregate the
collected data points based on the high adj-R
, resulting in 215 avail-
able segments for subsequent analyses. We then applied and compared
three statistical approaches for building the LUR model: linear
C.C. Lim, et al. Environment International 131 (2019) 105022
regression (LR), random forest (RF), and stacked ensemble (SE).
In the linear regression model, the GIS variables were retained for
multivariable models based on a distance-decay regression selection
strategy (ADDRESS) to screen and select informative candidate vari-
ables and corresponding buer size from all of the available potential
variables (Su et al., 2015). We then applied a supervised forward search
approach, adding the variables one at a time in the LR model and
keeping the variable only if it increased the R
of the model by 1.0%
and if all predictor variables have statistically signicant coecients
(p< 0.05) (Van den Bossche et al., 2018). We also applied the random
forest (RF) model after rst removing highly correlated variables (ab-
solute correlation > 0.8). Random forests, in brief, are an ensemble of
decision trees and each tree is constructed using the best split for each
node among a subset of predictors randomly chosen. Random search,
which randomly chooses combination of hyperparameters at every
iteration, was used to tune and optimize the model (Bergstra and
Bengio, 2012). Finally, we employed the stacked ensemble (SE) model,
a machine learning ensemble approach that involves training a learning
algorithm to combine the predictions of several other learning algo-
rithms; rst, all of the other algorithms are trained using the available
data, then a meta-classieralgorithm (chosen from the list of algo-
rithms) is trained to make a nal prediction combine all the predictions
of the other algorithms as additional inputs. We evaluated and selected
a diverse group of machine learning algorithms, including random
forest (rf), Bayesian generalized linear model (bayesglm), k-nearest
neighbors (knn), recursive partitioning and regression trees (rpart),
and partitioning using deletion, substitution, and addition moves
We applied 10-fold cross validation (with 500 repeats) to calculate
mean CV-R
(cross-validation R
; 1-(mean square error/variance)) and
root mean square errors (RMSE; a measure of the dierences between
values predicted by a model and the values observed) for the three
methods to quantify their accuracy. We used packages ggplot2and
leaetfor visualization and caretfor statistical analyses in R (version
3. Results
3.1. Adjustment for intra-instrument variability
We t univariate linear regression models for each of the deployed
Airbeam unit in order to adjust for intra-unit variability and to convert
particle counts to PM
mass concentrations. During the collocated
sessions with the DataRAM pDR-1500 in the CAP chamber, the PM
concentration (as measured by pDR-1500) ranged from 0 to 81 μg/m
The AirBeams revealed strong agreements with the pDR-1500 (adj-
= 0.950.98) and noticeable dierences in responses between the
individual units (Fig. 3). The regression models' intercepts, slopes, and
RMSE values varied across the units; detailed statistical summaries of
the models are presented in Table 1.
3.2. Mobile sampling summary statistics
The mobile sampling campaign yielded a total of 10,871 min of
data, of which after removing GPS and sampling errors, 10,177 min
(93.6%) of data remained, equaling > 169 h of total data across the 5
sampled routes (Table 2, Supplemental Tables 2 & 3). 1992 min (33.2 h)
of sampling data were collected at Route A; 2449 min (40.8h) at Route
B; 2313 min (38.6 h) at Route C; 1970 min (32.8 h) at Route D; and
1453 min (24.2 h) at Route E. Route D, which is located near major
roads and highways, had the highest concentration levels
(55.5 ± 27.7 μg/m
), while Route B (42.0 ± 24.2 μg/m
) and Route E
Fig. 2. Hourly (at 8 am, 9 am, 6 pm, 7 pm, 9 pm, 10 pm) PM
concentration levels during the sampling period (7/23/15 to 8/10/15) at the four regulatory
background monitors.
C.C. Lim, et al. Environment International 131 (2019) 105022
(48.4 ± 31.3 μg/m
) had the lowest concentration levels. Notable
dierences between morning, evening, and night were also observed
across the ve routes, especially for Route D, which had elevated levels
during morning (70.7 ± 25.5 μg/m
) compared to evening
(46.6 ± 28.3 μg/m
) and night (54.8 ± 24.1 μg/m
). The amount of
sampling data varied across the 215 segments, with a median of 44 min
per segment (minimum = 5; 25% percentile = 34; 75% percen-
tile = 55; maximum = 179). Summary statistics for minutes of sam-
pling per 100 m segment for each of the ve routes are visualized as
boxplots in Fig. 4.
3.3. Model results
The LUR models were sensitive to dierent segment lengths and
buer radiuses, with adj-R
generally increasing with larger buer ra-
diuses (Fig. 5), while 100 m to 150 m segments for spatial aggregation
performed the best. Fitting individual equations to account for intra-
instrument variability for each AirBeam unit generally improved the
accuracy of the constructed LUR models, with an increase in CV-R
values by ~0.100.15.
In constructing the LR model, we screened and removed several
point variables (e.g. re stations) that were not frequently present
across the sampling space but clustered near the pollution hotspots, as
these variables ended up having very strong inuences on the models.
The nal LR LUR model showed high goodness-of-t with a CV-R
0.63 and RMSE of 7.01, and the following variables were included in
the model: wood, secondary link, residential road, cathedral, station,
pitch, and apartments (Table 3). The machine learning approaches
explained a greater proportion of the variance of PM
than the LR model. The random forest model identied mostly dierent
variables as important (wood, residential road, living street, school,
park, apartments, residential, building, tertiary, and service) and also
revealed better performance metrics compared to the LR model, with
higher mean CV-R
(0.73) and lower RMSE (6.20). The stacked en-
semble model with random forest as the meta-predictor algorithm
performed the best, and the SE model outperformed both LR and RF
models, with higher CV-R
(0.80) and lower RMSE (5.22). Individual R
values for the algorithms in the ensemble were 0.74 for random forest,
0.45 for partDSA, 0.50 for rpart, 0.70 for bayesglm, and 0.69 for knn.
Adjusting for background temporal trends changed the overall
morning average concentration levels from 49.4 to 59.2 μg/m
; evening
from 46.4 to 45.7 μg/m
; and night from 51.5 to 47.3 μg/m
. The
changes in concentration levels after temporal adjustment during the
three sampling periods diered signicantly across the routes
(Supplemental Table 4). This adjustment also improved the CV-R
the three approaches, as not doing so resulted in lower CV-R
values of
Fig. 3. DataRam pDR-1500 (mass; μg/m
) vs. 1-minute averaged AirBeam (hundreds of particles per cubic feet; hppcf) measurements in the concentrated air particle
chamber (CAP).
Table 1
Linear regression equations to convert particle counts to mass for each of the AirBeam unit.
Unit name Intercepts (standard error) Slope (standard error) RMSE Adj-R
99B 10.72 (0.29) 0.002616 (1.77 × 105) 23.48 0.95
B7E 11.69 (0.44) 0.001974 (1.41 × 105) 14.84 0.98
B99 13.16 (0.42) 0.002102 (1.47 × 105) 15.50 0.98
C54 6.68 (0.18) 0.001905 (1.35 × 105) 8.92 0.96
C58 4.91 (0.16) 0.002000 (1.05 × 105) 9.16 0.98
C72 9.94 (0.34) 0.002537 (3.17 × 105) 10.50 0.95
D46 11.26 (0.37) 0.002049 (1.64 × 105) 11.39 0.98
C.C. Lim, et al. Environment International 131 (2019) 105022
0.54, 0.65, and 0.71 for the LR, RF, and SE models, respectively. The
constructed LUR models were used to create prediction maps of street-
level PM
concentration levels in Seoul nearby the sampled locations,
which revealed several hotspotswith elevated PM
levels (Fig. 6).
The prediction maps revealed similar spatial patterns between the three
modeling approaches with emphasis on similar locations as hotspots,
especially at locations with major roads/highways and high population
density. Conversely, the lowest concentrations were predicted at
greenspace locations, such as parks and mountains. The three ap-
proaches resulted in relatively similar mean predicted values across the
exposure surface, at 47.31, 48.86, and 49.43 μg/m
, for LR, RF, and SE,
respectively. However, the LR prediction map predicted lower values
than machine learning approaches at the extremes (range:
26.3668.96 μg/m
), while maps for RF (34.9771.43 μg/m
) and
especially SE (33.5083.19 μg/m
) models resulted in higher predicted
4. Discussion
In this study, we conducted a mobile sampling campaign in Seoul,
South Korea deploying low-cost smartphone-based air quality sensors
and utilized the collected data to construct LUR models employing
three statistical approaches. The strengths of the resulting R
were comparable to recent, similar studies across multiple locations
around the world that utilized more advanced equipment. Our study is
unique for developing LUR models using multiple low-cost (< $300),
mobile sensors; priced at $250 per unit, AirBeams are order(s) of
magnitude less expensive than the commercially available portable (in
the thousands; the pDR-1500 used in this study cost ~$5700) and
federal standard (in the tens of thousands) instruments. AirBeam and its
operating platform, Aircasting, is also notable for being primarily de-
veloped for citizen science whereby users can upload their measure-
ments to share with the public, as well as for being open-sourced, al-
lowing developers and researchers to program and customize the
instruments and the smartphone app according to their needs and re-
quirements. Many similarly priced ($200$300) sensors have entered
the market since the present study was conducted, underlining the
public's increasing interest in the capability to measure personalized
real-time exposure data (Caplin et al., 2019). Through deployment of
such low-cost sensors, we were able to characterize the spatial varia-
bility of street-level PM
in Seoul, the main source of which is likely to
be from trac given the near-road sampling approach applied in this
study. Past source apportionment studies also identied the primary
source of PM
in Seoul as motor vehicle emissions and road dust (Heo
et al., 2008;Ryou et al., 2018).
Recent mobile sampling approaches for LUR model building have
employed a variety of study designs and instruments. For example,
Hankey and Marshall (2015) collected over 85 h of data on a bicycle-
based sampling platform in Minneapolis, MN and constructed LUR
models for particle size, black carbon, and PM
with modest goodness-
of-t (adj-R
of ~0.5 for particle number and ~0.4 for PM
). Apte
et al. (2017) analyzed data collected from a Google Street View map-
ping vehicle equipped with air quality sensors that repeatedly sampled
every street in a 30-km
area of Oakland, CA, to model and reveal urban
air pollution patterns at 45 orders of magnitude greater spatial pre-
cision than possible with current central-site ambient monitoring. The
OpenSenseproject in Zurich, Switzerland (Hasenfratz et al., 2015)
utilized mobile sensor nodes installed on top of public transport tram
vehicles in the city to create high-resolution pollution prediction maps
for ultrane particles and particle counts. Vehicle-based mobile mea-
surements were also applied to create LUR models to estimate the
spatial variation of street-level PM
and PM
in the downtown area of
Hong Kong (Shi et al., 2016), and integration of urban/building mor-
phology as independent variables increased the adj-R
of the LUR
model, suggesting that incorporating detailed 3D characteristics of the
land use can improve the predictive power of such models.
Table 2
Summary statistics for measurements across the ve routes.
Route ID Route name AirBeam units
Total Morning Evening Night
Average (Std. Dev),
IQR Minutes
Average (Std. Dev),
IQR Minutes
Average (Std. Dev),
IQR Minutes
Average (Std. Dev),
A Sillim B7E, B99, C54,
1992 51.3 (32.6) 56.3 901 43.3 (32.8) 40.4 462 58.6 (27.9) 49.6 629 57.4 (33.0) 47.4
B Sadang 99B, B7E, C58,
2449 42.0 (24.2) 36.5 744 40.5 (24.8) 34.2 858 38.6 (21.2) 38.1 847 46.7 (25.8) 30.3
C Seocho 99B, C58, C72 2313 49.9 (31.4) 50.3 574 47.9 (35.2) 69.2 892 46.7 (29.4) 43.3 847 54.5 (30.1) 38.1
D Isu 99B, C72, D46 1970 55.5 (27.7) 33.9 477 70.7 (25.5) 39.6 755 46.6 (28.3) 43.3 738 54.8 (24.1) 26.1
E Seoul National
B7E, B99, C54,
1453 48.4 (31.3) 48.4 396 56.8 (39.4) 74.3 528 47.4 (26.5) 23.2 529 43.0 (27.4) 54.0
C.C. Lim, et al. Environment International 131 (2019) 105022
Our study and sampling design highlight the potential advantages of
mobile sampling with low-cost and portable air quality sensors in
constructing the LUR models. The aforementioned studies were largely
based on sampling campaigns conducted on modes of transport (e.g.
cars) visiting a single location at a given time, which may potentially
result in a low number of visits per location. The results from this and
past studies found that mobile LUR models are highly sensitive to
parameters such as the number of route segments, radiuses of buers,
and number of measurements per segment (Minet et al., 2017).
Hatzopoulou et al. (2017) evaluated the inuence of the number of
sampling locations and durations of sampling on LUR model perfor-
mance, noting that mobile sampling campaigns can be inecient due to
low sampling frequency at a large number of locations, and that spatial
variability may be more important than the numbers of locations when
designing the sampling routes. The authors also found that the LUR
models became relatively robust after 150200 segments and 1012
visits per segment. In the present study, walking at a slow speed, in-
stead of sampling on mechanical modes of transportation, resulted in
each route generally having a high number of data points
(median = 44) per segment. This approach also allows for assessing
personal-level exposure in urban areas where there are a larger number
of people on the streets than in cars. The disadvantage of shorter dis-
tances being covered when sampling on foot was oset by the low cost
and portability of AirBeams, which allowed for several units that could
be deployed simultaneously across multiple locations at a given time
and thereby maximize spatial coverage, as opposed to the majority of
past mobile sampling studies that were carried out on a single platform.
Simultaneous measurements within a structured sampling design could
Fig. 4. Boxplot demonstrating distribution of minutes of sampling per 100 m segment for each sampling route.
Fig. 5. Adjusted R
of LR LUR models (including all available 67 predictor variables) for mass, by segment radius and buer sizes.
C.C. Lim, et al. Environment International 131 (2019) 105022
decrease the amount of collected data (and manpower) required to
construct robust models, whereas participatory sensing where sampling
is done opportunisticallycould lead to unstructured data that is more
dicult to interpret (Van den Bossche et al., 2016). Furthermore, Air-
Beam's ease of operation meant that minimal training (a few minutes at
most) was required prior to eld deployment, resulting in a relatively
large volume of data being generated within the short sampling cam-
paign period during this study.
This study leveraged OpenStreetMap (OSM), an openly available
and crowd-sourced GIS dataset, which provided a rich and compre-
hensive source of geospatial data for a wide range of LUR variables.
OSM and other open datasources oer underexplored but valuable
information for data-driven methods to predict air pollution levels
(VoPham et al., 2018). Notably, the OSM GIS variables were highly
developed for Seoul and provided detailed and dierentiated data for
the numerous types of roads and buildings, which are the land use
categories that usually provide the highest predictive power for air
pollution LUR models. Another advantageous aspect of crowd-sourced
data is that it is continually updated; for example, using an earlier
download of OSM from September 2015 (versus January 2018 in this
analysis) with less developed characterization of Seoul resulted inLUR
models with lower CV-R
, suggesting that in locations with lacking
geospatial data, crowd-sourced eorts to generate the relevant GIS
variables could be carried out in concert with the air pollution sampling
campaign to strengthen the predictive capability of LUR models. De-
spite recent endeavors to democratize data by agencies and organiza-
tions throughout the world as part of the open datamovement, many
detailed GIS les remain proprietary and thereby cost-prohibitive, and
freely available data like OSM oer an alternative and important source
of detailed spatial data for researchers and communities.
Machine learning methods oered improved goodness-of-t com-
pared to traditional stepwise linear regression in constructing the LUR
models. Prior work on machine learning applications in both national
(Hu et al., 2017;Di et al., 2016) and local-level (Adams and
Kanaroglou, 2016;Weichenthal et al., 2016;Brokamp et al., 2017)
predictions of air pollution concentration levels highlight the ad-
vantages associated with the approach, including higher accuracy and
identication of important variables. A recent example further under-
lines additional potential benets; a study in Los Angeles, USA used a
multi-step and exible spatial data mining approach using machine
learning to select for most important OSM geographic features and
predict PM
concentrations, removing the need for a priori selection of
predictors for exposure modeling (Lin et al., 2017). Similarly in our
analysis, applying the traditional step-wise linear regression LUR ap-
proach with the highly correlated OSM dataset, which also contained
several highly inuential variables, required manual screening and re-
moval of predictor variables prior to input and during the model
building process. Notably, the stacked ensemble model combining
multiple machine learning algorithms outperformed both LR and RF in
this study. In recent years ensemble machine learning methods have
emerged as an important tool for modeling complex relationships and
have been applied successfully in various research areas (Yang et al.,
2010). Application of ensembles have been generally limited in air
pollution exposure assessment and modeling eorts to date, and the
results here suggest that ensemble-based approaches could further en-
hance the predictive performance of LUR models.
We note several potential weaknesses that are present in this study.
As we evaluated the AirBeam units in a carefully controlled experi-
mental chamber drawing in air from a forested and rural area (Tuxedo,
New York), the particle composition and the environmental conditions
(e.g., humidity and temperature) encountered during the experiment
are likely to be signicantly dierent from the heavily urban location
where this study was carried out. Although the potential impacts of
these factors were not assessed in this study, previous performance
evaluations of AirBeams in various laboratory and eld settings oer
insight. The initial manufacturer calibration was conducted in a simi-
larly urban setting (New York City), which revealed high correlations
with both gravimetric sampling and pDR-1500 (, 2014).
Comparison against federal equivalent method monitors showed high
agreements with GRIMM (R
~0.60.8) (Mukherjee et al., 2017;
SCAQMD, 2017;Feinberg et al., 2018), but mixed results were observed
with BAM (R
~0.20.7) (Jiao et al., 2016;SCAQMD, 2017). A study of
sensor responses to Arizona road dust, salt, and welding fumes (Sousan
et al., 2017) demonstrated that particle types had signicant impacts on
AirBeam (and other low-cost sensors) measurements. Relative humidity
(RH) levels also inuenced the measurements; a laboratory evaluation
found that bias was observed when both RH (> 65%) levels and con-
centration levels (> 100 μg/m
) were elevated (SCAQMD, 2017), while
another study (Feinberg et al., 2018) found that the particle counts
measurements were aected by higher humidity levels in a eld setting.
Highly humid summers in Korea would likely inuence the absolute
measurement values, but the potential impact on prediction model
performance is likely to be minimal as the spatial variability of hu-
midity levels is relatively uniform across a city. Nevertheless, these
ndings emphasize the need to consider the potential inuence of en-
vironmental factors in sensor deployments, and performance evalua-
tions at the study location is suggested for similar studies applying low-
cost sensors. In addition, the particle concentration levels encountered
during sampling in Seoul were higher than the range used for con-
structing calibration equations for the AirBeam units, which may ignore
Table 3
Selected LUR model predictor variables in the LR and RF models and associated statistics.
Variable name Variable type Buer length Linear regression Random forest
ΒStd. error P-value Importance
Intercept 50.02 1.21 < 0.001
Wood Area 500 m 3.80 × 10
4.25 × 10
< 0.001 14.45
Residential Road Line 500 m 2.59 × 10
5.88 × 10
< 0.001 13.10
Secondary Link Line 500 m 6.88 × 10
1.00 × 10
< 0.001
Cathedral Point 500 m 2.47 × 10
1.03 × 10
Station Point 500 m 3.75 1.02 < 0.001
Pitch Area 350 m 1.88 × 10
4.28 × 10
< 0.001
Apartments Point 500 m 7.70 × 10
4.02 × 10
0.05 10.21
School Area 500 m 10.73
Living Street Line 500 m 10.85
Park Area 500 m 10.31
Residential Area 500 m 9.93
Building (Unclassied) Area 500 m 9.72
Tertiary Line 350 m 9.70
Service Line 350 m 8.97
Top ten variables by variable importance are shown in the table.
C.C. Lim, et al. Environment International 131 (2019) 105022
the potential nonlinearity of sensor responses. We also did not check for
potential sensor drift a common issue for low-cost air quality sensors
during and after the mobile sampling, although this is unlikely due to
the relatively short sampling period. These issues may have contributed
to predicted values that were signicantly higher than observed values
from nearby xed-site monitors, although it is also possible that such
dierences are due to the fact that xed-site monitors are often located
well above ground and tend to underestimate personal exposures when
walking near trac(Deville Cavellin et al., 2016). Another potential
weakness is that the OSM data quality and density could be potentially
uneven across locations, as some areas could be characterized in more
detail than others. For example, in some of the sampled areas in this
study, several of the houses in residential areas were not captured in the
OSM le and thereby could have inuenced model quality; however, as
OSM data coverage and quality continues to improve this should be-
come less of an issue over time.
5. Conclusions
Low-cost sensors represent an opportunity to bridge the data gap,
thereby promoting public discourse, inuencing air pollution regula-
tions, and protecting public health (Amegah, 2018). This study high-
lights the advantages and potential of applying data collected from
mobile sampling with multiple low-cost sensors to model and map
street-level air pollution levels in urban locations, especially the cap-
ability to generate a large volume of sampling data with ease. The
predictive power of models developed here, despite deploying only a
limited number of signicantly less expensive, consumer-based air
quality sensors, were comparable to the past mobile sampling LUR
studies, especially after adjusting for intra-instrument variability and
temporal trends. To minimize the potential inuence of local particle
characteristics and environmental conditions, calibration with collo-
cated reference monitors at the sampling location is suggested for fu-
ture projects using similar low-cost sensors, as well as to convert par-
ticle counts to mass concentration, a unit of measurement that is more
readily transferable for policy-relevant metrics. Initial calibrations
should also carefully evaluate and adjust for the potential eects of
relative humidity levels, which can have signicant inuences on the
low-cost sensors. Overall, the ndings here suggest that similar mobile
sampling designs using low-cost sensors and open datasources could
be applied to generate a large volume of data and construct LUR models
and maps with ne spatial granularity, and that machine learning
methods could further improve model performance. Our study design
and approach may be especially suitable for citizen science and com-
munity-based endeavors, or in locations without preexisting air mon-
itoring networks, such as developing countries.
This study was funded by the Basic Science Research Program
through the National Research Foundation of Korea (NRF) funded by
the South Korea Ministry of Education (2018R1A2B6004608), the NSF
East Asia and Pacic Summer Institute (EAPSI) Fellowship, the Air &
Waste Management Air Pollution Education and Research Grant
(APERG), the EPA STAR Graduate Fellowship, and by a grant from the
National Institutes of Environmental Health Sciences Center (ES00260).
This publication was developed under Assistance Agreement No.
FP917825 awarded by the U.S. Environmental Protection Agency to
Chris C. Lim. It has not been formally reviewed by EPA. The views
expressed in this document are solely those of the authors and do not
necessarily reect those of the Agency. EPA does not endorse any
products or commercial services mentioned in this publication.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://
Fig. 6. PM
prediction maps nearby sampled areas constructed applying (a)
linear regression, (b) random forest, and (c) stacked ensemble approaches.
C.C. Lim, et al. Environment International 131 (2019) 105022
Adams, M.D., Kanaroglou, P.S., 2016. Mapping real-time air pollution health risk for
environmental management: combining mobile and stationary air pollution mon-
itoring with neural network models. J. Environ. Manag. 168, 133141.
Amegah, A.K., 2018. Proliferation of Low-Cost Sensors. What Prospects for Air Pollution
Epidemiologic Research in Sub-Saharan Africa? vol. 241. pp. 11321137.
Apte, J.S., Messier, K.P., Gani, S., Brauer, M., Kirchstetter, T.W., Lunden, M.M., ...
Hamburg, S.P., 2017. High-resolution air pollution mapping with Google street view
cars: exploiting big data. Environmental Science & Technology 69997008.
Bergstra, James, Bengio, Yoshua, 2012. Random search for hyper-parameter optimiza-
tion. J. Mach. Learn. Res. 13.Feb, 281305.
Borghi, F., Spinazz, A., Rovelli, S., Campagnolo, D., Buono, L. Del, Cattaneo, A., &
Cavallo, D. M. (2017). Miniaturized monitors for assessment of exposure to air pol-
lutants: a review.
Brokamp, C., Jandarov, R., Rao, M.B., LeMasters, G., Ryan, P., 2017. Exposure assessment
models for elemental components of particulate matter in an urban environment: a
comparison of regression and random forest approaches. Atmos. Environ. 151, 111.
Caplin, A., Ghandehari, M., Lim, C., Glimcher, P., Thurston, G., 2019. Advancing en-
vironmental exposure assessment science to benet society. Nat. Commun. 10 (1),
Castell, N., Dauge, F.R., Schneider, P., Vogt, M., Lerner, U., Fishbain, B., Bartonova, A.,
2017. Can commercial low-cost sensor platforms contribute to air quality monitoring
and exposure estimates? Environ. Int. 99, 293302.
Catlett, C.E., Beckman, P.H., Sankaran, R., Galvin, K.K., 2017. Array of things: a scientic
research instrument in the public way: Platform design and early lessons learned. In:
Proceedings of the 2nd International Workshop on Science of Smart City Operations
and Platforms Engineering. ACM, pp. 2633 April.
Clougherty, J.E., Kheirbek, I., Eisl, H.M., Ross, Z., Pezeshki, G., Gorczynski, J.E., Johnson,
S., Markowitz, S., Kass, D., Matte, T., 2013. Intra-urban spatial variability in win-
tertime street-level concentrations of multiple combustion- related air pollutants: the
New York City Community Air Survey (NYCCAS). J. Expo. Sci. Environ. Epidemiol.
23, 232e240.
Deville Cavellin, L., Weichenthal, S., Tack, R., Ragettli, M.S., Smargiassi, A., Hatzopoulou,
M., 2016. Investigating the use of portable air pollution sensors to capture the spatial
variability of trac-related air pollution. Environmental Science & Technology 50
(1), 313320.
Di, Q., Kloog, I., Koutrakis, P., Lyapustin, A., Wang, Y., Schwartz, J., 2016. Assessing PM
2.5 Exposures with High Spatiotemporal Resolution across the Continental United
States. Environ Sci Technol. 50 (9), 214712.
Dons, E., Int Panis, L., Van Poppel, M., Theunis, J., Wets, G., 2012. Personal exposure to
Black Carbon in transport microenvironments. Atmos. Environ. 55, 392398.
English, P.B., Olmedo, L., Bejarano, E., Lugo, H., Murillo, E., Seto, E., Wong, M., King, G.,
Wilkie, A., Meltzer, D., Carvlin, G., 2017. The Imperial County Community Air
Monitoring Network: a model for community-based environmental monitoring for
public health action. Environ. Health Perspect. 125 (7).
Feinberg, S., et al., 2018. Long-term evaluation of air sensor technology under ambient
conditions in Denver, Colorado. Atmos. Meas. Tech. 11, 46054615.
Gao, M., Cao, J., Seto, E., 2015. A distributed network of low-cost continuous reading
sensors to measure spatiotemporal variations of PM
in Xi'an, China. Environ.
Pollut. 199, 5665.
Hankey, S., Marshall, JD., 2015. Land Use Regression Models of On-Road Particulate Air
Pollution (Particle Number, Black Carbon, PM 2.5 , Particle Size) Using Mobile
Monitoring. Environ Sci Technol. 49 (15), 2029194.
Hasenfratz, D., et al., 2015. Deriving high-resolution urban air pollution maps using
mobile sensor nodes. Pervasive Mob. Comput. 16, 268285.
Hatzopoulou, M., Valois, M.F., Levy, I., Mihele, C., Lu, G., Bagg, S., ... Brook, J., 2017.
Robustness of land-use regression models developed from mobile air pollutant
measurements. Environ. Sci. Technol. 51 (7), 39383947.
Heo, J.B., Hopke, P.K., Yi, S.M., 2008. Source apportionment of PM
in Seoul.
Korea.Atmos. Chem. Phys. Discuss. 8, 20427e2046.
Hu, X., Belle, J. H., Meng, X., Wildani, A., Waller, L. A., & Strickland, M. J. (2017).
Estimating PM
concentrations in the conterminous United States using the random
forest approach.
Jerrett, M., et al., 2017. Validating novel air pollution sensors to improve exposure es-
timates for epidemiological analyses and citizen science. Environ. Res. 158, 286294.
Jiao, W., et al., 2016. Community Air Sensor Network (CAIRSENSE) project: evaluation of
low-cost sensor performance in a suburban environment in the southeastern United
States. Atmos. Meas. Tech. Discuss. 124.
Kelly, K.E., et al., 2017. Ambient and laboratory evaluation of a low-cost particulate
matter sensor. Environ. Pollut. 221, 491500.
2016.12.039. Feb.
Kumar, P., Morawska, L., Martani, C., Biskos, G., Neophytou, M., Di Sabatino, S., Bell, M.,
Norford, L., Britter, R., 2015. The rise of low-cost sensing for managing air pollution
in cities. Environ. Int. 75, 199205.
Larson, T., Henderson, S.B., Brauer, M., 2009. Mobile Monitoring of Particle Light
Absorption Coecient in an Urban Area as a Basis for Land Use Regression. Environ
Sci Technol. 43 (13), 84672.
Levy Zamora, Misti, et al., 2019. Field and laboratory evaluations of the low-cost plan-
tower particulate matter sensor. Environ. Sci. Technol. 53 (2), 838849 American
Chemical Society.
Lin, Y., Chiang, Y.-Y., Pan, F., Stripelis, D., Ambite, J.L., Eckel, S.P., Habre, R., 2017.
Mining public datasets for modeling intra-city PM
concentrations at a ne spatial
resolution. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on
Advances in Geographic Information Systems. ACM, Los Angeles area, CA, pp. 110.
Maciejczyk, P., Zhong, M., Li, Q., Xiong, J., Nadziejko, C., Chen, L.C., 2005. Eects of
subchronic exposures to concentrated ambient particles (CAPs) in mice: II. The design
of a CAPs exposure system for biometric telemetry monitoring. Inhal. Toxicol. 17
(45), 189197.
McKercher, G.R., Salmond, J.A., Vanos, J.K., 2017. Characteristics and applications of
small, portable gaseous air pollution monitors. Environ. Pollut.
Minet, L., Gehr, R., Hatzopoulou, M., 2017. Capturing the sensitivity of land-use re-
gression models to short-term mobile monitoring campaigns using air pollution
micro-sensors. Environ. Pollut. 230, 280290.
Morawska, L., et al., 2018. Applications of low-cost sensing technologies for air quality
monitoring and exposure assessment: how far have they gone? Environ. Int. 116,
Mukherjee, Anondo, Stanton, Levi, Graham, Ashley, Roberts, Paul, August 5, 2017.
Assessing the utility of low-cost particulate matter sensors over a 12-week period in
the Cuyama Valley of California. Sensors 17 (8), 1805.
Ryou, H., Heo, J., Kim, S.Y., 2018. Source apportionment of PM10 and PM
air pollu-
tion, and possible impacts of study characteristics in South Korea. Environ. Pollut.
240, 963972.
Schneider, P., Castell, N., Vogt, M., Dauge, F.R., Lahoz, W.A., Bartonova, A., 2017.
Mapping urban air quality in near real-time using observations from low-cost sensors
and model information. Environ. Int. 106 (December 2016), 234247.
Shi, Y., Lau, K.K.L., Ng, E., 2016. Developing street-level PM
and PM
land use re-
gression models in high-density Hong Kong with urban morphological factors.
Environ. Sci. Technol. 50 (15), 81788187.
Sousan, S., Koehler, K., Hallett, L., Peters, T.M., 2017. Evaluation of consumer monitors to
measure particulate matter. J. Aerosol Sci. 107, 123133.
South Coast AQMD, 2017. AirBeam summary report [online]. Available at. http://www., Accessed date: June 2019.
Su, J.G., Hopke, P.K., Tian, Y., Baldwin, N., Thurston, S.W., Evans, K., Rich, D.Q., 2015.
Modeling particulate matter concentrations measured through mobile monitoring in
a deletion/substitution/addition approach. Atmos. Environ. 122, 477483., 2014. AirBeam Technical Specications, Operation & Performance:
Taking Space. [online] Available at.
technical-specications-operation-performance/, Accessed date: June 2019.
Tessum, M.W., et al., 2018. Mobile and xed-site measurements to identify spatial dis-
tributions of trac-related pollution sources in Los Angeles. Environ. Sci. Technol. acs.est.7b04889.
Van den Bossche, J., Peters, J., Verwaeren, J., Botteldooren, D., Theunis, J., De Baets, B.,
2015. Mobile monitoring for mapping spatial variation in urban air quality: devel-
opment and validation of a methodology based on an extensive dataset. Atmos.
Environ. 105, 148161.
Van den Bossche, Joris, Theunis, Jan, Elen, Bart, Peters, Jan, Botteldooren, Dick, De
Baets, Bernard, 2016. Opportunistic mobile air pollution monitoring: a case study
with city wardens in Antwerp. Atmos. Environ. 141, 408421.
Van den Bossche, J., De Baets, B., Verwaeren, J., Botteldooren, D., Theunis, J., 2018.
Development and evaluation of land use regression models for black carbon based on
bicycle and pedestrian measurements in the urban environment. Environ. Model.
Softw. 99, 5869.
VoPham, T., Hart, J.E., Laden, F., Chiang, Y.-Y., 2018. Emerging trends in geospatial
articial intelligence (geoAI): potential applications for environmental epidemiology.
Environmental Health: A Global Access Science Source 17 (1), 16.
Weichenthal, S., Ryswyk, K. Van, Goldstein, A., Bagg, S., Shekkarizfard, M., Hatzopoulou,
M., 2016. A land use regression model for ambient ultrane particles in Montreal,
Canada: a comparison of linear regression and a machine learning approach. Environ.
Res. 146, 6572.
World Health Organization, May 2, 2018. 9 out of 10 People Worldwide Breathe Polluted
Air, but More Countries Are Taking Action. Available at.
but-more-countries-are-taking-action, Accessed date: 16 September 2018.
Yang, P., Yang, Y.H., Zhou, B.B., Zomaya, A.Y., 2010. A review of ensemble methods in
bioinformatics. Curr. Bioinforma. 5, 296308.
Zikova, N., et al., 2017. Estimating hourly concentrations of PM
across a metropolitan
area using low-cost particle monitors. Sensors 17, 1922.
C.C. Lim, et al. Environment International 131 (2019) 105022
... Low-cost monitors (LCMs) are widely used in various types of urban areas to monitor air quality in real time. To improve the accuracy and applicability of LCMs, various environmental parameters have been proposed, showing promising improvements in sensing performance [7][8][9]. ...
... Studies have shown that LCMs are reliable and exhibit high accuracy during laboratory testing as well as when calibrated in the eld [16][17][18][19]. Conversely, other studies have concluded that LCMs are sensitive to climate parameters such as air temperature and relative humidity (RH); however, long-term measurements combined with post-processing or ML can correct the bias of the sensors [7,20]. ...
... By comparing two regions with different climate characteristics, located in the middle latitudes and equator respectively, we revealed the LCM requirements for calibration according to the local climate parameters for more reliable ambient air monitoring. LCMs still have limitations in terms of accuracy and reliability, but their potential is invaluable, when combined with advances in postprocessing methodologies [7,8,20]. ...
Full-text available
Low-cost particulate matter (PM) sensors have been widely used following recent sensor-technology advancements; however, inherent limitations of low-cost monitors (LCMs), which operate based on light scattering without an air-conditioning function, still restrict their applicability. We propose a regional calibration of LCMs using a multivariate Tobit model with historical weather and air quality data to improve the accuracy of ambient air monitoring, which is highly dependent on meteorological conditions, local climate, and regional PM properties. Weather observations and PM 2.5 (fine inhalable particles with diameters ≤ 2.5 µm) concentrations from two regions in Korea, Incheon and Jeju, and one in Singapore were used as training data to build a visibility-based calibration model. To validate the model, field measurements were conducted by an LCM in Jeju and Singapore, where R ² and the error after applying the model in Jeju improved (from 0.85 to 0.88) and reduced by 44% (from 8.4 to 4.7 µg m ⁻³ ), respectively. The results demonstrated that regional calibration involving air temperature, relative humidity, and other local climate parameters can efficiently correct bias of the sensor. Our findings suggest that the proposed post-processing using the Tobit model with regional weather and air quality data enhances the applicability of LCMs.
... Previous studies [1,4,[20][21][22] have suggested numerous methods for mapping PM2.5 in Ulaanbaatar. The scale of the resulting dispersion maps, however, is too great, and some of the maps merely show the city's core, without expanding outward. ...
... The ger area's fine particulate matter pollution is around two times higher than the city center's fine particulate matter pollution [25]. Based on other researchers' experience [20][21][22]26], the "DUST TRAK II Aerosol Monitor 8532" mobile device was borrowed from the Department of Environment and Forest Engineering, National University of Mongolia, and was utilized for the field study. This portable device has a "Certificate of Calibration and Testing" (certificate serial number: 8532134301) and can measure aerosol concentrations for PM1, PM2.5, respirable, or PM10 size fractions with a corresponding impactor kit. ...
... Fixed stations and mobile device measurements' descriptive statistics are shown in Table 1. Based on other researchers' experience [20][21][22]26], the "DUST TRAK II Aerosol Monitor 8532" mobile device was borrowed from the Department of Environment and Forest Engineering, National University of Mongolia, and was utilized for the field study. This portable device has a "Certificate of Calibration and Testing" (certificate serial number: 8532134301) and can measure aerosol concentrations for PM 1 , PM 2.5 , respirable, or PM 10 size fractions with a corresponding impactor kit. ...
Full-text available
In recent decades, air pollution in Ulaanbaatar has become a challenge regarding the health of the citizens of Ulaanbaatar, due to coal combustion in the ger area. Households burn fuel for cooking and to warm their houses in the morning and evening. This creates a difference between daytime and nighttime air pollution levels. The accurate mapping of air pollution and assessment of exposure to air pollution have thus become important study objects for researchers. The city center is where most air quality monitoring stations are located, but they are unable to monitor every residential region, particularly the ger area, which is where most particulate matter pollution originates. Due to this circumstance, it is difficult to construct an LUR model for the entire capital city’s residential region. This study aims to map peak PM2.5 dispersion during the day using the Linear and Nonlinear Land Use Regression (LUR) model (Multi-Linear Regression Model (MLRM) and Generalized Additive Model (GAM)) for Ulaanbaatar, with monitoring station measurements and mobile device (DUST TRUK II) measurements. LUR models are frequently used to map small scale spatial variations in element levels for various types of air pollution, based on measurements and geographical predictors. PM2.5 measurement data were collected and analyzed in the R statistical software and ArcGIS. The results showed the dispersion map MLRM R2 = 0.84, adjusted R2 = 0.83, RMSE=53.25µg/m3 and GAM R2 = 0.89, and adjusted R2 = 0.87, RMSE = 44µg/m3. In order to validate the models, the LOOCV technique was run on both the MLRM and GAM. Their performance was also high, with LOOCV R2 = 0.83, MSE=55.6 µg/m3, MAE=38.7 µg/m3, and GAM LOOCV R2 = 0.77, RMSE=65.5 µg/m3, MAE=47.7 µg/m3. From these results, the LUR model’s performance is high, especially the GAM model, which works better than MRLM.
... For conducting the relationship between air pollution, human activities, and health at the urban scale in China, previous studies have shown that the fixed air pollution monitoring stations are spatially uneven with high construction operation and maintenance costs (Li et al., 2021;Zhang et al., 2019a), only represent air pollution in a small surrounding area and cannot meet the requirement of high spatial-temporal resolution studies to reflect the real exposure of residents (Dias and Tchepel, 2018). With the development of production and technology, low-cost mobile monitoring devices have been proven to improve the spatial-temporal resolution of air pollution (Lim et al., 2019). Meanwhile, the study indicates that there are spatial differences in the distribution of air pollution on weekdays and weekends (Requia et al., 2018;Wong et al., 2009). ...
... Exposure to PM may cause acute or chronic effects on the human body and cause adverse effects on economic development (Ma et al., 2011). PM monitoring based on high spatial and temporal resolution is helpful to understand the complex changes caused by the complex landscape structure in cities (Lim et al., 2019). This study indicated that as the PM diameter decreases, model accuracy increases on weekdays and decreases on weekends. ...
... Several parameters have been studied, including the annual temperature and the amount of precipitation and according to the Environmental Performance Index, South Korea ranks 173rd out of 180 nations in terms of quality in the area, where more than 50% of the population is exposed to heavy inhalation of fine dust. We in fact know that South Korea has numerous environmental laws, with different restrictions on both green belts and emissions, however, it is actually one of the most polluted countries in the world in terms of air quality (Lim et al. 2019) and for these reasons, many studies involved experts to prevent future disasters. ...
Full-text available
In the Future Studies context, the scenario development process is an established method for the identification of future projections, useful to avoid future threats and take different actions in the present. The development of future scenarios is often combined with different participatory approaches, one among many is the Delphi method, widely adopted for its systematic and interactive nature. In this context, the recent climate challenges lead society to an exponential growth of uncertainty about the future where Delphi-based scenarios (DBS) could be helpful to identify interesting mid and long-term projections. For the purpose of conducting a systematic review of Delphi-based future scenarios applied to climate change context, we used a quantitative bibliometric analysis aimed at investigating the scientific literature path, implementing it with a multiple correspondence analysis and a semantic network analysis. We illustrate the results of the case studies focusing on the combination of methods, rounds of the process, panellists‘ sampling, time horizon, and techniques used, to establish new guidelines for future Delphi-based climate research projects.
... In 2013, PM 2.5 was declared a Group 1 carcinogen by the World Health Organization (WHO). In recent years, studies related to the "risk to human health from PM 2.5 " have increased significantly, attracting the attention of researchers globally (Lim et al. 2019;Chen et al. 2018;Choudhuri and Singh 2022). Besides its health effects, it is a major threat to climatic change, socio-economics, and biodiversity. ...
Full-text available
Fine particulate matter (PM 2.5) has become a prominent pollutant due to rapid economic development, urbanization, indus-trialization, and transport activities, which has serious adverse effects on human health and the environment. Many studies have employed traditional statistical models and remote-sensing technologies to estimate PM 2.5 concentrations. However, statistical models have shown inconsistency in PM 2.5 concentration predictions, while machine learning algorithms have excellent predictive capacity, but little research has been done on the complementary advantages of diverse approaches. The present study proposed the best subset regression model and machine learning approaches, including random tree, additive regression, reduced error pruning tree, and random subspace, to estimate the ground-level PM 2.5 concentrations over Dhaka. This study used advanced machine learning algorithms to measure the effects of meteorological factors and air pollutants (NO X , SO 2 , CO, and O 3) on the dynamics of PM 2.5 in Dhaka from 2012 to 2020. Results showed that the best subset regression model was well-performed for forecasting PM 2.5 concentrations for all sites based on the integration of precipitation, relative humidity, temperature, wind speed, SO 2 , NO X , and O 3. Precipitation, relative humidity, and temperature have negative correlations with PM 2.5. The concentration levels of pollutants are much higher at the beginning and end of the year. Random subspace is the optimal model for estimating PM 2.5 because it has the least statistical error metrics compared to other models. This study suggests ensemble learning models to estimate PM 2.5 concentrations. This study will help quantify ground-level PM 2.5 concentration exposure and recommend regional government actions to prevent and regulate PM 2.5 air pollution.
... The involvement of citizen scientists is beneficial to allowing to gather additional data and to significantly increase the number of measurements with relatively low effort. In recent years, several citizen science projects have been carried out on air pollution in general, e.g., in urban areas in California and Colorado in the United States, in the Republic of Korea, and in Kenya [11][12][13][14], and specifically on urban NO2 concentrations, e.g., in Italy [15]. An impressive example of a large citizen science project on air pollution is a NO2 distribution survey over Flanders (Belgium) in 2018. ...
Full-text available
Nitrogen dioxide (NO2) is a major air pollutant with diverse impacts on human health and the environment. In urban areas, road traffic is the main emission source for NO2. In Berlin, Germany, a network of measurement stations is operated by the state, fulfilling the monitoring requirements set by the European Union. To get a more detailed overview of the spatial distribution of NO2 concentrations in Berlin, a citizen science project allowed for collection of additional data and an increase in the number of sampling sites. Passive samplers (modified Palmes tubes) were distributed to participants to collect NO2 at a site of their choice. When returned, the samplers were analyzed based on the Griess–Ilosvay reaction and spectrophotometric detection. The results confirmed a seasonal trend of higher NO2 concentrations in winter and lower concentrations during the summer period. Furthermore, the spatially and monthly averaged NO2 concentrations observed in the study period from March 2019 to October 2020 were in good agreement with the average urban background concentration. At small spatial scales, a tendency of decreasing NO2 concentrations with increasing distance from roads was observed. Overall, this study shows the added benefit of extensive low-cost measurements of NO2 concentrations across urban environments in a citizen science project to complement stationary air pollution monitoring networks.
Full-text available
Aerosol pollution in urban areas is highly variable due to numerous single emission sources such as automobiles, industrial and commercial activities as well as domestic heating, but also due to complex building structures redirecting air mass flows, producing leeward and windward turbulences and resuspension effects. In this publication, it is shown that one or even few aerosol monitoring sites are not able to reflect these complex patterns. In summer 2019, aerosol pollution was recorded in high spatial resolution during six night and daytime tours with a mobile sensor platform on a trailer pulled by a bicycle. Particle mass loadings showed a high variability with PM10 values ranging from 1.3 to 221 μg m−3 and PM2.5 values from 0.7 to 69.0 μg m−3. Geostatistics were used to calculate respective models of the spatial distributions of PM2.5 and PM10. The resulting maps depict the variability of aerosol concentrations within the urban space. These spatial distribution models delineate the distributions without cutting out the built-up structures. Elsewise, the overall spatial patterns do not become visible because of being sharply interrupted by those cutouts in the resulting maps. Thus, the spatial maps allow to identify most affected urban areas and are not restricted to the street space. Furthermore, this method provides an insight to potentially affected areas, and thus can be used to develop counter measures. It is evident that the spatial aerosol patterns cannot be directly derived from the main wind direction, but result far more from an interplay between main wind direction, built-up patterns and distribution of pollution sources. Not all pollution sources are directly obvious and more research has to be carried out to explain the micro-scale variations of spatial aerosol distribution patterns. In addition, since aerosol load in the atmosphere is a severe issue for health and wellbeing of city residents more attention has to be paid to these local inhomogeneities.
Background: Both exposure monitoring and exposure prediction have played key roles in assessing individual-level long-term exposure to air pollutants and their associations with human health. While there have been notable advances in exposure prediction methods, improvements in monitoring designs are also necessary, particularly given new monitoring paradigms leveraging low-cost sensors and mobile platforms. Objectives: We aim to provide a conceptual summary of novel monitoring designs for air pollution cohort studies that leverage new paradigms and technologies, to investigate their characteristics in real-world examples, and to offer practical guidance to future studies. Methods: We propose a conceptual summary that focuses on two overarching types of monitoring designs, mobile and non-mobile, as well as their subtypes. We define mobile designs as monitoring from a moving platform, and non-mobile designs as stationary monitoring from permanent or temporary locations. We only consider non-mobile studies with cost-effective sampling devices. Then we discuss similarities and differences across previous studies with respect to spatial and temporal representation, data comparability between design classes, and the data leveraged for model development. Finally, we provide specific suggestions for future monitoring designs. Results: Most mobile and non-mobile monitoring studies selected monitoring sites based on land use instead of residential locations, and deployed monitors over limited time periods. Some studies applied multiple design and/or sub-design classes to the same area, time period, or instrumentation, to allow comparison. Even fewer studies leveraged monitoring data from different designs to improve exposure assessment by capitalizing on different strengths. In order to maximize the benefit of new monitoring technologies, future studies should adopt monitoring designs that prioritize residence-based site selection with comprehensive temporal coverage and leverage data from different designs for model development in the presence of good data compatibility. Discussion: Our conceptual overview provides practical guidance on novel exposure assessment monitoring for epidemiological applications.
Full-text available
China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as -25.88 in Wuhan and -20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities.
Full-text available
As the changing climate expands the extent of arid and semi-arid lands, the number, severity of, and health effects associated with dust events are likely to increase. However, regulatory measurements capable of capturing dust (PM10, particulate matter smaller than 10 µm in diameter) are sparse, sparser than measurements of PM2.5 (PM smaller than 2.5 µm in diameter). Although low-cost sensors could supplement regulatory monitors, as numerous studies have shown for PM2.5 concentration, most of these sensors are not effective at measuring PM10 despite claims by sensor manufacturers. This study focuses on the Salt Lake Valley, adjacent to the Great Salt Lake, which recently reached historic lows exposing 1865 km2 of dry lakebed. It evaluated the field performance of the Plantower PMS 5003, a common low-cost PM sensor, and the Alphasense OPC-N3, a promising candidate for low-cost measurement of PM10, against a federal equivalent method (FEM, beta attenuation) and research measurements (GRIMM aerosol spectrophotometer) at three different locations. During a month-long field study that included five dust events in the Salt Lake Valley with PM10 concentrations reaching 311 µg/m3, the OPC-N3 exhibited strong correlation with FEM PM10 measurements (R2 = 0.865, RMSE = 12.4 µg/m3) and GRIMM (R2= 0.937, RMSE = 17.7 µg/m3). The PMS sensor exhibited poor to moderate correlations (R2<0.49, RMSE = 33–45 µg/m3) with reference/research monitors and severely underestimated the PM10 concentrations (slope <0.099) for PM10. We also evaluated a PM-ratio-based correction method to improve the estimated PM10 concentration from PMS sensors. After applying this method, PMS PM10 concentrations correlated reasonably well with FEM measurements (R2 > 0.63) and GRIMM measurements (R2 > 0.76), and the RMSE decreased to 15–25 µg/m3. Our results suggest that it may be possible to obtain better resolved spatial estimates of PM10 concentration using a combination of PMS sensors (often publicly available in communities) and measurements of PM2.5 and PM10, such as those provided by FEMs, research-grade instrumentation, or the OPC-N3.
Full-text available
Awareness of the human health impacts of exposure to air pollution is growing rapidly. For example, it has become evident that the adverse health effects of air pollution are more pronounced in disadvantaged populations. Policymakers in many jurisdictions have responded to this evidence by enacting initiatives that lead to lower concentrations of air pollutants, such as urban traffic restrictions. In this review, we focus on the interplay between advances in environmental exposure assessment and developments in policy. We highlight recent progress in the granular measurement of air pollutants and individual-level exposures, and how this has enabled focused local policy actions. Finally, we detail an illustrative study designed to link individual-level health-relevant exposures with economic, behavioral, biological, familial, and environmental variables.
Full-text available
Air pollution sensors are quickly proliferating for use in a wide variety of applications, with a low price point that supports use in high-density networks, citizen science, and individual consumer use. This emerging technology motivates the assessment under real-world conditions, including varying pollution levels and environmental conditions. A seven-month, systematic field evaluation of low-cost air pollution sensors was performed in Denver, Colorado, over 2015–2016; the location was chosen to evaluate the sensors in a high-altitude, cool, and dry climate. A suite of particulate matter (PM), ozone (O3), and nitrogen dioxide (NO2) sensors were deployed in triplicate and were collocated with federal equivalent method (FEM) monitors at an urban regulatory site. Sensors were evaluated for their data completeness, correlation with reference monitors, and ability to reproduce trends in pollution data, such as daily concentration values and wind-direction patterns. Most sensors showed high data completeness when data loggers were functioning properly. The sensors displayed a range of correlations with reference instruments, from poor to very high (e.g., hourly-average PM Pearson correlations with reference measurements varied from 0.01 to 0.86). Some sensors showed a change in response to laboratory audits/testing from before the sampling campaign to afterwards, such as Aeroqual, where the O3 response slope changed from about 1.2 to 0.6. Some PM sensors measured wind-direction and time-of-day trends similar to those measured by reference monitors, while others did not. This study showed different results for sensor performance than previous studies performed by the U.S. EPA and others, which could be due to different geographic location, meteorology, and aerosol properties. These results imply that continued field testing is necessary to understand emerging air sensing technology.
Full-text available
Geospatial artificial intelligence (geoAI) is an emerging scientific discipline that combines innovations in spatial science, artificial intelligence methods in machine learning (e.g., deep learning), data mining, and high-performance computing to extract knowledge from spatial big data. In environmental epidemiology, exposure modeling is a commonly used approach to conduct exposure assessment to determine the distribution of exposures in study populations. geoAI technologies provide important advantages for exposure modeling in environmental epidemiology, including the ability to incorporate large amounts of big spatial and temporal data in a variety of formats; computational efficiency; flexibility in algorithms and workflows to accommodate relevant characteristics of spatial (environmental) processes including spatial nonstationarity; and scalability to model other environmental exposures across different geographic areas. The objectives of this commentary are to provide an overview of key concepts surrounding the evolving and interdisciplinary field of geoAI including spatial data science, machine learning, deep learning, and data mining; recent geoAI applications in research; and potential future directions for geoAI in environmental epidemiology.
Full-text available
Air pollution sensors are quickly proliferating for use in a wide variety of applications, with a low price point that supports use in high density networks, citizen science, and individual consumer use. This emerging technology motivates the assessment under real-world conditions, including varying pollution levels and environmental conditions. A seven-month, systematic field evaluation of low-cost air pollution sensors was performed in Denver, Colorado over 2015–2016; the location chosen to evaluate the sensors in a high altitude, cool, and dry climate. A suite of particulate matter (PM), Ozone (O3), and nitrogen dioxide (NO2) sensors were deployed in triplicate, and were collocated with Federal Equivalent Method (FEM) monitors at an urban regulatory site. Sensors were evaluated for their data completeness, correlation with reference monitors, and ability to reproduce trends in pollution data, such as daily concentration values and wind-direction patterns. Most sensors showed high data completeness when data loggers were functioning properly. The sensors displayed a range of correlations with reference instruments, from poor to very high (e.g. hourly-average PM Pearson correlations with reference measurements varied from 0.01 to 0.86). Some sensors showed a change in response to laboratory audits/testing from before the sampling campaign to afterwards, such as the Aeroqual, where the O3 response slope changed from about 1.2 to 0.6. Some PM sensors measured wind-direction and time of day trends similar to those measured by reference monitors, while others did not. This study showed different results for sensor performance than previous studies performed by the U.S. EPA and others, which could be due to different geographic location, meteorology, and aerosol properties. These results imply that continued field testing is necessary to understand emerging air sensing technology.
Due to the rapid development of low cost air quality sensors, a rigorous scientific evaluation has not been conducted for many available sensors. We evaluated three Plantower PMS A003 sensors when exposed to 8 particulate matter (PM) sources (i.e., incense, oleic acid, NaCl, talcum powder, cooking emissions, and monodispersed polystyrene latex spheres under controlled laboratory conditions and also residential air and ambient outdoor air in Baltimore, MD). The PM2.5 sensors exhibited a high degree of precision and R2 values greater than 0.86 for all sources, but the accuracy ranged from 13 to >90% compared to reference instruments. The sensors were most accurate for PM with diameters below 1 µm, and they poorly measured PM in the 2.5-5 µm range. The accuracy of the sensors was dependent on relative humidity (RH), with decreases in accuracy at RH >50%. The sensors were able to produce meaningful data at low and high temperatures and when in motion, as it would be if utilized for outdoor or personal monitoring applications. It was most accurate in environments with polydispersed particle sources and may not be useful in specialized environments or experiments with narrow distributions of PM or aerosols with a large proportion of coarse PM.
Addressing the worsening urban air quality situation in Sub-Saharan Africa (SSA) is proving increasingly difficult owing to paucity of data on air pollution levels and also, lack of local evidence on the magnitude of the associated health effects. There is therefore the urgent need to expand air quality monitoring (AQM) networks in SSA to enable the conduct of high quality epidemiologic studies to help inform policies aimed at addressing air pollution and the associated health effects. In this commentary, I explore the prospects that the proliferation of low-cost sensors in recent times holds for air pollution epidemiologic research in SSA. This commentary is timely because most SSA governments do not see investments in air pollution control that requires assembling a network of sophisticated and prohibitively expensive instrumentation for AQM as necessary for improving and protecting public health. I conclude that, in a region that is bereft of air pollution data, the growing influx of low-cost sensors represents an excellent opportunity for bridging the data gap to inform air pollution control policies and regulations for public health protection. However, it is essential that only the most promising sensor technologies that performs creditably well in the harsh environmental conditions of the region are promoted.
Introduction: Studies of source apportionment (SA) for particulate matter (PM) air pollution have enhanced understanding of dominant pollution sources and quantification of their contribution. Although there have been many SA studies in South Korea over the last two decades, few studies provided an integrated understanding of PM sources nationwide. The aim of this study was to summarize findings of PM SA studies of South Korea and to explore study characteristics. Methods: We selected studies that estimated sources of PM10 and PM2.5 performed for 2000-2017 in South Korea using Positive Matrix Factorization and Chemical Mass Balance. We reclassified the original PM sources identified in each study into seven categories: motor vehicle, secondary aerosol, soil dust, biomass/field burning, combustion/industry, natural source, and others. These seven source categories were summarized by using frequency and contribution across four regions, defined by northwest, west, southeast, and southwest regions, by PM10 and PM2.5. We also computed the population-weighted mean contribution of each source category. In addition, we compared study features including sampling design, sampling and lab analysis methods, chemical components, and the inclusion of Asian dust days. Results: In the 21 selected studies, all six PM10 studies identified motor vehicle, soil dust, and combustion/industry, while all 15 PM2.5 studies identified motor vehicle and soil dust. Different from the frequency, secondary aerosol produced a large contribution to both PM10 and PM2.5. Motor vehicle contributed highly to both, whereas the contribution of combustion/industry was high for PM10. The population-weighted mean contribution was the highest for the motor vehicle and secondary aerosol sources for both PM10 and PM2.5. However, these results were based on different subsets of chemical speciation data collected at a single sampling site, commonly in metropolitan areas, with short overlap and measured by different lab analysis methods. Conclusion: We found that motor vehicle and secondary aerosol were the most common and influential sources for PM in South Korea. Our study, however, suggested a caution to understand SA findings from heterogeneous study features for study designs and input data.
Air quality models are important for studying the impact of air pollutant on health conditions at a fine spatiotemporal scale. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e,g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this paper, we present a data mining approach that utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the concentrations of fine particulate matter less than 2.5μm in aerodynamic diameter at various temporal scales. Our experiment shows that our (domain-) expert-free model could generate accurate PM2.5concentration predictions, which can be used to improve air quality models that traditionally rely on expert-selected input. Our approach also quantifies the impact on air quality from a variety of geographic features (i.e., how various types of geographic features such as parking lots and commercial buildings affect air quality and from what distance) representing mobile, stationary and area natural and anthropogenic air pollution sources. This approach is particularly important for enabling the construction of context-specific spatiotemporal models of air pollution, allowing investigations of the impact of air pollution exposures on sensitive populations such as children with asthma at scale.
Mobile monitoring and passive sampling device (PSD) monitoring are popular air pollutant measurement techniques with complementary strengths and weaknesses. This study investigates the utility of combining data from concurrent two-week mobile monitoring and PSD campaigns in Los Angeles in summer and early spring to identify sources of traffic-related air pollutants (TRAP) and their spatial distributions. There were strong to moderate correlations between mobile and PSD measurements of both NO2 and NOx in summer and spring (Pearson's r between 0.43 and 0.79), suggesting that the two datasets can be reliably combined for source apportionment. PCA identified the major TRAP sources as light duty vehicle emissions, diesel exhaust, crankcase vent emissions, and an independent source of combustion derived ultra-fine particle emissions. The component scores of those four sources at each site were significantly correlated across the two seasons (Pearson's r between 0.58 and 0.79). Spatial maps of absolute principal component scores showed all sources to be most prominent near major roadways and the central business district, and the ultrafine particle source being, in addition, more prominent near the airport. Mobile monitoring combined with fixed-site PSD sampling can provide high spatial resolution estimates of TRAP and can reveal underlying sources of exposure variability.