ArticlePDF Available

Machine Learning Emulation of Spatial Deposition from a Multi-Physics Ensemble of Weather and Atmospheric Transport Models


Abstract and Figures

In the event of an accidental or intentional hazardous material release in the atmosphere, researchers often run physics-based atmospheric transport and dispersion models to predict the extent and variation of the contaminant spread. These predictions are imperfect due to propagated uncertainty from atmospheric model physics (or parameterizations) and weather data initial conditions. Ensembles of simulations can be used to estimate uncertainty, but running large ensembles is often very time consuming and resource intensive, even using large supercomputers. In this paper, we present a machine-learning-based method which can be used to quickly emulate spatial deposition patterns from a multi-physics ensemble of dispersion simulations. We use a hybrid linear and logistic regression method that can predict deposition in more than 100,000 grid cells with as few as fifty training examples. Logistic regression provides probabilistic predictions of the presence or absence of hazardous materials, while linear regression predicts the quantity of hazardous materials. The coefficients of the linear regressions also open avenues of exploration regarding interpretability—the presented model can be used to find which physics schemes are most important over different spatial areas. A single regression prediction is on the order of 10,000 times faster than running a weather and dispersion simulation. However, considering the number of weather and dispersion simulations needed to train the regressions, the speed-up achieved when considering the whole ensemble is about 24 times. Ultimately, this work will allow atmospheric researchers to produce potential contamination scenarios with uncertainty estimates faster than previously possible, aiding public servants and first responders.
Content may be subject to copyright.
Machine Learning Emulation of Spatial Deposition from a
Multi-Physics Ensemble of Weather and Atmospheric
Transport Models
Nipun Gunawardena 1,* , Giuliana Pallotta 1, Matthew Simpson 2and Donald D. Lucas 1,*
Citation: Gunawardena, N.; Pallotta,
G.; Simpson, M.; Lucas, D.D. Machine
Learning Emulation of Spatial
Deposition from a Multi-Physics
Ensemble of Weather and
Atmospheric Transport Models.
Atmosphere 2021,12, 953. https://
Academic Editor: Patrick Armand
Received: 15 June 2021
Accepted: 21 July 2021
Published: 24 July 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
1Lawrence Livermore National Laboratory, Livermore, CA 94550, USA;
2Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093, USA;
*Correspondence: (N.G.); (D.D.L.)
In the event of an accidental or intentional hazardous material release in the atmosphere,
researchers often run physics-based atmospheric transport and dispersion models to predict the
extent and variation of the contaminant spread. These predictions are imperfect due to propagated
uncertainty from atmospheric model physics (or parameterizations) and weather data initial condi-
tions. Ensembles of simulations can be used to estimate uncertainty, but running large ensembles
is often very time consuming and resource intensive, even using large supercomputers. In this
paper, we present a machine-learning-based method which can be used to quickly emulate spatial
deposition patterns from a multi-physics ensemble of dispersion simulations. We use a hybrid linear
and logistic regression method that can predict deposition in more than 100,000 grid cells with as few
as fifty training examples. Logistic regression provides probabilistic predictions of the presence or
absence of hazardous materials, while linear regression predicts the quantity of hazardous materials.
The coefficients of the linear regressions also open avenues of exploration regarding interpretability—
the presented model can be used to find which physics schemes are most important over different
spatial areas. A single regression prediction is on the order of 10,000 times faster than running a
weather and dispersion simulation. However, considering the number of weather and dispersion
simulations needed to train the regressions, the speed-up achieved when considering the whole
ensemble is about 24 times. Ultimately, this work will allow atmospheric researchers to produce
potential contamination scenarios with uncertainty estimates faster than previously possible, aiding
public servants and first responders.
Keywords: deposition; machine learning; hazardous release; WRF; FLEXPART; prediction
1. Introduction
From localized air-pollution caused by fireworks [
], to seasonal changes in pollution
caused by cars [
], to planetary-scale dust transport from earth’s deserts [
], particulate and
gaseous hazardous matter can be dispersed throughout the environment from numerous
natural and anthropogenic processes. One event which is important to public health and
national security is the release of hazardous materials from nuclear weapons explosions,
nuclear reactor breaches (such as Chernobyl or Fukushima), chemical spills, industrial
accidents, and other toxic releases. These types of incidents happen suddenly and without
warning, creating a plume of toxic material in the earth’s atmosphere or ocean which can
threaten the well-being of living organisms and environments.
In such situations, it is crucial that politicians, policy makers, and first responders
have adequate knowledge about how the toxic plume will disperse and deposit throughout
the environment. This can be used to determine evacuation zones and how resources are
deployed to minimize the loss of public health. For example, during the 2011 Fukushima
Daiichi disaster, the United States Department of Energy, the United States Environmental
Atmosphere 2021,12, 953.
Atmosphere 2021,12, 953 2 of 24
Protection Agency, and other United States national agencies worked together to determine
the effect of the radioactive release on international aviation routes, global food supply,
and other crucial aspects of society [4].
To predict how a toxic plume disperses and deposits throughout the environment,
scientists typically run computer simulations. These dispersion simulations solve physical
and chemical equations to produce evolving concentration and deposition fields, but many
of the processes represented in the models are uncertain or not resolved at the scales of
interest. These processes are represented by empirical or semi-empirical parameterizations,
and no single set of parameterizations always performs best for every scenario. Picking
and choosing different sets of parameterizations provides an estimate of uncertainty and is
a necessary component of the prediction process. In addition, many detailed transport and
dispersion models that are currently in use are very computationally expensive, sometimes
requiring several hours to complete a single simulation. Since time is of the essence during
emergencies, these long run-times can be detrimental to first-responder efforts.
Therefore, in the event of a toxic environmental release, the scientists making pre-
dictions with limited computing resources often face a dilemma: using a detailed model,
they can make a small number of predictions quickly but have poor knowledge of the
uncertainty of those predictions, or they can make a large number of predictions slowly
but have better knowledge of the uncertainty of the predictions.
A machine learning or statistical method that emulates a transport and dispersion
model provides the opportunity to minimize this uncertainty versus time dilemma. To do
this, the scientists would vary the inputs to the traditional weather/dispersion model to
create a small number of predictions. They would then train a statistical model to produce
dispersion predictions given the original input values. Finally, the statistical model could
be used to create predictions for the set of inputs that were not originally run with the
traditional dispersion model. That is to say, the statistical model is an emulator of the
original dispersion model.
In this paper, we introduce a statistical method that rapidly predicts spatial deposition
of radioactive materials over a wide area. The deposition predictions we are emulating
were originally produced using material releases in the FLEXible PARTicle dispersion
model (FLEXPART) [
] and meteorological fields generated from the Weather Research
and Forecasting (WRF) [
] model. We created two FLEXPART-WRF ensembles for training
and testing—one simulating a continuous surface release from a hypothetical industrial
accident and one simulating an instantaneous elevated cloud from a hypothetical nuclear
detonation. Each ensemble contained 1196 members with different weather conditions.
(Each ensemble initially contained 1200 runs, but four runs did not complete due to
numerical stability issues.) To create the ensembles, WRF physics parameterizations were
varied (i.e., a multi-physics ensemble) and used as inputs for our statistical model. Multi-
physics WRF ensembles are often used to estimate weather model uncertainty, and our
statistical method is able to capture this uncertainty very efficiently without having to run
the full ensemble. We use a hybrid statistical model consisting of a two-dimensional grid
of linear and logistic regression models for predicting spatial deposition.
The paper is organized as follows: Section 2reviews the literature and the tools used.
Section 3describes the details of the dataset. Section 4describes the statistical algorithm
that is used as the emulator, and Section 5presents the performance of the algorithm.
Finally, Sections 6and 7discuss future work and summarize the current work, respectively.
2. Background
There are many different methods that can be used to predict how an airborne con-
taminant disperses throughout the atmosphere, ranging from simple box models and
Gaussian plumes to more sophisticated Lagrangian and Eulerian transport models [
Gaussian plume models are the simplest and the fastest to run but are often limited to
very specific, idealized scenarios. Lagrangian and Eulerian models are slower but contain
Atmosphere 2021,12, 953 3 of 24
representations of physical and chemical processes typically needed to simulate real-world
releases. One key distinction between Gaussian plume models and Lagrangian/Eulerian
models is that Gaussian plume models do not incorporate spatially and temporally varying
meteorological fields.
We use the FLEXPART Lagrangian dispersion model for calculating the dispersion of
airborne contaminants and estimate the effects of weather uncertainty in the dispersion cal-
culations. To transport materials through the atmosphere and deposit them on the surface,
FLEXPART requires spatially and temporally varying wind and precipitation data, which
can come from archived meteorological forecast/analysis/re-analysis fields or weather
models. For the work presented here, we use a specific version of FLEXPART designed to
work directly with WRF output [
] (FLEXPART-WRF version 3.3). Although FLEXPART
also has several internal physics options, we did not vary these for this work. A detailed
description of our FLEXPART setup is provided in Section 3.
Several researchers have investigated the uncertainty of atmospheric dispersion mod-
els without incorporating machine learning. Leadbetter et al.
classify dispersion model
error into three categories: source term uncertainty, meteorological uncertainty (which we
study here), and intrinsic dispersion model uncertainty. They proceed to rank the uncertain-
ties and find that wind direction and wind speed are important. Korsakissok et al. [9] ran
multiple dispersion ensembles, some hypothetical and some realistic (e.g., the Fukushima
Release) and analyzed the uncertainty. Finally, Sørensen et al.
simulated a nuclear
power plant atmospheric release (similar to our surface release scenario) and presented a
methodology to quantitatively estimate the variability of the ensemble. All studies cited
the need for ensemble dispersion modeling. We focus specifically on uncertainty due to
meteorological modeling.
To calculate winds and estimate weather uncertainty, we use the Weather Research
and Forecasting model (WRF), a tool which is used to predict weather phenomena on scales
of hundreds of meters to thousands of kilometers. A detailed description of WRF is found
in Skamarock et al.
. WRF contains several physics options known as parameterizations
for simulating processes such as cumulus convection, boundary layer turbulence, and land
surface interactions. In our application, we estimate weather uncertainty by using a
multi-physics approach that varies these parameterizations and uses the output to drive
FLEXPART. A detailed description of the WRF setup is in Section 3. We specifically use
WRF code version 3.9 with the advanced research dynamical numerical core.
Several other researchers have investigated WRF multi-physics ensembles. For exam-
ple, researchers produced WRF multi-physics ensembles to investigate precipitation [
heatwaves [
], and climate [
]. In prior work, we have investigated WRF multi-
physics uncertainty to investigate the release from a nuclear power plant [16]. The impor-
tant thing to note is that many of these ensembles have sizes of a few dozen members to a
few hundred members. Our ensemble, having 1200 members, is feasible but significantly
larger than average for a WRF multi-physics ensemble.
Machine learning and statistical methods have frequently been used to emulate com-
plicated atmospheric models. Much of the prior emulation work focused on the potential
speed-up offered, with applications to uncertainty studies, though some discussed ways
to improve sub-grid scale parameterizations. Jensen et al.
and Lucas et al.
machine learning algorithms to accelerate probabilistic inverse modeling studies of at-
mospheric sources. Watson
demonstrated the use of machine learning to improve
long term climate statistics. Calbó et al.
, Mayer et al.
, and Beddows et al.
used polynomial chaos expansions and Gaussian processes to emulate air quality models.
Wang et al.
used a neural network to emulate the planetary boundary layer param-
eterization of WRF. Krasnopolsky et al.
and Pal et al.
demonstrated the use of
machine learning to emulate radiation parameterizations for global atmospheric models.
Lucas and Prinn
, Kelp et al.
, and Ivatt and Evans
used statistical and machine
learning approaches to emulate atmospheric chemistry and transport models. To our
Atmosphere 2021,12, 953 4 of 24
best knowledge, this paper describes the first time a machine learning method is used to
emulate full FLEXPART-WRF spatial deposition maps.
2.2. Linear and Logistic Regression
The two main statistical methods we used were linear regression and logistic regres-
sion. Since these simple methods are fast, easy to train, and readily interpretable, we used
them over other more complex methods that we also investigated. Since linear regression
and logistic regression are basic statistical tools, we will only present a brief overview
here. More information about both methods can be found in many statistics and machine
learning textbooks, such as Murphy
or Gelman and Hill
. Linear regression is
a type of regression used to fit data that have a linear relationship. It can be written as
, where
is the scalar output of the regression,
is an
-dimensional coefficient
indicates the transpose operation, and
is the
-dimensional predictor vector.
The first or last element in
is typically just set to 1 and is there so the fitted hyperplane
has an intercept instead of being forced to pass through the origin. The “linear” in linear
regression only applies to the coefficient vector—the elements of the input vector can be
transformed as desired. Finally, linear regression without regularization is trained in a
non-iterative fashion by minimizing the squared residuals. This contrasts with many other
machine learning algorithms which are trained in an iterative fashion.
Logistic regression is a simple classification method that can be used to classify binary
variables. It can be written as
. Here,
is the probability that the target class has
a value of 1,
is an
-dimensional coefficient vector,
indicates the transpose operation,
is the
-dimensional predictor vector. The function
is also known as the
logistic function or sigmoid function. As with linear regression, logistic regression can have
an intercept term. Unlike linear regression, logistic regression must be trained iteratively,
even if regularization is not used.
3. Dataset
We trained our statistical model on two sets of FLEXPART dispersion simulations.
Both sets release the radioactive particulate material cesium-137 (Cs-137), which has a
half-life of 30.17 years, is highly soluble, and is subject to removal from the atmosphere
by rainout and wet and dry deposition. Both sets of FLEXPART simulations use 1196
different weather conditions generated by a WRF multi-physics ensemble, as described
below. The first set contains the results for a hypothetical continuous release of Cs-137
from the surface of the earth at an industrial facility representing a large-scale radiological
accident. This set of simulations is referred to as the “surface release” case or the “surface”
case. The second set contains simulations of a hypothetical instantaneous release of Cs-137
in the form of a mushroom cloud similar to how contaminants are created from a nuclear
detonation. This set of simulations is referred to here as the “elevated release” case or
“elevated” case. Any mathematical notation from this point forward can be generalized to
either case unless otherwise specified.
Within a case, each ensemble member
consists of an 1
16 input vector
and an
target deposition map
, where
are the number of grid boxes in latitudinal
and longitudinal directions, respectively. (The dimensionality of the input vector
will be
explained later in this section.) The vector
contains the physics parameterizations used
by WRF and is the input to our statistical model. The deposition map
is the output of
and is used as the target data for training our statistical model.
The input vectors are identical between the surface release case and the elevated release
case because they are based on the same WRF ensemble, i.e., xkSurface =xkElevated.
The FLEXPART settings remain constant for every ensemble member within a given
case. Consequently, they are not included as inputs to our statistical model. Each FLEX-
PART simulation was set to begin at 12:00Z on 24 April 2018 and end 48 h later. An adaptive
timestep was used for the sampling rate of the output, but the nominal value was 180 s.
Subgrid terrain effects and turbulence were included, and land-use data were taken from
Atmosphere 2021,12, 953 5 of 24
WRF. Two million Lagrangian particles were released, and the total mass for the surface
and elevated cases was 1 kg and 0.28 g, respectively. We used the default precipitation
scavenging coefficients for Cs-137. Table 1shows the Cs-137 particle size distributions and
masses as a function of altitude for the elevated release case, as further described in Nor-
. Further information about the release scenarios can be found in
Lucas et al. [31]
and Lucas et al. [32].
While the FLEXPART settings of each ensemble member remain constant within the
case, the set of physics options in WRF is different for every ensemble member. We vary the
following five categories of physics parameterizations within WRF: planetary boundary
layer physics (PBL), land surface model (LSM), cumulus physics (CU), microphysics (MP),
and radiation (RA). Any remaining parameterizations or options remain fixed. To run WRF,
one parameterization must be chosen from each physics category. While each category has
several different parameterization options available, yielding well over 100,000 possible
combinations of parameterizations, we selected a subset of 1200 possibilities expected to
simulate the weather, as determined by expert judgment. The ensemble members were
roughly chosen to maximize diversity in physics parameterizations.
In a real-world scenario, these 1200 possibilities would be forecasts, i.e., plausible
scenarios for the time evolution of the weather and plumes over a two-day period given
initial weather conditions that are known at the beginning of the forecast. Therefore, we
assume that each ensemble member is equally likely and do not attempt to “prune" the
ensemble while it is running because it is a short-term forecast. The 1200-member ensemble
therefore provides an estimate of weather model uncertainty in forecasting the deposition
from the hypothetical Cs-137 release events. Because we used data from 2018, we were
able to verify the meteorological forecasts. In work not presented here, we ran simulations
using data assimilation to produce analysis-observational fields. The ensemble simulations
provide a reasonable spread around the nudged fields [
], which gives us confidence
that our machine learning model can perform in realistic scenarios. Furthermore, for our
short-term forecasts of two days, the WRF parameterization uncertainty is expected to
dominate the variability. Very short term forecasts (e.g., 1 h) would not have a lot of
variability, while longer forecasts (e.g., 7 days) have errors dominated by initial conditions,
and the machine learning task would be much more difficult.
Ultimately, we selected five parameterizations for PBL, four for LSM, five for CU,
four for MP, and three for RA. The specific parameterizations are shown in Table 2. This
results in 5
1200 different combinations of the WRF parameterizations.
However, 4 of the 1200 combinations caused numerical issues in WRF, which failed to
run to completion, so there are only 1196 members in the final multi-physics weather
dataset. The 1196 input vectors
are vertically concatenated to create a 1196
16 input
. The 1196 output matrices
are concatenated in the third dimension to make the
1196 output matrix
. The ordering of the parameterization combinations in the
ensemble is shown in Figure 1.
The individual physics parameterizations are nominal categorical variables repre-
sented as numbers in WRF. In other words, the parameterizations are not ordinal—PBL
parameterization 2, which represents the MYJ scheme, is not greater than PBL parameteri-
zation 1, which represents the YSU scheme. To prevent our statistical model from treating
a higher numbered parameterization differently than a lower numbered parameterization,
we transformed the input WRF parameterization vector using one-hot encoding [
]. This
turns the five categorical variables for the parameterizations into sixteen boolean variables,
which is why
has shape 1
16. For example, the LSM parameterization has four options:
LSM 1, LSM 2, LSM 3, and LSM 7. When one-hot encoding, LSM 1 is represented by the
0, 0, 0
, LSM 2 is represented by the vector
1, 0, 0
, LSM 3 is represented by the vector
0, 1, 0
, and LSM 7 is represented by the vector
0, 0, 1
. The vectors for each parametrization
are concatenated together. (For example, the ensemble member run with PBL 2, LSM 1, CU
5, MP 4, and RA 4 has a one-hot encoded input vector [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1].)
Atmosphere 2021,12, 953 6 of 24
The output matrix
consists of 1196 simulations produced by FLEXPART-WRF . Each
ensemble member
is an
map of the surface deposition of Cs-137 from either the
surface release or the elevated release. For the surface release case, each map contains a total
of 160,000 grid cells, with 400 cells in the latitudinal direction and 400 cells in longitudinal
direction using a spatial resolution of about 1.7 km per cell. For the elevated release case,
each map contains 600 grid cells by 600 grid cells with a resolution of about 1.2 km. Both
deposition domains range from
in the latitudinal direction and
in the longitudinal direction. The height of the surface release domain was 3000 m
resolved using 11 vertical layers, and the height of the elevated release domain was 4500 m
resolved using 14 layers. The latitude and longitude of the location of the surface release
, respectively. The latitude and longitude of the location of
the elevated release were
, respectively. This domain is centered
on the southwest corner of the US state North Carolina and has many different land types,
including the Appalachian Mountains and the Atlantic Ocean.
The surface deposition output of FLEXPART-WRF accounts for both wet and dry
removal of Cs-137 from the atmosphere and is reported in units of Bq/m
using a specific
activity for Cs-137 of 3.215 Bq/nanogram. We also filtered out data less than 0.01 Bq/m
our analysis, as deposition values below this level are comparable to background levels
from global fallout [33] and do not pose much risk to public health.
All FLEXPART-WRF runs were completed on Lawrence Livermore National Labo-
ratory’s Quartz supercomputer which has 36 compute cores, 128 GB of RAM per node,
and 3018 nodes total. A single WRF run costs about 150 core-hours, and a single FLEX-
PART run costs about 20 core-hours. The total ensemble cost was about 180,000 core-hours.
The speedup between the full ensemble and the machine learning training set cost depends
on the training size, which is discussed in Section 5. For a training size of 50, the total
cost would be 7500 core-hours, which is a speedup of 24 times (or a savings of 172,500
core-hours). Figures 2and 3show selected examples of ensemble members from the
surface case and elevated case, respectively. The members were chosen to highlight the
diversity of the ensemble. The examples in the figures used PBL, LSM, CU, MP, and RA
parameterization combinations (1, 1, 2, 3, 4) for member 25, (2, 1, 1, 3, 4) for member 245, (2,
3, 5, 3, 4) for member 413, and (7, 7, 10, 3, 4) for member 1157.
Figure 1.
WRF parameterizations were varied as illustrated to create the multi-physics ensemble by
iterating through the schemes in the order PBL, LSM, CU, MP, and RA.
Atmosphere 2021,12, 953 7 of 24
Table 1. Profile for elevated release.
Altitude (m) Mean Diameter (µm) Geometric Standard Deviation Cs-137 Mass (mg)
0 910 1.9 20.7
250 400 1.2 2.45
500 350 1.2 3.24
750 300 1.2 4.05
1000 270 1.2 5.17
1250 220 1.3 6.25
1500 170 1.4 9.67
1750 110 1.6 17.7
2000 52 2.1 51.5
2250 54 2.5 36.9
2500 48 2.4 36.6
2750 40 2.3 34.4
3000 32 2.0 26.1
3250 19 2.0 22.6
Table 2.
WRF parameterizations used to create dataset, referred to here by their standard option number (between
parentheses), name, and corresponding citation.
(1) YSU [34] (1) Thermal Diffusion [35] (1) Kain-Fritsch [36] (2) Lin (Purdue) [37] (1) RRTM [38]
(2) MYJ [39] (2) Noah [40] (2) Betts-Miller-Janjic [39] (3) WSM3 [41] (3) CAM [42]
(4) QNSE [43] (3) RUC [44] (3) Grell-Devenyi [45] (4) WSM5 [41] (4) RRTMG [46]
(5) MYNN2 [47] (7) Pleim-Xu [48] (5) Grell-3 [49] (5) Eta (Ferrier) [50]
(7) ACM2 [51] (10) CuP [52]
Figure 2.
Examples of different deposition maps produced by FLEXPART-WRF for the surface release
case. All values below 0.01 Bq/m
were removed. The WRF parameterizations used to create each
subplot can be found in Section 3.
Atmosphere 2021,12, 953 8 of 24
Figure 3.
Examples of different deposition maps produced by FLEXPART-WRF for the elevated
release case. All values below 0.01 Bq/m
were removed. The WRF parameterizations used to create
each subplot can be found in Section 3.
4. Spatial Prediction Algorithm
The algorithm we use to emulate physics package changes in WRF is straightforward.
A conceptual schematic can be seen in Figure 4. We start by creating an
to represent the prediction of a given FLEXPART-WRF map
. Each grid cell
the combined output of an independent linear regression and logistic regression model.
The inputs to every linear and logistic regression model in the grid are the same: a 1
of one-hot-encoded WRF physics categorical variables, as described in
Section 3
For each grid cell, the associated logistic regression model determines the probability
that the location will experience surface contamination from the hypothetical release
event. If the probability at that location is greater than a pre-determined threshold value,
the corresponding linear regression model determines the magnitude of the deposition.
Mathematically, the value of a given grid cell
is given by Equations (1) and (2). The
terms represent the vector of regression coefficients for the logistic and linear
regression models, respectively. The coefficients in Equation (2) are exponentiated because
the linear regression is trained on the logarithm of the deposition. This linearizes the
deposition values which allows the regression model to be fit; however, the logarithm is
also useful for analyzing the data in general since the deposition values span many orders
of magnitude.
Yi,j,k=(0 if Pi,jpthreshold
i,jxkif Pi,j>pthreshold
The full
1196 dataset
can be split into an
training set
testing set
. The linear regression models are trained on the
logarithm of the deposition values, while the logistic regression models are trained on a
binary indicator determining whether a grid cell has deposition or not.
Atmosphere 2021,12, 953 9 of 24
We implemented our model in Python 3 using Numpy [
]. We used the linear
regression and the logistic regression implementations from Scikit-Learn [
]. The logistic
regression implementation in Scikit-Learn was run with the “liblinear“ solver and L2
regularization with
1.0. L2 regularization is necessary to obtain accurate results and
ensure convergence. With 50 training examples, training regression models for every grid
cell in the domain took approximately 1–1.5 min on a modern desktop computer. Making
predictions for 1146 full maps took 5–6 min on the same computer, but that was achieved
by re-implementing the Scikit-Learn “predict” functions using the Python just-in-time
compiler Numba [
]. At approximately 315 milliseconds per prediction on one core,
the machine learning model offers an approximately two million times speedup for a single
run. Some researchers have found similar speedups using ML on scientific codes [
Large scale experiments where the training and testing cycles had to occur thousands of
times (e.g., determining training size convergence curves) were completed on Lawrence
Livermore National Laboratory’s Quartz Supercomputer and could take up to a few hours.
Figure 4.
Conceptual diagram of the training set and model. Each deposition map produced by
FLEXPART-WRF is a grid of size
. There are
maps in the training set. A single grid cell of
a single deposition map is represented by
and can be approximated by our machine learning
model output,
is produced by the output of a single linear regression model and a single
logistic regression model which is trained on the
data values for grid cell
. Further details of the
model can be found in Section 4.
5. Results and Analysis
To test the effectiveness of our statistical model, we ran a suite of tests and derived
performance statistics from the results. For these tests, we trained and evaluated our
statistical model for eight different training sizes, with 100 runs with varying random
seeds for each training size. The eight different training sizes we used were
750, and
1000 ensemble members.
This corresponds to 2.09%, 4.18%, 6.27%, 8.36%, 20.90%, 41.81%, 62.71%, and 83.61% of
our 1196-member ensemble dataset, respectively. Varying the random seed allowed each
of the 100 runs for a given training size to have different members in the training set,
which allowed us to see how much performance varied by training set member selection.
The members of the test set for a given training size and random seed can be used in the
training set for a different random seed. In other words, for a given training size and
random seed, we had a training set and a testing set, but looking at all the random seeds
for a given training size together was similar to k-fold cross validation. Since we used all
1196 members for this process, we did not have any truly held out test set that was not part
of the 1196-member ensemble.
Figures that do not show training size variability (Figures 57) show the results from
a 50-member training set with the same fixed random seed. The number 50 is somewhat
arbitrary but shows the minimum amount of training examples that produces accurate
Atmosphere 2021,12, 953 10 of 24
predictions. At 50 training examples, the predictions are qualitatively good, and one starts
to see significant overlap between the training and testing performance metric distributions.
Figures 810 all show results from the cross-validation tests.
The following subsections summarize the statistical and numerical performance of the
algorithm. Some subsections present summary statistics, while some subsections present
individual member predictions. In subsections where individual predictions are present,
the training size is also presented.
5.1. Decision Threshold
Before showing summary statistics, it is important to understand how the output of
our model is a probabilistic prediction. Figures 5and 6both have six subplots. The top left
plot shows the true output by FLEXPART-WRF for a selected ensemble member. The top
middle plot shows the probability map produced by the grid of logistic regression models.
The color at each pixel represents the probability that the pixel has a non-zero deposition
value. The areas of this subplot that are not colored are excluded from prediction because
the corresponding grid cells in the training data contain no deposition. The remaining
areas use the combination of logistic and linear regressions for making predictions.
The output of the logistic regression models is used in conjunction with a user-defined
decision threshold value to produce deposition predictions. As determined from the
training data, grid cells with probabilities greater than the threshold are predicted to
have deposition, while those less than it are not. If conservative estimates are desired,
a low threshold value can be used to include low probability, but still likely, areas of
contamination in the prediction. The top-right and entire bottom row of Figures 5and
6show the predictions at different decision thresholds. The decision threshold can also
be thought of as a probability cutoff value. The term “decision threshold” is synonymous
with “decision boundary”, which is referred to in the literature when classifying positive
and negative outcomes [28].
Figure 5.
True FLEXPART-WRF output vs. predicted output at several decision threshold values
for the surface release ensemble member 0 with
50. The WRF parameterization choices for this
ensemble member were PBL 1, LSM 1, CU 1, MP 2, and RA 1. The top middle plot shows the original
decision threshold map.
Atmosphere 2021,12, 953 11 of 24
Figure 6.
True FLEXPART-WRF output vs. predicted output at several decision threshold values for
the elevated release ensemble member 0 with
50. The WRF parameterization choices for this
ensemble member were PBL 1, LSM 1, CU 1, MP 2, and RA 1. The top middle plot shows the original
decision threshold map.
Through a qualitative assessment, we determined that a decision threshold of 0.5 ap-
pears to be optimal. With values larger than 0.5, the plume shape starts becoming distorted
and leaves important sections out. With values less than 0.5, noisy values at the edges of
the plume are included, which are typically not accurate. These noisy values occur in grid
cells where there are not many examples of deposition in the training data, and they are
eliminated as more examples are included when the training size increases (see
Section 5.2
These values can be seen in the bottom left subplot of Figure 5on the northern edge of
the plume. Anomalously large prediction values skew the performance statistics and are
removed from the metrics if they exceed the maximum deposition value present in the
training examples.
5.2. Training Size Variability
As with all statistical methods, the size of the training set affects the model perfor-
mance. Figure 7shows the plume prediction for ensemble member 296 as the training
size increases. The members of the training set at a given size are also all included in the
training set at the next largest size (i.e., the
50 training set is a proper subset of the
75 training set). The decision threshold is set to 0.5 for each training size. It is evident
from the figure that as the training size increases, the deposition values and the plume
boundary become less noisy. A quantitative assessment of how the predictions change
with increasing training size is shown in Figures 8and 9for the surface case and elevated
case, respectively.
These two figures show different statistical measures for predicting the members of
training and testing sets as a function of training size. Because the selection of members
is random and can affect the prediction performance, the experiment is repeated 100
times using different random seeds. Therefore, each “violin” in the plots displays the
statistical variation stemming from member selection differences. For a given training size
, the orange training distributions are estimated from
100 predictions, while the blue
test distributions are derived from (1196 n)×100 predictions.
The following error metrics are used to assess the predictive performance of the
regression system. Two of the metrics target the logistic regressions (figure of merit in
space and accuracy), three are for the linear regressions (fraction within a factor of 5,
R, and fractional bias), and an aggregated metric (rank) is used to gauge the overall
performance. Many other metrics are available to judge regression and classification
Atmosphere 2021,12, 953 12 of 24
performance (e.g., mean squared error, F1), but we wanted to use metrics that were
commonly used in the atmospheric science community [57,58].
Figure of Merit in Space (FMS): A spatial error metric which is defined as the inter-
section of the area of the predicted and actual plumes divided by the union of area
of the predicted and actual plumes [
]. Outside of atmospheric science this is also
known as the Jaccard index. This metric depends only on the absence or presence of
deposition, not the magnitude, and so directly assesses the logistic regressions. This
metric varies between 0 and 1, and values of 0.8 and above are generally considered
good for atmospheric models.
Fraction within a Factor 5 (FAC5): The fraction of the predicted values within a factor
of 5 of the actual values is an effective metric for assessing the linear regressions. Gen-
eralized, this is defined as
FACX=Fraction of data that satisfy 1
XPredicted Value
Actual Value
]. We present the FAC5 value for the intersection of the predicted and actual
plume locations. This metric can range from 0 to 1, with values above 0.85 generally
being considered good values for atmospheric models.
Pearson’s R: Pearson’s correlation coefficient
measures the linear relationship be-
tween the predicted and actual magnitudes of deposition in a log space. We present
R calculated for the natural log of the intersection of the predicted and actual plume
locations. This metric can range from
1 to 1, with values further away from 0 being
good. (A Pearson’s R value of
1 implies perfect anticorrelation, which is still useful.)
Accuracy: This is the standard classification accuracy as explained in Swets
. It is
defined as the ratio of the sum of true positives and true negatives to all classifications
considered (i.e.,
). In a logistic regression case such as ours, a grid cell is
classified as positive or negative by whether it has deposition above or below a certain
threshold value. As described in Section 3, a deposition threshold of 0.01 Bq/m
used. This metric can range from 0 to 1, with values closer to 1 being considered good.
Fractional Bias (FB): Fractional bias is defined as
, where
are the mean observed values and mean predicted values, respectively. It is a
normalized indicator of model bias and essentially describes the difference in means
in terms of the average mean. This metric ranges from
2 to +2 and has an ideal
value of 0. In Figures 8and 9, the fractional bias plot has a different shape from all the
others. One reason for this is the fractional bias range is different, and the ideal value
is in the center of the range. However, even if the absolute value of the fractional
bias was taken, the shape would still be different. In this case, as the training set
size increases, the fractional bias statistic converges to the inherent bias that exists
between our statistical model and FLEXPART-WRF , just as the others do. However,
in this case, it is shown that the training size that produces the least fractional bias
50. This does not mean that
50 is the best sample size overall, as other
metrics improve with increasing sample size. Like the other metrics, the fractional
bias training and test curves converge with increasing training size, though they seem
to converge much faster than the others.
Rank: Described in Maurer et al.
, the rank score is a metric that combines several
statistics into a single number used to assess the overall performance of an atmo-
spheric model. It is defined as
Rank =R2+ (
2) + FAC5 +Accuracy
is the
coefficient of determination, and FB is the fractional bias. Each term in the equation
ranges from 0 to 1, with 1 being best, which means the rank score ranges from 0 to 4,
with 4 being best. The models studied in Maurer et al.
had a mean rank score of
2.06, which means our model looks very good in comparison. However, the models
studied were time series models applied to individual stations, so they cannot be
directly compared to our model.
In both the surface and elevated release cases, increasing the training size leads to,
on average, an increase in performance on the test set and a decrease in performance on
the training set. Nevertheless, as expected, the training set performance is better than the
Atmosphere 2021,12, 953 13 of 24
testing set performance. There is no immediately distinguishable difference in performance
between the surface case and the elevated case; on some metrics the surface case performs
better and on others the elevated case performs better. However, the distribution of error
metrics for the elevated case is often bimodal, whereas the surface case is more unimodal.
This makes intuitive sense since the elevated case often has two separate deposition patterns
with different shapes, while the surface case typically only has one large pattern.
Figures 7and 8highlight one of the most important conclusions from this work. Very
few training samples are needed to make reasonable predictions. Even a prediction using
50 training samples, or 50
4.18% of the total dataset, is capable of accurately
predicting deposition values in over 100,000 grid cells. Because there is significant overlap
between the training and test distributions in Figure 8, these predictions are also robust to
the 50 training samples selected from the full set.
Figure 7.
Spatial prediction for ensemble member 296 as the training set size increases. The true
FLEXPART-WRF output is the top left subplot. The samples in the training set are randomly selected
and the
value is 0.5. Ensemble member 296 was in the test set for all training sizes. The WRF
parameterization choices for this ensemble member were PBL 2, LSM 1, CU 5, MP 4, and RA 4.
Atmosphere 2021,12, 953 14 of 24
Figure 8.
Spread of error metrics for all members for several different training sizes and 100 different
random seeds for the surface release case. Training and test distributions are in blue and orange,
respectively. A description of the metrics is provided in Section 5.2. Within the distributions,
the dashed lines indicate the quartiles, the solid line is the mean, and the corresponding vertical bars
are the standard deviations.
Atmosphere 2021,12, 953 15 of 24
Figure 9. Same as Figure 8, except for the elevated release case.
5.3. Predictability of Individual Ensemble Members
The previous subsection described how training size affected the statistical model
performance for the entire ensemble. In this section, we show how the predictions vary
with training size for selected individual ensemble members. The purpose of this test is to
show that some FLEXPART-WRF members are easier to predict than others, regardless of
the amount of training members. Figure 10 shows the mean Pearson’s R score by training
size and member number for the surface release case for selected members of the test set.
The members are selected by their decile average performance. We only show the members
that are closest to the decile average performance because showing all 1196 members
results in a visualization that is difficult to read.
For example, take the square marked by “60% (132)” on the x-axis and “250” on
the y-axis. This square represents the mean Pearson’s R score for member 132 calculated
from every statistical model (out of 100) where member 132 was contained in the test set.
Atmosphere 2021,12, 953 16 of 24
Member 132 is the member that is closest to the 60th percentile mean Pearson’s R score
averaged over all training sizes.
As already demonstrated, the general performance of the model increases as the
training set size increases; however, the relative individual performance does not generally
change. Part of this can be explained statistically. Our statistical model essentially fits a
hyperplane in the WRF-parameter/deposition space. A hyperplane is one of the simplest
possible models, and there is noise in the dataset. Some data points will be far away from
the hyperplane, and increasing the training size does not move the hyperplane enough to
successfully fit those points. This highlights the importance of the fact that physics-based
modeling-machine learning is not able to capture all of the variation present in the dataset,
even with very large training sizes. While we analyzed the WRF inputs associated with
well and poorly performing members, we found no consistent pattern associated with
poor predictions and WRF parameterizations. Hypothetically, if there was a relationship
between WRF inputs and poorly performing members, the information could be used by
WRF developers to improve accuracy for certain parameterizations. This figure also shows
that low amounts of training data start producing accurate predictions. A similar analysis
can be done for the elevated case but is not included here.
Figure 10.
Mean Pearson R by training size and selected ensemble member. Some members are easier
than others to predict regardless of the training size. Only instances where the ensemble member
was included in the test set are used for calculations. The members were selected to be closest to the
overall decile performance.
5.4. Ensemble Probability of Exceedance
One of the main goals of emulating Cs-137 spatial deposition is to account for the vari-
ability in the ensemble from weather uncertainty, so we use probability of exceedance plots
to compare the variability of the predicted and true ensemble in Figure 11. The topmost
and center subplots of Figure 11 show the percentage of members in the ensemble that
have deposition values that exceed a threshold of 0.01 Bq/m
at every location. For ex-
ample, if 598 ensemble members have deposition above 0.01 Bq/m
at grid cell (200, 200),
the percentage for that cell is 598
50%. Yellow colors indicate areas where many,
if not all ensemble members report above-threshold deposition values. Dark purple col-
ors indicate areas where very few ensemble members report above-threshold deposition
values. Generally, the probability of exceedance drops as one moves further away from
Atmosphere 2021,12, 953 17 of 24
the release location. The predictions are based on 50 training samples, and both ensembles
used for this plot contain all 1196 members, meaning the training and testing predictions
are included for the predicted percentages.
Figure 11.
Percentage of members in the true (
) and predicted (
) ensembles that have
deposition that exceeds 0.01 Bq/m
at each location. The bottom-most plot shows the difference
between the two.
Atmosphere 2021,12, 953 18 of 24
The topmost subplot shows the probability of exceedance of the true ensemble. As ex-
pected, the outside of the shape is made up of low percentage grid cells, as only outlier
plumes make up those locations. The center subplot shows the probability of exceedance
of the predicted ensemble. The predicted probability of exceedance takes up less area than
the true ensemble because the outliers around the edge are challenging for the regressions
to predict.
Despite the vast differences in computational resources needed to produce them,
the probability of exceedance in the true and predicted ensembles appears similar. To high-
light the differences, we created the bottom-most subplot of Figure 11, which shows the
difference between the true ensemble percentages and the predicted ensemble percentages.
Positive values, in teal, show areas where the population of members in the true ensemble
is higher than the predicted ensemble. Negative values, in brown, show areas with higher
predicted population than true population. Comparing the plot to Figure 12, one notices
that the boundary between brown and teal happens approximately where the number of
samples per pixel drops below 17, which is where the linear regression becomes underde-
fined. The conclusion we have drawn is that the regressions tend to overpredict values
where there are sufficient samples (with some exceptions, such as in the center right of the
plot) and underpredict where there are not sufficient samples.
5.5. Spatial Coefficient Analysis
One advantage our regression method holds over other machine learning models
is the potential for interpretability. In this subsection we highlight one aspect of this
interpretability. Our predictions are made using thousands of individual regression mod-
els, each of which has coefficients that transform the WRF parameterization input variables
into a deposition value. In traditional regression approaches with non-categorical inputs,
the units of all the input variables can be standardized so that the magnitude of a coefficient
is related to the effect of its corresponding variable. That is, the larger the value of a
coefficient, the more important the corresponding predictor is to the output. However, our
WRF variables are one-hot-encoded as binary inputs, so determining their importance is
not as straightforward as standard regression. Each of the regression models in our method
has seventeen input terms—one for the intercept and sixteen binary encoded variables that
represent five different WRF physics parameterizations. Out of these sixteen non-intercept
coefficients, the first four represent the five PBL schemes, the next three represent the four
LSM schemes, the next four represent the five CU schemes, the next three represent the
four MP schemes, and the final two coefficients represent the three RA schemes. Taking the
mean of the absolute value of a WRF physics package’s coefficients gives an estimate of the
importance of that variable. In other words,
represents the importance of PBL,
i=5|βi|represents the importance of LSM, and so on.
Once the mean coefficient magnitudes are calculated, the argmax is used to find the
WRF parameterization which is most important at a given grid cell. These results can
be plotted to see which parameterizations are most important for a given area, as seen
in Figure 12 for the surface release case. Figure 12 was created using models trained on
50 ensemble members and only includes grid cells that have greater than 17 samples.
The intercept is not considered when determining importance. It is important to remember
that with our process, the “most important variable” is not the same as “only important
variable.” Combinations of WRF parameterization changes can be important, resulting
in the many coefficients that have a similar mean magnitude. In other words, the second
most important WRF parameterization can still be very important because it has a mean
coefficient magnitude slightly smaller than the most important WRF parameterization.
Regardless, this analysis provides an interesting consequence of using regression models
to interpret WRF physics.
Figure 12 shows that PBL variations tend to dominate other WRF parameterizations,
as captured by the large areas in beige. This result is not surprising, as changing the PBL
scheme in WRF is known to greatly influence atmospheric turbulence and mixing near the
Atmosphere 2021,12, 953 19 of 24
surface. The variable importance map also shows other interesting features, including the
red areas highlighting the relatively elevated importance of cumulus convection variations
over coastal and mountainous areas where precipitation occurs during the release events.
Similarly, magenta areas where microphysics is important occur near areas where cumulus
convection is also important, which is consistent with the correlation of these physical
processes in the model. The overall spatial complexity in Figure 12 illustrates one final
critical point. No single WRF parameterization is most important everywhere, so multi-
physics WRF ensembles that vary a range of physical parameterizations are needed to
capture weather model uncertainty.
Figure 12.
Primary WRF parameterization changes associated with Cs-137 deposition are color-coded
and shown for every grid cell. Areas with 17 or fewer samples are excluded. The total training
dataset included 50 ensemble members. In the legend, PBL stands for the planetary boundary layer
physics parameterization (tan), LSM stands for land surface model (green), CU stands for cumulus
physics (red), MP stands for microphysics (magenta), and RA stands for radiation (cyan).
6. Future Work
The regression prediction method we have described has some drawbacks and un-
knowns, which means there are several avenues for further exploration. The most signif-
icant drawback is that it does not exploit spatial correlations of nearby locations in the
domain. Since each grid cell is treated as completely independent from the other grid cells,
spatial correlations are not used to improve the overall prediction. This means that any
predicted plume is limited to the envelope of the union of all of the training plumes, as our
model cannot predict in areas that do not have any training data. However, this trait can be
viewed as a positive feature of our algorithm; it will not poorly extrapolate in areas where
there are no training data. To overcome this problem, spatial data can be incorporated into
the overall model. Including spatial correlation information in our model may lead to a
more parsimonious model or one that produces improved predictions. Including spatial
Atmosphere 2021,12, 953 20 of 24
correlations can also potentially be done using dimensional reduction techniques such as
PCA or autoencoders. For example, the model we describe could be used to produce an
initial map, and then an alternate model based off radial basis functions, multitask learning,
or even linear regression can be used to refine it.
Another drawback is the subjective nature of picking a decision threshold
in the logistic regression component. We used a value of 0.5 for all the calculations
presented here, which is a reasonable value to use, but that is the result of qualitative
analysis. Implementing an optimization routine to determine the best
to use
would increase the objectivity and may improve the performance of our model. The tuned
threshold could also be applied at a grid-cell level, which may increase performance in the
boundary regions.
As mentioned in Section 5.1, we remove outlier deposition values which are predicted
to be larger than any deposition value present in the training set. This is a simple way to
remove outliers and is easily implemented operationally. However, it is a naive outlier
removal method. A more complex outlier removal method may be beneficial to help
differentiate false extreme values from true extreme values, the latter of which can pose a
large risk to public health.
When we create training sets for our method we sample randomly from the entire
population of predictions. By using methods from adaptive sampling, it may be possible to
dynamically produce a training set that is more representative of the population than a
random sample, leading to higher performance for the trained model with fewer expensive
computer simulations. In an emergency situation, this would be very useful.
The individual models that predict hazardous deposition in each grid cell do not
necessarily have to be linear or logistic regression models. They can be produced by other
regression and classification models such as random forests or artificial neural networks.
The biggest hurdle in implementing more complex grid cell-level models is the training
time. During our testing on a desktop computer, the training time for a single grid cell
took between 1 and 10 milliseconds, and training a full spatial map was on the order of
minutes. Changing to a more complicated model could potentially increase training time
by an order of magnitude.
Finally, this regression method should be tested with more FLEXPART-WRF simu-
lations. It should be tested with different hazardous releases in different locations from
FLEXPART-WRF , but it could also be tested on completely different physical models. More
terms could also be added to the regression model to account for larger initial condition
errors present in longer forecast simulations. There is nothing about our method that is in-
herently specific to FLEXPART-WRF , and we think this method could work for simulations
that are unrelated to deposition.
7. Conclusions
In this paper, we presented a statistical method that can be used to quickly emulate
complex, spatially varying radiological deposition patterns produced by the meteorolog-
ical and dispersion tools WRF and FLEXPART. FLEXPART-WRF is slow to run, and a
single simulation from it may have significant uncertainty due to model imperfections.
To estimate uncertainty, researchers can run FLEXPART-WRF hundreds of times by vary-
ing representations of physical processes in the models, but that can take crucial hours.
Instead of running FLEXPART-WRF hundreds of times, researchers can run it dozens of
times, use the results to train our emulator, and then use the emulator to produce the
remaining results.
Our emulator is represented by an
grid where the value at each grid cell is
determined by the output of independent linear regression and logistic regression models.
The logistic regression determines whether hazardous deposition is present at that location,
and the linear regression determines the magnitude of the deposition. Since all the grid cells
are independent from one another, our model can accurately predict subsets of locations.
Atmosphere 2021,12, 953 21 of 24
We used two datasets for training, testing, and predicting. One was a simulated
continuous surface contaminant release representing a large-scale industrial accident, and
the other was a simulated instantaneous elevated contaminant release from a hypothetical
nuclear explosion. For each of the two cases, there were 1196 different simulations, all
representing variations in the WRF parameterizations. The WRF parameterizations were
treated as categorical variables that were binary encoded and used as the inputs to the
linear and logistic regression models used in our emulator.
We conducted several tests to evaluate the performance of our emulator. We found that
the emulator performs well, even with only 50 samples out of the 1196-member population.
While the deposition patterns have variance, they are not drastically different shapes,
which is why 50 samples is sufficient to make reasonable predictions. This is promising
since in an emergency situation, the amount of computationally expensive runs should be
minimized. As with many machine learning models, the prediction performance on the
test set increases with increasing training size. We also found that for each case there are
some members that perform better than others, regardless of the training size.
In general, we think that the emulator that we have presented here is successful in
predicting complex spatial patterns produced by FLEXPART-WRF with relatively few
training samples. We think there are several areas that can be explored to improve our
emulator, and we hope to complete some of them in the future.
Author Contributions:
N.G. contributed to statistical model preparation and analysis, visualization,
and draft writing and editing. G.P. contributed to data analysis and validation. M.S. contributed
to conceptualization, methodology, data creation, funding acquisition, and validation. D.D.L. con-
tributed to statistical model analysis, data creation, draft writing and editing, validation, and project
administration. All authors have read and agreed to the published version of the manuscript.
This work was performed under the auspices of the U.S. Department of Energy by
Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. It was funded
by LDRD 17-ERD-045. Released under LLNL-JRNL-808577.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable
Data Availability Statement:
The surface release data presented in this study are openly available
at, accessed on 22 July 2021. The elevated release data are
available upon request.
Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript;
or in the decision to publish the results.
The following is a table of the notation used in the document. All notation is generalizable to
the surface release case and the elevated release case.
Symbol Meaning
xGeneral single instance of predictor vector
xkSpecific single instance of predictor vector
XComplete set of input vectors
XTrain Training set of input vectors
XTest Testing set of input vectors
YComplete set of target matrices
YkSpecific single instance of target matrix
Yi,j,kSpecific location of single instance of target matrix
YTrain Training set of target matrices
YTest Testing set of target matrices
YComplete set of estimation matrices
YkSpecific single instance of estimation matrix
Atmosphere 2021,12, 953 22 of 24
Yi,j,kSpecific location of single instance of estimation matrix
MNumber of rows in Yor ˆ
NNumber of columns in Yor ˆ
nSize of training set
αi,jCoefficients for the logistic regression model at a location
βi,jCoefficients for the linear regression model at a location
Pi,jProbability of non-zero deposition at a location
pthreshold Decision threshold to be applied to prediction
Moreno, T.; Querol, X.; Alastuey, A.; Minguillón, M.C.; Pey, J.; Rodriguez, S.; Miró, J.V.; Felis, C.; Gibbons, W. Recreational
atmospheric pollution episodes: Inhalable metalliferous particles from firework displays. Atmos. Environ. 2007,41, 913–922.
Styer, P.; McMillan, N.; Gao, F.; Davis, J.; Sacks, J. Effect of outdoor airborne particulate matter on daily death counts. Environ.
Health Perspect. 1995,103, 490–497.
Griffin, D.W.; Kellogg, C.A. Dust storms and their impact on ocean and human health: Dust in Earth’s atmosphere. EcoHealth
2004,1, 284–295.
Bader, J.A. Dealing with Multiple Disasters in Japan. In Obama and China’s Rise: An Insider’s Account of America’s Asia Strategy;
Brookings Institution Press: Washington, DC, USA, 2012; pp. 130–139.
Brioude, J.; Arnold, D.; Stohl, A.; Cassiani, M.; Morton, D.; Seibert, P.; Angevine, W.; Evan, S.; Dingwell, A.; Fast, J.D.; et al. The
Lagrangian particle dispersion model FLEXPART-WRF version 3.1. Geosci. Model Dev. 2013,6, 1889–1904.
Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Barker, D.M.; Wang, W.; Powers, J.G. A Description of the Advanced Research
WRF Version 3; NCAR Technical note-475+ STR; University Corporation for Atmospheric Research: Boulder, CO, USA, 2008.
Hutchinson, M.; Oh, H.; Chen, W.H. A review of source term estimation methods for atmospheric dispersion events using static
or mobile sensors. Inf. Fusion 2017,36, 130–148.
Leadbetter, S.; Andronopoulos, S.; Bedwell, P.; Chevalier-Jabet, K.; Geertsema, G.; Gering, F.; Hamburger, T.; Jones, A.; Klein, H.;
Korsakissok, I.; et al. Ranking uncertainties in atmospheric dispersion modelling following the accidental release of radioactive
material. Radioprotection 2020,55, S51–S55, doi:10.1051/radiopro/2020012.
Korsakissok, I.; Périllat, R.; Andronopoulos, S.; Bedwell, P.; Berge, E.; Charnock, T.; Geertsema, G.; Gering, F.; Hamburger, T.;
Klein, H.; et al. Uncertainty propagation in atmospheric dispersion models for radiological emergencies in the pre- and early
release phase: Summary of case studies. Radioprotection 2020,55, S57–S68, doi:10.1051/radiopro/2020013.
Sørensen, J.H.; Bartnicki, J.; Buhr, A.; Feddersen, H.; Hoe, S.; Israelson, C.; Klein, H.; Lauritzen, B.; Lindgren, J.; Schönfeldt,
F.; et al. Uncertainties in atmospheric dispersion modelling during nuclear accidents. J. Environ. Radioact.
,222, 106356,
Kirthiga, S.; Narasimhan, B.; Balaji, C. A multi-physics ensemble approach for short-term precipitation forecasts at convective
permitting scales based on sensitivity experiments over southern parts of peninsular India. J. Earth Syst. Sci. 2021,130, 1–29.
Imran, H.M.; Kala, J.; Ng, A.; Muthukumaran, S. An evaluation of the performance of a WRF multi-physics ensemble for
heatwave events over the city of Melbourne in southeast Australia. Clim. Dyn. 2018,50, 2553–2586.
Stegehuis, A.I.; Vautard, R.; Ciais, P.; Teuling, A.J.; Miralles, D.G.; Wild, M. An observation-constrained multi-physics WRF
ensemble for simulating European mega heat waves. Geosci. Model Dev. 2015,8, 2285–2298.
Katragkou, E.; García-Díez, M.; Vautard, R.; Sobolowski, S.; Zanis, P.; Alexandri, G.; Cardoso, R.M.; Colette, A.; Fernandez, J.;
Gobiet, A.; et al. Regional climate hindcast simulations within EURO-CORDEX: Evaluation of a WRF multi-physics ensemble.
Geosci. Model Dev. 2015,8, 603–618.
Lavin-Gullon, A.; Fernandez, J.; Bastin, S.; Cardoso, R.M.; Fita, L.; Giannaros, T.M.; Goergen, K.; Gutiérrez, J.M.; Kartsios, S.;
Katragkou, E.; et al. Internal variability versus multi-physics uncertainty in a regional climate model. Int. J. Climatol.
41, E656–E671.
Lucas, D.D.; Simpson, M.; Cameron-Smith, P.; Baskett, R.L. Bayesian inverse modeling of the atmospheric transport and emissions
of a controlled tracer release from a nuclear power plant. Atmos. Chem. Phys. 2017,17, 13521–13543.
Jensen, D.D.; Lucas, D.D.; Lundquist, K.A.; Glascoe, L.G. Sensitivity of a Bayesian source-term estimation model to spatiotemporal
sensor resolution. Atmos. Environ. X 2019,3, 100045.
Watson, P.A. Applying machine learning to improve simulations of a chaotic dynamical system using empirical error correction.
J. Adv. Model. Earth Syst. 2019,11, 1402–1417.
Calbó, J.; Pan, W.; Webster, M.; Prinn, R.G.; McRae, G.J. Parameterization of urban subgrid scale processes in global atmospheric
chemistry models. J. Geophys. Res. Atmos. 1998,103, 3437–3451.
Mayer, M.; Wang, C.; Webster, M.; Prinn, R.G. Linking local air pollution to global chemistry and climate. J. Geophys. Res. Atmos.
2000,105, 22869–22896.
Beddows, A.V.; Kitwiroon, N.; Williams, M.L.; Beevers, S.D. Emulation and Sensitivity Analysis of the Community Multiscale Air
Quality Model for a UK Ozone Pollution Episode. Environ. Sci. Technol. 2017,51, 6229–6236, doi:10.1021/acs.est.6b05873.
Wang, J.; Balaprakash, P.; Kotamarthi, R. Fast domain-aware neural network emulation of a planetary boundary layer parameteri-
zation in a numerical weather forecast model. Geosci. Model Dev. 2019,12, 4261–4274.
Atmosphere 2021,12, 953 23 of 24
Krasnopolsky, V.M.; Fox-Rabinovitz, M.S.; Chalikov, D.V. New approach to calculation of atmospheric model physics: Accurate
and fast neural network emulation of longwave radiation in a climate model. Mon. Weather Rev. 2005,133, 1370–1383.
Pal, A.; Mahajan, S.; Norman, M.R. Using deep neural networks as cost-effective surrogate models for super-parameterized
E3SM radiative transfer. Geophys. Res. Lett. 2019,46, 6069–6079.
Lucas, D.; Prinn, R. Parametric sensitivity and uncertainty analysis of dimethylsulfide oxidation in the clear-sky remote marine
boundary layer. Atmos. Chem. Phys. 2005,5, 1505–1525.
Kelp, M.M.; Tessum, C.W.; Marshall, J.D. Orders-of-magnitude speedup in atmospheric chemistry modeling through neural
network-based emulation. arXiv 2018, arXiv:1808.03874.
Ivatt, P.D.; Evans, M.J. Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression
trees. Atmos. Chem. Phys. 2020,20, 8063–8082.
28. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012.
Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; Cambridge University Press: Cambridge, UK,
Norment, H.G. DELFIC: Department of Defense Fallout Prediction System. Volume I-Fundamentals; Final Report 16 Jan–31 Dec 79;
Atmospheric Science Associates: Bedford, MA, USA, 1979.
Lucas, D.D.; Pallotta, G.; Simpson, M.D. Using Machine Learning to Intelligently Select Members of Large Atmospheric Model
Ensembles. In Proceedings of the AGU Fall Meeting Abstracts, Washington, DC, USA, 10–14 December 2018; Volume 2018,
p. GC43J-1663.
Lucas, D.D.; Simpson, M.; Pallotta, G. Probabilistic Predictions and Uncertainty Estimation Using Adaptively Designed Ensembles
for Radiological Plume Modeling. In Proceedings of the CTBT Science and Technology 2019 Conference, Vienna, Austria, 24–28
June 2019.
Aoyama, M. Long-range transport of radiocaesium derived from global fallout and the Fukushima accident in the Pacific Ocean
since 1953 through 2017—Part I: Source term and surface transport. J. Radioanal. Nucl. Chem. 2018,318, 1519–1542.
Hong, S.Y.; Noh, Y.; Dudhia, J. A new vertical diffusion package with an explicit treatment of entrainment processes. Mon.
Weather Rev. 2006,134, 2318–2341.
Dudhia, J. A multi-layer soil temperature model for MM5. In Proceedings of the Sixth PSU/NCAR Mesoscale Model Users’
Workshop, Boulder, CO, USA, 22–24 July 1996; pp. 22–24;
36. Kain, J.S. The Kain–Fritsch convective parameterization: An update. J. Appl. Meteorol. 2004,43, 170–181.
Chen, S.H.; Sun, W.Y. A One-dimensional Time Dependent Cloud Model. J. Meteorol. Soc. Jpn. Ser. II
,80, 99–118,
Dudhia, J. Numerical study of convection observed during the winter monsoon experiment using a mesoscale two-dimensional
model. J. Atmos. Sci. 1989,46, 3077–3107.
Janji´c, Z.I. The step-mountain eta coordinate model: Further developments of the convection, viscous sublayer, and turbulence
closure schemes. Mon. Weather Rev. 1994,122, 927–945.
Tewari, M.; Chen, F.; Wang, W.; Dudhia, J.; LeMone, M.; Mitchell, K.; Ek, M.; Gayno, G.; Wegiel, J.; Cuenca, R. Implementation and
verification of the unified NOAH land surface model in the WRF model. In 20th Conference on Weather Analysis and Forecasting/16th
Conference on Numerical Weather Prediction; American Meteorological Society: Seattle, WA, USA, 2004; Volume 1115, pp. 2165–2170.
Hong, S.Y.; Dudhia, J.; Chen, S.H. A revised approach to ice microphysical processes for the bulk parameterization of clouds and
precipitation. Mon. Weather Rev. 2004,132, 103–120.
Collins, W.D.; Rasch, P.J.; Boville, B.A.; Hack, J.J.; McCaa, J.R.; Williamson, D.L.; Kiehl, J.T.; Briegleb, B.; Bitz, C.; Lin, S.J.; et al.
Description of the NCAR community atmosphere model (CAM 3.0). NCAR Tech. Note NCAR/TN-464+ STR
,226, 1326–1334.
Sukoriansky, S.; Galperin, B.; Perov, V. Application of a new spectral theory of stably stratified turbulence to the atmospheric
boundary layer over sea ice. Bound.-Layer Meteorol. 2005,117, 231–257.
Benjamin, S.G.; Grell, G.A.; Brown, J.M.; Smirnova, T.G.; Bleck, R. Mesoscale weather prediction with the RUC hybrid isentropic–
terrain-following coordinate model. Mon. Weather Rev. 2004,132, 473–494.
Grell, G.A.; Freitas, S.R.; others. A scale and aerosol aware stochastic convective parameterization for weather and air quality
modeling. Atmos. Chem. Phys. 2014,14, 5233–5250.
Iacono, M.J.; Delamere, J.S.; Mlawer, E.J.; Shephard, M.W.; Clough, S.A.; Collins, W.D. Radiative forcing by long-lived greenhouse
gases: Calculations with the AER radiative transfer models. J. Geophys. Res. Atmos. 2008,113, doi:10.1029/2008JD009944.
Nakanishi, M.; Niino, H. An improved Mellor–Yamada level-3 model: Its numerical stability and application to a regional
prediction of advection fog. Bound.-Layer Meteorol. 2006,119, 397–407.
Gilliam, R.C.; Pleim, J.E. Performance assessment of new land surface and planetary boundary layer physics in the WRF-ARW. J.
Appl. Meteorol. Climatol. 2010,49, 760–774.
Grell, G.A.; Dévényi, D. A generalized approach to parameterizing convection combining ensemble and data assimilation
techniques. Geophys. Res. Lett. 2002,29, 38-1–38-4.
Rogers, E.; Black, T.; Ferrier, B.; Lin, Y.; Parrish, D.; DiMego, G. National Oceanic and Atmospheric Administration Changes
to the NCEP Meso Eta Analysis and Forecast System: Increase in resolution, new cloud microphysics, modified precipitation
assimilation, modified 3DVAR analysis. NWS Tech. Proced. Bull. 2001,488, 15.
Atmosphere 2021,12, 953 24 of 24
Pleim, J.E. A combined local and nonlocal closure model for the atmospheric boundary layer. Part I: Model description and
testing. J. Appl. Meteorol. Climatol. 2007,46, 1383–1395.
Berg, L.K.; Gustafson Jr, W.I.; Kassianov, E.I.; Deng, L. Evaluation of a modified scheme for shallow convection: Implementation
of CuP and case studies. Mon. Weather Rev. 2013,141, 134–147.
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.;
Bright, J.; et al. SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. arXiv 2019, arXiv:1907.10121.
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg,
V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011,12, 2825–2830.
Lam, S.K.; Pitrou, A.; Seibert, S. Numba: A LLVM-based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM
Compiler Infrastructure in HPC; ACM: New York, NY, USA, 2015; pp. 7:1–7:6, doi:10.1145/2833157.2833162.
Kasim, M.; Watson-Parris, D.; Deaconu, L.; Oliver, S.; Hatfield, P.; Froula, D.; Gregori, G.; Jarvis, M.; Khatiwala, S.; Kore-
naga, J.; et al. Building high accuracy emulators for scientific simulations with deep neural architecture search. arXiv
57. Chang, J.C.; Hanna, S.R. Air quality model performance evaluation. Meteorol. Atmos. Phys. 2004,87, 167–196.
Maurer, C.; Baré, J.; Kusmierczyk-Michulec, J.; Crawford, A.; Eslinger, P.W.; Seibert, P.; Orr, B.; Philipp, A.; Ross, O.; Generoso, S.;
et al. International challenge to model the long-range transport of radioxenon released from medical isotope production to six
Comprehensive Nuclear-Test-Ban Treaty monitoring stations. J. Environ. Radioact. 2018,192, 667–686.
59. Swets, J.A. Measuring the accuracy of diagnostic systems. Science 1988,240, 1285–1293.
Full-text available
This paper presents a framework for the development of a computationally-efficient surrogate model for air pollution dispersion. Numerical simulation of air pollution dispersion is of fundamental importance for the mitigation of pollution in Seveso-type accidents, and, in extreme cases, for the design of evacuation scenarios for which long-range forecasting is necessary. Due to the high computational load, sophisticated simulation programs are not always useful for prompt computational studies and experimentation in real time. Surrogate models are data-driven models that mimic the behaviour of more accurate and more complex models in limited conditions. These models are computationally fast and enable efficient computer experimentation with them. We propose two methods. The first method develops a grid of independent dynamic models of the air pollution dispersion. The second method develops a reduced grid with interpolation of outputs. Both are demonstrated in an example of a realistic, controlled experiment with limited complexity based on an approximately 7 km radius around the thermal power plant in Šoštanj, Slovenia. The results show acceptable matching of behaviour between the surrogate and original model and noticeable improvement in the computational load. This makes the obtained surrogate models appropriate for further experimentation and confirms the feasibility of the proposed method.
Full-text available
Computer simulations are invaluable tools for scientific discovery. However, accurate simulations are often slow to execute, which limits their applicability to extensive parameter exploration, large-scale data analysis, and uncertainty quantification. A promising route to accelerate simulations by building fast emulators with machine learning requires large training datasets, which can be prohibitively expensive to obtain with slow simulations. Here we present a method based on neural architecture search to build accurate emulators even with a limited number of training data. The method successfully emulates simulations in 10 scientific cases including astrophysics, climate sci-ence, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters. Our approach also inherently provides emulator uncertainty estimation, adding further confidence in their use. We anticipate this work will accelerate research involving expensive simulations, allow more extensive parameters exploration, and enable new, previously unfeasible computational discovery.
Full-text available
The southern peninsular India is characterized by unique climatology with rainfall processes throughout the year from land-ocean contrasts. In addition, the complex terrain induces localized effects causing huge spatial and temporal variability in the observed precipitation. This study aims at evaluating the sensitivity of the high-resolution Weather Research and Forecasting (WRF) model (4km) to multi-physics parameterizations, 3D variational data assimilation and domain configuration, in the study domain covering southern peninsular India. Furthermore, the study focuses on formulation of an ensemble method to improve the simulation of precipitation across seasons. A total of 120 experiments were setup across four crucial rainfall events, of varying spatial extent and duration, dominated by different rainfall generation mechanisms. The assessment of the experiments shows that the model’s cumulus and microphysics schemes have highest impact on the location, intensity and spread of the simulated 4-day long Quantitative Precipitation Forecasts (QPFs). Applying cumulus schemes at all domains represented the variability in the QPFs, across space and time, for the precipitation events dominated by convective activity. The cases without cumulus schemes at the convective scale domain (4 km), captured the higher intensity rains during organized cyclonic circulations in the North-East monsoon period. Hence, a ten-member multi-physics ensemble approach including members with and without cumulus parameterization at the fine resolution domain was adopted. The preliminary results demonstrate that the mean from the suggested ensemble approach (n-MPP) performed well in capturing the dynamics of QPFs across the rainfall events, as opposed to a single-member deterministic simulation and mean from larger member conventional multi-physics ensemble approach (c-MPP) without cumulus parameterization at the convective scale. The rank histogram, delta semi-variance plots and outlier statistics at various lead times clearly showed that the suggested n-MPP was able to capture the high-intensity rainfall, increasing the spread of precipitation forecasts and consequently reducing the occurrence of outliers.
Full-text available
Predictions from process-based models of environmental systems are biased, due to uncertainties in their inputs and parameterizations, reducing their utility. We develop a predictor for the bias in tropospheric ozone (O3, a key pollutant) calculated by an atmospheric chemistry transport model (GEOS-Chem), based on outputs from the model and observations of ozone from both the surface (EPA, EMEP, and GAW) and the ozone-sonde networks. We train a gradient-boosted decision tree algorithm (XGBoost) to predict model bias (model divided by observation), with model and observational data for 2010–2015, and then we test the approach using the years 2016–2017. We show that the bias-corrected model performs considerably better than the uncorrected model. The root-mean-square error is reduced from 16.2 to 7.5 ppb, the normalized mean bias is reduced from 0.28 to −0.04, and Pearson's R is increased from 0.48 to 0.84. Comparisons with observations from the NASA ATom flights (which were not included in the training) also show improvements but to a smaller extent, reducing the root-mean-square error (RMSE) from 12.1 to 10.5 ppb, reducing the normalized mean bias (NMB) from 0.08 to 0.06, and increasing Pearson's R from 0.76 to 0.79. We attribute the smaller improvements to the lack of routine observational constraints for much of the remote troposphere. We show that the method is robust to variations in the volume of training data, with approximately a year of data needed to produce useful performance. Data denial experiments (removing observational sites from the algorithm training) show that information from one location (for example Europe) can reduce the model bias over other locations (for example North America) which might provide insights into the processes controlling the model bias. We explore the choice of predictor (bias prediction versus direct prediction) and conclude both may have utility. We conclude that combining machine learning approaches with process-based models may provide a useful tool for improving these models.
Full-text available
In the framework of the European project CONFIDENCE, Work Package 1 (WP1) focused on the uncertainties in the pre- and early phase of a radiological emergency, when environmental observations are not available and the assessment of the environmental and health impact of the accident largely relies on atmospheric dispersion modelling. The latter is subject to large uncertainties coming from, in particular, meteorological and release data. In WP1, several case studies were identified, including hypothetical accident scenarios in Europe and the Fukushima accident, for which participants propagated input uncertainties through their atmospheric dispersion and subsequent dose models. This resulted in several ensembles of results (consisting of tens to hundreds of simulations) that were compared to each other and to radiological observations (in the Fukushima case). These ensembles were analysed in order to answer questions such as: among meteorology, source term and model-related uncertainties, which are the predominant ones? Are uncertainty assessments very different between the participants and can this inter-ensemble variability be explained? What are the optimal ways of characterizing and presenting the uncertainties? Is the ensemble modelling sufficient to encompass the observations, or are there sources of uncertainty not (sufficiently) taken into account? This paper describes the case studies of WP1 and presents some illustrations of the results, with a summary of the main findings.
Full-text available
During the pre-release and early phase of an accidental release of radionuclides into the atmosphere there are few or no measurements, and dispersion models are used to assess the consequences and assist in determining appropriate countermeasures. However, uncertainties are high during this early phase and it is important to characterise these uncertainties and, if possible, include them in any dispersion modelling. In this paper we examine three sources of uncertainty in dispersion modelling; uncertainty in the source term, uncertainty in the meteorological information used to drive the dispersion model and intrinsic uncertainty within the dispersion model. We also explore the possibility of ranking these uncertainties dependent on their impact on the dispersion model outputs.
Full-text available
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments. This Perspective describes the development and capabilities of SciPy 1.0, an open source scientific computing library for the Python programming language.
Full-text available
Parameterizations for physical processes in weather and climate models are computationally expensive. We use model output from the Weather Research Forecast (WRF) climate model to train deep neural networks (DNNs) and evaluate whether trained DNNs can provide an accurate alternative to the physics-based parameterizations. Specifically, we develop an emulator using DNNs for a planetary boundary layer (PBL) parameterization in the WRF model. PBL parameterizations are used in atmospheric models to represent the diurnal variation in the formation and collapse of the atmospheric boundary layer – the lowest part of the atmosphere. The dynamics and turbulence, as well as velocity, temperature, and humidity profiles within the boundary layer are all critical for determining many of the physical processes in the atmosphere. PBL parameterizations are used to represent these processes that are usually unresolved in a typical climate model that operates at horizontal spatial scales in the tens of kilometers. We demonstrate that a domain-aware DNN, which takes account of underlying domain structure (e.g., nonlocal mixing), can successfully simulate the vertical profiles within the boundary layer of velocities, temperature, and water vapor over the entire diurnal cycle. Results also show that a single trained DNN from one location can produce predictions of wind speed, temperature, and water vapor profiles over nearby locations with similar terrain conditions with correlations higher than 0.9 when compared with the WRF simulations used as the training dataset.
Predictions of the atmospheric dispersion of radionuclides accidentally released from a nuclear power plant are influenced by two large sources of uncertainty: one associated with the meteorological data employed, and one with the source term, i.e. the temporal evolution of the amount and physical and chemical properties of the release. A methodology is presented for quantitative estimation of the variability of the prediction of atmospheric dispersion resulting from both sources of uncertainty. The methodology, which allows for efficient calculation, and thus is well suited for real-time assessment, is applied to a hypothetical accidental release of radionuclides.
In a recent study, Coppola et al (2020) assessed the ability of an ensemble of convection‐permitting models (CPM) to simulate deep convection using three case studies. The ensemble exhibited strong dis crepancies between models, which were attributed to various factors. In order to shed some light on the issue, we quantify in this paper the uncertainty associated to different physical parameterizations from that of using different initial conditions, often referred to as the inter nal variability. For this purpose, we establish a framework to quantify both signals and we compare them for upper atmospheric circulation and near‐surface variables. The analysis is carried out in the context of the CORDEX Flagship Pilot Study on Convective phenomena at high resolution over Europe and the Mediterranean, in which the intermediate RCM WRF simulations that serve to drive the CPM are run several times with different parameterizations. For atmospheric circulation (geopotential height), the sensitivity induced by multi‐physics and the internal variability show comparable magnitudes and a similar spatial distribution pattern. For 2‐meter temperature and 10‐meter wind, the simulations with different parameterizations show larger differences than those launched with different initial conditions. The systematic effect over one year shows distinct patterns for the multiphysics and the internal variability. Therefore, the general lesson of this study is that internal variability should be analyzed in order to properly distinguish the impact of other sources of uncertainty, especially for short‐term sensitivity simulations. This article is protected by copyright. All rights reserved.
Predictions from process-based models of environmental systems are biased, due to uncertainties in their inputs and parameterisations, reducing their utility. We develop a predictor for the bias in tropospheric ozone (a key pollutant) calculated by an atmospheric chemistry transport model (GEOS-Chem), based on outputs from the model and observations of ozone from both the surface (EPA, EMEP and GAW) and the ozone-sonde networks. We train a gradient-boosted decision tree algorithm (XGBoost) to predict model bias, with model and observational data for 2010–2015, and then test the approach using the years 2016–2017. We show that the bias-corrected model performs significantly better than the uncorrected model. The root mean square error is reduced from from 16.21 ppb to 7.48 ppb, the normalised mean bias is reduced from 0.28 to −0.04, and the Pearson's R is increased from 0.479 to 0.841. Comparisons with observations from the NASA ATom flights (which were not included in the training) also show improvements but to a smaller extent reducing the RMSE from 12.11 ppb to 10.50 ppb, the NMB from 0.08 to 0.06 and increasing the Pearson's R from 0.761 to 0.792. We attribute the smaller improvements to the lack of routine observational constraints of the remote troposphere. We explore the choice of predictor (bias prediction versus direct prediction) and conclude both may have utility. We show that the method is robust to variations in the volume of training data, with approximately a year of data needed to produce useful performance. Data denial experiments (removing observational sites from the algorithm training) shows that information from one location (for example Europe) can reduce the model bias over other locations (for example North America) which might provide insights into the processes controlling the model bias. We conclude that combining machine learning approaches with process based models may provide a useful tool for improving performance of air quality forecasts or to provide enhanced assessments of the impact of pollutants on human and ecosystem health, and may have utility in other environmental applications.