A mixed integer genetic algorithm used in biological and chemical defense applications.
-
Citations (0)
-
Cited In (0)
Page 1
FOCUS
A mixed integer genetic algorithm used in biological and chemical
defense applications
Sue Ellen Haupt•Randy L. Haupt•
George S. Young
? Springer-Verlag 2009
Abstract
defense that require a robust optimization technique,
including those that involve the release of a chemical or
biological contaminant. Our problem, in particular, is
computing the parameters to be used in modeling atmo-
spheric transport and dispersion given field sensor mea-
surements of contaminant concentration. This paper
discusses using a genetic algorithm for addressing this
problem. An example is given how a mixed integer genetic
algorithm can be used in conjunction with field sensor data
to invert a forward model to obtain the meteorological
data and source information necessary for prediction of the
subsequent concentration field. A new mixed integer
genetic algorithm is described that is a state-of-the-art tool
capable of optimizing a wide range of objective functions.
Such an algorithm is used here for optimizing atmospheric
stability, wind speed, wind direction, rainout, and source
location. We demonstrate that the algorithm is successful at
reconstructing these meteorological and source parameters
despite moderate correlations between their effects on the
sensor data.
There are many problems in security and
Keywords
Source characterization ? Atmospheric dispersion
Genetic algorithm ? Mixed integer ?
1 Motivation
In the case of an accidental or intentional release of a toxic
chemical, biological, radiological, or nuclear (CBRN)
contaminant, responsible agencies must decide which areas
to evacuate, how to mitigate the release, and how to exe-
cute emergency response. Potentially life-or-death deci-
sions should be based on forecasts of transport and
dispersion of the contaminant. In a real situation, however,
it is unlikely that the exact information regarding the
source parameters (location, time, strength of the release)
or the meteorological data (wind speed and direction,
atmospheric stability) would be available. One way to
mitigate this problem is to assimilate data from field sen-
sors into the atmospheric transport and dispersion model,
for example, meteorological data, concentration data, or
both. In assimilating these data into the model, however,
one must consider difficulties including: (1) monitored
concentration data contains errors; (2) inherent uncertain-
ties apply to modeling chaotic processes such as turbulent
dispersion; and (3) transport and dispersion models com-
pute the ensemble average of many realizations of an event
while the goal is to reproduce a specific single realization
of the event in real time. The process of assimilating the
data into the modeling framework should thus be formu-
lated as one in optimization, specifically configured to
address these issues as well as the fit of model to data.
Dispersion modeling is therefore augmented by assimilat-
ing ground truth from a network of field sensors to back-
calculatethesourceand
required by the model to predict the subsequent transport
and dispersion of the contaminant.
Our previous work demonstrated that coupling inverse
models with transport and dispersion models using
a genetic algorithm (GA) is an effective approach for
meteorologicalparameters
S. E. Haupt (&) ? R. L. Haupt
Applied Research Laboratory, The Pennsylvania State
University, P.O. Box 30, State College, PA 16804, USA
e-mail: haupts2@asme.org
S. E. Haupt ? G. S. Young
Meteorology Department, The Pennsylvania State University,
Walker Building, University Park, PA 16804, USA
123
Soft Comput
DOI 10.1007/s00500-009-0516-z
Page 2
attributing concentration measured at a receptor to each of
a specified number of sources. A GA is an optimization
technique that integrates genetic recombination with nat-
ural selection to evolve better solutions to an optimization
problem (Goldberg 1989; Haupt and Haupt 2004; Holland
1975). This technique was tested using a basic Gaussian
plume dispersion model on synthetic data for both con-
trived source configurations and also with the actual source
configuration for Logan, Utah (Haupt 2005). The meth-
odology was then statistically analyzed using Monte Carlo
techniques to determine the confidence intervals, including
in the presence of both additive and multiplicative white
noise (Haupt et al. 2006). We found that even when the
noise was the same magnitude as the signal, the GA-cou-
pled model could apportion the pollutant to the correct
source. The next step replaced the Gaussian plume dis-
persion model with an operational Second order Closure
Integrated PUFF model, SCIPUFF (Allen et al. 2007a).
The GA-coupled model performed as well with SCIPUFF
computing the dispersion as with the Gaussian plume
model. That enhanced coupled model was then tested on
field test data (Allen et al. 2007a). Within the limitations of
the data, the coupled model still performed admirably. The
cases where performance was disappointing were traced to
difficulties during the field test that would be expected to
impact data quality. The problem was then reformulated
to additionally compute the wind speed and direction
(Allen et al. 2007b). A subsequent study with that refor-
mulated model additionally included the wind speed, time
of release, and effective plume height as parameters to
optimize (Long et al. 2009). That work also analyzed the
amount of data required to back-calculate six parameters in
the presence of noise and formulated a measure of how
much information is necessary to compute a sufficiently
good solution.
The inverse problems in these prior studies were all
solved using a genetic algorithm (GA). The parameters to
be optimized by the GA are the input values for the dis-
persion model. Thus, for each potential solution, the results
of the dispersion model with those estimated parameters
are compared to the monitored concentration pattern. That
series of efforts progressed from simply tuning the source
strength through identifying up to seven relevant parame-
ters: two dimensional location (x, y), effective plume
height, time of release, strength of release, wind direction,
and wind speed. The general process is depicted in Fig. 1.
The concentrations are assumed to be measured at a set of
sensor locations. The concentrations predicted by the dis-
persion model using ‘‘guesses’’ at the modeling variables
are compared to those measured. The GA incorporates the
resulting trial solution performance information to con-
struct better guesses to the model variables using the
operators. After enough generations, the modeling vari-
ables converge to an accuracy sufficient for predicting the
future concentration field.
The GA used in our prior efforts was a continuous GA,
i.e. all of the variables being optimized were real-valued.
For this current effort, however, we need to additionally
invert for both an integer and a binary variable: atmo-
spheric stability class and a rainout switch. The stability of
the atmosphere determines the dispersion coefficients that
govern the plume spread with distance and is typically
divided into six discrete categories. We also account for the
fact that rain causes a portion of the contaminant to ‘‘wash
out’’ from the atmosphere, lowering surface concentration.
This expanded effort requires a mixed integer genetic
Fig. 1 Schematic of source and
meteorological data
optimization for security
S. E. Haupt et al.
123
Page 3
algorithm (MIGA), which allows the simultaneous opti-
mization of real, integer, and binary variables.
The formulation of the problem is described in Sect. 2.
Section 3 details the MIGA. Results for three numerical
experiments are described in Sect. 4. Section 5 summarizes
the work and discusses the utility of the algorithm in the
context of similar problems.
2 Problem formulation
2.1 Model formulation
The goal of the current study is to develop and test an
algorithm for combining an atmospheric transport and
dispersion model with field sensor data on contaminant
concentration so as to back-calculate (i.e. invert for) the
meteorological data and source characteristics necessary
for subsequent transport and dispersion modeling. The
predicted concentration values are compared to those
measured using the cost function:
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PTR
where Cris the forecast concentration as predicted by the
transport and dispersion model at receptor r,Rrthe observed
concentration retrieved from receptorr,TR the total number
of receptors, and a and e are constants used to avoid taking
the logarithm of zero (a = 1, e ¼ 1 ? 10?13here).
The model agreement with sensor data is tested in the
context of an identical twin experiment; that is, the trans-
port and dispersion model used to optimize agreement with
the sensor data is the same model that produces the syn-
thetic sensor data. The identical twin approach is conve-
nient for formulating and testing problems because it
removes several of the sources of error from consideration:
we no longer expect inherent fluctuations due to turbulence
and we are assured that the sensor data is not contaminated
with noise. Since it allows us to compare to data that we
know are exact, it allows evaluation of the inversion
algorithm alone rather than the combination of model and
data. The disadvantage, of course, is that a level of realism
is lost in this approach. Thus, the approach is best suited for
algorithm analysis rather than estimating absolute perfor-
mance of an algorithm in the real-world setting.
cost ¼
PTR
r¼1lnðaCrþ eÞ ? lnðaRrþ eÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½?2
q
r¼1lnðaRrþ eÞ½?2
q
ð1Þ
2.2 Concentration prediction
A Gaussian plume transport and dispersion model is used
to forecast the contaminant concentration. This model is
used because it is an exact solution of the ensemble
averaged diffusion equation. The problem is formulated in
a Cartesian domain. Wind is assumed to blow in the
positive x-direction (i.e. the domain is rotated to so that the
x axis aligns with the direction of the wind). This model is
formulated as:
exp?ðz ? HeÞ2
2r2
z
Cr¼
Q
urzry2pexp
(
?y2
2r2
y
!
?
"#
þ exp?ðz þ HeÞ2
2r2
z
"#)
ð2Þ
where Cris the concentration of emissions from source n
over time period m at a receptor location, ðx;y;zÞ the
Cartesian coordinates of the receptor in the downwind
direction from the source, Q the emission rate from source
n over time period m, u the wind speed, Hethe effective
height of the plume centerline above ground, and ry;rzis
the dispersion coefficients in the y- and z-directions,
respectively.
The transport is in the x-direction at wind speed, u and
the contaminant dispersed in the y- and z-directions with
standard deviation of the spread given by the dispersion
coefficients. The dispersion coefficients are computed from
(Beychok 1994).
n
where x is the downwind distance (in km) and I, J, and K
are empirical coefficients dependent on the Pasquill Sta-
bility Class (Pasquill 1961), which characterizes the
atmospheric turbulence scales. In a highly unstable atmo-
sphere (stability class A), large eddies mix the contaminant
over a relatively large physical extent; thus, the dispersion
coefficients are large for this case. In contrast, for stable
conditions such as form on a calm night, few eddies exist
and mixing is slight (stability classes E and F), resulting in
smaller dispersion coefficients. For the slightly unstable
(classes B and C) and neutral (class D) cases, the coeffi-
cient values fall between those extremes. The dispersion
coefficients can be looked up in tables to produce ryand rz
(Beychok 1994).
The dispersed pollutant form the Cr of (2) and the
monitored data, Rrare the concentration values that are
compared in the cost function (1).
r ¼ exp I þ J lnðxÞ þ KðlnðxÞ½?2
o
ð3Þ
2.3 Impact of atmospheric stability
In our previous work (Allen et al. 2007a; Haupt 2005,
2007; Haupt et al. 2006; Long et al. 2009), we assumed a
neutrally stable atmosphere and used values of I, J, and K
appropriate for that assumption. The shape of a plume
varies greatly, however, with differing stability classes.
This sensitivity occurs because, in the atmosphere,
A mixed integer genetic algorithm used in biological and chemical defense applications
123
Page 4
turbulent dispersion is much larger than molecular disper-
sion; therefore, the turbulence scales of the atmosphere
determine the plume spread. Pasquill (Pasquill 1961)
defined six atmospheric stability classes, each with its own
characteristic scales. The most unstable class (stability A or
1) results in large-scale turbulent motion so the contami-
nant can be carried to higher elevations. In contrast, the low
stability classes (such as stability F or 6) are characterized
by smaller scale atmospheric eddies so the contaminant
does not spread as much in either the crosswind or vertical
directions. Figure 2 demonstrates that point by comparing
concentration isosurfaces for stable, neutral, and unstable
atmospheric conditions.
The six stability classes are discrete and thus are most
readily coded as integer values. Thus, the GA must be able
to optimize both real and integer parameters.
2.4 Impact of rainout
Additionally a binary variable is added to the cost function
to indicate whether or not it is raining: B = 0 indicates a
‘‘no rain’’ condition and B = 1 means ‘‘rain.’’ Steady rain
decreases the concentration exponentially with distance
because it eliminates a fixed fraction of the contaminant in
each time interval, so the concentration equation in the case
of rain becomes:
? exp?ðz ? HeÞ2
2r2
z
Cr¼ e?ax
Q
urzry2pexp
"
?y2
2r2
#
y
!"
þ exp?ðz þ HeÞ2
2r2
z
"#( )#
ð4Þ
where a is a constant that determines the rainout rate. Here
x is the downwind distance in km, a = 0 for ‘‘no rain,’’ and
a = 0.75 km-1for ‘‘rain’’ situations. Figure 3 compares a
plume for stability 2 with and without rainout.
2.5 Solution domain and variable settings
The transport and dispersion model calculations presented
here use a 32 9 32 grid of receptors (TR = 32 9
32 = 1,024) with the source located in the center of the
grid as seen in Fig. 3. In prior work (Long et al. 2009) we
showed that a grid of this size is quite robust for such back-
calculations of source and meteorological parameters, even
in the presence of noise. The target solution uses
u = 5 ms-1, h = 180? (wind from the south in the mete-
orological convention of wind direction), and with a sta-
bility class of 4.1
We conduct three sets of numerical experiments to test
the algorithm. First, we back-calculate the basic meteoro-
logical variables: wind direction, wind speed, and stability
classification. Then we compute the location (x, y) and
strength of the source in addition to wind direction and
stability class. The third experiment includes the meteo-
rological variables from the first experiment plus a binary
variable to indicate whether or not it is raining.
Fig. 2 Comparison of concentration isosurfaces for stabilities 1 (a), 3 (b), 4 (c), and 6 (d). Note the different vertical coordinate
1The algorithm was tested for sensitivity to changes in these
parameters and found to be insensitive. Therefore, although this
single case is shown here, we expect that the same results are
attainable for other values of the variables.
S. E. Haupt et al.
123
Page 5
2.6 Solution technique
The algorithm used to optimize the agreement between
monitored data and predicted concentrations is a MIGA.
Figure 4 is a flow diagram of a genetic algorithm applied to
the source characterization problem solved here. The
algorithm begins with a population of chromosomes, which
comprise first guesses of the variables to be optimized. The
GA works with many such guesses at once, so a matrix of
trial solutions is formed with chromosomes as the rows.
Initially, all of the chromosomes in the population matrix
contain random values, in this case, between 0 and 1. This
matrix is passed to the cost function and a column vector of
costs is returned. In a process known as natural selection,
the best chromosomes are retained in the population while
the ‘‘unfit’’ ones are discarded. The remaining population
then undergoes two operators, mating and mutation, that
generate new potential solutions. The operation of mating
combines the variable values from the best trial solutions
(the mating pool or parents) to produce a new population of
improved variable estimates, the offspring. The mutation
operator then modifies a set of those chromosomes of
parents and offspring that now form the population in order
to maintain an adequate sampling of the variable space,
thus preventing premature convergence to a suboptimal set
of variable values. The process repeats until an adequate
set of variables is identified. At that point, the source and
meteorology variables have been tuned to produce the best
agreement between measured concentrations and those
observed. The GA is quite robust at solving difficult non-
linear coupled optimization problems with a multitude of
local minima that are difficult for traditional techniques
(Goldberg 1989; Haupt and Haupt 2004; Holland 1975).
More details of the GA technique are described in Haupt
and Haupt (2004).
The MIGA used here minimizes cost functions that are
comprised of real number continuous variables, integer
variables, and binary variables. We configure the MIGA to
minimize the cost defined in (1). Including the integer and
binary variables in the search space necessitates a method
to simultaneously optimize integer, binary, and real values.
3 The mixed integer genetic algorithm
We describe several unique features of the MIGA used
here and first introduced in (Haupt 2007), including
•
all variables are represented with values between zero
and one,
the uniform crossover mating operation is used,
mutations occur on an entire chromosome rather than
on an individual variable, and
all scaling and mapping of the variables occurs in the
cost function.
•
•
•
This MIGA is versatile because the same algorithm can
be used for optimizing combinations of binary, integer, and
real variables.
3.1 Variable coding and chromosomes
In order to make the MIGA as flexible as possible, all
variables are mapped to continuous values between 0.0 and
1.0. The term continuous, as used here, specifies continu-
ous over the finite range of 0.0–1.0. A simple transforma-
tion in the cost function maps it to the appropriate range for
each variable. If a variable has an integer or binary value,
then the cost function will map it to a discrete value within
that range. The benefit of this approach is that all the
scaling, quantizing, and rounding happen in the cost
function, so that the MIGA operates independent of the
Fig. 3 Surface concentration of plume at stability 2 with (a) and
without (b) rainout with a = 0.75 in (4)
A mixed integer genetic algorithm used in biological and chemical defense applications
123
Page 6
variable type. There is no need for a binary GA or a real
GA, because the operators work with any combination of
variable types. A chromosome can have any mix of real,
integer, and binary variables.
Because of this mapping the initial population matrix is
an Nvar9 Npopmatrix of uniform random numbers in the
range of 0.0–1.0, where Nvaris the number of variables to
be optimized and Npopis the population size.
2
a11
a21
...
aNpop1
a12
a22
???
.
a1Nvar
..
...
???
aNpopNvar
6664
Each row is a chromosome that serves as an input to the
cost function (1). Inside the cost function, the domain-
limited real variables of each chromosome are converted to
the problem’s true variable types. Real variables are
mapped by
3
7775
ð5Þ
xn¼ xmax? xmin
The integer value mapping is
ðÞamnþ xmin
ð6Þ
xn¼ roundup xmax? xmin
where ‘roundup’ rounds to the next highest integer and xn
values are integers. Where necessary, the variable is
converted to binary by either rounding its value:
ðÞamn
f g þ xmin
ð7Þ
xn¼ round amn
or by quantizing its value:
fg
ð8Þ
xn¼ quantize amn
Let us consider a case where we wish to optimize wind
speed, wind direction, stability, and rainout. For wind
fg
ð9Þ
speed, we choose xWS;min¼ 0 and xWS;max¼ 20: Wind
direction uses xWD;min¼ 0 and xWD;max¼ 360: There are
six discrete stability classes that are decoded by assigning
each an integer value of 1 through 6 so that xstab;min¼ 1 and
xstab;max¼ 6: The binary value that determines whether or
not it is raining is determined via (8). If the rainout value is
1, the concentration is determined by (4). If it is 0, then (2)
is used instead.
The next step in the algorithm is natural selection. This
process occurs via the cost function, in this case, variables
in each chromosome are used to compute C using (2) or
(4). C is then fed into the cost function (1). Chromosomes
with low costs survive, while chromosomes with high costs
are discarded. This step either keeps a certain percentage of
the population or discards members with costs that exceed
a certain level. Surviving chromosomes become the mating
pool. Discarded chromosomes from the population are
replaced by new offspring chromosomes. In order to create
the offspring, parents must be selected. This current
application replaces 50% of the population at each gener-
ation. We use tournament selection (Haupt and Haupt
2004). In general, two parents produce two offspring that
replace two discarded chromosomes.
3.2 Mating
Mating between two selected chromosomes uses uniform
crossover, which is preferable for a MIGA because uniform
crossover provides a larger exploration of the cost surface
than other approaches to crossover (Haupt and Haupt
2004). First, a random binary mask is created consisting of
ones and zeros with the same length as the chromosome. A
one in the mask column means that the offspring receives
Fig. 4 Diagram indicating flow
of logic for a mixed integer
genetic algorithm applied to the
source characterization problem
S. E. Haupt et al.
123
Page 7
the value of that variable from parent#1, a zero that it
receives it from parent#2. As an example:
parent#1 ¼
parent#2 ¼
mask ¼
offspring ¼
If the matrix elements represent binary variables, then this
type of crossover results in a diversity of values. If the
elements represent continuous or integer variables then this
operation merely interchanges the values between chro-
mosomes. Consequently, mutation is the primary incubator
of diversity within the population for continuous and
integer values in this algorithm.
am1
an1
1
am1
am2
an2
0
an2
am3
an3
1
am3
am4
an4
1
am4
ð10Þ
3.3 Mutation
The simplest approach to mutation is to randomly select
variables in the population and replace them with uniform
random values. Indeed, this is the first step of the mutation
operation. If the third element of a chromosome is selected
for mutation, the mutated chromosome (chrom0) is derived
from the selected chromosome (chrom) by
chrom0¼ ar1
where the primed values are new uniform random numbers.
The second step of the mutation operator used here
adds a random adjustment factor to the chromosome
selected for mutation. The correction factor comes from
multiplying each element within the chromosome by a
random number (?1?brm?1) and multiplying the
resulting chromosome by a mutation factor (0?ar?1) so
that
chromc¼ ar br1ar1
Finally, the mutated chromosome is given by
ar2
a0
r3
ar4
½?ð11Þ
br2ar2
br3ar3
br4ar4
½?ð12Þ
chrom0¼ rem chrom þ chromc
where rem is the remainder function (digits to the left of the
decimal point are ignored).
fg
ð13Þ
4 Results
4.1 Three-variable problem
The first numerical experiment optimizes the base meteo-
rological parameters used for computing dispersion: wind
speed, wind direction, and Pasquill–Gifford stability class
(integer). Note that the stability class determines the dis-
persion coefficients and thus the spread of the plume. The
MIGA was run with Npop= 12 and a mutation rate for the
first step of mutation set to 0.2.
First a single run was accomplished using 5,000 gener-
ations to assess convergence properties. A plot of the con-
vergence appears in Fig. 5. For this case, the best solution
converges in about 1,200 generations. Based on this result, a
margin of error is added and the remaining three-variable
runs use 2,000 generations. Note that the high mutation rate
forces the algorithm to continually try new solutions; thus,
the mean solution does not change much.
The results of the first series of optimizations appear in
Table 1. That table reports the statistics of ten independent
runs of 2,000 generations each for optimizing the three
meteorological parameters. It is apparent that the GA is
quite reliable for this back-calculation. Not only is the
mean value of each variable quite close to the known exact
solution, but the standard deviations are quite small. In
fact, the integer denoting the stability category is consis-
tently diagnosed correctly. In additional runs (not shown),
we were assured that this calculation is not sensitive to
stability category.
4.2 Five-variable problem
Now, the problem is reconfigured to include the source
location variables in the inversion. Each of these have
xmin= -8,000 m and xmax= 8,000 m. The actual source
Fig. 5 Convergence of first numerical experiment on a 32 9 32 grid
for stability 4
Table 1 Results of ten MIGA optimizations of meteorological
parameters
Wind speed (m/s)Wind direction (?)Stability class
Actual5.000 180.0006
Mean 4.990180.0266
Median4.987 180.0286
SD0.2260.0140
A mixed integer genetic algorithm used in biological and chemical defense applications
123
Page 8
is located at (0, 0). We back-calculate the source strength
using xQ;min¼ 0 and xQ;max¼ 10 (these are scaling factors
on a non-dimensional emission rate, Q). Note that source
strength, Q and wind speed, u, appear as a ratio in (2), so
their affects on the concentration values, C, and thus the
cost function value have a high (albeit negative) correla-
tion; therefore trying to distinguish between the two pro-
duces an ill-posed problem. So here we choose to assume
that wind speed is known for this calculation.2
Table 2 reports statistics of ten independent runs, each
run for 10,000 generations3when optimizing wind direc-
tion, stability, the (x, y) location and strength of the source.
Generally, the MIGA is successful at finding the correct
values of the five parameters. The value of source strength
(a factor that multiplies the emission rate) is not quite as
close to actual, but still has a reasonable percent error
(taken as the difference from the actual divided by the
range). Although the magnitude of the source location error
appears large, it is small on the scale of the search domain,
-8,000 to 8,000 m. The final row of Table 2 lists the
difference between the mean and the actual as a percentage
skill score (based on the search range). Based on these
scores, the source location has been pin-pointed rather
accurately.
4.3 Four-variable problem
The third numerical experiment seeks to optimize four
variables: wind speed, wind direction, stability class, and
rainout switch (a binary number). This experiment thus
tests the MIGA’s ability to optimize real, integer, and
binary variables simultaneously. Each of the ten indepen-
dent runs used 300 generations. The results appear in
Table 3. The real valued meteorological variables, wind
speed and direction are consistently retrieved to a high
degree of accuracy. Both the integer and binary variables
were computed correctly in each of the ten runs with
standard deviations of 0. Thus we conclude that the MIGA
is an effective and reliable tool for such problems.
5 Discussion
The MIGA is a useful advance of computing technology
that allows joint optimization of integer, binary, and con-
tinuous variables. Its utility is demonstrated in a CBRN
security application—back-calculating source and meteo-
rological parameters for subsequent dispersion modeling.
Specifically, it allows adding the computation of stability
category and rainout switch in a way that would be difficult
for more traditional techniques. This work could aid
decision-makers by giving better estimates of contaminant
dispersion.
Note that there are several limitations of the current
work that affect its applicability. First, the demonstration
application has been done in the context of an identical
twin environment. Although this environment is quite
useful for algorithm testing as it allows verification against
a known solution, it does not, however, permit assessment
of the algorithm’s ability to cope with the unknown
vagaries of actual field monitored data. The second limi-
tation is that this work has been done in a noiseless envi-
ronment. In a real situation, there would be noise due to
error in the measurements, the inability of the ensemble
average model to match a specific realization, and the
inherently chaotic fluctuations due to atmospheric turbu-
lence. Incorporating synthetic noise (as was done in prior
work (Haupt et al. 2006; Long et al. 2009) could help
simulate these effects and thus provide a first estimate of
the algorithm’s sensitivity to them. In addition, sensors
have detection and saturation limits. Rodriguez et al.
(2009) showed that it is important to build such limits into
the models. Finally, the rainout simulation is not physically
realistic. Although some meteorological forethought went
into formulating (4), the basic form is ad-hoc and the
rainout rate (a) is not based on any observations and lacks a
dependence on rain rate. That part of the calculation was
Table 2 Results of ten MIGA optimizations of meteorological
parameters including the binary rainout switch
Source
strength
Wind direction
(?)
Stability
class
x (m)
y (m)
Actual1.00 180.004 0.00.0
Mean 1.38 180.414-9.924.6
Median 1.20180.30 4-22.046.7
SD 0.67 0.32025.6 72.3
% Error 3.8 0.010.00 0.000.15
Table 3 Results of ten MIGA optimizations of meteorological
parameters plus source location and strength
Wind
speed (m/s)
Wind
direction (?)
Stability
class
Rainout
switch
Actual5.000 180.00021
Mean5.015 180.05521
Median 4.998180.05521
SD0.055 0.04000
2One way to address the coupling of Q and u is to employ a Gaussian
puff model that provides a time-varying concentration field. In that
case the additional information allows computing both parameters
(Long et al. 2009). This approach, however, is not appropriate for the
continuous release considered here.
3More generations are required to assure convergence when
including these additional variables.
S. E. Haupt et al.
123
Page 9
solely for demonstrating that a binary variable could be
encoded and used in the inversion.
6 Conclusions
This work has introduced a new tool, the MIGA, that is
capable of optimizing integer, binary, and continuous
variables simultaneously. This algorithm was demonstrated
to be effective at back-calculating source and meteorology
information necessary to model dispersion of a CBRN
release if field sensor data is available. The biggest
advantage of this algorithm is its ability to find the global
optimum of such a highly nonlinear problem with a com-
plex cost surface with many local optima. The biggest
disadvantage of this algorithm is that it is not fast—a run of
the four-variable problem takes about 17 min on a desktop
PC. Although this sounds slower than standard steepest
descent algorithms, those algorithms are not successful in
solving this problem. In fact, when a Nelder Meade
Downhill Simplex algorithm (Nelder and Mead 1965) is
applied to this problem, it cannot find the solution.
Therefore, the robustness of the MIGA is the important
factor that makes it a clear winner for this application.
Thus, this application demonstrates the robustness of the
MIGA algorithm.
Future work will proceed on two fronts. First, we will
concentrate on further tuning and testing the MIGA. There
are a myriad of ways to tweak the GA operators that will be
explored. Such algorithm modifications should be tested in
the context of test functions to examine algorithm behavior
in a carefully construed environment. The second front will
be in applying the MIGA to broader problems, including
extensions of the current demonstration problem. For
example, we will examine the robustness of the results in
the face of noise in the data. We will also optimize addi-
tional variables, such as simultaneously optimizing source
strength and wind direction as well as including effective
source height and the height of the atmospheric boundary
layer. For these additional variables, we will use an
instantaneous source model and a puff dispersion model.
We could study these variables for various receptor con-
figurations and compute the amount of information nec-
essary to complete an inversion, both without noise and in
the presence of noise. Note that even further complexity
could be added by incorporating a more refined dispersion
model and using a more refined model or real data to relax
the identical twin assumption. Finally, it would be inter-
esting to use the MIGA to optimize the number of receptors
and their location.
In summary, the MIGA described here has proven to be
a powerful optimization tool that could be applied to many
problems, including those in CBRN defense arena.
Acknowledgments
making Fig. 2. Many helpful discussions with Christopher Allen,
Kerrie Long, Anke Beyer-Lout, Andrew Annunzio, Yuki Kuroki, Lili
Lei, and Luna Rodriguez helped inspire this work. The third author
also expresses his eternal gratitude to Francis de Sales and John
Bosco for support in manuscript preparation.
The authors would like to thank Kerrie Long for
References
Allen CT, Haupt SE, Young GS (2007a) Source characterization with
a genetic algorithm-coupled receptor/dispersion model incorpo-
rating SCIPUFF. J Appl Meteorol 46(3):273–287
Allen CT, Young GS, Haupt SE (2007b) Improving pollutant source
characterization by optimizing meteorological data with a
genetic algorithm. Atmos Environ 41:2283–2289
Beychok MR (1994) Fundamentals of stack gas dispersion, 3rd edn.
Milton Beychok, Pub., Irvine, CA, 193 pp
Goldberg DE (1989) Genetic algorithms in search, optimization, and
machine learning. Addison-Wesley, New York
Haupt SE (2005) A demonstration of coupled receptor/dispersion
modeling with a genetic algorithm. Atmos Environ 39:7181–
7189
Haupt RL (2007) Antenna design with a mixed integer genetic
algorithm. IEEE AP-S Trans. 55(3):577–582
Haupt RL, Haupt SE (2004) Practical genetic algorithms, 2nd edn
with CD. John Wiley & Sons, New York, NY
Haupt SE, Young GS, Allen CT (2006) Validation of a receptor/
dispersion model coupled with a genetic algorithm using
synthetic data. J Appl Meteorol 45:476–490
Holland JH (1975) Adaptation in natural and artificial systems. The
University of Michigan Press, Ann Arbor
Long KJ, Haupt SE, Young GS (2009) Assessing sensitivity of source
term estimation. Atmos Environ (submitted)
Nelder JA, Mead R (1965) A simplex method for function minimi-
zation. Comput J 7:308–313
Pasquill F (1961) The estimation of the dispersion of windborne
material. Meteorol Mag 90:33–49
Rodriguez LM, Haupt SE, Young GS (2009) Impact of sensor
characteristics on source characterization for dispersion model-
ing. Measurement (in revision)
A mixed integer genetic algorithm used in biological and chemical defense applications
123