An Interaction-Based Approach to Computational Epidemiology.
- [show abstract] [hide abstract]
ABSTRACT: Eective detection of and response to pandemic disease out- breaks require significant advances in data mining. Contri- butions to the recently held SIAM DM 2006 Workshop on Spatial Data Mining highlighted key challenges, directions, and progress in this context. We summarize here the main themes presented at the workshop as well as promising re- search directions for the data mining community.SIGKDD Explorations. 01/2006; 8:80-82.
- [show abstract] [hide abstract]
ABSTRACT: Recent human deaths due to infection by highly pathogenic (H5N1) avian influenza A virus have raised the specter of a devastating pandemic like that of 1917-1918, should this avian virus evolve to become readily transmissible among humans. We introduce and use a large-scale stochastic simulation model to investigate the spread of a pandemic strain of influenza virus through the U.S. population of 281 million individuals for R(0) (the basic reproductive number) from 1.6 to 2.4. We model the impact that a variety of levels and combinations of influenza antiviral agents, vaccines, and modified social mobility (including school closure and travel restrictions) have on the timing and magnitude of this spread. Our simulations demonstrate that, in a highly mobile population, restricting travel after an outbreak is detected is likely to delay slightly the time course of the outbreak without impacting the eventual number ill. For R(0) < 1.9, our model suggests that the rapid production and distribution of vaccines, even if poorly matched to circulating strains, could significantly slow disease spread and limit the number ill to <10% of the population, particularly if children are preferentially vaccinated. Alternatively, the aggressive deployment of several million courses of influenza antiviral agents in a targeted prophylaxis strategy may contain a nascent outbreak with low R(0), provided adequate contact tracing and distribution capacities exist. For higher R(0), we predict that multiple strategies in combination (involving both social and medical interventions) will be required to achieve similar limits on illness rates.Proceedings of the National Academy of Sciences 05/2006; 103(15):5935-40. · 9.74 Impact Factor
An Interaction-Based Approach to Computational Epidemiology
Christopher L. Barrett, Stephen Eubank and Madhav V. Marathe∗
Epidemiology is the study of patterns of health in a popula-
tion and the factors that contribute to these patterns. Compu-
tational Epidemiology is the development and use of com-
puter models to understand the spatio-temporal diffusion
of disease through populations. An important factor that
greatly influences an outbreak of an infectious disease is the
structure of the interaction network across which it spreads.
Aggregate or collective computational epidemiology mod-
els that have been studied in the literature for over a century,
often assume that a population is partitioned into a few sub-
populations (e.g. by age) with a regular interaction structure
within and between subpopulations. Although useful for ob-
taining analytical expressions for a number of interesting pa-
rameters such as the numbers of sick, infected and recovered
individuals in a population, it does not capture the complex-
ity of human interactions that serves as a mechanism for dis-
ease transmission. In other words, the aggregate approach
does not take the structure of underlying social network into
account. Additionally, the number of different subpopula-
tion types considered is small and parameters such as mix-
ing rate and reproductive number are either unknown or hard
Here we describe Simdemics: an interaction-based multi-
agent approach to support epidemic planning for large urban
regions. Simdemics is an example of a disaggregated mod-
eling approach in which interactions between every pair of
individuals is represented. It is based on the idea that a bet-
ter understanding of the characteristics of the social contact
network can give better insights into disease dynamics and
intervention strategies for epidemic planning.
Simdemics details the demographic and geographic dis-
tributions of disease and provides decision makers with in-
formation about (1) the consequences of a biological attack
or natural outbreak, (2) the resulting demand for health ser-
vices, and (3) the feasibility and effectiveness of response
options. A unique feature of Simdemics is the size and scale
of urban regions that can be analyzed using it.
Copyright c ? 2008, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Simdemics uses a number of concepts studied in tradi-
tional and contemporary AI literature. This includes, multi-
agent systems, social network analysis, Markov decision
processes and large n-way games [4, 8, 11]. However, the
practical use of this tool prompted the investigation of sev-
eral new basic and applied research questions. For example,
we had to develop new HPC oriented efficient algorithmic
techniques to generate and analyze dynamic social networks
and simulate diffusion of diseases on these dynamic social
networks [13, 20]. These algorithms were implemented so
that they can scale to 100 million node networks and can
be mapped on to 100-1000 processor shared memory multi-
processor architectures [8, 13, 20]. Similarly, scalable data
mining methods are being developed to analyze the vast data
sets that are produced by Simdemics . These scalable
simulations and mining algorithms form the basis of practi-
cal and usable decision support systems that we have built
and are being continually enhanced [1, 3].
The overall approach consists of composing four distinct
models: Step 1. Model for creating a set of synthetic in-
Model for generating a (time vary-
ing) interaction networks, Step 3.
the epidemic process, and Step 4. Model for representing
and evaluating interventions and public policies. The over-
all mathematical model consists of two parts: (i) a discrete
dynamical system framework that captures the co-evolution
of disease dynamics, social network and individual behavior
(first three steps) and (ii) a partially observable Markov deci-
sion process that captures various control and optimization
problems formulated on the phase space of this dynamical
system. See [13, 20] for more details.
Step 1 creates a synthetic urban population by integrating
a variety of databases from commercial and public sources
into a common architecture for data exchange. The process
preserves the confidentiality of the original data sets, yet
produces realistic attributes and demographics for the syn-
thetic individuals. The synthetic population is a set of syn-
thetic people and households, located geographically, each
associated with demographic variables drawn from any of
distributions can be reconstructed from the marginal distri-
butions available in typical census data using an iterative
Model for simulating
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
proportional fitting (IPF) technique. Each synthetic individ-
ual is placed in a household with other synthetic people and
each household is located geographically in such a way that
a census of our synthetic population yields results that are
statistically indistinguishable from the original census data,
if they are both aggregated to the block group level.
In Step 2, a set of activity templates for households are de-
termined, based on several thousand responses to an activity
or time-use survey. These activity templates include the sort
of activities each household member performs and the time
of day they are performed. Thus for a city - demographic
information for each person and location, and a minute-by-
minute schedule of each person’s activities and the locations
where these activities take place is generated by a combina-
tion of simulation and data fusion techniques. This yields
a dynamic social contact network represented by a (vertex
and edge) labeled bipartite graph GPL, where P is the set of
people and L is the set of locations. If a person p ∈ P visits
a location ? ∈ L, there is an edge (p,?,label) ∈ E(GPL)
between them, where label is a record of the type of activ-
ity of the visit and its start and end points. It is impossible
to build such a network by simply collecting field data; the
use of generative models to build such networks is a unique
feature of this work.
Step 3 consists of developing a computational model for
representing the disease within individuals and its transmis-
probabilistic timed finite state machine. Each individual is
associated with a timed probabilistic finite state machine.
Furthermore, the automata are connected to other automata
– this coupling is derived from the social contact network.
The state transition is probabilistic and is timed (i.e. de-
pends on the duration of contact). It may also depend on
the attributes of the people involved (age, profession, health
status, etc.) as well as the type of contact (intimate, ca-
sual, etc.). states of individual automata as they update their
states in responses to changes in internal state and state of
Step 4 consists of representing and analyzing various pub-
lic policies and interventions using a combination of par-
tially observable Markov decision process (POMDP) and n-
way games. It allows us to capture sequential decision mak-
ing process related to studying the efficacy of various inter-
ventions and behaviors of individual agents in response to
their perception of disease spread. The POMDP is exponen-
tially larger than the problem specification and is intractable
to solve optimally in general. We thus resort to efficient sim-
ulations. A key concept is that of implementable policies —
policies or interventions that are implementable in the real
From Theory to Practice and Back to Theory
Simdemics is being developed continually over the last 12
years. It was used in a number of user defined studies,
including recent pandemic planning studies undertaken for
DHS, DoD and DHHS. The studies have guided the con-
tinued evolution of Simdemics. Equally important, these
studies helped us identify new research questions at the in-
terface of multi-agent modeling, data mining, network sci-
ence and high performance computing. For e.g., recently,
at the request of federal agencies involved in preparing for
an influenza pandemic, Simdemics as a part of NIH funded
MIDAS group analyzed combinations of strategies for re-
sponding to influenza. Results of the MIDAS analysis were
reviewed in a Letter Report by the Institute of Medicine,
Modeling Community Containment for Pandemic Influenza
. We discuss this below.
The MIDAS study considered both pharmaceutical and
non-pharmaceutical interventions (NPI) targeted at those
parts of the population where they might most effectively
control the spread of disease. NPIs aim to alter human so-
cial behaviors so as to mitigate an outbreak. They include
interventions such as closing schools or reducing contacts
at work and in the community [1, 23]. In the course of
this study, we found our overall methodology to be suit-
able for estimating normal social contact patterns as well
as changes in patterns resulting from NPIs . Indeed,
it is difficult to generalize observations about transmission
in observed outbreaks to hypothetical circumstances with-
out such a generative, structurally calibrated model of so-
cial networks. For example, one can estimate from historical
outbreaks that roughly 35% of influenza transmission occurs
within a household and 65% occurs in the community (e.g.
at work, school, etc.). However, if the proportion of trans-
mission occurring in different contexts is a parameter of the
model, it becomes impossible to say how it might change
as people’s behaviors change. Moreover, since it is diffi-
cult to find out what spontaneous changes in behavior were
happening during the historical outbreaks, it may be that the
effects of NPIs on overall transmission patterns are already
included in transmission parameter estimates. Our methods
infer the proportions given a social network from assump-
tions about relative transmission rates between people with
different demographics, in effect separating the problem of
estimating the social network from the problem of estimat-
ing transmission over that network. The MIDAS and other
recent studies have raised a number of new (and sometimes
new twists on old) research questions of interest to the AI
community. We give two examples of this below. The ex-
amplesarediscussedinthecontextofpublic health epidemi-
ology, but can often be generalized to other socio-technical
(a) Dynamic Graphical Models: Multi-agent models to
study co-evolution of large social networks, individual be-
havior, disease dynamics and public policies. Over the last
several years, researchers in AI have been studying the inter-
action between individual actions and the multi-agent net-
works that they constitute. Graphical models of Bayesian
inference and games have been proposed and studied in
AI to capture the network structure inherent in certain ap-
plications. Epidemics on social networks provides a use-
ful and realistic application for further studying graphical
games and inference problems [17, 21]. The inclusion of
public policies and disease dynamics as a part of this in-
teraction process motivates several new questions, ranging
from representation of realistic agent behaviors in crises
[1, 3, 10, 19] to design of heuristic methods for solving op-
timization problems modeled as a combination of POMDP
and n-way games to data mining methods for inferring spa-
tial and temporal patterns of disease spread to assist inter-
ventions [11, 18]. A single case study involved addressing
variants of all of the above questions. The size of the sys-
tems (10-300 million agents) makes the problems even more
challenging. The recent work on computational methods for
analysis of graphical models has shown how tree-like struc-
Nash equilibria [17, 22, 26]. Unfortunately, social networks
of urban regions do not have the tree-like property. They
are not even small-world networks or scale free networks as
defined in the current literature; see [8, 12]. Understanding
the structure of these networks and exploiting this structure
for designing efficient computational solutions is an impor-
tant research question. Another interesting direction for fur-
ther research is to extend the notion of graphical games and
inference problems to dynamic graphical models — games
and inference problems in which the underlying network is
changing due to the decisions taken by individual agents.
(b) Intelligent query processing systems and computational
steering of simulation-based experiments. This topic is re-
lated to classical problems in AI and is being revisited in the
context of semantic web and knowledge-based systems .
cerned with many of the same questions. As we started us-
ing our models to address user defined questions, it became
progressively clear to us that easy to use web-based systems
would provide an appropriate mechanism for delivering the
results obtained by executing our models. Our goal is to al-
low the analyst who is not a computing expert, to use HPC-
based models routinely and with ease. Given that the under-
lying complex network, individual behavior and dynamics
of particular process over the network (e.g. epidemic) co-
evolve, werequire anadaptive computationalsteering mech-
anism. This requires methods for coordinating resource dis-
covery of computing and data assets; AI-based techniques
for translating user level request to efficient workflows; re-
using data sets whenever possible and spawning computer
models with required initial parameters and coordination of
resources among various users. Consider a hypothetical yet
illustrative query by an analyst: Compare the effects of vac-
cinating 10% of the school children or closing schools two
days to control a potential flu epidemic city of Portland and
its surrounding areas. In order to answer this simple query,
the system will spawn a series of sub-queries and compu-
tational tasks, e.g. does such a data already exist in our
database ? It might be that the data exists but with 12%
of school children being vaccinated in the study. The sys-
tem then needs to determine if the results are adequate. If
the answer is no, then we might potentially have to create
the Portland social network, construct an appropriate exper-
imental design that consists of choosing various subsets of
school children (note that the query does not specify this
precisely and hence one might consider a random or the op-
timal subset), and so on. Statistical analysis of the results
will then be performed and the final results presented to the
analyst as a combination of charts, spatial spread movies us-
ing Google Earth and so on. We have only taken the first
steps in addressing both these research areas, see  for ad-
tors and members of the Network Dynamics and Sim-
ulation Science Laboratory(NDSSL); the work reported
here is a joint effort of all the team members.
work has been partially supported NSF Nets Grant CNS-
062694, HSD Grant SES-0729441, CDC Center of Ex-
cellence in Public Health Informatics Grant 2506055-01,
NIH-NIGMS MIDAS project GM070694-06, DTRA CN-
IMS Grant HDTRA1-07-C-0113.
We thank our external collabora-
 K. Atkins et al. Simulated Pandemic Influenza Out-
breaks in Chicago VT TR-NDSSL-07-004, 2004.
 K Atkins, C. Barrett, R. Beckman, K. Bisset, J. Chen,
S. Eubank, A. Feng, Z. Feng, S. Harris, B. Lewis, V.
Anil Kumar, M. Marathe, A. Marathe, H. Mortveit, P.
Stretz, An Interaction Based Composable Architecture
for Building Scalable Models of Large Social, Biolog-
ical, Information and Technical Systems,
Quarterly, Volume 4, Number 1, March 2008.
 K. Atkins, et al. An analysis of layered public health
interventions at Ft. Lewis and Ft. Hood during a pan-
demic influenza event VT TR-NDSSL-07-019, 2007.
 C. Barrett, S. Eubank, V. Anil Kumar, M. Marathe, Un-
derstanding Large Scale Social and Infrastructure Net-
works: A Simulation Based Approach, SIAM news:
The Mathematics of Networks,2004.
 C. L. Barrett, K. Bisset, S. Eubank, V. S. A. Kumar,
M. V. Marathe and H. S. Mortveit, Modeling and Sim-
ulation of Large Biological, Information and Socio-
Technical Systems: An Interaction-Based Approach,
Proc. Symposia in Applied Mathematics, Short Course
on Modeling and Simulation of Biological Networks,
AMSLectureNotes, Series, (PSAPM),64, pp. 101-147,
 T. Berners-Lee, J. Hendler, O. Lassila, The Semantic
Web, Scientific American, May (2001).
 C. Barrett, K. Bisset, J. Chen, B. Lewis, S Eubank, V.S.
Anil Kumar, M. Marathe, H. Mortveit, Effect of Public
Policies and Individual Behavior on the Co-evolution
of Social Networks and Infectious Disease Dynamics,
Proc. DIMACS/DyDAn Workshop on Computational
Methods for Dynamic Interaction Networks, Septem-
 C. Barrett, S. Eubank and M. Marathe
& Simulation of Large Biological, information and
Socio-Technical Systems: An Interaction Based Ap-
proach,Interactive Computing: A new Paradigm,
Ed. D. Goldin, S. Smolka and P. Wegner pp. 353-394,
Springer Verlag, 2006.
 C. Barrett, S. Eubank, J. Smith, If smallpox strikes
Portland ... Scientific American, 292, 2005.
 C.T. Bauch and D.J. Earn, Vaccination and the theory
of games, Proc. Natl. Acad. Sci., 101(36), pp. 13391-
 C. Bailey-Kellog, N. Ramakrishnan, M. Marathe, Spa-
tial data mining to support pandemic preparedness.
SIGKDD Explorations 8: 80-82, 2006.
 S. Eubank, V.S. Anil Kumar, M. Marathe, A. Srini-
vasan and N. Wang, Structure of Social Contact Net-
works and Their Impact on Epidemics, AMS-DIMACS
Special Issue on Epidemiology, 70, pp. 181-213, 2006.
 S. Eubank, et al. H. Guclu, V.S. Anil Kumar, M.
Marathe, A. Srinivasan, Z. Toroczkai and N. Wang.
Modeling Disease Outbreaks in Realistic Urban Social
Networks. Nature, 429, (2004).
 N.Ferguson, D. A. T. Cummings, C. Fraser, J. C. Ca-
jka, P. C. Cooley, D. S. Burke Strategies for mitigating
an influenza pandemic, Nature, April, 2006.
 N. L. Ferguson, D. A. T. Cummings, S. Cauchemez, C.
Fraser, S. Riley, A. Meeyai, S. Lamsirithaworn, D. S.
Burke Strategies for containing an emerging influenza
pandemic in Southeast Asia, Nature, vol 437, Septem-
 T.C. Germann, K. Kadau, I. M. Longini Jr., C. A.
Macken Mitigation strategies for pandemic influenza
in the United States, Proc. of National Academy of
Sciences (PNAS), April 11, vol 103, no. 15, pp. 5935-
 M. Kearns Graphical Games, Algorithmic Game The-
ory, Ed. N. Nisan, T. Roughgarden, E. Tardos and V.
Vazirani, Cambridge University Press, pp. 159-180,
 D. Kempe, J. Kleinberg, and E. Tardos. Maximizing
the Spread of Influence in a Social Network, Proc.
 J. Epstein, J. Parker, D. Cummings. Coupled Conta-
gion Dynamics of Fear and Disease: A Behavioral Ba-
sis for the 1918 Epidemic Waves: Mathematical and
Computational Explorations Technical report, Brook-
ings Institute. Presentation made at the MIDAS meet-
ing, June 2006.
 S. Eubank, et al. V.S. Anil Kumar, M. Marathe, A.
Srinivasan and N. Wang. Structural and Algorithmic
Aspects of Large Social Networks, Proc. 15th ACM-
SIAM Symposium on Discrete Algorithms (SODA), pp.
 S. Lauritzen, Graphical Models Oxford University
 S. Lauritzen and D. Spiegelhalter, Local Computation
with probabilities on graphical structures and their ap-
plication to expert systems, J. Royal Statistical Society
B 50(2), pp. 157-224, 1988.
 Letter Report (2007) National Academies Press
 M. Mundhenk, J. Goldsmith, C. Lusena and E. Al-
lender, Complexity of finite-horizon Markov decision
process problems. J. ACM ( JACM) 47(4), 2000, pp.
 M. E. J. Newman. The structure and function of com-
plex networks. SIAM Review 45, 167–256 (2003).
 J. Pearl, Probabilistic Reasoning in Intelligent Sys-
tems, Morgan Kaufmann, 1988.
 D. Vickery and D. Koller Multi-agent algorithms for
solving graphical games, Proc. 18th International
Conference on Artificial Intelligence (AAAI), pp. 345-
Madhav Marathe is a Professor, of Computer Science, and
Deputy Director of Network Dynamics and Simulation Sci-
ence Laboratory (NDSSL), Virginia Bio-Informatics Insti-
tute at Virginia Polytechnic Institute and State University.
He obtained his B.Tech degree in 1989 in Computer Sci-
ence and Engineering from the Indian Institute of Technol-
ogy (IIT) Madras, and his Ph.D. in 1994 in Computer Sci-
ence, from University at Albany under the supervision of
Professors Harry B. Hunt III and Richard E. Stearns. Be-
fore coming to Virginia Tech, he was a Team Leader in the
Basic and Applied Simulation Science group (CCS-5) in the
Computer and Computational Sciences division at the Los
Alamos National Laboratory (LANL) where he led the the-
oretical program to support simulation based design, and
analyze extremely large socio-technical and critical infras-
tructure systems. At Los Alamos, he played a lead role in
the routing module of Transportation Simulation and Analy-
sis System (TRANSIMS), AdHopNet – a modeling tool for
integrated advanced communication networks, and a team
leader for the Urban Infrastructure Suite (UIS), funded as
part of the the DHS National Infrastructure Simulation and
Analysis Center (NISAC). Since joining Virginia Tech, he
and NDSSL team members have been actively developing
an interaction based composable architecture for building
scalable models of large socio-technical systems; see .
He serves as a Co-PI/PI/Senior Investigator on a number
of projects related to computational epidemiology.
includes: NIH-MIDAS project for developing agent based
model to represent and analyze spread of infectious diseases
(Co-PI), CDC Center of Excellence in Public Health In-
formatics (University of Utah Medical School is the lead)
aiming to develop innovative methods and informatics in-
frastructure to support epidemic preparedness, (Co-PI), NSF
HSD program to study the effect of individual behaviors on
the evolution on epidemics, (PI).