Conference PaperPDF Available


Proceedings of the 2018 Winter Simulation Conference
M. Rabe, A. A. Juan, N. Mustafee, A. Skoogh, S. Jain, and B. Johansson, eds.
Martin Bicher
Institute for Analysis and Scientific Computing
TU Wien
Wiedner Hauptstraße 8-10
1040 Vienna, AUSTRIA
Christoph Urach
Niki Popper
dwh Simulation Services
dwh GmbH
Neustiftgasse 57-59
1070 Vienna, AUSTRIA
Since 2015 researchers in Austrian health-care research project DEXHELPP (Decision Support for Health
Policy and Planning) benefit from having access to a validated generic agent-based population model
(GEPOC ABM) of Austria’s population. This simulation model delivers a valid virtual image of Austria’s
population and is also able to make feasible prognoses. During the last years the model has been extended,
remodeled and applied to several use-cases. We were able to add aspects like vaccination strategies,
treatment pathways or spread of infectious diseases which underlines the flexibility of the implementation.
Yet, a number of challenges have been identified, being the basis to contribute to the general discussion of
population models. We will discuss evolving challenges according performance issues and present a newly
implemented time-update approach. Thereafter we will discuss different parametrization concepts when
adding a disease model. Finally we will present how we integrated GIS information based on Delauney
With about 8.7 million inhabitants, 190 thousand emigrations and deaths and 260 thousand immigrants and
births, Austria’s total population fluctuated by about 2.2 percent in the course of 2016 (Statistik Austria
2016). This percentage is neither statistically high or low in comparison with other years or other countries,
but it gives an idea about the total volume of population fluctuation and its potential impact on deducible
numbers. It makes clear that any decision-support for policy making and planning can only be valid if it
considers a model accounting for the underlying population dynamics.
Austrian research project DEXHELPP (Decision Support for Health Policy and Planning) poses a
platform for collaboration of health-care stake holders, medical experts, modeling and simulation experts,
statisticians, data scientists and visualization experts. By combining their skills they perform innovative,
joint and data based research on all levels of the health system. With a wide range of integrated technologies
they provide interactive tools for prognosis and decision support for policy making. In order to create a
valid common founding for their decision-support tools research on population modeling and simulation
is one of the most important research areas of this project:
GEPOC, short for Generic Population Concept, is a vital research part of DEXHELPP since 2014.
It is founded on the idea that a related number of valid population models can be used as a basis for
many different applied decision support models. In the first stage of the project, two structurally different
population-models have been developed and validated: GEPOC SD and GEPOC ABM. The first one
was developed using the method of system-dynamics (SD) and is (mathematically spoken) an ordinary
differential equation model with several hundred coupled equations. The second model is a stochastic
agent-based model (ABM). Both models have been validated using data from the Austrian Bureau of
2656978-1-5386-6572-5/18/$31.00 ©2018 IEEE
Bicher, Urach, and Popper
Statistics (for details, see (Bicher et al. 2015)). Finally, in fall 2016, also a third population model was
added to the collection in form of a partial differential equation (PDE) model (Bicher and Popper 2016)..
1.1 Introduction to GEPOC ABM
All mentioned population models have been sufficiently validated and are tested to produce equivalent
results. In the next chapters, we will focus on the agent-based approach GEPOC ABM, as this model
became the center of population based health-care research in DEXHELPP and has grown to a powerful
and versatile simulation tool for any kind of population-based research problem in Austria. Hereby the
coincidence of two important factors was responsible for this success:
Intensive collaboration with health-care stakeholders provided the possibility for application of
GEPOC ABM as a base model for many diverse health-care related research problems.
Continued research on population modeling and continuous improvement of GEPOC ABM in
collaboration with modeling and simulation experts from different institutions.
In this work we want to present the overall view on this versatile population model in detail for the
first time. Besides giving a formal model definition we will emphasis on valuable lessons-learned from
iteratively applying and improving the model. We will present interesting technical as well as model-
theoretic challenges related to the model and its implementation and state our approaches to overcome
As mentioned GEPOC ABM is an agent-based simulation model and has been validated to firstly, depict
the status quo of Austria’s population between 1991 and 2017 and secondly to make feasible prognoses
matching the forecasts of the Austrian Bureau of Statistics (on the aggregate level). GEPOC ABM is
defined via its initialization and its time-dynamics:
Initial Setup: Given a certain start date of the simulation an agent-based model with N+1 agents is
initialized. The first Nof them stand representative for the inhabitants of Austria and will be denoted as
person-agents henceforth. Each person-agent is given a certain birth-date and (biological) sex. We will
refer to them as female and male agents with a certain age. The remaining N+1-st agent will play the
role of the government and will be denoted as government-agent.
Time Dynamics: The model is updated in not-necessarily equidistant time-steps which are defined
a-priori. Each time-step consists of two parts:
In the first part all person-agents are iterated in random order. For each addressed agent, the model
decides about death, emigration and birth of agents using an event-based strategy. First of all, random
numbers decide about whether the addressed agent is scheduled to emigrate, die and/or (for female agents)
had an offspring in the regarded time-step. For each action scheduled this way a uniformly distributed
random number samples a date for the scheduled action and adds it to an event-list. After all possible
events have been regarded the event-list is sorted and processed in correct order. Death and emigration
events lead to a removal of the agent (skipping all further planned events) while the birth event leads to
a newborn agent with correspondent birth-date added to the model. This strategy is sketched in Figure
1. After all person-agents have been iterated, the government agent generates a certain number of new
person-agents (representing immigrants) and adds them to the model. This concludes one model time-step.
This model definition has changed from the original definition of GEPOC ABM ((Bicher et al. 2015))
at two points. Firstly, the original model was updated in equidistant time-steps. This small enhancement
became relevant to satisfy the need to execute the model in monthly steps (which may take between 28 and
31 days). Secondly, the mechanism for agent-updates switched from a classic probability-based (markovian)
to an event-based approach. We will discuss the benefits of this strategy in Section 2.2 and take a look at
the implementation first.
Bicher, Urach, and Popper
Simulation Time-Step
Update simulation time
Loop person-agents in
random order
dies? Create Random
Death date
Immigrate new
Create Random
Emigration date
Create Random
Birth date
Sort planned
actions by date
Loop over
Figure 1: Discrete-event motivated strategy encapsulated in a basically time-discrete update of the person-
agents in GEPOC ABM.
For our application we found it more useful to implement the model from the scratch than using
existing ABM frameworks like Netlogo (Tisue and Wilensky 2004), Anylogic (Grigoryev 2012), Mesa
(Masad and Kazil 2015), JADE (Bellifemine et al. 1999) or Mason (Luke et al. 2004). Neither of the
mentioned was capable of 1) dealing with the high total number of required agents, 2) load and process
all necessary parametrisation data (with reasonable preprocessing time) and 3) provide sufficient flexibility
for all potential model extensions. Moreover, as we are dealing with very sensitive health-care data and
research questions we wanted to stay in full control of all parts of the simulation and did not want to rely
on often loosely documented 3rd party frameworks that work nicely for scientific applications, but reveal
shortcomings and bugs when it comes to real-world applications.
We decided to implement the model using the (primarily) object-oriented programming language
Python3. Firstly, most Python interpreters can be used free of charge and work platform independent
which makes the model easily transferable. Secondly, Python programming requires the use of proper
indentation making the code easily readable. Thirdly, millions of freely available Python packages provide
high performance algorithms and interfaces to almost any known data format.
2.1 Code Performance
Although sub-packages like Numpy and SciPy provide highly efficient and vectorized algorithms to speed
up computation times, Python (alike other dynamically typed, interpreted languages) is known to execute
comparably slow. Therefore, execution of the simulation model with the full population of Austria (i.e.
run the model with 8-9 million agents) is very time and memory consuming. To give a quick example,
the execution of a 365day-time-step with 79000 agents takes a Intel R
CoreTMi5-5200U processor about
2.02 sec without making use of multithreading. This number scales linearly with the number of agents and
Bicher, Urach, and Popper
The easiest and most obvious solution to this problem is running the model with a reduced number of
agents (i.e. one tenth or one hundredth of Austria’s original population) instead. Afterwards the simulation
results can easily be rescaled to the original size. This strategy was quickly approved to be valid from the
modeling perspective: It is a direct consequence of the Law of Large Numbers that the aggregated simulation
results with full population match the rescaled aggregated simulation results with reduced population. The
only difference is the size of stochastic fluctuations which is proven to be larger when running the model
with reduced number of agents (Note, that this result is not only valid for models without interaction as in
this case, but also for a broad range of models with interaction. For more information see (Bicher 2017;
Bicher and Popper 2015)). To compensate for the higher fluctuations with a downscaled population the
simulation can be evaluated more often in Monte Carlo experiments, which increases computation time
with a smaller extent.
Surprisingly, the described strategy encountered harsh opposition at decision-makers and its credibility
was decreased. Discussing the model’s internal logic its easier to communicate, that an agent poses for a
statistical-representative of one real person instead of 10 or 100. Hence, we had to get it executable with
the full population in reasonable time.
Besides standard means for code optimization two interesting technical measures have been implemented
that finally improved performance of the code.
The generation of new person-agents has a massive impact on the computation time due to sampling of
multivariate random numbers with user-defined distribution functions. As this is needed extensively
often when generating the initial model population a Markov-Chain Monte-Carlo (MCMC) sampling
algorithm was applied for this purpose. We made use of the performant implementation of this
algorithm in the PyMC package of Python3 (Patil et al. 2010).
As many applications of GEPOC ABM did not make use of agent-agent contacts or did only
require very local contacts (see Section 3) we used Python’s native subprocess package to make
the simulation model capable for multi-threading. Hereby, the initial population is split into a
predefined number of parts which can be distributed among an arbitrary number of computation
kernels. Hence, as long as it is sufficient that person agents have a very limited range of contact
partners, GEPOC ABM can be executed fully parallelized.
Our current work in this area is focuses on improving the parallelization capabilities of GEPOC ABM
to allow limited contacts between person agents in different threads comparable to (Collier et al. 2015).
Summarizing, we learned the lesson, that performance is still an issue in population models. Strategies to
cope with this, have to include not only methods to increase performance but also stakeholder interests.
2.2 Time-Update Strategy
To be fully versatile as a generic framework GEPOC ABM has to be capable of dealing with processes
on different time scales. While e.g. infectious diseases like influenza spread in a few days or weeks is
usually requires many years and decades to observe the impact of demographic changes on the health-care
The currently most prominent concept to overcome this problem is simulating the model in continuous
time – i.e. using a discrete-event strategy (Buss and Al Rowaei 2010). Hereby, agents are emigrated
and immigrated, die and are born at corresponding event dates which additionally schedule new future
events. After each occurred event the simulation instantaneously skips to the next scheduled event and the
model-time is enhanced. For the multi time-scale problem in GEPOC ABM this strategy would clearly
be beneficial to a classic time-discrete update as the mechanism is independent of the observed time-scale
and scope. Yet, we found two arguments why this type of update is not optimal for our applications (or at
least requires further research).
Bicher, Urach, and Popper
Finding the next event to occur is always related to a sorting problem. With Ndenoting the initial
number of agents in the model the computational efforts of the ABM consists of iteratively executing
the occurring events (resulting in a problem of O(N)) and correctly inserting the newly scheduled
events to the event list (e.g. using a standard divide-and-conquer algorithm with O(log(N))).
Therefore, the total computational efforts of the model calculate to O(Nlog(N)) which is delicately
larger than using a time-discrete strategy with O(N)effort. Though, there has been progress in
reducing the computational efforts of continuous-time population models by using internal model
logic (Reinhardt and Uhrmacher 2017; Warnke et al. 2016) they can never depend linearly on the
number of agents. Hence, this kind of update strategy is significantly slower (at least as long as
the model does not use agent-agent contacts).
Discrete event update is known to cause difficulties if there exists a global interaction level. We
explain this problem on a short example: Suppose, GEPOC ABM is used to investigate the effects
of overpopulation. Therefore, the population density of the country is assumed to have a negative
impact on the death rate. As the population density changes with every occurring event, it is
impossible for a person-agent to correctly define its own death date in advance. The only solution
to this problem would be, to re-sample all death dates of all agents whenever the population density
changes. This leads to a massive overhead.
The second option to update ABMs is applying discrete time-steps: Instead of deciding when a specific
event happens the model iterates through time asking if a specific event occurred in a regarded time-
interval. Hereby so called transition probabilities are used. For the multi-scale problem in GEPOC ABM
the simulation needs to be executable (and valid) with time-steps of arbitrary lengths. Hereby, two problems
Firstly, it is mathematically impossible to correctly transform transition probabilities from one to a
different time-step length without changing the (expected) simulation outcome. This is exhaustively
discussed in (Bicher 2017) and is best imagined by a simple gedankenexperiment: Say, a female
agent has a probability ptto give birth to a child during a time-interval with length t. Now, assume
that the time-step length should be halved to t/2. Hence, we are looking for a rescaled probability
pt/2so that two steps of the rescaled model lead to the same results as one step of the original
one. Easily seen, this task is impossible to solve as (independent of the choice of pt/2) the rescaled
model makes it possible that two children are born after the regarded time-interval.
Secondly, the occurrence of two or more events in one model time-step leads to causality problems.
Especially in the case of population models it makes a crucial difference if an agent dies before it
recreates, emigrates before it dies, recreates before it emigrates or vice versa. Hence, using discrete
time-steps always requires additional model logic.
Consequently neither of the two time-update strategies is optimally suited for a generic population model.
The proposed solution presented in the model definition and in Figure 1 can be interpreted as an event-based
strategy embedded in a time-discrete update. On the global level, there is a time-step that manages the
update of the time variable. For most transition probabilities we applied the approximation formula
to scale transition probabilities from one to a different time-step length (tt0). This formula is motivated
from geometric distribution.
On the agent-level, the boolean-statement that something happens is linked to an event with occurrence
time when it happens. Hereby, ordering of events is clear from the start and illogical event sequences
are excluded. It is possible to e.g. hospitalize, treat and release an agent in just one model time-step
automatically generating plausible hospitalization and release dates. Hence, as an additional benefit, it is
not always necessary to use atomically small time-steps to investigate small time-scopes. Summarizing,
Bicher, Urach, and Popper
we learned the lesson, that there is no optimal time-update strategy for a generic population model. Event
oriented concepts appear promising, but require further research.
GEPOC ABM has already proven its flexibility as a basis model for population based research in various
areas. Since its validation in 2015 GEPOC ABM has been used for several health-care related applications
of which we specifically want to explain the three largest in detail.
Vaccination Rates: Eradication of measles and polio is one of many goals the World Health Association
(WHO) is trying to achieve until year 2020. Hereby, besides other factors especially high vaccination
numbers among the population play a key role. In case a high percentage (about 95% are estimated) of all
inhabitants are vaccinated so-called herd-immunity effects will prevent potential epidemics from breaking
out which, in the long run, leads to the full eradication of the disease. To stay in control about the progress
every country is obliged to yearly report the percentage of vaccinated infants among their age-cohort – we
will furthermore refer to this number as “vaccination rate” – to the WHO.
Though numbers of sold vaccination doses as well as age of their recipients are (quite) well known in
Austria calculation of these rates for reporting reasons is not as simple as it seems. Due to fluctuations
among the population primarily caused by high immigrant/refugee numbers a dynamic simulation model
was used to correctly determine the vaccination rates and improve the formally used calculation method.
We extended GEPOC ABM to get an image about the current MMR (measles, mumps, rubella) and
polio vaccination rates in Austria. According to availability of doses (gained from data about real sold
doses) and the vaccination regimen each person agent is assigned vaccinations. With specifically calculated
vaccination rates for regular immigrants and refugees the model fully considered the effects of a fluctuating
population. The simulated numbers were reported by the Austrian Ministry of Health and Women’s Affairs
and can be accessed via the web-page of the WHO or in two short reports about the current situation
in Austria (Bundesministerium f¨
ur Gesundheit und Frauen 2017; Bundesministerium f¨
ur Gesundheit und
Frauen 2016). Besides giving access to a more precise calculation method GEPOC ABM additionally
provides deeper insights into the dangers of measles outbreak. E.g. using accredited estimates for the
chance that a vaccination successfully immunizes the recipient and people who were immunized by past
illnesses we are additionally able to give information about the percentage and distribution of immune
Re-hospitalization of Psychiatric Patients: Re-hospitalization rates of psychiatric patients are con-
sidered as a metric of quality of care. Yet, risk factors which enforce high percentages of re-hospitalized
patients are still not fully understood and are a heavily researched area. In order to test the plausibility of
several risk factors commonly believed by domain experts, and to compare different types of health service
interventions in terms of differences in re-hospitalization outcomes, a simulation model was implemented.
GEPOC ABM was extended by several functionalities. First, person-agents were given a probability
to visit mental hospitals and have a stay of several days during which they are diagnosed. Afterwards,
every person-agent has a certain chance to become re-hospitalized again dependent on diagnosis, sex,
age and other risk factors with were key objects of the investigation. Assuming that the chance depends
on the mean-distance to the nearest hospital, person-agents were assigned a residence (NUTS3 region).
Hereby, impact of infrastructural changes could were tested. Moreover, assuming that the chance depends
on co-morbidities, diabetes mellitus was implemented as background disease. This way also the influence
of our aging society was analyzed. More information about this model is found in (Zauner et al. 2017;
Bicher et al. 2017).
Number, Severity and Diagnosis of Stroke Incidences: Implementation of stroke units in hospitals
is a heavily discussed topic (Wilbacher 2005). On the one hand, these units are known to significantly
decrease the risk of mortality and consequential damage in case of a stroke incident compared to regular
hospital units (Barnett 2000). On the other hand, operation of these specialized units is expensive, especially
Bicher, Urach, and Popper
when not in use. Therefore, DEXHELPP started with rigorous analysis on the need for stroke treatment
using a dynamic simulation model.
Person-agents in GEPOC ABM were extended by a chance to suffer from a stroke with a certain
severity and a specific type (diagnosis). This chance is implemented to depend from the person-agent’s
age, sex and residence district as well as having had a previous stroke incident. Hereby, we were able to
observe stroke-related parameters which (in Austria) cannot be accessed from data like the average number
of stroke incidences per person or the total number of stroke-caused deaths. The model is not yet fully
validated, but will contribute to improve services provided for stroke treatment by giving a very detailed
picture of the need.
Motivated by these three applications a couple of toolboxes have been developed that can optionally be
used to extend GEPOC ABM if needed. Hereby, certain parts that have been required for the case-studies
and were deemed to have potential use in future applications were made reusable in a more generic form.
We will present the two most interesting here.
3.1 Parametrization of Diseases via Incidence and Prevalence
Taking a closer look at the three applications presented above the experienced modeler will quickly observe
that none of them relies on any contacts between person-agents (Note, that the first mentioned application
modeled measles vaccinations and not measles infections). GEPOC ABM offers the possibility to implement
contacts e.g. between persons/patients/hospitals/physicians, but the given research problems defined by our
collaborating decision makers (e.g. Austrian Ministry of Health, Main Association of Social Insurances,
Gesundheit ¨
Osterreich GmbH) hardly required this functionality yet. Although we made use of contacts
in smaller and more academic studies (patients doctors in (Nowotny, K. 2018)), the three important
applications presented earlier taught us that simulation-based research in Health Technology Assessment,
Health System Research and Health Services Research does not necessarily rely on contacts or contact-
networks. On the one hand, this can be considered as good news as GEPOC ABM can make full use of
parallelization. On the other hand, the dynamics of the resulting models are scientifically less interesting.
Causes for the lack of need in contact-based models in health-care applications can only be speculated.
One possible reason might be that the impact of non-transmittable diseases (e.g. cardiovascular diseases,
neurological diseases, chronic progressive diseases) on the health-care system is massive – even compared
to infectious diseases.
For this reason we decided to implement a toolbox that makes it possible to quickly extend GEPOC
ABM with a non-transmittable disease. We united the mechanism used for diabetes mellitus in the re-
hospitalization module and the mechanism for stroke incidences in the last application to form one generically
applicable model add-on. As diabetes is parametrized using prevalence data and stroke is parametrized
using incidence data the generic module is capable for using both data of these epidemiological key figures.
Hereby it is important to mention that the strategy only considers new cases and does not regard the recovery
from the medical condition.
Incidence or to be precise the incidence rate is defined as a measure for the probability of at least one
occurrence of a certain medical condition in the observed time-interval. An incidence rate of Iper year
implies that a person who does not show the regarded medical condition before has a probability of Ito
show the medical condition after one year. Often incidence rates are given as average number of persons
showing the condition per 1000 or 10000 as it is easier to interpret.
Incidence rates can be used to extend GEPOC ABM in a very natural way. Every healthy person-agent
schedules the “medical condition”-event in the course of the regarded time-step with a probability directly
calculated from the incidence rate. In case GEPOC ABM is run with yearly steps, the incidence rate can
be taken directly, otherwise it is rescaled using formula (1). Although incidences are sufficient to simulate
new cases it is necessary to know about the prevalence at least for the initial setup of the person-agents.
Hence, incidence rates alone are usually not sufficient to parametrize the model.
Bicher, Urach, and Popper
Prevalence is a measure for the total number of persons suffering from a specific medical condition
and is usually given as a fraction of the total population. As for the incidence rate we often find this number
described as number of cases per 1000, 10000 or 100000 persons to make it easier to depict.
In the contrast to incidence rates, the extension of GEPOC ABM using prevalences is not that natural.
We found it most convenient to follow a two phase strategy. First, the model time-step is executed as defined
in Section 2 (including immigration). Hereby, the total population Pand the fraction F0of person-agents
suffering from the medical condition are counted directly after execution of all agent-events. Thereafter, the
known prevalence Fof the medical condition is compared with F0. If data and model are valid, F0<Fshould
result as the number of cases is only reduced in the first phase (deaths, emigrations, recoveries). Hence,
(FF0)Pdescribes the total number of person-agents that should suffer from the medical consideration
according to the data, but do not show this behavior in the model so far. Therefore in phase two, (FF0)P
healthy person-agents are randomly picked from the agent population to start suffering from the medical
condition. Easily seen, this strategy becomes more accurate the smaller the used time-step and the more
prevalence data points are given. If the step-width of the model time-steps is chosen smaller than the
time-resolution of the data it is useful to linearly interpolate the data points to avoid unsteady jumps of the
prevalence in the model.
Clearly, in case of direct conflict the incidence strategy would be preferred as it is the more natural
way parametrising a disease in an ABM. Yet, incidence data for diseases is usually harder to get. The
strategy for parametrization of prevalence might seem unusual for an agent-based model, but gives perfect
control about the total number of cases and has proven to be perfectly suited for simulation of chronic
diseases like diabetes mellitus. Summarizing, we learned the lesson, that a lot of problems don’t require
agent-agent contacts. It is important to have the possibility, but its same important to get rid of it, if not
needed or applicable.
3.2 Giving Agents a Place to Live
As seen in the Stroke and the Re-Hospitalization application of GEPOC ABM it is often necessary to
extend the person-agents properties by a residence. One could mention this feature to be a necessary
feature of population models in general, but turns out to be a massive overhead if not needed. We decided
to generalize the findings of the two case studies that required agent residences in a generic Geography
toolbox that samples residences to person-agents.
In the course of this development soon a couple of problems occurred. Firstly, the administrative
landscape is permanently changing: Each year a couple of districts and municipalities are dissolved, joined
or reassembled. A very prominent example for this is the former district “Wien Umgebung” which was
split up into four neighbored districts in 2016. Secondly, different partitions of Austria are not always
compatible. It happens quite often that smaller units are not uniquely contained in larger units. For example
one quickly finds ZIP regions that belong to two or more different political districts. The administrative
regions for health-care service (“Versorgungsregionen”) even overlap with the Austrian federal states.
In order to develop a generic solution that works independently of the investigated partition of Austria
we decided to sample residences in form of GIS coordinates. This method is beneficial compared to sampled
regions as a coordinate is always linked to one unique region per investigated partition. This region may
change with time if units are joined or separated, but can always be found as long as the GPS outline of
the partition is known.
We implemented the following algorithm to sample a random GPS coordinate with respect to a given
partition of Austria (equivalent to the one presented in Section 3.3.1 in (Gallagher et al. 2018)):
1. Sample a random region the person-agent is planned to live in according to a given distribution.
2. Sample a uniformly distributed point inside the region according to its GSP outline.
Bicher, Urach, and Popper
Hereby we worked hard to improve the performance of the latter part. Standard algorithms to sample a
uniformly distributed coordinate in a given region are based on a rejection algorithm. I.e. a uniformly
distributed point inside the bounding-box of the polygon (or to be precise multi-polygon) is sampled and
accepted if it lies inside the regarded region. The strategy requires to check if the sampled point lies inside
the polygon at least once which requires that many scalar multiplications as corner-points on the outline.
It is particularly inefficient if shapes are not 0-connected (as the district of “Amstetten” seen in Figure 2),
not 1-connected (as the district of “St P¨
olten Land” seen in Figure 2), or elongated and diagonally oriented.
Hence, we decided to use a different strategy based on the idea that there exists an explicit formula
to calculate a uniformly distributed point inside a triangle. Given two independent uniformly distributed
random numbers r1and r2between 0 and 1 and three points A,B,CR2forming a triangle then
x:=A(1r1) + B(1r2)r1+Cr2r1(2)
is a uniformly distributed point inside 4ABC (Osada et al. 2002). As we could not find a full proof for
this statement in literature we added it to the Appendix section.
Using this formula our strategy states as follows.
2.a Perform a Constrained Delauney Triangulation (CDT) of the shape and calculate the areas of all
resulting triangles. Note, that this has to be done only once for each region and can be reused.
2.b Pick one random triangle from the list of triangles weighted by their area.
2.c Pick a uniformly distributed point inside the triangle according to formula (2)
The concept of the CDT is visualized on the two aforementioned districts in Figure 2. Experiments showed
that this version of the method is about ten times more efficient than the rejection algorithm. Figure 3 shows
100000 sampled residences according to a given distribution on municipality level (Austria is partitioned
in about 2700 of them). Highly populated areas, especially the large cities Vienna, Graz, Linz, Salzburg
and Innsbruck are well visible. Also the influence of the Alps which range from the south-west almost
until Vienna in the north-east is very picturesque.
Although the sampling algorithm works nicely the Geography module of GEPOC ABM can not yet
be considered a validated generic model extension so far especially due to a lack of parametrization data.
First of all, joining and splitting of regions cause problems with standardized data storage and acquisition
for parametrization of the module. Secondly, data availability for parametrization of internal migration
of person-agents is unfortunately insufficient. We currently plan to include settlement information from
the Global Human Settlement Project (Florczyk et al. 2016) to make population distribution even more
realistic. Summarizing, we learned the lesson, that sampling of solely residential regions (Federal States,
NUTS3 Regions, Political Districts,. . . ) is not sustainable. We require sampled coordinates.
As seen in the three case studies GEPOC ABM has already proven its worth as a generic population base
module for different health-care related research problems. Due to our close collaboration with decision
makers we are able to continuously improve and extend the model to make it easier applicable and more
flexible. Hereby we were taught valuable lessons about population modeling and modularity of simulation
models which we shared in this work.
Still, there are many open questions which require further research. The parametrization of spatial
aspects and hereby especially the internal migration involves data difficulties which we plan to solve in the
next years. Also the usage of a large computation cluster for reduction of calculation times is planned very
soon. Finally, we aim to apply the model for research problems apart from health-care to get additional
Bicher, Urach, and Popper
Figure 2: Constrained Delauney Triangulation of districts “St. P ¨
olten Land” (left) and “Amstetten” (right)
for GIS-coordinate sampling (status Jan 1st 2013). The colors of the triangles indicate their area.
Figure 3: Sampled residences for 100000 agents according to distribution for municipalities (Jan 1st 2013).
Barnett, H. J. M. 2000. “The Imperative to Develop Dedicated Stroke Centers”. Journal of the American
Medical Association 283(23):3125.
Bellifemine, F., A. Poggi, and G. Rimassa. 1999. “JADE–A FIPA-compliant agent framework”. In Pro-
ceedings of the Practical Applications of Intelligent Agents, 97–108.
Bicher, M. 2017. Classification of Microscopic Models with Respect to Aggregated System Behaviour.
Dissertation, Institute for Analysis and Scientific Computing, TU Wien, Vienna, Austria.
Bicher, M., B. Glock, F. Miksch, G. Schneckenreither, and N. Popper. 2015. “Definition, Validation and
Comparison of Two Population Models for Austria”. In Proceedings of 4th UBT Annual International
Conference on Business,Technology and Innovation, edited by E. Hajrizi, 174–179. Durres, Albania:
UBT - Higher Education Institution.
Bicher, M., and N. Popper. 2015. “Spatial Effects in Stochastic Microscopic Models - Case Study and
Analysis”. IFAC-PapersOnLine 48(1):153–158.
Bicher, Urach, and Popper
Bicher, M., and N. Popper. 2016. “Mean-Field Approximation of a Microscopic Population Model for
Austria”. In Proceedings of the 9th EUROSIM Congress on Modelling and Simulation, 544–545. Oulu,
Bicher, M., C. Urach, G. Zauner, C. Rippinger, and N. Popper. 2017. “Calibration of a Stochastic Agent-
Based Model for Re-Hospitalization Numbers of Psychatric Patients”. In Proceedings of the 2017
Winter Simulation Conference, edited by W.K.V. Chan et al., 12. Piscataway, New Jersey: IEEE.
Bundesministerium f¨
ur Gesundheit und Frauen 2016. “Kurzbericht: Evaluierung der Masern - Durchimp-
fungsraten”. Technical report, BMGF, Vienna, Austria.
Bundesministerium f¨
ur Gesundheit und Frauen 2017. “Kurzbericht: Evaluierung der Polio-
Durchimpfungsraten”. Technical report, BMGF, Vienna, Austria.
Buss, A., and A. Al Rowaei. 2010. “A comparison of the accuracy of discrete event and discrete time”.
In Proceedings of the 2010 Winter Simulation Conference, edited by B. Johansson et al., 1468–1477:
Collier, N., J. Ozik, and C. M. Macal. 2015. “Large-scale agent-based modeling with repast hpc: A case
study in parallelizing an agent-based model”. In European Conference on Parallel Processing, 454–465.
Florczyk, A. J., S. Ferri, V. Syrris, T. Kemper, M. Halkia, P. Soille, and M. Pesaresi. 2016. “A new
European settlement map from optical remotely sensed data”. Journal of Selected Topics in Applied
Earth Observations and Remote Sensing 9(5):1978–1992.
Gallagher, S., L. F. Richardson, S. L. Ventura, and W. F. Eddy. 2018. “SPEW: Synthetic Populations and
Ecosystems of the World”. Journal of Computational and Graphical Statistics 0(0):1–12.
Grigoryev, I. 2012. AnyLogic 6 in three days: a quick course in simulation modeling. Hampton, NJ:
AnyLogic North America.
Luke, S., C. Cioffi-Revilla, L. Panait, and K. Sullivan. 2004. “Mason: A new multi-agent simulation
toolkit”. In Proceedings of the 2004 swarmfest workshop, Volume 8, 316–327. Michigan, USA.
Masad, D., and J. Kazil. 2015. “MESA: an agent-based modeling framework”. In 14th PYTHON in Science
Conference, edited by K. Huff et al., 53–60.
Nowotny, K. 2018, June. “ECO - Land¨
arzte gesucht: Immer mehr Orte ohne Ordination”. TV documentary.
Osada, R., T. Funkhouser, B. Chazelle, and D. Dobkin. 2002. “Shape distributions”. ACM Transactions on
Graphics (TOG) 21(4):807–832.
Patil, A., D. Huard, and C. J. Fonnesbeck. 2010. “PyMC: Bayesian stochastic modelling in Python”. Journal
of statistical software 35(4):1.
Reinhardt, O., and A. M. Uhrmacher. 2017, April. “An Efficient Simulation Algorithm for Continuous-
Time Agent-Based Linked Lives Models”. In ANSS 2017 Spring Simulation Multi-Conference. Virginia
Beach, Virginia.
Statistik Austria 2016. Statistisches Jahrbuch ¨
Osterreich 2016. Verlag ¨
Osterreich GmbH.
Tisue, S., and U. Wilensky. 2004. “NetLogo: A simple environment for modelling complexity”. In
International Conference on Complex Systems, Volume 21, 16–21. Boston, Massachusetts.
Warnke, T., O. Reinhardt, and A. M. Uhrmacher. 2016. “Population-based CTMCS and agent-based models”.
In 2016 Winter Simulation Conference (WSC), 1253–1264. Piscataway, New Jersey: IEEE.
Wilbacher, I. 2005. “Stroke Units - ¨
Osterreich im Internationalen Vergleich”. Technical report, HVB EBM.
Zauner, G., C. Urach, M. Bicher, N. Popper, and F. Endel. 2017. “Spatial psychiatric hospitalization
modelling in an international setting - an agent based approach”. In Proceedings of the International
Workshop on Innovative Simulation for Health Care 2017. Barcelona, Spain. To Appear.
Proof of statement (2).
Bicher, Urach, and Popper
Proof. Based on two independent uniform random numbers r1,r2with common density
fX:R2R+:(r1,r2)T7→ 1
we define the transformation
φA,B,C:R2R2:(r1,r2)T7→ A(1r1) + B(1r2)r1+Cr2r1
=B+ (AB)(1r1)+(CB)r2r1
and aim to show that φA,B,Cuniformly maps the unit square [0,1]2onto the triangle 4ABC. Firstly, we
define φA,B,Cas the conjunction of two separate mappings. With
φ0:R2R2:(r1,r2)T7→ (1r1)
we get
φA,B,C(r1,r2) = B+ ((AB),(CB))φ0(r1,r2).
Hereby an affine transformation is applied on the image of φ0. As affine transformations (a) map triangles
onto triangles and (b) conserve the uniformity of a distribution, it is sufficient to show that φ0maps r1,r2
onto the triangle 4(1,0)(0,0)(0,1)and that this mapping conserves the uniformity.
The first statement is trivially fulfilled. To show the second, we apply the transformation formula for
probability densities
fφ0(y1,y2) = fX(φ01(y1,y2))detJφ01(y1,y2).
We calculate
φ01(y1,y2) = (1y1)2
1y1,and Jφ01(y1,y2) = 2(1y1)y2
shows that the transformed density is (as well) constant. Therefore, the image of φ0and also the image of
φA,B,Cis uniformly distributed on the stated triangle proving (2).
MARTIN BICHER is research associate at the TU Wien and scientific employee at dwh Simulation
Services GmbH. He finished his PhD in Technical Mathematics at TU Wien in Winter 2017. His doctoral
thesis was about mean-field behaviour of microscopic models. Email address:
CHRISTOPH URACH studied Technical Mathematics at TU Wien and specialised on Mathematical
Modelling and Simulation in the field of HTA (Health Technology Assessment). He currently works at
dwh simulation services in the department of health economics where he is developing applicable model
structures for evaluation of health care interventions. He is also working on a PhD thesis supervised by
Prof. Dr. Felix Breitenecker. Email address:
NIKI POPPER is CEO of dwh - Simulation Services GmbH and research associate at TU Wien. He
is responsible key-researcher of K-Project DEXHELPP and head of the corresponding association. His
research focus lies on comparison of different modeling techniques.
... 1. Population Module. Altogether, the model is based on the Generic Population Concept (GEPOC, see [12]), a generic stochastic agent-based population model of Austria. It validly depicts the current demographic as well as the regional structure of the population on a microscopic level. ...
... Population Module. As briefly described in [12], agents trigger birth and death events always via time-and age-dependent probabilities that apply for the observed time step (i.e. the observed day). Note that in contrast to the basic population model, immigration and emigration events are disabled in the agent-based COVID-19 model due to closed borders in reality. ...
... A lot of problems that deal with the sampling of the initial population have already been solved in the original GEPOC model [12]. In particular this refers to the delaunaytriangulation-based sampling method for locations. ...
Full-text available
Background Many countries have already gone through several infection waves and mostly managed to successfully stop the exponential spread of SARS-CoV-2 through bundles of restrictive measures. Still, the danger of further waves of infections is omnipresent and it is apparent that every containment policy must be carefully evaluated and possibly replaced by a different, less restrictive policy, before it can be lifted. Tracing of contacts and consequential breaking of infection chains is a promising strategy to help containing the disease, although its precise impact on the epidemic is unknown. Objective In this work we aim to quantify the impact of tracing on the containment of the disease and investigate the dynamic effects involved. Design We developed an agent-based model that validly depicts the spread of the disease and allows for exploratory analysis of containment policies. We apply this model to quantify the impact of divverent variants of contact tracing in Austria and to derive general conclusions on contract tracing. Results The study displays that strict tracing can supplement up to 5% reduction of infectivity and that household quarantine comes at the smallest price regarding preventively quarantined people. Limitations The results are limited by the validity of the modeling assumptions, model parameter estimates, and the quality of the parametrization data. Conclusions The study shows that tracing is indeed an efficient measure to keep case numbers low but comes at a high price if the disease is not well contained. Therefore, contact tracing must be executed strictly and adherence within the population must be held up to prevent uncontrolled outbreaks of the disease.
... Austria; approximately nine million inhabitants) is mapped by individual agents that correctly reproduce the demographic properties of the country (M. Bicher, Urach, and Popper 2018). The transmission of a contagious disease (i.e. ...
... mented COVID-19 cases in Austria (Rippinger et al. 2020). Our data statistically reflects the figures in official infectious disease reporting and is further augmented with additional information on SARS-CoV-2 infections that are not detected in reality. The synthetic data is generated from an agent-based simulation model (M. R. Bicher et al. 2020;M. Bicher, Urach, and Popper 2018;dwh GmbH 2020) that is carefully parameterized and calibrated based on additional data sources and expert knowledge. Hence, in this context we regard agentbased simulation as a dynamic method for the fusion, imputation and augmentation of data. ...
Full-text available
We generate synthetic data documenting COVID-19 cases in Austria by the means of an agent-based simulation model. The model simulates the transmission of the SARS-CoV-2 virus in a statistical replica of the population and reproduces typical patient pathways on an individual basis while simultaneously integrating historical data on the implementation and expiration of population-wide countermeasures. The resulting data semantically and statistically aligns with an official epidemiological case reporting data set and provides an easily accessible, consistent and augmented alternative. Our synthetic data set provides additional insight into the spread of the epidemic by synthesizing information that cannot be recorded in reality.
... We used decision-analytic modeling to compare multiple sequential prioritization rules targeting specific subgroups to identify strategies minimizing deaths and hospitalizations over an analytic time horizon of 6 months after availability of the first vaccine doses in Austria [19]. To consider the simultaneous impact of hospitalizations, mortality and spread over time, we applied a previously published agent-based population model that is currently used to inform Austrian healthcare decision-making bodies [20][21][22][23][24]. We followed international guidelines of the ISPOR-SMDM Joint Modeling Good Research Practices Task Force for the development and analysis of our model, as well as for the reporting of our methods and results [25][26][27]. ...
... Thirdly, it includes the ability for tracing of agent-agent contacts and vaccinating specific subpopulations. It is important to note that our study would not have been possible if we could not have built on prior work, particularly the development of a Generic Population Concept for Austria (GEPOC) [22] within the 'Decision Support for Health Policy and Planning: Methods, Models and Technologies based on Existing Health Care Data' (DEXHELPP) project, which was part of the 'COMET-Competence Centers for Excellent Technologies' funded by the Austrian government and organized by the Austrian Research Promotion Agency (FFG) [36]. ...
Full-text available
(1) Background: The Austrian supply of COVID-19 vaccine is limited for now. We aim to provide evidence-based guidance to the authorities in order to minimize COVID-19-related hospitalizations and deaths in Austria. (2) Methods: We used a dynamic agent-based population model to compare different vaccination strategies targeted to the elderly (65 ≥ years), middle aged (45–64 years), younger (15–44 years), vulnerable (risk of severe disease due to comorbidities), and healthcare workers (HCW). First, outcomes were optimized for an initially available vaccine batch for 200,000 individuals. Second, stepwise optimization was performed deriving a prioritization sequence for 2.45 million individuals, maximizing the reduction in total hospitalizations and deaths compared to no vaccination. We considered sterilizing and non-sterilizing immunity, assuming a 70% effectiveness. (3) Results: Maximum reduction of hospitalizations and deaths was achieved by starting vaccination with the elderly and vulnerable followed by middle-aged, HCW, and younger individuals. Optimizations for vaccinating 2.45 million individuals yielded the same prioritization and avoided approximately one third of deaths and hospitalizations. Starting vaccination with HCW leads to slightly smaller reductions but maximizes occupational safety. (4) Conclusion: To minimize COVID-19-related hospitalizations and deaths, our study shows that elderly and vulnerable persons should be prioritized for vaccination until further vaccines are available.
... The Austrian COVID-19 forecast consortium provided short-term forecasts for case numbers and required hospital beds. Our consortium consisted of three independent modelling teams with experience in the use and development of sophisticated mathematical and computational models to address epidemiological and public health challenges [9][10][11][12][13][14]. The consortium was complemented with experts from the Ministry of Health, the Austrian Agency for Health and Food Safety, as well as external public health experts in weekly meetings. ...
... It is based on a validated population model of Austria including demographic processes like death, birth, and migration [14]. Contacts between agents are responsible for disease transmission and are sampled via locations in which agents meet: schools, workplaces, households and leisure-time. ...
Full-text available
Background The corona crisis hit Austria at the end of February 2020 with one of the first European superspreading events. In response, the governmental crisis unit commissioned a forecast consortium with regularly projections of case numbers and demand for hospital beds. Methods We consolidated the output of three independent epidemiological models (ranging from agent-based micro simulation to parsimonious compartmental models) and published weekly short-term forecasts for the number of confirmed cases as well as estimates and upper bounds for the required hospital beds. Findings Here, we report om four key contributions by which our forecasting and reporting system has helped shaping Austria’s policy to navigate the crisis and re-open the country step-wise, namely (i) when and where case numbers are expected to peak during the first wave, (ii) how to safely re-open the country after passing this peak, (iii) how to evaluate the effects of non-pharmaceutical interventions and (iv) provide hospital managers guidance to plan health-care capacities. Interpretation Complex mathematical epidemiological models play an important role in guiding governmental responses during pandemic crises, provided they are used as a monitoring system to detect epidemiological change points. For policy-makers, the media and the public, it might be problematic to distinguish short-term forecasts from worst-case scenarios with undefined levels of certainty, creating distrust in the legitimacy and accuracy of such models. However, when used as a short-term forecast-based monitoring system, the models can inform decisions to ease or strengthen governmental responses.
... Starting the simulation executes a population-sampling routine, which has been implemented in the course of a prior research project (see DEX-HELPP 23 ). This sampling routine, a part of the Generic Population Concept (GEPOC 24 ), ensures that the demography of Austria is well depicted by the agent population, meaning statistically correct age, sex, and residence of each agent. ...
Background Many countries have already gone through several infection waves and mostly managed to successfully stop the exponential spread of SARS-CoV-2 through bundles of restrictive measures. Still, the danger of further waves of infections is omnipresent, and it is apparent that every containment policy must be carefully evaluated and possibly replaced by a different, less restrictive policy before it can be lifted. Tracing of contacts and consequential breaking of infection chains is a promising strategy to help contain the disease, although its precise impact on the epidemic is unknown. Objective In this work, we aim to quantify the impact of tracing on the containment of the disease and investigate the dynamic effects involved. Design We developed an agent-based model that validly depicts the spread of the disease and allows for exploratory analysis of containment policies. We applied this model to quantify the impact of different approaches of contact tracing in Austria to derive general conclusions on contract tracing. Results The study displays that strict tracing complements other intervention strategies. For the containment of the disease, the number of secondary infections must be reduced by about 75%. Implementing the proposed tracing strategy supplements measures worth about 5%. Evaluation of the number of preventively quarantined persons shows that household quarantine is the most effective in terms of avoided cases per quarantined person. Limitations The results are limited by the validity of the modeling assumptions, model parameter estimates, and the quality of the parametrization data. Conclusions The study shows that tracing is indeed an efficient measure to keep case numbers low but comes at a high price if the disease is not well contained. Therefore, contact tracing must be executed strictly, and adherence within the population must be held up to prevent uncontrolled outbreaks of the disease.
... In our simulation model we map the population of Austria (approximately nine million inhabitants) with agents that correctly represent the demographic properties of the country (M. Bicher, Urach, and Popper 2018). The simulator is characterized by dynamic scheduling and processing of discrete events. ...
Full-text available
We generate synthetic data documenting COVID-19 cases in Austria by the means of an agent-based simulation model. The model simulates the transmission of the SARS-CoV-2 virus in a statistical replica of the population and reproduces typical patient pathways on an individual basis while simultaneously integrating historical data on the implementation and expiration of population-wide countermeasures. The resulting data semantically and statistically aligns with an official epidemiological case reporting data set and provides an easily accessible, consistent and augmented alternative. Our synthetic data set provides additional insight into the spread of the epidemic by synthesizing information that cannot be recorded in reality.
... Austria. Unlike the previous two prognostic statistical models, the Austrian COVID-19 model is a descriptive simulation model that is extended from an existing population model (Bicher et al., 2018) based on the SIR framework. It follows a highly detailed agent-based modeling approach, where each individual in the society and various contact places, such as schools, kindergartens, leisure places, are explicitly modeled (dwh, 2020). ...
Full-text available
Mathematical models have become central to the public and policy debate about the recent COVID-19 pandemic. On the one hand, they provide guidance to policy-makers about the development of the epidemic and healthcare demand overtime; on the other hand, they are heavily criticized for their lack of credibility. This commentary reflects on three such models from a validity and usefulness perspective. Specifically, it discusses the complexity, validation, and communication of models informing the government decisions in the UK, US and Austria, and concludes that, although these models are useful in many ways, they currently lack a thorough validation and a clear communication of their uncertainties. Therefore, prediction claims of these models should be taken cautiously, and their merits on scenario analysis should be the basis for decision-making. The lessons that can be learned from the COVID models in terms of the communication of uncertainties and assumptions can guide the use of quantitative models in other policy-making areas.
... SD models for the dynamics of COVID-19 and the effects of possible interventions are for instance [9], which is accessible online 3 , the model of Robert Koch Institute for Germany [10] which we consider as a reference model for the study presented here. In [11] an ABS model is presented for studying the dynamics of COVID-19 and possible mitigation and suppression measures in GB and US, a model representing each inhabitant of Austria 4 is based on [12]. ...
Full-text available
This paper presents two epidemiological models that have been developed in order to study the disease dynamics of the COVID-19 pandemic and exit strategies from the lockdown which has been imposed on many countries world-wide. A strategy is needed such that both the health system is not overloaded letting people die in an uncontrolled way and also such that the majority of people can get back their social contacts as soon as possible. We investigate the potential effects of a combination of measures such as continuation of hygienic constraints after leaving lockdown, isolation of infectious persons, repeated and adaptive short-term contact reductions and also large-scale use of antibody tests in order to know who can be assumed to be immune and participate at public life without constraints. We apply two commonly used modeling approaches: extended SEIR models formulated both as System Dynamics and Agent-Based Simulation, in order to get insight into the disease dynamics of a complete country like Germany and also into more detailed behavior of smaller regions. We confirm the findings of other models that without intervention the consequences of the pandemic can be catastrophic and we extend such findings with effective strategies to overcome the challenge. Based on the modeling assumptions it can be expected that repeated short-term contact reductions will be necessary in the next years to avoid overload of the health system and that on the other side herd immunity can be achieved and antibody tests are an effective way to mitigate the contact reductions for many.
Full-text available
In this work we present two structurally different mathematical models for the prognostic simulation of Austria’s population: A time-continuous, macroscopic system dynamics approach and a time-discrete, microscopic agent-based approach. Both models were developed as case studies of a series of population concepts in order to support models for decision-support in Austria’s health care system. In the present work we want to focus on the definition, the parametrisation as well as especially the validation process of both population-models. The latter was of special interest as it included a cross-model validation with Statistics Austria’s own prognostic model SIKURS.
Conference Paper
Full-text available
In this work we present two structurally different mathematical models for the prognostic simulation of Austria's population: A time-continuous, macroscopic system dynamics approach and a time-discrete, microscopic agent-based approach. Both models were developed as case studies of a series of population concepts in order to support models for decision-support in Austria's health care system. In the present work we want to focus on the definition, the parametrisation as well as especially the validation process of both population-models. The latter was of special interest as it included a cross-model validation with Statistics Austria's own prognostic model SIKURS.
Conference Paper
The efficient continuous-time simulation of linked lives in demography implies specific challenges. The resulting agent-based models constitute time-inhomogeneous Markov chains which require stochastic simulation algorithms. Each agent is characterized by diverse attributes, including a specific position in a dynamically evolving social network which influences the agent's behavior. This hampers the application of population-based approaches in implementing the stochastic simulation algorithm. However, as events are locally constrained by the social network, many events will happen independently of each other. We develop a stochastic simulation algorithm that maintains a dependency structure to realize lazy re-calculation of events. In case study on a Susceptible-Infected-Recovered-Model with social network and age-dependent susceptibility we evaluate the performance of the algorithm in comparison to an earlier version. The evaluation shows the improved scalability and a significant speedup of up to 150 times that can be achieved by taking dependencies into account when executing linked, continuous-time agent-based models.
Agent-based models (ABMs) simulate interactions between autonomous agents in constrained environments over time. ABMs are often used for modeling the spread of infectious diseases. In order to simulate disease outbreaks or other phenomena, ABMs rely on "synthetic ecosystems," or information about agents and their environments that is representative of the real world. Previous approaches for generating synthetic ecosystems have some limitations: they are not open-source, cannot be adapted to new or updated input data sources, and do not allow for alternative methods for sampling agent characteristics and locations. We introduce a general framework for generating Synthetic Populations and Ecosystems of the World (SPEW), implemented as an open-source R package. SPEW allows researchers to choose from a variety of sampling methods for agent characteristics and locations when generating synthetic ecosystems for any geographic region. SPEW can produce synthetic ecosystems for any agent (e.g. humans, mosquitoes, etc), provided that appropriate data is available. We analyze the accuracy and computational efficiency of SPEW given different sampling methods for agent characteristics and locations and provide a suite of diagnostics to screen our synthetic ecosystems. SPEW has generated over five billion human agents across approximately 100,000 geographic regions in about 70 countries, available online.