ChapterPDF Available

MaaSim: A Liveability Simulation for Improving the Quality of Life in Cities


Abstract and Figures

Urbanism is no longer planned on paper thanks to powerful models and 3D simulation platforms. However, current work is not open to the public and lacks an optimisation agent that could help in decision making. This paper describes the creation of an open-source simulation based on an existing Dutch liveability score with a built-in AI module. Features are selected using feature engineering and Random Forests. Then, a modified scoring function is built based on the former liveability classes. The score is predicted using Random Forest for regression and achieved a recall of 0.83 with 10-fold cross-validation. Afterwards, Exploratory Factor Analysis is applied to select the actions present in the model. The resulting indicators are divided into 5 groups, and 12 actions are generated. The performance of four optimisation algorithms is compared, namely NSGA-II, PAES, SPEA2 and \(\epsilon \)-MOEA, on three established criteria of quality: cardinality, the spread of the solutions, spacing, and the resulting score and number of turns. Although all four algorithms show different strengths, \(\epsilon \)-MOEA is selected to be the most suitable for this problem. Ultimately, the simulation incorporates the model and the selected AI module in a GUI written in the Kivy framework for Python. Tests performed on users show positive responses and encourage further initiatives towards joining technology and public applications.
Content may be subject to copyright.
MaaSim: A Liveability Simulation for Improving
the Quality of Life in Cities
Dominika Woszczyk1and Gerasimos Spanakis1
Department of Data Science and Knowledge Engineering
Maastricht University
Abstract. Urbanism is no longer planned on paper thanks to power-
ful models and 3D simulation platforms. However, current work is not
open to the public and lacks an optimisation agent that could help in
decision making. This paper describes the creation of an open-source
simulation based on an existing Dutch liveability score with a built-in
AI module. Features are selected using feature engineering and Ran-
dom Forests. Then, a modified scoring function is built based on the
former liveability classes. The score is predicted using Random Forest
for regression and achieved a recall of 0.83 with 10-fold cross-validation.
Afterwards, Exploratory Factor Analysis is applied to select the actions
present in the model. The resulting indicators are divided into 5 groups,
and 12 actions are generated. The performance of four optimisation al-
gorithms is compared, namely NSGA-II, PAES, SPEA2 and -MOEA,
on three established criteria of quality: cardinality, the spread of the so-
lutions, spacing, and the resulting score and number of turns. Although
all four algorithms show different strengths, MOE A is selected to be
the most suitable for this problem. Ultimately, the simulation incorpo-
rates the model and the selected AI module in a GUI written in the Kivy
framework for Python. Tests performed on users show positive responses
and encourage further initiatives towards joining technology and public
Keywords: liveability, simulation, feature selection, multi-objective op-
timisation, Kivy
1 Introduction
Liveability, wellbeing, quality of life: Those are the concepts that have been at the
centre of a growing interest of industries and governments for the past years. As
a matter of fact, multiple scores and rankings have been created, in an attempt
to capture the features that describe a “good quality of life” [3,1,15]. The focus is
nowadays on building and improving cities, and creating the best environment to
live in, but also on identifying critical zones that demand changes[3]. Liveability
serves now as an evaluation metric for policies.
arXiv:1810.07791v1 [cs.CY] 13 Oct 2018
Furthermore, with the advances in technology, it is now possible to visualise
the impact of policies through models and simulations. It allows for cheap anal-
ysis, fast insights and no physical consequences. Additionally, stakeholders are
given a platform where it is possible to represent concrete plans, thus facilitat-
ing the communication and exchange of ideas. What is more, it is not rare to
see civilians willing to take matters into their hands and take care of projects
for their neighbourhood.1Nevertheless, simulators and models for urbanisation
are built by companies or collaborating universities. Those are intended for the
private sector or a subscription fee must be paid. Moreover, the companies do
not share their model, and a user-friendly interface is not their primary con-
cern. On the other hand, serious games for urban planning, while appealing and
entertaining, do not carry a real value regarding practical insights.
Finding sets of beneficial actions that will improve a liveability score is an
optimisation task. Hence, a step further for these simulations would be to intro-
duce a decision maker or a decision helper. Real life applications often embody
multiple parameters that must be optimised, yet often contradict each other. Im-
plementing an AI module for that simulation requires an algorithm capable of
solving multi-objective optimisation problems. Advances in Computer Science
brought evolutionary algorithms, widely and successfully applied to optimisa-
tion problems and have been shown to be efficient in solving those with a higher
number of functions [14].
This paper aims to build a simulation model based on a real liveability score
and open-source geographical data. Moreover, it proposes an optimisation al-
gorithm that computes an optimal set of actions for achieving best possible
liveability score, for a given neighbourhood or municipality. Finally, it combines
the model, the AI module and a graphical interface into a serious game, targeted
at citizens and policy-makers. This paper aims to join those points together and
apply it to the new demand for urbanisation and wellbeing, made available to
the public. It is a report of the steps and literature for the creation of a serious
game, based on real data of Dutch neighbourhoods, from the region of Limburg.
Our contribution is two-fold: (a) we show how a minimalistic city-builder-like
simulation can be built based on real data and implemented as a serious game
(MaaSim) available to policy makers and (b) how AI algorithms can be used
to find the optimal actions to improve a neighbourhood within the simulation
2 Related Work
2.1 Urban Planners and Serious Games
Urban planners and policy simulators are already present in multiple forms in
the private sector. They share a common ground of presenting a 3D visualisation
of a city or a region and the effect of performing actions, on different indicators,
whether it is liveability, traffic, density, and so on. However, they do not provide a
decision maker and are paid services. Some of the current project are: Urbansim2,
MUtopia [1],Tygron3or SimSmartMobility 4.
A serious game is a game with an informative or educational purpose. It can
be built on a high level or be very complex. Moreover, the platform is not limited
to computer-based programs but can as well be in the form of board games. In
this case, serious games aim at urban project stakeholders or civilians. The goal
is to encourage a thought process and teach about some neglected or taken for
granted aspects of urban planning.
Some current games built for the purpose of urban planning were made by
means of 3D simulations, like the “B3: Design your marketplace”project [19]
or games made by the Tygron company [9]. Others were made in the format
of a board game [5] or card games. 5Nevertheless, those games are targeted
for universities and business applications. Moreover, the access is restricted to
specific events and test groups [9].
More simplistic serious games not based on real models aim to educate about
the difficulties of managing the different aspects of livability and environment,
as well as raise awareness. 6 7 8
2.2 Paper Application and Structure
This paper aims at filling the gap between serious games and urban planners by
presenting a report on the construction of a 2D liveability simulation.
First of all, the creation of the simulation follows scientific methods and
real-life dataset and score, as described in Chapters 3 and 4. On that aspect,
the paper situates itself among other urban planners mentioned in the previous
section. On the other hand, the open source essence of the final product and
the raised attention to a user-friendly and entertaining interface places it among
other serious games. Finally, the addition of an optimisation algorithm (described
in Chapter 5) as a decision helper is an unusual asset that is present in none of
the urban planners or serious games. This AI module shows the user the best
possible choices by giving sets of optimal solutions. In that manner, the user
can select their favourite solution based on its criteria. They are offered a choice
instead of having one imposed on them.
The end product of this paper will benefit policymakers by giving them gen-
eral insights on the development of neighbourhoods, for Dutch and non-Dutch
tycoon-trains-you-to- be-green-but- in-a-fun- way
citizens. Most importantly, the end program can bring awareness to the inhab-
itants and provide a visualisation platform that shows what can be improved,
through a playful, serious game. In that manner, civilians can undertake ac-
tion and propose projects to municipalities. The paper is an invitation to bring
data-based visualisation and decision making tools to the public.
3 The Dataset
The dataset was based on an existing liveability score for Dutch neighbourhoods.
The Leefbaarometer 2.0 is a score built by Rigo, a Dutch statistical company
and Atlas, on the demand of the Ministry of Housing, Spatial Planning and the
Environment [15]. It is a low-level score based on 116 environmental variables
grouped into 5 categories. Those variables were collected based on data given by
municipalities and surveys, corrected for different factors such as age and general
background. Since the variable values were not available for the public, a different
approach had to be taken: Instead of using the exact indicators and scoring
function, all available indicators would be collected and computed using the
same method, which is using the 200 m area around the centre and distances to
facilities. The new scoring function would be then computed using the liveability
classes made publicly available by Rigo. Therefore, the following dataset was
built based on open data for the whole Limburg, retrieved from CBS9, BAG10,
Geofabrik11 and Culturelerfolg12. Fortunately, multiple original indicators could
be discarded and ignored due to the fact that they remain constant for that
region. Unfortunately, some were not available, i.e. average distance to an ATM
or crimes/nuisance records.
The indicators were retrieved and computed following the indicators descrip-
tion [15], using shapefiles. A shapefile is a file format combining information
with its geometry(Point, Polygon, Line) and its coordinates [6]. In this case, the
shapefiles contained: buildings and their function (habitation, industry ) and
construction date, neighbourhoods, roads, tracks, waters and land use. This for-
mat allows to make geographic computations such as distance, centres and areas,
as needed for computing the indicators. Moreover, shapefiles allow for easy visu-
alisation and integration within a simulation, needed for the further step of the
paper. An adjustment had to be made in the manner the indicators were com-
puted. For many calculations, the centre of the neighbourhood is used. Rigo did
not state how it was computed. Therefore, both the geometrical centre and the
mean coordinates of all buildings were compared. The mean coordinates method
was retained as it was better at reflecting the “centre” of a neighbourhood. The
geometrical variant was often in a non-occupied area.
The resulting generated dataset consisted of 997 neighbourhoods and 54 in-
4 Model Selection
The simulation was built on real-life indicators to create in-game indexes, show-
ing the user the state of the selected neighbourhood and its liveability. Fur-
thermore, actions affecting those had to be generated, in a relatively accurate
manner. Finally, a liveability score based on the indicators had to be created.
As not all indicators retrieved were initially significant, the data had to go
through a process of selection and combination of features. This section describes
the process of the analysis, to reach the final indicators and actions suitable for
the simulation and the creation of a liveability score.
4.1 Techniques
Dimensionality Reduction The aim of this section is to simplify the model by
creating understandable actions and indicators. For the purpose of grouping and
deriving actions, dimensionality reduction techniques are used. Feature selection
can be achieved by different methods, depending on the desired output. Several
dimensionality reduction and feature selection are available [21].
However, Exploratory Factor Analysis(EFA) [12] was more suited to the prob-
lem, thus was retained. EFA is a technique that finds the underlying latent factors
that cannot be measured by a single variable and that causes the changes in the
observed variables. The model can be interpreted as a set of regression equations
Indicators and Tests Multiple indicators can be used as support for selecting
features and improving the quality of the dataset. The following paragraphs
describe the tools used and tests performed in the next section.
The Correlation matrix is used to show the pairwise correlation value for
variables in a dataset. It can be used to discard variables in the cases of redun-
dancy, when the correlation is too high, and/or when the correlation is too low.
It can also be used as a tool to combine variables.
Communalities show the extent to which a variable correlates with all other
variables. They indicate the common variance shared by factors with given fea-
ture. Items with low communalities can be discarded to improve the analysis.
The Kaiser-Meyer-Olkin (KMO) is a measure of sampling adequacy. It in-
dicates the proportion of variance in the variables that might be caused by
underlying factors. Thus, it can measure how suitable the dataset is for Feature
Feature engineering is also necessary for combining variables, re-expressing
them into boolean or more explicit indicators.
4.2 Methodology
The developed approach is described in detail with the following subsections.
Feature Selection and Score Creation The scoring function used by Rigo
no longer applied to the collected indicators. Therefore, a new score function
had to be found. Nevertheless, the liveability classification was available. Thus,
a regression could be applied.
The dataset was firstly examined for constant indicators and outliers. One
variable remained constant and was consequently removed. The neighbourhoods
have been classified in 6 ordinal classes, the first class having the worst liveability
As one can see in Figure 1, the class with the lower score had in total 3 data
points. It was decided to discard that class as there were too few data points to
guarantee a good prediction.
Fig. 1. Liveability classes distribution of the dataset
In the next step, variables with a correlation of |1|were discarded, based on
the indicator description and correlation with other variables.
In Figure 1, one can observe that the dataset was not evenly spread among
classes. The imbalance in a dataset can negatively impact the results of a pre-
diction. Therefore, the Synthetic Minority Over-sampling Technique (SMOTE)
introduced in 2002 [2], was performed to correct the imbalances by over-sampling
the minority classes.
In order to find a model for the score, Regression was performed for classi-
fication. The algorithm predicted the numerical value of a data-point class(1 to
6), and the actual predicted class was obtained by truncating that number to
an integer. This technique had been used in order to later have a scoring func-
tion that output numerical values instead of classes, but in order to validate the
model, the classification recall was necessary. Different Regression algorithms
were compared, namely multinomial regression, K-nearest neighbours(KNN),
Random Forest(RF) and a Decision Tree as well as a perceptron. In Table 1,
one can see that SMOTE improved the recall of all algorithms. The Random
Forest(RF) one achieved the best recall, and thus was retained.
Table 1. Multinomial classification algorithms comparison based on a 10-fold cross
Additionally, to reduce over-fitting, the generated RF was used for feature
selection using mean decrease impurity as a feature importance measure. This
measure indicates how much each indicator decreases the impurity of a tree, the
variance in the case of a regression. The value is computed for each tree and for
each feature. For the RF, the importance of a feature is the average of those
values [16]. Indicators with too little impact were discarded.
Fig. 2. Importance of the features as given by the RF
Based on Figure 3, it was decided to discard the last three variables, as the
importance dropped noticeably at the last three elements, and their importance
value is below 0.01. Once the variables were cleaned, a 10-fold cross-validation
resulted in an average recall of 0.83. To interpret the resulting RF, one can
decompose the Decision Trees into a prediction function. The function can be
described as the sum of the contribution of each feature and the value of the
root node.
The value for a decision tree is computed as follows:
f(x) = bias +
contrib(x, k) (1)
where K is the number of features and bias is the value at the root of the tree.
For a Random Forest, the prediction value is the mean of all the values of all
Decision Trees [17].
F(x) = 1
contrib(x, k)) (2)
The model was saved and used as an evaluation function. The number of
indicators after the process was 44.
Actions and Groups Exploratory Factor Analysis(EFA) was performed in
order to find actions affecting the indicators, as they can be described as un-
derlying variables that affect the observable variables. After dropping variables
with low communalities, the KMO value of the dataset was 0.7, which was ac-
ceptable. The indicators were divided into two groups: direct and indirect. The
indirect indicators were ones that cannot be directly controlled by an action. For
example, the percentage of single households, non-westerners and family house-
holds. The directs indicators were the ones that can be directly “added” such as
groceries stores, parks, swimming pools, and others. An action “Add a library”
increases or decreases the indicator by a pre-computed value. For each “Add an
item” action, the value increase by 10% of its original value, and similarly for a
decreasing action. Another way to compute the effect of an action has been in-
vestigated, namely computing the increase by adding one element of that specific
type at the density centre of the neighbourhood. However, as the density centre
agglomerates most of the inhabitations, the increase on the liveability score were
too small to be considered. An increase/decrease of 10% was big enough to show
significant changes but small enough not to compromise the feasibility of the
actions. It was decided on keeping 1 indirect action and 11 direct ones.
Finally, to reduce the number of indicators shown in the simulation, vari-
ables were grouped into categories. Through the process, 5 groups were formed.
These grouping’s only purpose was to alleviate the interface. Consequently, the
author took the liberty to select the indicators representing similar features, i.e.
services, environment, housing, healthcare and leisure. Nevertheless, the newly
formed indicators had to be meaningful to the user. Therefore, the normalised
contributions of each feature, as computed by the RF model and described in
equation(2), were used to weight the indicators.
5 Optimization Algorithms
This section presents and describes multi-objective optimisation algorithms(MOOA).
Four algorithms were compared on several metrics, and the one judged the most
adequate was implemented into the simulation.
5.1 Problem Definition
The optimisation problem of this paper can be defined as:
Maximise f(x) = s
Minimise f(y) = t
subject to t > 0
where sis the liveability score and tthe number of turns (or actions).
The problem stated above is a Multi-Objective Optimisation problem (MOOP)
and’as opposed to a single optimisation problem, contains more than one vari-
able that needs to be optimised. It can be homogeneous if all variables need to
be maximised/ minimised, or mixed when it is constituted of both minimisation
and maximisation functions, also called minmax [22]. The above-stated problem
both minimises and maximises its variables, thus it is mixed.
The problem was translated into a string of binaries for each possible action
and for each neighbourhood. A value of 1 indicating the corresponding action
is used, and the total number of 1’s indicates the number of turns. Then, each
indicator affected by an activated action is updated. Finally, the new liveability
score of an individual is computed using the model described in 4.2.
5.2 Algorithms
Multi-objective optimisation algorithms have been a popular research topic, with
a growing interest. There are plenty of research papers covering those algorithms
compared and tailored for their specific application problems [18,23]. Hundreds
of variations have been proposed, some specific to a particular type of applica-
tion. They can be pareto-dominated based, indicators based, probability based,
preference-based or swarm based [18]. A comprehensive survey can be found in
[20]. As described in 5.1, the problem of this paper is discrete. Four well-known
optimisation algorithms for static and discrete multi-optimization problems were
compared and are briefly described here.
Non-Dominated Sorting Genetic Algorithm (NSGAII) Published in 2002 by
Deb et al. [4], an improved version of the original NSGA by Guria [10], it is
a non-dominated Pareto optimisation algorithm. It is the most used and well
known algorithm due its to simplicity, diversity-preserving mechanism, and bet-
ter convergence near the true Pareto optimal set.
Pareto Archived Evolutionary (PAES) Published in 2000 by Knowles and Corne
[13]. It is a simpler method, as it does not have a population but keeps non
dominated solutions in an external archive. At each iteration, a parent solution
is mutated, and a solution cis created. The parent pis then evaluated and
compared to the mutated solution c. If cdominates p, then cbecomes the new
parent and is added to the external archive.
Strength Pareto Evolutionary Algorithm(SPEA2) Zitzler et al. [25], introduced
in 2001 as an improved version built upon the SPEA, proposed earlier by the
same group Zitzler and Thiele [26]. SPEA2, also uses an external archive, but
uses a truncate procedure that eliminates non dominated solution but preserves
the best solutions, when the size of the archive is exceeded.
-Multi-objective Evolutionary Algorithm (-MOEA) -MOEA presented in 2003
by Dieb et al. [7] is a steady-state algorithm, that is it updates the population
one solution at a time and coevolves a population of individuals with an external
archive, where are stored the best non dominated solutions. The external archive
applies -dominance to prevent the deterioration of the population and to ensure
both convergence and diversity of the solutions.
5.3 Metrics
Genetic algorithms for multi-objective optimisation problems(MOOP) cannot
be easily compared based on traditional measures used for single-objective prob-
lems. The quality of a set of solution given by a multi-objective optimisation
algorithm is defined by the distance from the true Pareto Front(PF), its cover-
age and the diversity of the solutions. Therefore, well-known measures for MOOP
were used [24].
Spread The spread, also called “extent”, is a diversity and coverage measure
that shows the range of values covered by the set of solutions. [11]
df+dl+(N1) ¯
where diis the Euclidean distance between two consecutive solutions in P F
and ¯
dis the average of these distances. For a perfect distribution, =0 which
means that diis constant for all i.
Spacing The Spacing indicator is a distance-based measure of the uniformity of
a solution. [11] For a set of solutions S the spacing is defined as
SP (S) = s1
where ¯
dis the average of diand diis the Euclidean distance between and
solution s and the nearest member in the true Pareto Front.
A value of 0 indicates that the solutions are evenly spread along the Parieto
Cardinality is a measure of the number of non dominated solution given by
an algorithm [11] It is especially relevant to the problem of this paper, as an
algorithm offering multiple scenarios is preferred.
Additionally, the maximum score and the minimal number of turns from a
set of solutions are measured.
5.4 Experiments
System Specification The Platypus library for Python13 was used for the
implementation of the optimisation algorithms. All experiments were performed
on an i7 7700HQ Intel Core Processor 2.80GhZ and in Python 3.6.
Algorithm Parameters
NSGAII pop size = 100
PAES pop size = 100, archive size=100
SPEA2 pop size = 100, archive size=100
MOEA pop size =100, archive size=100, = 0.01
Table 2. Parameters setup
Experiment Set Up The algorithms were compared on 10000 and 20000 func-
tion evaluations(FE). Below 10000, the results were not as interesting in terms
of fitness and competition among algorithms, while more than 20000 would ne-
cessitate greater computational power.
Fig. 3. Results on 10 runs for 10000 FE
Results The Wilcoxon-Mann-Whitney’s test was performed on the 20000 FE
runs to check whether an algorithm is better than another one [8]. The test was
performed for each metric. A cross is present whenever an algorithm in the row
was significantly better than the algorithm in the column.
Fig. 4. Results on 10 runs for 20000 FE
Fig. 5. Results of the Wilcoxon-Mann-Whitney’s test
5.5 Discussion
One can notice that the cardinality of the SPEA2 algorithm was always equal to
100. This can be explained by the fact that the external archive limits the total
number of solutions SPEA2 retain. If the archive size was increased, it could re-
sults in an even higher cardinality. Nevertheless, it is surprisingly high compared
to other algorithms, considering that only unique non dominated solutions were
One can observe on Figure 6 -MOEA algorithm was outperformed in almost
all metrics and by nearly all the other algorithms. Indeed, - MOEA favoured
faster convergence, at the expense of diversity and uniform spread. However,
considering that it reaches a fairly similar score in a significantly smaller number
of turns, it wouldn’t be correct to classify it as “poorly performing”.
Both NSGA-II and PAES performed well for different measures, sometimes
even outperforming each other. While PAES had better results for the spread
and spacing, NSGAII gave higher scores in fewer turns.
On the other hand, SPEA2 was the best out of the four algorithms when
comparing spacing and cardinality. However, the number of turns needed was
significantly higher, while achieving a similar score as the other algorithms in
Figure 4, and a lower one in Figure 5.
Ultimately, a relatively high score in fewer turns was in the interest of this
paper application, more than diversity and uniform coverage of the Parieto Front.
Particularly when each turn represents a lot of money and work. Therefore, the
decision of choosing the best optimization algorithm was between the NSGA-II
and -MOEA.
Finally, it was decided that -MOEA would be most suited for this paper
problem. The characteristic faster convergence of the algorithm fits best the
simulation environment, as users do not like to wait.
6 Simulation
The final product, the simulation, is the integration of the previous sections. The
scientific methods described earlier are represented in darker shades.
Fig. 6. Pipeline of the simulation
The interface (GUI) was implemented in a minimalist and intuitive way, in
Kivy for Python. An AI option has been added to the simulation menu. Once
started, the algorithm outputs the set of solution of optimal actions. The dataset
and code for the application will be made available upon paper acceptance.
6.1 Experiments
In order to assess the effect of the simulation, experiments with users were per-
formed in an interview fashion. Users were asked to play around with the sim-
ulation. Then, they were assigned a task of reaching the highest score possible
within 10 turns. After the test, the user had to grade the value of the game in
term of entertainment, education, aesthetics and the ergonomics of the controls.
The grade given was in a scale from 1 to 10, 10 being the highest grade.
Entertainment Education Aesthetics Ergonomy Score
Mean 7.1 7.4 8.1 6.9 36.5
SD 1.28 1.17 1.19 0.8 18.5
Table 3. Results from 10 users tests
6.2 Results
From the results on Table 3, one can observe that the mechanics have room for
improvement but that overall, the different aspects are positively graded. There
were comments regarding the playability, difficulty due to too many choices and
the symbols, that were not always understood. Nevertheless, users were overall
pleased with the design, the controls and the purpose of the simulation.
7 Discussion
Of course, some limitations and critiques can be addressed concerning the process
of creating the simulation. One first possible limitation of the model described
in chapter 4 is that it was built on a dataset of the region of Limburg. The
dataset could have been extended to the whole Netherlands. However, due to
computational difficulties, a larger dataset was not possible.
The liveability model has undoubtedly room for improvement. For example,
more open source data could have been collected and used in the regression
model. Resulting actions could perhaps be even more meaningful. Hopefully, the
resulting framework can be easily modified to incorporate a new scoring function,
for future changes.
Moreover, one can argue that the interface could be improved for an even
more appealing serious game of a higher educational value. However, as it was
not the primary focus of this paper, the GUI was kept simple.
8 Conclusion and Future Work
As liveability becomes a key factor in everyday life and data science techniques
advance, it is only natural to combine them in order to improve decisions for the
good of society. This paper reports the methodology followed to build a serious
game for urban planning and on incorporating an AI module as a decision helper.
First, the steps followed to create a model for a simulation were described,
based on real data and scientific techniques. To create a new score with open-
source data, Exploratory Feature Analysis(EFA) and Random Forests(RF) were
both used to prune non-significant variables. Then, RF for regression was applied
and achieved a recall of 0.83 with 10-fold cross-validation, with a reduction from
the original 115 to 44 indicators. The indicators were divided into direct and
indirect action categories; the indicators that are directly related to an action
and the ones that are the indirect result of an action. The latter were created by
looking at underlying factors using EFA. This process resulted in 12 actions, 11
directs and 1 indirect. Moreover, the model was built in a suitable manner for
the simulation. For visibility purposes, the indicators were gathered manually
into 5 groups: housing, environment, services, healthcare and leisure.
Later on, the optimisation problem resulting from the model was formulated
as a multi-objective optimisation problem. The algorithms NSGAII, SPAE2,
PAES and M OE A were compared. The algorithm that was the most suited
for the simulation was MOEA due to its fast convergence and higher results
in terms of turns and score. Research works covering high dimensionality multi-
objective optimisation problems (many-objective optimisation problems), could
be investigated in the future, since dimensionality increase might lead to a
more complex optimization problem. Furthermore, state-of-the-art algorithms
but more complex such as Binary Bat Multi-Objective Algorithm or Multi Ob-
jective Ant Colony Algorithm could be compared to algorithms in this paper.
Finally, the model and the optimisation algorithm were incorporated into an
interface. The feedback received from testers was generally positive. Neverthe-
less, further improvements in the interface can only increase the quality of the
serious game.
The resulting simulation from the process of building the model, the AI
module and the interface answered the questions on how to incorporate an AI
module for a simulation and how to build a serious game based on real data.
In light of these results, one can conclude that data science techniques can be
successfully applied to building an urban planner for both entertainment and
education purposes. Arguably, the product of this paper is one step forward in
filling the gap between private and public applications.
1. Bishop, I., Rajabifard, A., Saydi, M.: Mutopia: a collaborative tool for engineering
sustainable systems. U21 Graduate Research Conference (2009)
2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic mi-
nority over-sampling technique. Journal of artificial intelligence research 16, 321–
357 (2002)
3. Chu, W., Low, H., Rix, S.: TSNS 2020 Neighbourhod Equity Index Methodological
Documentation. Tech. rep., Social Policy Analysis and Research, Toronto (2014)
4. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective
genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation 6(2),
182–197 (2002)
5. Ekim TAN: Play-the-city workshop. Building the shared city : how can we engage
citizens ? Amsterdam Seminar (2012)
6. ESRI: Esri shapefile technical description (july 1998)
7. Fan, Z., Li, W., Cai, X., Huang, H., Fang, Y., You, Y., Mo, J., Wei, C., Goodman,
E.: An Improved Epsilon Constraint-handling Method in MOEA/D for CMOPs
with Large Infeasible Regions. ArXiv e-prints (jul 2017)
8. Garcia, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-
parametric tests for analyzing the evolutionary algorithms behaviour: a case study
on the cec2005 special session on real parameter optimization. Journal of Heuristics
15(6), 617 (2009)
9. Gonzalez, C.: Student Usability in Educational Software and Games: Improving
Experiences: Improving Experiences. IGI Global (2012)
10. Guria, C., Bhattacharya, P.K., Gupta, S.K.: Multi-objective optimization of reverse
osmosis desalination units using different adaptations of the non-dominated sorting
genetic algorithm (nsga). Computers & chemical engineering 29(9), 1977–1995
11. Hamdy, M., Nguyen, A.T., Hensen, J.L.: A performance comparison of multi-
objective optimization algorithms for solving nearly-zero-energy-building design
problems. Energy and Buildings 121, 57–71 (2016)
12. Hurley, A.E., Scandura, T.A., Schriesheim, C.A., Brannick, M.T., Seers, A., Van-
denberg, R.J., Williams, L.J.: Exploratory and confirmatory factor analysis: Guide-
lines, issues, and alternatives. Journal of Organizational Behavior 18(6), 667–683
13. Knowles, J., Corne, D.: The pareto archived evolution strategy: A new baseline
algorithm for pareto multiobjective optimisation. In: Evolutionary Computation,
1999. CEC 99. Proceedings of the 1999 Congress on. vol. 1, pp. 98–105. IEEE
14. Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic
algorithms: A tutorial. Reliability Engineering & System Safety 91(9), 992–1007
15. Leidelmeijer, K., Marlet, G., Ponds, R., Schulenberg, R., Van Woerkens, C.: Leef-
baarometer 2.0: Instrumentonwikkeling Research en Advies (2014)
16. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable impor-
tances in forests of randomized trees. In: Advances in neural information processing
systems. pp. 431–439 (2013)
17. Palczewska, A., Palczewski, J., Robinson, R.M., Neagu, D.: Interpreting random
forest models using a feature contribution method pp. 112–119 (2013)
18. Pindoriya, N., Singh, S., Lee, K.Y.: A comprehensive survey on multi-objective
evolutionary optimization in power system applications. In: Power and Energy
Society General Meeting, 2010 IEEE. pp. 1–8. IEEE (2010)
19. Poplin, A.: Digital serious game for urban planning:b3design your marketplace!.
Environment and Planning B: Planning and Design 41(3), 493–511 (2014)
20. Qu, B., Zhu, Y., Jiao, Y., Wu, M., Suganthan, P., Liang, J.: A survey on multi-
objective evolutionary algorithms for the solution of the environmental/economic
dispatch problems. Swarm and Evolutionary Computation 38, 1–11 (2018)
21. Sorzano, C.O.S., Vargas, J., Montano, A.P.: A survey of dimensionality reduction
techniques. arXiv preprint arXiv:1403.2877 (2014)
22. Srinivas, N., Deb, K.: Muiltiob jective optimization using nondominated sorting in
genetic algorithms. Evolutionary computation 2(3), 221–248 (1994)
23. Yapo, P.O., Gupta, H.V., Sorooshian, S.: Multi-objective global optimization for
hydrologic models. Journal of hydrology 204(1-4), 83–97 (1998)
24. Yen, G.G., He, Z.: Performance metric ensemble for multiobjective evolutionary al-
gorithms. IEEE Transactions on Evolutionary Computation 18(1), 131–144 (2014)
25. Zitzler, E., Laumanns, M., Thiele, L.: Spea2: Improving the strength pareto evo-
lutionary algorithm. TIK-report 103 (2001)
26. Zitzler, E., Thiele, L.: An evolutionary algorithm for multiob jective optimization:
The strength pareto approach. TIK-report 43 (1998)
Full-text available
This paper proposes an improved epsilon constraint-handling mechanism, and combines it with a decomposition-based multi-objective evolutionary algorithm (MOEA/D) to solve constrained multi-objective optimization problems (CMOPs). The proposed constrained multi-objective evolutionary algorithm (CMOEA) is named MOEA/D-IEpsilon. It adjusts the epsilon level dynamically according to the ratio of feasible to total solutions (RFS) in the current population. In order to evaluate the performance of MOEA/D-IEpsilon, a new set of CMOPs with two and three objectives is designed, having large infeasible regions (relative to the feasible regions), and they are called LIR-CMOPs. Then the fourteen benchmarks, including LIR-CMOP1-14, are used to test MOEA/D-IEpsilon and four other decomposition-based CMOEAs, including MOEA/D-Epsilon, MOEA/D-SR, MOEA/D-CDP and C-MOEA/D. The experimental results indicate that MOEA/D-IEpsilon is significantly better than the other four CMOEAs on all of the test instances, which shows that MOEA/D-IEpsilon is more suitable for solving CMOPs with large infeasible regions. Furthermore, a real-world problem, namely the robot gripper optimization problem, is used to test the five CMOEAs. The experimental results demonstrate that MOEA/D-IEpsilon also outperforms the other four CMOEAs on this problem.
Full-text available
Development of efficient multi-objective evolutionary algorithms (MOEAs) has provided effective tools to solve environmental/economic dispatch (EED) problems. EED is a highly constrained complex bi-objective optimization problem. Since 1990s, numerous publications have reported the applications of MOEAs to solve the EED problems. This paper surveys the state-of-the-art of research related to this direction. It covers topics of typical MOEAs, classical EED problems, Dynamic EED problems, EED problems incorporating wind power, EED problems incorporating electric vehicles and EED problems within micro-grids. In addition, some potential directions for future research are also presented.
Conference Paper
Full-text available
Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance. For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.
Full-text available
Integrated building design is inherently a multi-objective optimization problem where two or more conflicting objectives must be minimized and/or maximized concurrently. Many multi-objective optimization algorithms have been developed; however few of them are tested in solving building design problems. This paper compares performance of seven commonly-used multi-objective evolutionary optimization algorithms in solving the design problem of a nearly zero energy building (nZEB) where more than 1.610 solutions would be possible. The compared algorithms include a controlled non-dominated sorting genetic algorithm with a passive archive (pNSGA-II), a multi-objective particle swarm optimization (MOPSO), a two-phase optimization using the genetic algorithm (PR-GA), an elitist non-dominated sorting evolution strategy (ENSES), a multi-objective evolutionary algorithm based on the concept of epsilon dominance (evMOGA), a multi-objective differential evolution algorithm (spMODE-II), and a multi-objective dragonfly algorithm (MODA). Several criteria was used to compare performance of these algorithms. In most cases, the quality of the obtained solutions was improved when the number of generations was increased. The optimization results of running each algorithm 20 times with gradually increasing number of evaluations indicated that the PR-GA algorithm had a high repeatability to explore a large area of the solution-space and achieved close-to-optimal solutions with a good diversity, followed by the pNSGA-II, evMOGA and spMODE-II. Uncompetitive results were achieved by the ENSES, MOPSO and MODA in most running cases. The study also found that 1400-1800 were minimum required number of evaluations to stabilize optimization results of the building energy model.
Conference Paper
Full-text available
Despite growing interest and practical use in various scientific areas, variable im-portances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Im-purity (MDI) variable importances as measured by an ensemble of totally ran-domized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input vari-ables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.
Full-text available
The main goal of this paper is to study the design and implementation of a digital serious game for civic engagement in urban planning. Digital serious games are games that aim to support learning in a playful and engaging way. Learning about the environment and planned changes is essential in civic engagement. The study case is taken from a city district, Billstedt, in Hamburg, Germany. In the implementation of a game concept we concentrated on the design of a marketplace in Billstedt. The game was called “B3— Design your Marketplace!” The B3 game aims to provide a playful digital environment in which the citizens gain information about the current situation in the city district, have the possibility of submitting their own designs for the marketplace, vote for the preferred designs, and chat with the experts and other participants. The prototype of the B3 serious game was evaluated with a group of students and a group of elderly people. The majority of the participants involved in testing expressed appreciation for the digital serious game as a new form of online civic engagement in urban planning. The paper concludes with a discussion about the potential of digital serious games for civic engagement and open research questions.
Full-text available
Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.
Evolutionary algorithms have been successfully exploited to solve multiobjective optimization problems. In the literature, a heuristic approach is often taken. For a chosen benchmark problem with specific problem characteristics, the performance of multiobjective evolutionary algorithms (MOEAs) is evaluated via some heuristic chosen performance metrics. The conclusion is then drawn based on statistical findings given the preferable choices of performance metrics. The conclusion, if any, is often indecisive and reveals no insight pertaining to which specific problem characteristics the underlying MOEA could perform the best. In this paper, we introduce an ensemble method to compare MOEAs by combining a number of performance metrics using double elimination tournament selection. The double elimination design allows characteristically poor performance of a quality algorithm to still be able to win it all. Experimental results show that the proposed metric ensemble can provide a more comprehensive comparison among various MOEAs than what could be obtained from a single performance metric alone. The end result is a ranking order among all chosen MOEAs, but not quantifiable measures pertaining to the underlying MOEAs.