Content uploaded by Eiman Kanjo
Author content
All content in this area was uploaded by Eiman Kanjo on May 30, 2017
Content may be subject to copyright.
Accepted Manuscript
Towards Unravelling the Relationship between On-Body,
Environmental and Emotion Data using Sensor Information Fusion
Approach
Eiman Kanjo , Eman M.G. Younis , Nasser Sherkat
PII: S1566-2535(17)30343-3
DOI: 10.1016/j.inffus.2017.05.005
Reference: INFFUS 876
To appear in: Information Fusion
Received date: 18 January 2017
Revised date: 13 April 2017
Accepted date: 28 May 2017
Please cite this article as: Eiman Kanjo , Eman M.G. Younis , Nasser Sherkat , Towards Unravelling
the Relationship between On-Body, Environmental and Emotion Data using Sensor Information Fusion
Approach, Information Fusion (2017), doi: 10.1016/j.inffus.2017.05.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Highlights
collect heterogeneous and synchronised data from on-body and environmental sensors.
utilise sensor data driven approach to study the relationship between the environment and
on-body.
model the short term impact of the ambient environment on human body.
-predict emotions based on-body sensors and environmental data.
-employ information fusion techniques at the data, feature and decision levels.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Towards Unravelling the Relationship between On-Body, Environmental and
Emotion Data using Sensor Information Fusion Approach
Eiman Kanjo
eiman.kanjo@ntu.ac.uk
Computing and Technology
Nottingham Trent University
Eman M.G.Younis
eman.younas@mu.edu.eg
Faculty of Computers and
Information Minia University,
Egypt
Nasser Sherkat
Nasser.Sherkat@ntu.ac.uk
Computing and Technology
Nottingham Trent University
Abstract
Over the past few years, there has been a noticeable advancement in environmental models and information
fusion systems taking advantage of the recent developments in sensor and mobile technologies. However, little
attention has been paid so far to quantifying the relationship between environment changes and their impact on
our bodies in real-life settings.
In this paper, we identify a data driven approach based on direct and continuous sensor data to assess the impact
of the surrounding environment and physiological changes and emotion.
We aim at investigating the potential of fusing on-body physiological signals, environmental sensory data and
on-line self-report emotion measures in order to achieve the following objectives: 1) model the short term
impact of the ambient environment on human body, 2) predict emotions based on-body sensors and
environmental data.
To achieve this, we have conducted a real-world study „in the wild‟ with on-body and mobile sensors. Data was
collected from participants walking around Nottingham city centre, in order to develop analytical and predictive
models.
Multiple regression, after allowing for possible confounders, showed a noticeable correlation between noise
exposure and heart rate. Similarly, UV and environmental noise have been shown to have a noticeable effect on
changes in ElectroDermal Activity (EDA). Air pressure demonstrated the greatest contribution towards the
detected changes in body temperature and motion. Also, significant correlation was found between air pressure
and heart rate.
Finally, decision fusion of the classification results from different modalities is performed. To the best of our
knowledge this work presents the first attempt at fusing and modelling data from environmental and
physiological sources collected from sensors in a real-world setting.
Keywords- Multi sensor data fusion; regression analysis sensor data, Multivariable Regression;
Affective Computing, Physiological signals; Machine Learning.
1 Introduction
Repeated exposures to environmental stressors (such as pollution, noise and crowded areas) cause
physical illnesses (e.g., headaches, fatigue, sleeping disorder, and heart diseases) and behavioural
issues (e.g., stress, attention deficit, anger, and depression) [1, 2, 3].
The effect of these stressors on health has been a focal point in health research. Models have been
widely used as indispensable tools to assess effects of environmental factors on human and health. In
particular, modelling the level of exposures to environmental pollutants such as [4], [5].
A decade-long study of 6.6 million people, published in the Lancet recently, found that one in 10
dementia related deaths in people living within 50 metres of a busy road was attributable to fumes and
noise. There was a linear decline in deaths the further people lived away from heavy traffic [6].
Additionally, Chen's group [6] noted that because air pollution exposure was estimated at the postal-
code level, it may not account accurately for each individual's exposure. The study suggested that
more research to understand this link is needed, particularly into the effects of different aspects of
traffic, such as air pollutants and noise at a higher granular levels.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
In general, epidemiological and statistical analysis are usually studied based on observed
environmental data, which have traditionally been obtained from governmental sources or from a
number of sporadically distributed sensing nodes. In both cases, the performance of these studies is
evaluated against relatively few directly measured data points [7].
Conversely, the capabilities and availability of cheaper, more sensitive and sophisticated sensors for
gases, particulates, water quality, noise and other environmental measurements have improved and are
enabling researchers to collect data in unprecedented spatial, temporal and contextual detail [7][8].
These sensors range from bespoke devices designed for specific applications, to those found on more
mainstream personal devices, such as smartphones. In some cases, people may act as environmental
sensors by reporting what they see, hear and feel by participating in the citizen science of
environmental conditions [9]. By leveraging widely available wearable devices, communication and
sensor technologies many new sensor systems are relatively low-cost compared with technologies
used in established monitoring stations [10, 11].
Advances in data science and fusion techniques are critical to enable researchers to make best use of
the vast amounts of additional, heterogeneous sensor data sources.
Despite the popularity of using wearable sensors for emotion recognition, the problem of quantifying
the relation between environmental variables and physiological body reactions and emotions has been
overlooked. In addition, the relationship between emotions and all the other environmental and body
factors have been studied qualitatively.
In this paper, we incorporate a sensor-data driven approach to understand the relationship of various
environmental measures with wellbeing and emotion. By unobtrusively collecting data from on-Body
and environmental sensors we can get better understanding of the association and causality of the
environmental bases for human health including psychological changes.
This leads us to investigate the following research questions:
1. How can we model and fuse the relationship between on-body and environmental variables?
2. Can the multi heterogeneous sensors integration improve our understanding of the
associations and environmental impact on human health?
3. How can information fusion best make use of the „on-body and environmental Sensor Data‟
to infer emotion?
Our approach to answer these questions is based on two phase framework in information fusion,
which utilizes the new available heterogeneous sensors of multiple modalities as mobile interfaces by
studying the relationship between these data sources in spatial-temporal context. Moreover, by
studying its relationship with emotion based on decision fusion.
In order to follow our approach, we collected data from forty subjects using on-body sensors „in the
wild‟ around Nottingham city centre environment. The data collected include on-body data such as
body movement, heart rate (HR), Electrodermal activities (EDA) and body temperature and,
environmental data including noise level (Env-noise), air pressure and ambient light levels (UV), as
shown in Figure 1.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Figure 1. The relationship between different modalities, the environment, human body, Motion and
emotions data.
In addition, collected GPS data record the user locations while gathering data. The different data
channels are collected, cleaned, aggregated and smoothed for different users and user emotions labels
are collected using self-report input, based on 5-step SAM Scale for Valence taken in [12].
The selection of sensors and data analysis techniques is optimized from the ground up with the
emotion inference application in mind for outdoor environments.
We have adopted an information fusion approach to analyse and model the data since this method
offers an effective solution to many of the issues found in analysing data from individual sensors.
Information fusion allows integration of independent features and prior knowledge and, provides a
better means of identifying specific aspects of the target application domain and improve robustness
against interferences of data sources [13].
For examples physiological data, such as heart rate reveal the physical effort of an activity but they
may be influenced by external factors such as environmental conditions or social interaction. All of
these sources provide only partial information related to the actual individuals‟ activity.
In this work we utilise, multi-sensor fusion to demonstrate the feasibility of capturing diverse and
multi-model derived features in order to identify relationships, associations and causality and,
formalize models describing people‟s reaction and emotions.
Our data fusion approach is in three folds: 1) Data fusion by collecting data from multiple sources
including HR, EDA, body temperature, movement and activity, environmental noise, location, air
pressure and UV. 2) Feature fusion by examining relationship between our environmental variables
and physiological variables based on exploratory statistics and Multivariate Regression modelling,
also by looking at the variable importance and variation 3) Decision fusion by combining multiple
classifiers from different modalities for emotion prediction.
The rest of the paper is structured as follows. Section 2 discusses related work focusing on previous
efforts in quantifying environmental health impact along with a brief review of on-Body sensors and
related information fusion techniques. Section 3 covers the methodology including the user study,
system architecture of the proposed method, initial data processing and descriptive statistics. Also
section 4 introduces multivariate regression and its math quotation. Section 5 reports the results of the
multimodel analysis and emotion prediction based on decision fusion. Followed by discussion and
conclusion sections respectively.
2 Related Work
2.1 Quantitative assessment of environmental health impacts
Human exposure to environmental pollutants such as noise, air pollution, traffic or even crowded
areas can cause severe health problems ranging from headaches and sleep disturbance and heart
diseases [1, 2].
The relationship between human body and the environmental factors has been extensively studied in
social and environmental sciences, psychology and environmental health literature [3, 14]. WHO,
Body
Motion
HR
EDA
Body-Temp
Emotion
Environnent
Env-Noise
UV
Air pressure
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
defines “Environmental Burden of Disease” [15] as one methodology for quantitatively assessing
environmental health impacts at the population level in terms of deaths, Disability Adjusted Life
Years (DALYs), or occasionally the number of cases. Other indirect measures can be used to estimate
health impacts, for example the number of hospital admissions.
According to WHO quantitative assessments of health impacts are based on combining exposure data
with exposure-response information. Such assessments require (i) the compilation of exposure data,
(ii) a systematic review of evidence from epidemiology and other scientific disciplines concerning the
association between environmental factors and human health, and (iii) the combination of exposure
For examples, very recently a study has found that in large population-based cohort living close to
heavy traffic was associated with a higher incidence of dementia but not with Parkinson's disease or
multiple sclerosis [6].
In addition, the negative effect of noise on human health are discussed extensively in the literature
including health issues related to sleep disorders, heart problems, vision problems and many more
[14].
Similarly, many previously medical studies have confirmed that changes in temperature, humidity,
weather events such as storms can trigger asthma attacks [18].
A criticism of statistical epidemiologic models is their focus on identifying association, while
causality remains difficult to assess, despite the fact that many information theoretical and physical
based models have been developed recently for dissecting spatial-temporal correlation time series
more deeply than with traditional statistical models [7].
For examples, the average environmental exposure across regions rarely reveals the specific health
problems people face in any given location. Most people live around urban areas. They go down and
walk about on city streets and get around by cars, trains or buses. Therefore, in order to know more
about the impact of their surrounding and current environment there is a need to monitor people while
carrying their daily activities.
For example, cyclists might get exposed to a high level of pollution in half an hour when riding their
bikes behind buses than other people get in an entire month. There is a need to monitor and assess
people‟s exposure and health impact in short term and at high granular spatial scale.
Most of the related traditional statistical models do not take advantage of the availability and
affordability of modern sensors for on-body and environmental data collection that can make it
possible to collect accurate environmental and health data for analytics and modelling.
The increasing pervasiveness of wearable and sensor devices has created new opportunities for
sensing people‟s activities around physical spaces. These new data sources at high level of
granularities enables higher level of estimation of human exposures to environmental conditions and
quantifying health-related responses that may be associated with such exposures.
Some attempts have been made to use data driven approaches to characterise the impact of
environment on health. For examples, mobile phone data have been used to parameterize population
movement networks to the spread of malaria [16] [17].
Recently, marrying data from personal monitoring devices with air pollution models has improved the
characterizations of air pollution exposures [19, 20, 21], and in other cases, has employed energy
expenditure sensors to improve exposure prediction [22]. Beside health impact, emotions and
physiological changes have also started to grab attention as a direct influence on wellbeing. Kööts et
al. [23] studied the relationship between positive and negative emotions and the environmental
changes such as temperature. Park and Farr [24] studied the relationship between lighting and
emotions in a business retail environment. In response to this, we have added the following to the
related work section: Gravina and Fortino have developed a novel algorithm designed to detect the
(Cardiac Defense Response) CDR by analyzing the electrocardiogram (ECG) signal [13]. This
approach helps in detecting preceding negative emotional states including fear, chronic worry and
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
panic. This approach helps in detecting preceding negative emotional states including: fear, chronic
worry, and Panic.
In addition, many research projects have studied emotion and its relationship with health and
physiological changes [25-36], however none of them have considered integrating physiological and
health sensors along with environmental sensors, in order to model and predict emotions.
In this work, we present an emotional analytical model where the environmental and the physiological
measures have been combined. Also, we study the relationship between environmental and health
variables based on sensor data collected from forty participants walking along the same urban route in
Nottingham. Both environmental and physiological data are collected simultaneously along with
spatial and temporal information in order to understand at a small scale the relationship between these
parameters along with emotion.
2.2 On-Body Sensors
Body physiological signals require sensors for their measurements. In the past wearable sensors were
intrusive and uncomfortable to be used in the real world experiments. However, nowadays with the
advancement of the wearable sensors and mobile technologies these sensors have become non-
invasive and comfortable for the users, with the availability of wrist-bands, equipped with many built-
in sensors. Table 1 presents a list of on-body sensors that have been used for emotion detection:
Table 1: List of physiological sensors and signals used widely for emotion detection.
In addition to the above sensors, currently many wristbands and wearable devices offer a wide range
of sensors that are not restricted to health or body statistics. For examples, pollution sensors along
with weather stations and other environmental sensors such as light and colours are widely available
in different shapes and styles [8][9][19].
Many researchers have started to look at different ways of programming and managing these sensors
And, to fuse the data using various computational methods such information fusion [13].
2.3 Information fusion
Information Fusion is the merging of information from heterogeneous sources with differing
conceptual, contextual and typographical representations. It can be performed on three levels:
First, “Data Level” fusion aims at collecting different data elements from different sensors to
Sensor
Signals and characteristics
Heart Rate
The produced signal is showing the changes in the heartbeats over time. The distance between
two consecutive pulse peaks is called the RR interval. It has been widely used in many emotion
recognition studies such as [37-42] to measure health and emotions.
Body Temperature
Although, the temperature signal is very simple, it could be used as an indicator of the person‟s
emotions and mood changes [37, 33]. [40] proved that the nervous system activity can be
detected by changes in skin temperature called Temperature Variability (TV).
Breathing
It has been used widely to measure how fast the person is breathing and patterns of breath. It has
proved to be correlated with the heart rate and person‟s emotions [28, 30, 36].
Motion
Modern accelerometers include tri-axial micro-electro-mechanical systems (MEMS) for three-
dimensional acceleration measurement with sub-second time resolution. However, for analysis,
these measurements are usually converted into a uni-axial representation, measuring cumulative
activity for a certain period of time. For simplicity, the motion can be represented as the root
mean square of all the three components such as X2+Y2+Z2.The accelerometer is now
embedded in almost all mobile phones and recently used for emotion recognition in [35].
Electrodermal
Activity
Also known as Galvanic Skin Resistance (GSR) has shown high correlation with the emotions
and stress detection [29,30,33,34,37,39, 42].
EEG Headsets
EEG devices are normally used to measure the electrical activity of the brain. It has been used to
measure the emotions and attention [25, 41, 42].
Muscle contraction
(EMG)
EMG measures the electrical pulsed resulted from muscle contraction. It has proved effective in
detecting arousal in [30,34,36,38,39, 41, 42].
Blood Volume pulse
(BVP)
PVP has been used for emotion recognition always combined with one or more of the previous
sensors [30,39,36].
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
complement each other. It can be done during data collection to fuse external data sources
such as user self-reporting of emotions [13,42].
Second, “Feature Level” fusion is performed during data analysis to find the best set of
features for the classification. Feature level fusion has been done in [36], to find the best
combination of features using EMG, RSP, SC and ECG signals for emotion recognition.
Third, “Decision Level” fusion, which aims to combine the results of multiple techniques to
improve decision making. A recent review of various data fusion techniques and applications
in body sensor networks can be found in [13]. Granero, et al. [42], used feature level fusion to
classify emotions and proved that the ECG and EDA signals are the most significant signals in
emotion classification.
3 Methodology
3.1 Data Collection
The data collection setup is depicted in Figure 2. In this process, we gather various sensor and self-
report data from a smart phone application named “EnvBodySens”, and Microsoft wristband 2.
Collected data is then logged and stamped with the time and date. The application also records the
data shown in Table 2.
Table2. List of the collected data.
Microsoft Band 2
Android Phone 6
Heart Rate (HR)
Environmental Noise (Env-Noise)
Electro Dermal Activities (EDA)
GPS Location
Body Temperature (Body-Temp)
Self-Report of Emotion (1 to 5)
Hand Acceleration (Motion)
Air Pressure
Light (UV)
In the EnvBodySens application, an interface is implemented for continuous and quick labelling of
user emotions while walking and collecting data. When the user launches the application, mobile
interface appears with five iconic facial expressions ranging from very negative to very positive. A
screen capture is presented in Figure 2 (left) showing the five emotion buttons.
Users are asked to constantly select one of the affective categories (in the form of buttons) as they
walk around the city centre. We disabled the screen auto sleep mode on our mobile devices, so the
screen was kept on during the data collection process. We adopted the 5-step SAM Scale for Valence
taken from [10] to simplify the continuous labelling process.
The dimension valence fits well into our experimental setup since it describes the positive or negative
feeling caused by a situation, an object or an event.
Figure 2. (left) Screenshot of EnvBodySens application, (right) Data Collecting process.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
The study was launched in July 2016. A call for participation in the study was advertised in various
mailing lists. Forty participants took part in the study all female with an average age of 28. The study
was approved by Nottingham Trent University‟s Ethics Committee. We have chosen to recruit female
participants only in order to rule out other factors (i.e. confounders) related to gender.
Participants were scheduled to take part in the study in order to collect data around Nottingham city
centre. The participants have been asked to meet with a trained researcher in a low stress environment
(a café) where they were given light refreshments while the experiment is set up. The researcher
provided them with details of the study protocol, obtained informed consent, and equipped them with
the study equipment. The study was carried out over a number of days in order to find convenient
times for the participants and to allow for the limited number of devices available. Information and
study details were sent to the participants ahead of the data collection session. The participants were
asked to spend no more than 45 minutes collecting data.
The reasons for limiting the journeys to 45 minutes are as follows: 1) the shopping route is relatively
narrow and can be walked along during this time frame.. 2) users from previous experiments found it
hard to walk longer.
Based on the previous experience we have found it difficult to motivate participants to walk longer
[53,54]. Additionally, we plan to carry out further analysis on the data that requires adopting fixed
route with pre-defined time frame.
Data was collected in similar weather conditions (average 20 degrees), at around 11am. During the
data collection process 550,432 data lines were collected as well as 5,345 self-report responses.
3.2 System Architecture
Our approach consists of real-time collection and off-line analysis of the sensor data using
information fusion techniques at all the three fusion levels. The architecture is composed of a number
of processing blocks as depicted in Figure 3. First, the data is collected using on-body sensors and
then fused with other contextual user data such as location, noise and emotional state (self-report)
which is the data-level fusion. Second, the collected data is cleaned and pre-processed. Third,
Statistical correlation, covariance and multiple regression analysis are performed. Fourth, the emotion
predictive model is created by extracting features from sensor data and using feature selection for
decision fusion. Then, stacking model training is carried out using training examples and then testing
the model using unseen data for evaluation.
Figure 3. Overall block scheme of the proposed Information Fusion system (Two phase frame work).
Model Performance
Evaluation
On-Body Data
Acquisition
Data acquisition from the
Smart Phone
Data Level
Fusion
Data cleaning
pre-processing
Segmentation
Predictive Model
Statistical Correlation and
covariance Analysis
Multivariable Data Analaysis
Multiple Regression analysis
Variable Importance and PCA
Feature extraction, and Selection
Stacking Model Training
Model Testing
Predictive Model
Self-report labels
Location
Environmental Noise
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
3.3 Data Pre-Processing
After the data acquisition the signals are pre-processed and cleaned. Then, the first and the last 30
seconds were cut from the beginning of the data collection process for each user dataset, the reason
for this step is that participants needed a few seconds to fully get into the movement and also few
seconds to stop the data collection at the end of the experiment.
Data from six users were excluded due to logging problem, for examples one user wasn‟t able to
collect data due to battery problem with the mobile phone, another one switched the application off
accidently.
We produced Lagged Poincare plots for each individual data subset, in order to remove the ones with
abnormal heart rate patterns. The Poincare plot is a visual tool that uses the ratio between standard
descriptors for short-term correlation (SD1) and long-term correlation (SD2) between RR intervals to
assess the health of the heart. It has been found that the peculiar shape of RR interval is not an artefact
or mere placement of point but a specific temporal correlation between the successive RR intervals
and hence prelates closely to the natural rhythm of heart as a response to many different complex
closed loop systems controlling the heart [43]. Given a time series of the form: xt, xt+1, xt+2,…, a return
map in its simplest form first plots (xt, xt+1), then plots (xt+1, xt+2), then (xt+2, xt+3), and so on.
The shape of the RR interval distribution shows an elliptical pattern and the ratio of SD1/SD2 should
be higher for a healthy person. Conversely, the shape of RR interval distribution is a non-elliptical
pattern and the ratio is much lower for a subject with impaired heart or reduced HRV [67]. The
typical cases of normal and impaired subject are as shown in the right panels of Figure 4.
Figure 4. Abnormal (left) and normal (right) Poincare plots.
HR data from our users have been checked using Poincare plots. All our pariticpants have noramal patterns
similar to Figure 4 (right).
3.4 Statistical Analysis
Various statistical methods including descriptive statistics, covariance and correlation matrixes, and
Principle Component Analysis (PCA) map have been used to identify variables to be included in the
multiple regression analyses. Table 3 shows descriptive statistics of the data estimated for all subjects.
This includes: the mean (μ), standard deviation (std), median, minimum (min), 1st Quartile (25%), 2nd
Quartile (50%), 3rd Quartile (75%), maximum (100%) and the skewness and kurtosis of the various
body and environment sensor signals, where N=472,904 samples (after data cleaning).
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Table 3. Descriptive, summary statistics for the collected signals.
μ
Median
Min
1st Qu.
2nd Qu.
3rd Qu.
Max
skewness
kurtosis
Air-Pressure
1014
5.41
1014
1002
1012
1014.0
1019
1020
-0.882
2.81
EDA
1454
348.9
343
15
185
347
952
2903
9.44
90.8
Env-Noise
54.20
3.786
53
20
52
53
55
95
-0.33
14.4
HR
80.24
11.75
77.00
49
73
77
83
189
2.91
18.02
UV
795.9
2646.06
131.0
0.0
47
131
418
62359
9.9
137.8
X
-0.15
0.662
-0.081
-4.27
-0.82
-0.08
0.28
2.27
0.09
2.16
Y
-0.01
0.625
0.018
-2.77
-0.61
0.018
0.515
3.93
-0.03
1.84
Z
0.01
0.47
0.12744
-1.92
-0.39
0.127
0.348
3.86
-0.27
2.26
Body-Temp
28.93
1.62
28.93
24.67
27.7
28.8
29.89
33.8
0.49
3.46
The correlation matrix in Figure 5 shows a low level of correlation between the independent variables
which suggests that our model will not be affected by the Multi-collinearity problem, which is a basic
pre-condition for applying multiple linear regression analysis.
Figure 5. Correlation Matrix of the independent variables.
Figure 6 shows the PCA map for all the variables indicating that the first PCA component has positive
coefficients for all the on-body measurements such as HR, EDA, Motion, and Body-Temp. That is why
the three vectors are directed into the right top-quarter of the plot; while all the environmental
measurements including Env-Noise, Air-Pressure, UV and Motion are on the lower half of the plot.
Thus, we need to further understand the relationship between both of them and their relation to human
emotions.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Figure6. PCA plot for all the on-body and environmental measurements.
Based on the covariance matrix, if the covariance is positive, this means that the two variables are
mutually increasing. Conversely, if the correlation is negative, this means that the two variables are
mutually decreasing. If the covariance is zero, this means that there is no relationship between the two
variables. It is noticed from Table 4 that the air pressure has negative relation with all the other
factors. Whereas. The EDA is negatively related with UV and Body-Temp and positively related with
Env-Noise, HR and Motion. In addition, HR is negatively related with UV and Body-Temp. In
addition, the Body-Temp is negatively related to EDA and positively related with the Env-Noise, HR,
UV and Motion. In addition, Motion is negatively related with positively related with all the other
variables.
Table 4. The covariance matrix for all the environmental and body measurements
Air Pressure
EDA
Env-Noise
HR
UV
Motion
Body-Temp
Air pressure
28.784
-2.674
-4.073
-2.300
-133.653
-0.0165
-4.073
EDA
-267.481
8.239
319.497
3635.712
-1952.3
0.4607
-77.08
Env-Noise
-4.073
3.194
14.386
3.166
574.733
0.0214
0.869
HR
-2.300
3.635
3.166
138.587
414.358
0.0303
0.764
UV
-133.65
-1.952
574.73
-133.65
7070.60
0.0303
38.769
Motion
-0.0165
4.607
0.0214
-0.0165
37.143
37.143
0.0025
Body-Temp
-4.073
-7.708
0.869
-4.073
38.769
0.002
2.51
Based on the above analysis, we included all the independent variables for the Multivariate
Regression analysis in the next section.
It should be noted that EDA and HR have the highest positive correlation with Affect labels as shown
in Figure 7.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Figure 7. 3D scatter plot shows how both EDA and HR correlate positively with the label (affect state).
4 Results
4.1 Multi-Variant Regression Analysis and Variable Importance
Having examined our variables closely in order to provide an analytical model we employ
multivariate (and multivariable) analysis in order to study the variable dependency between two
different modalities using Multivariate Regression and Principle Component Analysis (PCA).
Statistically speaking, multivariate analysis refers to statistical models that have two or more
dependent or outcome variables [44] and multivariable analysis refers to statistical models in which
there are multiple independent or response variables.
The analysis is performed in two main steps: First, study the relationship between every dependent
variable individually (i.e. the body responses) and all the other independent variables (i.e. the
environmental variables). Second, determine the relative importance of the independent variables (i.e.
regressors) on each of the dependent variables using PCA based on the residuals for every dependent
variable in the regression model [45].
Multiple Linear Regression analysis was conducted separately for each dependent variable
representing body stats in relation to all the independent variables which are represented by the
environmental stats. The aim of this step is to determine which body variable can be best predicted
using the environmental measurements as independent variables.
Let z1; z2;z3;z4 be a set of r independent variables (Env-Noise, Air pressure and UV) believed to be
related to a dependent variable Y.
The linear regression model for the 4th sample unit has the form:
(1)
Where, is a random error and the = 0; 1; 2; 3 are unknown (and fixed) regression coefficients.
= 0; 1; 2; 3, are the four dependent variables (HR, EDA, Body-Temp and Motion respectively)
is the intercept and sometimes we write , where = 1 for all j.
We assume that:
) = 0; ) = Q2; ,) = 0 j 6≠ j. (2)
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Then we calculate the residual e of the model which is the difference between the observed value of
the dependent variable ŷ and the estimated value y. Each data point has one residual which is
expressed as follows:
e = ŷ -y (3)
Multiple Regression Model for Heart Rate:
The following discussion presents the multiple linear regression model of HR using all the other
independent variables – environmental factors including (Env-Noise, Air Pressure and UV). Table 5
shows multiple regression results for HR:
Table 5. Multiple regression analysis between HR (dependent variable) and relevant independent variables.
Independent Variable
Regression Coefficient (β)
Std. Error
t-value
P-value
Intercept
211
3.86
54.7
<2e-16***
Air-Pressure
-0.0115
0.0037
-30.9
<2e-16***
Env-Noise
-0.255
0.0053
-48.26
<2e-16***
UV
0.000077
0.0007
10.4
<2e-16***
The multiple linear regression model for the heart rate is then formulated using the following
equation:
(4)
The comprehensive model above was evaluated using the diagnostic regression curves shown in .
Figure 8 shows the relation between the fitted values against the model residual values (i.e. goodness
of fit). The model is statistically significant based on (p <0.001).
Figure 8. HR Diagnostic regression curves: (Left) represents residuals curve, and (Right) represents the Q-Q curve.
The right Q-Q plot in Figure 8 shows that data exhibits a pronounced bimodal distribution, which may
be seen clearly in the left residual plot. Normal Q-Q plots constructed from bimodal data typically
exhibit a „twist‟ like the one seen in this plot. To explain this behaviour (of why the upper part of the
plot looks deviated from the baseline) the lower portion of the Q-Q plot is almost linear, suggesting an
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
approximate normal distribution, corresponding to one mode of data distribution. Similarly, the upper
portion of the Q-Q plot is again roughly linear, but with a very different intercept that corresponds to
the larger mean of the data distribution (i.e. the duration of the small peaks in environmental changes).
To connect these two 'roughly linear‟ local segments, the curve must exhibit a rapid transition region
between them (i.e. the duration of the large peaks in environmental changes). By the same reasoning
more general multi-modal distributions will exhibit more than one such “twist” in their Q-Q plots.
Variable importance for HR
The variable importance calculations measures produce a predictor ranking (also known as variable
importance) based on the contribution predictors make to the construction and variability of the model
[46]. Table 5 shows the four HR variable importance metrics for each of the independent variables.
Table 6. Variable importance for HR
Aggressor/Metrics
Lmg
last
first
pratt
Air Pressure
0.2496
0.2818
0.0867178
0.2031
Env-Noise
0.72066
0.6861
0.7700
0.73090
UV
0.0489202
0.0526879
0.0444603
0.0479666
These metrics indicate that Air-Pressure and Env-Noise affect HR, whereas, the UV has less effect on
the HR. Furthermore, adding the motion as an independent variable to the HR model does not make
any visible changes to the model. The ANOVA test [45] applied on the two models indicates that
there is no difference between the two models in terms of the residuals.
To conclude, the heart rate (HR) is affected by the Env-Noise level (the most important variable in our
model), with the Air-Pressure in the second place. However, the UV and Motion, have proven to have
no significant effect on the heart rate.
These initial findings are in agreement with Scientists who have now shown that exposure to noise during
everyday life influences heart rate variability. Many previous works which suggest a direct impact of high level
of irregular noise levels on the regular rhythm of the heart. For example recent studies have found that noise
levels between <55 and >75 dB are linked to heart related diseases such as coronary heart disease [55]. Another
study shows that HRV was affected in association with increases of 5 dB in noise exposure at both the higher
and lower noise level ranges. The study showed that not only higher noise levels have a stressful effect and are
harmful to health, but that lower noise levels can cause adverse health effects too [56].
However, the impact of air pressure is still subject to debate. For example a study published by the American
Heart Association showed that atmospheric pressures increased an individual‟s risk of heart attack [58]. While
another study has examined the links between atmospheric conditions, temperature and air pressure with the
occurrence of various cardiovascular events, they have not found enough evidence to suggest a direct impact of
air pressure on cardiovascular events [57].
Electrodermal Activity (EDA)
Table 7 shows multiple regression results for EDA:
Table 7. Multiple regression analysis between EDA (dependent variable) and relevant independent variables.
Independent Variable
Regression Coefficient (β)
Std. Error
t-value
P-value
Intercept
6771.45
944.77
7.16
7.66e-13***
Air-Pressure
-6.37
0.91
-6.96
3.34e-12***
Env-Noise
21.58
1.29
16.63
<2e-16***
UV
-0.029
-0.001
-16.26
<2e-16***
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
The multiple linear regression model for the EDA is then formulated using the following equation:
(5)
The comprehensive model above is evaluated using the diagnostic curves. The following is the Q-Q
plot and the residuals of the final linear equation. The model is statistically significant (p <0.001).
Figure 9. EDA Diagnostic regression curves: (Left) represents residuals curve, and (Right) represents the Q-Q curve.
Similar to the HR Q-Q plot, EDA Q-Q plot on the right of Figure 9 shows, that data exhibit a
pronounced bimodal distribution which may be seen clearly in the left residual plot. The lower portion
of the Q-Q plot is almost linear suggesting an approximate normal distribution corresponding to one
mode of data distribution. Similarly, the upper portion of the Q-Q plot is again roughly linear but with
a much different intercept that corresponds to the second mode in the data distribution.
Variable importance for EDA
Variable Importance metrics show that, the Env-Noise level and the UV both have a similar effect on
the EDA while the Air-Pressure has a less significant effect on the EDA. Moreover, adding the motion
as an independent variable to the previous model is statistically significant (p < 2.2e-1).
Table 8. Variable importance for EDA
Aggressor/Metrics
lmg
last
first
Pratt
Air Pressure
0.1265742
0.082174
0.1659
0.11874
Env-Noise
0.4712
0.4691
0.47382
0.48014
UV
0.40220
0.448646
0.36019
0.4011
Body Temperature
Table 9 shows the results for the multiple regression for Body-Temp as dependent variable.
Table 9. Multiple regression analysis between Body-Temp (dependent variable) and relevant independent
variables.
Independent
Variable
Regression Coefficient
(β)
Std. Error
t-value
P- value
Intercept
168
0.04577
367.3
<2e-16***
Air--Pressure
-0.014
0.0004
-312.2
<2e-16***
Env-Noise
0.0211
0.0006
33.6
<2e-16***
UV
0.000001
0.00008
1.30
0.19207
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
The multiple linear regression model for the skin temperature is then formulated using the following
equation:
(6)
The model is statistically significant (p <0.001). Figure 10 shows the Q-Q plot for the residual and the
residual against the fitted values of the final linear equation.
Figure 10. Body-Temp Diagnostic regression curve: (Left) represents residual curve, and (Right) represents the Q-Q curve.
The Q-Q plot looks perfectly linear and matching the baseline. This indicates that residuals are
distributed approximately in a normal fashion. In particular, the residual tend to be larger in
magnitude than what we would expect from the normal distribution. Body temperature scored much
higher in terms of R2 goodness of fit measure 0.35 whereas HR and EDA were most difficult to
predict using the environmental factors only.
Variable importance for Body-Temp:
Table 10. Variable importance for Body-Temp.
Aggressor/Method
lmg
last
first
Pratt
Air Pressure
0.94
0.98
0.91
0.96
Env-Noise
0.05
0.011
0.083
0.031
UV
0.0001
0.0000017
0.00033
0.00007
The variable importance for Body-Temp suggests that the air pressure and the noise are the most
effective measures in the model. These values also reveal that the UV variable is not effective in this
model so it can be removed from the model.
Body Motion
The fourth dependent variable that will be examined here is the motion. The following discussion
presents the multiple linear regression model of motion with all the other independent variables (i.e.
Env-Noise, Air pressure and UV).
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Table 11. Multiple regression analysis between Motion (dependent variable) and relevant independent variables.
Independent Variables
Regression Coefficient (β)
Std. Error
t-value
P value
Intercept
1.35
0.0534
25.2
<2e-16***
Air Pressure
-0.000038
0.000005
-7.4
<2e-16***
Env-Noise
0.0012
0.0000007
16.07
<2e-16***
UV
0.0000005
0.0000001
50.2
<2e-16***
The multiple linear regression model for the motion is then formulated using the following equation:
(7)
The comprehensive model above is evaluated using the diagnostic curves. Figure 11 shows the Q-Q
plot for the residual and the residuals versus fitted values of the final linear equation.
Figure 11. Motion Diagnostic regression curves: (Left) represents residual curve, and (Right) represents the Q-Q curve.
The Q-Q plot looks deviated from the baseline, on the right side, but on the left sides of the baseline,
the actual data points are clearly linear, which suggests multi-modality in the data. In other words, the
upper part Q-Q plot is again roughly linear but with a much different intercept that corresponds to the
larger mean of the second peak in the distribution.
Variable importance for Motion
The variable importance for body Motion in Table 12 suggests that, the Air Pressure and the Env-
Noise are the most effective measures in the Motion model. These values also reveal that the UV
variable is not effective in this model.
Table 12. Variable importance for Motion
Aggressor/Method
lmg
last
first
pratt
Air Pressure
0.95
0.98
0.92
0.97
Env-Noise
0.05
0.011
0.08
0.031
UV
0.0001
0.000018
0.00033
0.00007
PCA Analysis
The second step in the multivariable data analysis is computing the PCA between all the independent
variables residuals, to see if there is any additional inter relationships. Our dependent variables are the
HR, EDA, Motion and the Body-Temp. The purpose of the PCA is to discover the inter-relationships
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
between the residuals of the models created for these variables previously. Table 13 shows the
principle components for the independent variable residuals.
Table 13. Shows dependent variables’ residuals for PCA
Variable
PC1
PC2
PC3
PC4
HR_residuals
0.70725
-0.218
0.0603
0.6698
EDA_residuals
0.69719
0.256
0.1012
-0.6621
Body-Temp residuals
0.00918
-0.932
0.1539
-0.3268
Motion residuals
0.11682
-0.133
-0.9810
-0.0784
This first component represents almost 80% of data variability indicates that, the HR and EDA could
be used alone to represent the dataset variability. Whereas, the second component suggest that the
body temperature alone can represent more than 90% of the variability in the data. The first
component is strongly correlated with both the HR and EDA, whereas, the second component is
strongly correlated with the body temperature. The third component suggests that the HR and motion
are both sufficient to describe the variability in the dataset. The fourth component suggests that the
HR and EDA are the most important variables. Figure 12 shows the relationship between the first and
the second principle components.
.
Figure 12. Biplot of the first two principal components of the PCA.
4.2 Emotion Predictive Model
In this section, we will present our approach to model emotion based on our collected data along with
performance evaluation.
Feature Extraction
After thoroughly analysing the related literature about feature extraction from physiological signals,
we also extracted statistical features from the environmental sensors. In total, we extracted 87
features. Our extracted features are as follows:
1. For HR, EDA and Body-Temp signals, common statistical features were computed: mean, median,
max, min, max-min, and standard deviation and, quartiles [33, 40].
2. Additionally, for the HR:
Standard HRV analysis refers to the extraction of parameters defined in the time and frequency
domains [34, 36]. In total we extracted 17 HRV features.
Concerning the time domain analysis, we calculated the following: the maximum and minimum of
Heart Rate, the square root of the mean of the sum of the squares of differences between subsequent
NN intervals
,
and
(percentage
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
of consecutive NN intervals which differ by more than 50 and 30 milliseconds respectively), Standard
Deviation of the NN interval SDNN. RMSSD indicates the short-term variability; instead SDNN and
HRV triangular index are features of the entire HRV [43].
We also derive frequency domain features which are indicative of sympathetic and parasympathetic
neural activity including: the total spectral power of the successive difference of NN intervals in
power bands up to 0.04 Hz, between 0.04 and 0.15 Hz, and between 0.15 and 0.50 Hz, and ratio of
low lf to high frequency power hf.
4. For the EDA:
Additionally, ten features were extracted from the EDA signal, including, the number of responses,
the power of responses, the number of significant responses (responses which have a value over some
threshold) and the power of significant responses and slope and intercept of signal were calculated.
5. For Motion:
We abstracted the motion representation to be one component and it is represented as in [37]:
Motion =X2+Y2+Z2 (8)
Features Fusion Level
Our system utilizes feature-level fusion, where feature sets from different modalities are concatenated
to form two feature spaces, the environmental and on-Body modalities. As explained in the previous
section (section number) 84 features were extracted. However, many of these features do not have an
important explanatory effect on the emotional outcomes. In addition, many of the extracted features of
the same signal are correlated with each other‟s and hence can be removed to simplify the model by
reducing it to only include the most significant features necessary to explain the emotion response.
We developed a predictive model, to test whether it is possible to accurately predict individual‟s
affect state based on both a combination of physiological and environmental features.
Our labelled data has 5 classes ranging from Class very negative =1 to Class very positive =5 with 355,089
instances.
To build the model, we tested the levels of significance of the features in relation to the affect lables
and checked the response of the affect labels for any interdependency between the variables based on
the correlation matrix.
We checked the pairwise correlations between features and the label on the whole dataset. Based on
the result of features evaluation, we finally selected 21 features, which have strong correlations with
label to build the prediction model (i.e., feature selection step).
Predictive model for emotion recognition based on multimodal fusion
We opt to use a multi-learner approach based on Ensemble algorithm called Stacking [47]. There are
several reasons for preferring a multi-classifier system to a single classifier. It is mainly done to
improve the accuracy and efficiency of the classification system and the volume of the data to be
analysed is too large to be handled by a single classifier. Training a classifier with such a large
amount of data is usually not practical. Finding a single classifier to work well for all test data is
difficult. Instead multiple classifiers can be combined to give a better output than a single classifier. It
may not necessarily out-perform a single best classifier but the accuracy will be on average better than
all the individual classifiers.
Stacking model: Stacked generalization (or stacking) is an ensemble learner that combines multiple
models. Unlike bagging [48] and boosting [49] stacking is used to combine models of different types.
Stacking exploits this prior belief further by using performance on the testing data to combine the
models rather than choose among them, thereby typically getting a better performance than any single
one of the trained models. It has been successfully used on both supervised learning tasks (e.g.
regression) and unsupervised learning (e.g. density estimation) [47].
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Due to the multi-model nature of our features, we follow the stacking approach in [51], in which each
modality is processed independently by the corresponding classifier and the outputs of the classifiers
are combined to yield the final result instead of concatenating the features to form a composite feature
vector and then input to a classifier.
Our procedure is as follows: Let D1 and D2 be two different datasets: Environmental D1 (including
Env-Noise, Air Pressure and UV) and Physiological D2 including (HR, EDA, Body-Temp and Motion).
The datasets are then split up into three parts each (Di0 to Di2), the level-0 training sets, level-0
evaluation sets and level-1 evaluation sets. The classifiers ei
E with |E| = N are trained on Di0 and
evaluated on Di1 to produce Di’ the level-1 training-set parts which are combined to form D’ the full
level-1 training-set on which a level-1 classifier is then trained. The whole stack is then evaluated on
the Di2 datasets.
In order to train and model our labelled dataset, we stacked a combination of three base classifiers:
Support Vector Machine (SVM), Random Forest (RF) and K Nearest Neighbour (KNN); and Naive
Bayes (NB) as the Stacking Model Learner. We have chosen these classifiers since the have proven to
be effective in classifying emotions based on on-body sensors [25-36], and all can output a confidence
rating for each label attribute. Our class attribute is of nominal value ranging from 1 to 5. Figure 13
represents the Accuracy levels and F-measures of all the base learner models on two modalities and
the overall Stacking model. It is clear that Stacking model with five classifiers yields excellent results
and outperforms the individual classifiers with F-measure 0.84 and Accuracy %86. It is difficult to
compare precisely our results to previous work in the literature, since no other work included
environmental sensor data in the emotion model.
Figure 13. Accuracy and F-Measure levels of the base learners and the Stacking learner.
The results show the improvement in the classification accuracy of emotion prediction method by
combining decision fusion and feature fusion based on Stacking Learner. Furthermore, Illustration of
the confusion matrix of the 5 labels is indicated in Figure 14.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Figure 14. Visualisation of the Confusion Matrix for the Stacking Learner.
Although our system was developed based on on-body sensor data as well as environmental signals
obtained from multiple sensors, the ratio of correct recognition was comparable with that of the
previous systems [27-33].
In order to learn more about the influence of each single modality to the overall performance of the
prediction, we show the prediction accuracy of each modality and for each user, see Figure 15.
Figure 15. Prediction Accuracy per user and modality and the fusion approach.
The line charts indicate high variability in all the two modalities. The most extreme difference occurs
in the environmental data. The physiological modality displays better accuracy levels among most of
the participants. There are also no suggestions of a correlation between the two modalities, e.g. high
accuracy levels of environmental data don‟t indicate a corresponding high levels in the physiological
data. Also the accuracy levels among users vary in random fashion.
5 Discussion
We developed five generalized multiple Regression models to analyse the relative impact of
environmental factors on body dynamics. The obtained results quantitatively indicated a possible
control of ambient environment factors on body and emotion variables. Individual variables are not
significant on their own but they have a significant impact when combined with other independent
variables.
Multiple regression results suggest that the HR data exhibits a pronounced bimodal distribution. Also
it shows that Air-Pressure and Env-Noise contribute to a large percentage of variation in HR, with
Env-Noise being the most important variable that explains the majority of changes, whereas, the UV
has much less significance on the HR data. In addition, adding the motion as an independent variable
to the HR model does not make any noticeable changes to the model.
These results comply. Also, these findings are in agreement with previous epidemiologic research
concluding that noise exposure can contribute to the prevalence of cardiovascular disease [50]
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Similarly, Multiple regression model of EDA exhibits bimodal and variable importance metrics
showing that the Env-Noise level and the UV both have a similar effect on the EDA whereas the Air-
Pressure and Motion have a less significant effect.
In addition, the multiple regression model of Body-Temp between the independent variables showed a
perfect linear model matching the regression base line. The variable importance for Body-Temp
suggests that, the Air-Pressure and the Env-Noise are the most effective measures in the model. These
values also reveal that the UV variable does not have a significant effective in this model.
Furthermore, analysis of the multiple regression model of Motion suggests multi-modality distribution
in the data, with Air-Pressure is the main noticeable relevant variable.
Body-Temp scored much higher in terms of R2 goodness of fit measure =0.35 whereas HR and EDA
were more challenging to predict using the environmental factors alone.
PCA analysis suggests that all the variables used can describe the variability of the data. The PCA
ordination map suggested that EDA, HR and Body-Temp are grouped and oriented in one direction,
while the environmental variables are oriented towards another direction of the map. Motion sits in
between moving in different direction from both modalities.
Although we can conclude that various environmental factors can contribute to the prevalence of
physiological changes including heart rate variability and body temperature, the evidence for this
relationship is still inconclusive because of the limitations in the number of the effectors measured
and the exposure characterization.
On the other hand, despite of the quality of the emotion prediction results, it is still difficult to single
out the impact of the individual environmental factors compared to the individual physiological
measurements. Also, we can‟t rule out the hidden impact of other environmental co-founders such as
gas pollution or crowd size around the street.
Since this is the first set of experiments of its kind, it is hard to compare our results to any other
studies based on sensor data feed. As mentioned previously quantifying and modelling the
relationships between all these variables haven‟t been studied before.
Future preventive health strategies should involve environmental and urban interventions. Decision
makers have the responsibility to develop, implement, evaluate, and improve guidelines and standards
to protect public health around urban spaces; new tools and strategies based on local conditions will
have to be developed.
Conclusions and future Work
In this paper, we have described our information fusion approach for on-body and environmental
sensing that offers new opportunities for data-intensive modelling particularly involving the
quantification of some aspects of physiological and movement changes in relation to the variation in
environmental factors measured continuously in the same Spatial-Temporal context.
To achieve this, we have conducted a real-world study „in the wild‟ with on-body and mobile sensors.
Data was collected from forty participants walking around Nottingham city centre.
Multivariate linear regression models for on-body sensor data were developed. We found that the
spatial variability in on-body sensor data were directly associated with environmental changes.
Emotion prediction has resulted in an encouraging accuracy level which is comparable with that of the
previous systems. In addition, decision fusion of emotion recognition based on the two modalities
yielded an increase in the performance over each single modality, indicating at least some
complementarity to the modalities.
These results show that, the realisation of user independent emotion recognition based on the
integration of physiological and environmental signals is feasible.
Since we can only collect a limited number of signals the constraints imposed by the on-body
instrumentation heavily influence the design of the algorithm. Future work will look at adding more
sensor modalities to increase further our understanding of the hidden connections between the
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
environment and health. Also, in future work we will look at modelling these parameters in relation to
changes in physical places by aggregating the data into different spatial segments.
Acknowledgement
This work was supported by the B11 Research unit at NTU, Ref 3452. Also, the authors would like to
thank Dr. Caroline Langensiepen for her help with the Ethical Approval Application and Dr. Ahmed
Aldabbagh for his help in collecting the data.
References
[1] Dockery, D. W., & Pope, C. A. (1994). Acute respiratory effects of particulate air pollution. Annual review of public
health, 15(1), 107-132.
[2] Briggs, D. (2003). Environmental pollution and the global burden of disease.British Medical Bulletin, 68(1), 1-24.
[3] Lim, Y. H., Kim, H., Kim, J. H., Bae, S., Park, H. Y., & Hong, Y. C. (2012). Air pollution and symptoms of depression
in elderly adults. Environmental health perspectives, 120(7), 1023.
[4] Lim, S. S., Vos, T., Flaxman, A. D., Danaei, G., Shibuya, K., Adair-Rohani, H., ... & Aryee, M. (2013). A comparative
risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions,
1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. The lancet, 380(9859), 2224-2260.
[5] Schlink, U., Strebel, K., Loos, M., Tuchscherer, R., Richter, M., Lange, T., & Ragas, A. (2010). Evaluation of human
mobility models, for exposure to air pollutants. Science of the total environment, 408(18), 3918-3930.
[6] H Chen, JC Kwong, R Copes, et al. "Living near major roads and the incidence of dementia, Parkinson's disease, and
multiple sclerosis: a population-based cohort study", Lancet (2016) published online Jan 4.
http://dx.doi.org/10.1016/S0140-6736(16)32399-6
[7] Reis, S., Seto, E., Northcross, A., Quinn, N. W., Convertino, M., Jones, R. L., & Wimberly, M. C. (2015). Integrating
modelling and smart sensors for environmental and human health. Environmental Modelling & Software, 74, 238-
246.
[8] Kanjo, E., Benford, S., Paxton, M., Chamberlain, A., Fraser, D. S., Woodgate, D. & Woolard, A. (2008). MobGeSen:
facilitating personal geosensor data collection and visualization using mobile phones. Personal and Ubiquitous
Computing, 12(8), 599-607.
[9] Kanjo, E., Bacon, J., Roberts, D., & Landshoff, P. (2009). MobSens: Making smart phones smarter. IEEE Pervasive
Computing, 8(4), 50-57.
[10] Banzhaf, E., de la Barrera, F., Kindler, A., Reyes-Paecke, S., Schlink, U., Welz, J., & Kabisch, S. (2014). A conceptual
framework for integrated analysis of environmental quality and quality of life. Ecological Indicators, 45, 664-668.
[11] Galelli, S., Humphrey, G. B., Maier, H. R., Castelletti, A., Dandy, G. C., & Gibbs, M. S. (2014). An evaluation
framework for input variable selection algorithms for environmental data-driven models. Environmental Modelling &
Software, 62, 33-51.
[12] Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: the self-assessment manikin and the semantic
differential. Journal of behavior therapy and experimental psychiatry, 25(1), 49-59.
[13] Raffaele Gravina, Parastoo Alinia, Hassan Ghasemzadeh, Giancarlo Fortino, “Multi-sensor fusion in body sensor
networks: State-of-the-art and research challenges,” Information Fusion, vol. 35, pp. 68-80, 2017.
[14] Stansfeld, S., Haines, M., & Brown, B. (2000). Noise and health in the urban environment. Reviews on environmental
health, 15(1-2), 43-82.
[15] World Health Organisation (WHO), (2017), http://www.who.int/heli/tools/quantassess/en/ [accessed 09/01/2017]
[16] Barabasi, A. L. (2005). The origin of bursts and heavy tails in human dynamics. Nature, 435(7039), 207-211.
[17] Wesolowski, A., Eagle, N., Tatem, A. J., Smith, D. L., Noor, A. M., Snow, R. W., & Buckee, C. O. (2012). Quantifying
the impact of human mobility on malaria. Science, 338(6104), 267-270.
[18] M. Kampa and E. Castanas, Human health effects of air pollution, 4th International Workshop on Biomonitoring of
Atmospheric Pollution (With Emphasis on Trace Elements), pp. 362–367, 2008.
[19] S. Steinle, S. Reis, C. Sabel, Quantifying human exposure to air pollution – moving from static monitoring to spatio-
temporally resolved personal exposure assessment, Sci. Total Environ., 443 (2013), pp. 184–193.
[20] J. Engel-Cox, T.K.O. Nguyen, A. van Donkelaar, R.V. Martin, E. Zell Toward the next generation of air quality
monitoring: particulate matter Atmos. Environ., 80 (2013), pp. 584–590.
[21] S. Steinle, S. Reis, C. Sabel, S. Semple, M.M. Twigg, C.F. Braban, A.E. Leeson, M.R. Heal, D. Harrison, C. Lin, H.
Wu, Application of a low-cost method to quantify human exposure to ambient particulate matter concentrations across
a wide range of microenvironments, Sci. Total Environ., 508 (2015), pp. 383–394.
[22] A. de Nazelle, E. Seto, D. Donaire-Gonzalez, M. Mendez, J. Matamala, M.J. Nieuwenhuijsen, M. Jerrett, Improving
estimates of air pollution exposure through ubiquitous sensing technologies, Environ. Pollut., 176 (2013), pp. 92–99
[23] Kööts, L., Realo, A., & Allik, J. (2011). The influence of the weather on affective experience. Journal of individual
differences.
[24] Park, N. K., & Farr, C. A. (2007). The Effects of Lighting on Consumers' Emotions and Behavioral Intentions in a
Retail Environment: A CrossCultural Comparison. Journal of Interior Design, 33(1), 17-32.
[25] Knapp, R. B., Kim, J., & André, E. (2011). Physiological signals and their use in augmenting emotion recognition for
human–machine interaction. In Emotion-oriented systems (pp. 133-159). Springer Berlin Heidelberg.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
[26] Kreibig, S. D. (2010). Autonomic nervous system activity in emotion: A review. Biological psychology, 84(3), 394-
421.
[27] Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: Analysis of affective
physiological state. IEEE transactions on pattern analysis and machine intelligence, 23(10), 1175-1191.
[28] Younis, E. M. (2015). Sentiment Analysis and Text Mining for Social Media Microblogs using Open Source Tools: An
Empirical Study. International Journal of Computer Applications, 112(5).
[29] Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres, M. T., ... & Pantic, M. (2016). AVEC 2016-
Depression, Mood, and Emotion Recognition Workshop and Challenge. arXiv preprint arXiv:1605.01600.
[30] Kim, Jonghwa, and Elisabeth André. "Emotion recognition based on physiological changes in music listening." Pattern
Analysis and Machine Intelligence, IEEE Transactions on 30.12 (2008): 2067-2083.
[31] Guthier, B., Dörner, R., & Martinez, H. P. (2016). Affective Computing in Games. In Entertainment Computing and
Serious Games (pp. 402-441). Springer International Publishing.
[32] Datcu, D., & Rothkrantz, L. (2009). Multimodal recognition of emotions in car environments. DCI&I 2009.
[33] Lisetti, C. L., & Nasoz, F. (2004). Using noninvasive wearable computers to recognize human emotions from
physiological signals. EURASIP Journal on Advances in Signal Processing, 2004(11), 1-16.
[34] Takahashi, K. (2004, September). Remarks on SVM-based emotion recognition from multi-modal bio-potential signals.
In Robot and Human Interactive Communication, 2004. ROMAN 2004. 13th IEEE International Workshop on (pp.
95-100). IEEE.
[35] Irrgang, M., & Egermann, H. (2016). From Motion to Emotion: Accelerometer Data Predict Subjective Experience of
Music. PloS One, 11(7), e0154360.
[36] Guendil, Z., Lachiri, Z., Maaoui, C., & Pruski, A. (2016, March). Multiresolution framework for emotion sensing in
physiological signals. InAdvanced Technologies for Signal and Image Processing (ATSIP), 2016 2nd International
Conference on (pp. 793-797). IEEE.
[37] Mohammad Adibuzzaman, Niharika Jain, Nicholas Steinhafel, Munir Haque, Ferdaus Ahmed, Sheikh Ahamed, and
Richard Love. 2013. In situ affect detection in mobile devices: a multimodal approach for advertisement using social
network. SIGAPP Appl. Comput. Rev. 13,4 (December 2013), 67-77. DOI=10.1145/2577554.2577562.
[38] Wan-Hui, Wen, Qiu Yu-Hui, and Liu Guang-Yuan. "Electrocardiography recording, feature extraction and
classification for emotion recognition."Computer Science and Information Engineering, 2009 WRI World Congress
on. Vol. 4. IEEE, 2009.
[39] Kim, Kyung Hwan, S. W. Bang, and S. R. Kim. "Emotion recognition system using short-term monitoring of
physiological signals." Medical and biological engineering and computing 42.3 (2004): 419-427.
[40] Chung, W. Y., Bhardwaj, S., Punvar, A., Lee, D. S., & Myllylae, R. (2007, August). A fusion health monitoring using
ECG and accelerometer sensors for elderly persons at home. In 2007 29th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (pp. 3818-3821). IEEE.
[41] Ramzan, N., Palke, S., Cuntz, T., Gibson, R., & Amira, A. (2016). Emotion Recognition by Physiological Signals.
Electronic Imaging, 2016(16), 1-6.
[42] Granero, A. C., Fuentes-Hurtado, F., Ornedo, V. N., Provinciale, J. G., Ausín, J. M., & Raya, M. A. (2016). A
Comparison of Physiological Signal Analysis Techniques and Classifiers for Automatic Emotional Evaluation of
Audiovisual Contents. Frontiers in Computational Neuroscience, 10.
[43] Khandoker, A. H., Karmakar, C., Brennan, M., Palaniswami, M., & Voss, A. (2013). Poincaré plot methods for heart
rate variability analysis. New York: Springer.
[44] Hidalgo, B., & Goodman, M. (2013). Multivariate or multivariable regression?.American journal of public health,
103(1), 39-40.
[45] Härdle, W., & Simar, L. (2007). Applied multivariate statistical analysis (Vol. 22007). Berlin: Springer.
[46] Grömping, U. (2006). Relative importance for linear regression in R: the package relaimpo. Journal of statistical
software, 17(1), 1-27.
[47] Polikar, R. (2012). Ensemble learning. In Ensemble machine learning (pp. 1-34). Springer US.
[48] Van Poucke, S., Zhang, Z., Schmitz, M., Vukicevic, M., Vander Laenen, M., Celi, L. A., & De Deyne, C. (2016).
Scalable predictive analysis in critically ill patients using a visual open data analysis platform. PloS one, 11(1),
e0145791.
[49] Freund, Y., Schapire, R., & Abe, N. (1999). A short introduction to boosting.Journal-Japanese Society For Artificial
Intelligence, 14(771-780), 1612.
[50] Kempen, V., Kruize H, Boshuizen HC, Ameling CB, Staatsen BAM, de Hollander AEM. The association between noise
exposure and blood pressure and ischemic heart disease: a meta-analysis. Environmental Health Perspectives.
2002;110(3):307-317.
[51] Marton, Z., Seidel, F., Balint-Benczedi, F., and Michael Beetz. 2013. Ensembles of strong learners for multi-cue
classification. Pattern Recogn. Lett. 34, 7 (May 2013), 754-761. DOI=http://dx.doi.org/10.1016/j.patrec.2012.07.011.
[52] Alan Chamberlain, Mark Paxton, Kevin Glover, Martin Flintham, Dominic Price, Chris Greenhalgh, Steve Benford,
Peter Tolmie, Eiman Kanjo, Amanda Gower, Andy Gower, Dawn Woodgate, Danaë Emma Beckford Stanton Fraser:
Understanding mass participatory pervasive computing systems for environmental campaigns. Personal and
Ubiquitous Computing 18(7): 1775-1792 (2014)
[53] Nouf Alajmi, Eiman Kanjo, Nour El Mawass, Alan Chamberlain: Shopmobia: An Emotion-Based Shop Rating System.
ACII 2013: 745-750.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
[54] Eiman Kanjo, Lulwah Al-barrak, Eman M.G.Younis, “NeuroPlace: Categorizing Urban Places According to Mental
States”, Plos one, 2017, in press.
[55] Münzel T, Gori T, Babisch W, Basner M. Cardiovascular effects of environmental noise exposure. European Heart
Journal. 2014;35(13):829-836. doi:10.1093/eurheartj/ehu030.
[56] Ute Kraus, Alexandra Schneider, Susanne Breitner, Regina Hampel, Regina Rückerl, Mike Pitz, Uta Geruschkat, Petra
Belcredi, Katja Radon, Annette Peters. Individual Day-Time Noise Exposure During Routine Activities and Heart
Rate Variability in Adults: A Repeated Measures Study. Environmental Health Perspectives, 2013; DOI:
http://dx.doi.org/10.1289/ehp.1205606.
[57] Verberkmoes NJ, Soliman Hamad MA, ter Woorst JF, Tan MESH, Peels CH, van Straten AHM. Impact of temperature
and atmospheric pressure on the incidence of major acute cardiovascular events. Netherlands Heart Journal.
2012;20(5):193-196. doi:10.1007/s12471-012-0258-x.
[58] Sandrine Danet, Florence Richard, Michèle Montaye, Stephanette Beauchant, Brigitte Lemaire, Catherine Graux,
Dominique Cottel, Nadine Marécaux, Philippe Amouyel , “Unhealthy Effects of Atmospheric Temperature and
Pressure on the Occurrence of Myocardial Infarction and Coronary Deaths”, https://doi.org/10.1161/01.CIR.100.1.e1.
[59] Eiman Kanjo, Luluah Al-Husain, Alan Chamberlain: Emotions in context: examining pervasive affective sensing
systems, applications, and analyses. Personal and Ubiquitous Computing 19(7): 1197-1212 (2015).
[60] Lulwah Al-Barrak, Eiman Kanjo: NeuroPlace: making sense of a place. AH 2013: 186-189.
[61] Nour El Mawass, Eiman Kanjo: A supermarket stress map. UbiComp (Adjunct Publication) 2013: 1043-1046.
[62] Lulwah Al-Barrak, Eiman Kanjo: NeuroPlace: making sense of a place. AH 2013: 186-189.
[63] Luluah Al-Husain, Eiman Kanjo, Alan Chamberlain: Sense of space: mapping physiological emotion response in urban
space. UbiComp (Adjunct Publication) 2013: 1321-1324
[64] Gravina, G. Fortino, Automatic methods for the detection of accelerative cardiac defense response, IEEE Transactions
on Affective Computing, 7(3), pp. 286-298, 2016.
[65] Min Chen, Sergio González-Valenzuela, Athanasios V. Vasilakos, Huasong Cao, Victor C. M. Leung: Body Area
Networks: A Survey. MONET 16(2): 171-193 (2011).
[66]Giancarlo Fortino, Roberta Giannantonio, Raffaele Gravina, Philip Kuryloski, Roozbeh Jafari: Enabling Effective
Programming and Flexible Management of Efficient Body Sensor Network Applications. IEEE Trans. Human-
Machine Systems 43(1): 115-133 (2013).
[67] Bojan Makivic , Pascal Bauer,” HEART RATE VARIABILITY ANALYSIS IN SPORT”, The Aspetar Sports
Medicine Journal, 4(2), pp.326-331, September 2015.