Conference PaperPDF Available

Analysing the Quality of Experience of Multisensory Media from Measurements of Physiological Responses



This paper investigates the Quality of Experience (QoE) of multisensory media by analysing biosignals collected by electroencephalography (EEG) and eye gaze sensors and comparing with subjective ratings. Also investigated is the impact on QoE of various levels of synchronicity between the sensory effect and target video scene. Results confirm findings from previous research that show sensory effects added to videos increases the QoE rating. While there was no statistical difference observed for the QoE ratings for different levels of sensory effect synchronicity, an analysis of raw EEG data showed 25% more activity in the temporal lobe during asynchronous effects and 20-25% more activity in the occipital lobe during synchronous effects. The eye gaze data showed more deviation for a video with synchronous effects and the EEG showed correlating occipital lobe activity for this instance. These differences in physiological responses indicate sensory effect synchronicity may affect QoE despite subjective ratings appearing similar.
Jacob Donley, Christian Ritz, Muawiyath Shujau
School of Electrical Computer and Telecommunications Engineering, University of Wollongong,
Wollongong, NSW, Australia, 2522,,,
This paper investigates the Quality of Experience (QoE) of
multisensory media by analysing biosignals collected by
electroencephalography (EEG) and eye gaze sensors and
comparing with subjective ratings. Also investigated is the impact
on QoE of various levels of synchronicity between the sensory
effect and target video scene. Results confirm findings from
previous research that show sensory effects added to videos
increases the QoE rating. While there was no statistical difference
observed for the QoE ratings for different levels of sensory effect
synchronicity, an analysis of raw EEG data showed 25% more
activity in the temporal lobe during asynchronous effects and 20-
25% more activity in the occipital lobe during synchronous effects.
The eye gaze data showed more deviation for a video with
synchronous effects and the EEG showed correlating occipital lobe
activity for this instance. These differences in physiological
responses indicate sensory effect synchronicity may affect QoE
despite subjective ratings appearing similar.
Index Terms quality of experience (QoE), multisensory
media, gaze tracking (GT), electroencephalography (EEG),
Multisensory media systems provide for an enhanced user
Quality of Experience (QoE), which in this context is also known
as Sensory Experience [1], through stimulating human senses by
vibration, blowing air and ambient lighting effects at time-points
that are linked to an audio/visual scene in a multimedia
presentation [2, 3]. Previous research investigating the use of a
multisensory media system to provide these sensory effects for
video demonstrated an enhanced user experience compared to the
same videos without the sensory effects [3]. This was based on
subjective testing, where a participant selected their vote from an
international standard scale for subjective quality assessment of
multimedia [2-4]. In [2, 5], research further investigated the
enhancement of specific emotional responses provided by sensory
effects added to video sequences. This existing research is based
on QoE or emotional response ratings provided by each user at the
end of the test stimulus. In contrast, this paper investigates the
temporal variation of QoE during the presentation of the sensory-
enhanced test stimulus. This is achieved by collecting and
analysing physiological responses using biosensors whilst a subject
participates in a QoE evaluation of the test sequences. Presented
here is a system that integrates an electroencephalography (EEG)
headset and an eye gaze tracker within a multisensory media
presentation and QoE evaluation system [2, 3].
EEG is a method of reading the electrical activity of the brain
by measuring the potential difference between two different
receptors positioned on the surface of the scalp. Analysing EEG
recordings can be used to identify a person’s emotional state for
both mapping to the six primary emotions elicited by viewing
images [6] as well as distinguishing between like and dislike when
viewing video advertisements [7]. Recent research [8] has
investigated the use of EEG data for understanding the time-
varying QoE of multimedia and research in [9] investigates EEG
responses for comparison with user ‘tags’ added temporally during
presentation of the multisensory media at time-points chosen by
the user. Eye gaze tracking data collected while users watch a
media presentation can be analysed to detect when a user glances
or stares at objects of interest within an audio/visual scene. This is
directly related to what a human brain is processing when engaged
in an activity [10]. Jointly analysing gaze and EEG activity can be
used to investigate how a person feels about what they are
observing onscreen at that particular time. Such analysis is
explored in this work to see how it relates to QoE for multisensory
media applications.
Previous research [4] has concluded that different sensory
effects have varying impact on the overall QoE and emotion as
judged by a participant. Results have also shown that the level of
QoE enhancement is related to the genre of the video. Such
research provides useful information in regards to modelling
sensory effects to ensure QoE is maximised. Poor synchronisation
of audio and visual content is known to provide a poor user
experience [11, 12]. Recent work [1] has found that synchronicity
of olfactory sensory effects with the target video scene may also
influence QoE. This paper provides a new investigation into the
impact of synchronicity between a video scene and other sensory
effects (wind, vibration, and lighting) on the resulting QoE. This is
based on analysing subjective QoE ratings as well as biosensor
signals (EEG and eye gaze) collected for sequences containing no
effects, effects synchronised to the desired time-location and
effects asynchronous with the desired time-location.
Section 2 of this paper reviews techniques for EEG and eye
gaze monitoring and describes the methods adopted in this research.
Section 3 describes the biosensor-based QoE evaluation system
while Section 4 describes the multisensory media QoE evaluation
incorporating biosensors. Results from subjective QoE testing and
analysis of EEG and eye gaze data for a selection of sensory
enhanced videos with different types and synchronisation of effects
are presented in Section 5 with conclusions provided in Section 6.
This section reviews existing approaches to measuring emotional
response from EEG and eye gaze signals and describes the
techniques adopted in this work.
2.1. Electroencephalography (EEG)
Being able to accurately and logically test for emotions and
user experiences by collecting EEG data is a problem when most
test equipment for these applications can be quite uncomfortable
and cumbersome. For example, the standard 10-10 EEG system
requires 81 separate wires and corresponding electrodes [13]
usually held on a net or cap which is strapped over the head. More
recently, less obtrusive EEG headsets have become available such
as the Emotiv EEG [14] that is used in this work. This new style of
EEG head set fits neatly and comfortably on a person’s head and
includes a built-in accelerometer to record head movement.
Using multiple sensors (14 channels in the case of the EEG
headset used here) allows for recording signals from different parts
of the brain. For example the frontal lobe of the cerebral cortex is
associated with behavioural and emotional responses, the lobe also
functions in close relationship with other regions of the brain in
aspects of memory and learning, attention and motivation [15]. All
of these complex relationships make it possible to read emotional
behaviour from electrodes. As well as recording raw signals, the
Emotiv EEG system includes software for deriving parameters
from these signals, such as ‘frustration’ values, that can be linked
to emotion. Being a proprietary algorithm, there is little
information about how these ‘frustration’ values are determined
(further explained in Section 3). Hence, this paper also analyses the
raw EEG signals using the standardised low resolution brain
electromagnetic tomography (sLORETA) method [16] for spatial
analysis. This method was chosen as it yields images of
standardised current density with zero localisation error [16] and is
an alternative to analysing P300 responses or sub-band frequency
powers [17] which are both temporal analysis methods. For an in-
depth explanation of sLORETA the reader is referred to [16]. The
advantage of using spatial analysis is that it can distinguish
between regions of temporal activity. The method (sLORETA)
helps quantify these active regions of the brain to assist in
deducing emotions with the added benefit of visualising brain
2.2. Eye gaze tracking
In addition to EEG, eye gaze tracking data can be analysed to
identify audio/visual content linked to emotional responses (see
Section 5.6). Similar to older EEG headsets, eye gaze trackers
consisted of mounting superfluous amounts of equipment on the
head such as a camera, glasses, reflective plates and camera
mounts [18, 19]. Recently there have been improvements to eye
gaze software which has made it possible to track eye gaze from
video without calibration [18] and from standard low resolution
web cameras [20, 21]. Compared to more sophisticated systems
involving special glasses or other hardware, later systems minimise
eye distractions that may otherwise influence the test results.
To increase the accuracy of the eye gaze tracker, the system
used in this paper includes a PS3Eye [22], varifocal lens and an
infrared (IR) light source pointing towards the eyes of the subject.
The varifocal lens was added to the PS3Eye to improve focus. The
PS3Eye manufactured by Sony was chosen as it was designed in
collaboration with sensor chip manufacturer OmniVision
Technologies to perform well in variable lighting [23]. Hence, this
improves the resolution and quality of the image of the eye that is
captured due to the accountability of distance to the subject and
varying lighting conditions. The high frame rate of this camera (60
frames per second) makes it possible to sample faster and average
multiple image frames together to improve the accuracy of the
tracker. The IR light source causes only the cornea to reflect the
light back at the camera. This creates a distinct pupil in the image
which can then be segmented from the image using software
algorithms that recognise the pupil and edges of the eye [20]. The
work in this paper uses the ITU Gaze Tracker (GT) [24], which has
been shown to provide accurate performance [21], to collect eye
gaze data as a user views videos.
In order to record all of the information a software package was
written which integrates the ITU GT, Emotiv EEG headset and the
Philips amBX multisensory media system. The ITU GT was added
to the package so that the original interface could be controlled to
perform the calibration and recordings. The integrated system is
shown in Figure 1.
The Emotiv EEG headset comes with a Software
Development Kit (SDK) which allows recording of raw EEG and
filtered signals. The software records values from the Expressiv™
and Affectiv™ suites, saving information on facial expressions and
real-time changes in subjective emotions, respectively. The
experiments in this paper analyse the ‘frustration’ values recorded
from the Affectiv™ suite as well as the raw EEG signals. Emotiv
have informally mentioned [25] that Affectiv™ values are
calculated from algorithms written to correlate to subjective studies
as explained in [26]. A live display was added to show the values
from the Affectiv™ suite, Expressiv™ suite, gyroscope and
contact quality of the sensors (see Figure 1).
The amBX multisensory media system is packaged with an
SDK allowing the software package to control the equipment
Figure 2: Hardware configuration and test setup
Figure 1: Layout of evaluation system
precisely. A media player was coded into the user interface to
display audio and video content. The program is coded based on
the MPEG-V standard [27] so that it can read vibration, wind and
lighting effects from compatible sensory effects metadata files. An
auto extraction feature was written for the lighting effects based on
previous work [28]. The multisensory media section of the
software package records the times when events occur in the video.
These times are saved along with the EEG and GT logs relative to
a common timestamp.
The testing methodology used was designed to coincide with
research done in [2, 3, 29]. The design was based on a single
stimulus testing method where subjects were introduced to the
equipment using a training phase [4]. Videos and their effects were
chosen based on [2, 4]. The asynchronous effects were designed so
that all effects (wind, vibration and light) preceded the audio and
video by 500ms. This value was chosen as it is the average skew in
the positive and negative direction considered perceptually
noticeable for media synchronisation [11, 12]. This section
describes the procedure used to evaluate the subjects with
biosensors and relates to previous discrete results [2, 3] by
incorporating a QoE voting stage for the videos. The equipment
setup and test configuration is depicted in Figure 2.
4.1. Introduction and pre-questionnaire
The evaluation begins by describing the procedure, set of
tasks and rating method to the subject. The subject is told the
working definition of QoE as stated in [30] and that they will be
asked to base their votes on their QoE of the presented
multisensory media. Following their consent, the evaluation
begins. A pre-questionnaire establishes basic demographic data
(name, age, gender). This was used to later determine that there
were no significant differences in test results due to age or gender
4.2. Biosensor setup
The biosensors are set up before the training phase to ensure the
participants become familiar with the equipment prior to the main
evaluation. The EEG headset is set up first using an appropriate
saline solution to ensure sensor contacts are fitted to provide for
the highest signal quality. The GT calibration takes place next and
the subject is positioned approximately 1 metre from the screen
[4]. The camera is optically zoomed so that any black background
visible in the image is minimised and the eyes are located near the
centre of the image. The colour of the GT calibration background
is set to a light grey colour; this allows for miosis in the dim
environment which gives a clearer distinction between the sclera,
iris and pupil. Individual calibration points are re-calibrated if the
GT cannot determine which quadrant of the screen the subject is
gazing at.
4.3. Training phase
A training phase takes place after the biosensor setup, as
recommended in [4]. This is designed to eliminate the surprise
effect [4] by helping the participants to become familiar with the
stimulus presentation and style of QoE voting. The design was
adapted from the training phases used in [4, 29]. The results of the
training phase are not included in the final analysis. The training
phase for all evaluations used a shortened version of the publicly
available ‘Titanic (1997)’ trailer (2012 3D release). The genre of
the trailer is Drama and is presented with 18 wind effects and 13
vibration effects. The training video was shown consecutively
three times with different effects picked randomly from: no effects;
asynchronous effects; and synchronous effects. After each video
the subject was asked to vote their QoE on an eleven-grade
numerical quasi-continuous quality scale from 0 to 100 [3, 4, 31].
4.4. Main evaluation
Once the subjects are familiar with the equipment, effects
and the voting stages, the EEG and GT logging commences for the
main evaluation. The 15 videos seen in Figure 3 (dataset available
at [32]) are randomly presented three times; without effects, with
asynchronous effects and with synchronous effects. No two videos
with the same title are shown consecutively [29]. At the end of
each video the subject is asked to record their QoE. The voting
takes no more than 10 seconds as suggested in [31]. The QoE votes
are stored together with logs for the biosensors. The single
stimulus assessment method is adopted from [4]. The videos have
an average duration of 30 seconds [32] and the entire evaluation
takes approximately 22.5 minutes.
4.5. Post-questionnaire
At the end of the main evaluation the EEG headset is
removed and the subject is asked to complete a post-questionnaire
[2, 4]. The post-questionnaire is given to gain feedback on the
evaluation task and gives the subject opportunity to provide
recommendations to the procedure.
For the subjective testing 10 subjects (6 males and 4 females) were
chosen from an initial set of 15 who were invited to participate in
QoE Vote (%)
Shortened Video Titles
Without Effects With Asynchronous Effects With Synchronous Effects
Figure 3: QoE Votes and 95% Confidence Interval
the experiment. Five subjects were excluded based on EEG
artefacts. Physiological tests can be highly complex and we found
reliability to be a key issue. The mean age for this set of
participants is 28.8 and ranges from 19 to 59 with a sample
standard deviation () of 12.2. The post-questionnaire showed
that some subjects commented on the vibration effect with various
recommendations on placement, realism and timing. Some subjects
also stated that they thought there was a difference between the
videos with synchronous and asynchronous effects; however, they
were unsure of exactly what the difference was.
5.1. Discrete QoE voting
Each subject voted on their QoE for each video as described
in Sections 4.1 and 4.4. The votes were analysed for each video
and the Mean Opinion Score (MOS) was plotted for each video
and effect type as recommended in [4]. The results for this
evaluation can be seen in Figure 3 and are presented with a 95%
confidence interval. Full video titles can be found at [32]. The
videos are ordered left to right from highest mean vote to lowest
mean vote, respectively.
A single factor Analysis of Variance (ANOVA) was applied
to the votes for each video to determine if there was a discernable
difference between the different effect types. ANOVA was applied
with alpha equal to 0.05. The p-values of the ANOVA test showed
that for 80% of the videos there was a discernable difference
between effect types and so Student’s t-tests were then applied to
refine the differences.
Figure 4 shows the probabilities calculated using a Student’s
t-test analysis of the QoE votes [2, 4], using a one tail distribution
and a two-sample equal variance. This was conducted three times
under the following null hypotheses:
  ;    ;     (1)
The mean of the QoE votes is denoted by  ,  and
 for without effects, with asynchronous effects and with
synchronous effects, respectively. The alternative hypotheses for
these tests are:
  ;   ;     (2)
The critical value is 0.05 (5%) and Figure 4 shows that the
null hypotheses is rejected 80% of the time for and , and
also shows that the alternative hypothesis is rejected 100% of
the time. This shows that videos with effects have a significantly
larger mean QoE observed than videos without and agrees with [2-
4, 28]. It also shows that there is no significant difference in mean
QoE observed between asynchronous and synchronous effects.
This indiscernible difference may be due to the asynchronicity
being on one side of the perceptual threshold for some effects and
on the other for other effects. It may also be possible that the large
contrast between no effects and effects reduced the apparent
difference between async and sync effects.
5.2. Temporal physiological responses
The biosensor analysis was completed with the two videos
that had the highest mean vote for QoE. The videos were ‘Tron’
and ‘Berrecloth’ with the highest and second highest mean vote,
respectively. EEG analysis using frustration’ values was
completed using both videos, whereas, the eye gaze and raw EEG
analysis was performed on the ‘Tron’ video. The data was analysed
to show responses directly after major effects. Vibration was found
to be the most dominant effect when compared to light and wind in
[4] and so the first vibration with 100% strength was examined.
5.3. Electroencephalography (EEG)
For this analysis all 10 sets of data were used to find the
frustration’ values throughout the ‘Tron’ and ‘Berrecloth’ videos.
Frustration was chosen as it is synonymous to annoyance stated in
the working definition of QoE [30]. The description of
synchronous effects presented to the subjects for the video ‘Tron’
and ‘Berrecloth’ are available at [32]. The EEG data for all
subjects was linearly interpolated to a common sampling rate of
10Hz and then filtered using a moving average filter with a
window size of 10 samples. The filtered data was normalised so
that all subjects were within the same range of amplitude and the
gradients of this information was calculated. The frustration
gradients can be related to the time that the effects occurred using
the effect metadata.
To calculate whether there is a significant difference with
increasing and decreasing frustration a statistical analysis needs to
be applied. The same method is adopted from the one described in
Section 5.1, however, it is now two-tailed and is applied to the
frustration gradients between the start of the first full strength
vibration and second full strength vibration. The null hypotheses
for this test asked if the frustration gradients are observed
statistically greater than zero. The results of the t -tests show that
30% of subjects were observed having a significant increase in
frustration for no effects and asynchronous effects and 50% for
The analysis used for ‘Tron’ was then applied to
‘Berrecloth’. The period is once again equal to the gap between the
first and second full strength vibrations. The results of the t-tests
show that 10% of subjects were observed having a significant
increase in frustration for no effects and synchronous effects and
30% for asynchronous. Frustration in these cases may be caused by
unanticipated and/or unfavourable effects. It is difficult to draw
conclusions due to the ‘frustration’ algorithm being proprietary and
so this paper explores the recorded EEG potentials.
Shortened Video Titles
Without Effects & Async Without Effects & Sync Async & Sync critical value
Figure 4: T-Test showing the probability that mean QoE would be observed the same (alpha=0.05)
5.4. Spatial EEG analysis
The sLORETA method was used to analyse the raw EEG
data due to the proprietary Affectiv™ algorithms. This method was
used to determine the location at which the propagating electrical
potentials originated. Data from the ‘Tron’ video was used for this
analysis due to the stronger first vibration and higher QoE vote.
The sLORETA method can provide images of standardised current
density and quantification of brain lobes and Brodmann areas. The
data was analysed using a moving window with a period of 0.5s
which overlapped for the duration of the effect and after the effect.
Averaging was applied to the EEG signal for 10s to remove DC
bias and a Fast Fourier Transform (FFT) was applied for Delta,
Theta, Alpha and Beta bands. The images presented in Figure 5
and Figure 6 show a model brain at 5% opacity from directly
above with the frontal lobe positioned at the top. The coloured
areas show the most active regions of the brain with a threshold of
25%. These images are provided in two dimensions (2D) for print;
however, three dimensional (3D) visualisations are possible.
5.5. Synchronicity
The temporal lobe of the brain can be related to memory of
temporal events and the senses these stimulate, which can be
related to the synchronicity of the sensory effects. Figure 5 shows a
particular subjects brain for the three different versions of the
video ‘Tron’, where it can be seen that for asynchronous effects the
temporal lobe is the most active. Furthermore, Figure 7 shows
there is an increase of 25% of activity in the temporal lobe for
asynchronous effects for all subjects, whereas, there was no
activity in this lobe observed under the conditions for no effects
and synchronous effects. There is also an increase in activity in the
order of 20-25% for the occipital lobe during synchronous effects;
this is discussed in more detail in 5.6.
5.6. Eye gaze tracking
A method with preliminary results for correlating brain lobe
activity with gaze deviation is presented in this section. Eye gaze
analysis was performed on two individual subjects, one female and
one male, because during the videos it was common for subjects to
drift off camera causing distorted eye gaze logs. This could be
circumvented by keeping subjects still for the length of the test,
reducing the length of the test and/or eye tracking with a wider
view. The two individuals chosen had the least distorted eye gaze.
Blinking caused null values which were removed using linear
interpolation. The data was then up sampled and filtered using the
same methods for the EEG data in Section 5.3. It should be noted
that these two subjects had different opinions of the sensory effects;
this was apparent from the subject’s QoE votes. Named subject one
and two, they are situated on the left and right side for each effect
type in Figure 8, respectively. Subject one had an average QoE
vote of 47, 69 and 68, and subject two, 54, 44 and 45 for no effects,
async effects and sync effects, respectively. The order of the
videos was async effects, no effects and sync effects for subject
one and async effects, sync effects and no effects for subject two.
A much larger standard deviation for synchronous effects is
experienced by subject one but only slightly for subject two. At
this time and for this effect, subject one also has more activity in
the occipital lobe (Figure 6) which may be correlated to the
increased gaze deviation. A significant aspect of the occipital lobe
is the primary visual cortex, which highly correlates to increased
gaze deviation. Subject two lacks activity in this lobe and this is
reflected in the uniform gaze deviation across effect types.
This paper presents results for QoE assessment of multisensory
media using an EEG neuroheadset and eye gaze tracker. The
response to sensory effects when complementing audio/visual
content under asynchronous and synchronous conditions is
compared to content with no effects. The results show that sensory
effects enhance the QoE; however, there was statistically
indiscernible difference between the synchronicities of effects.
Furthermore, the EEG results show that there is correlating brain
activity with a 20-25% decrease in frontal lobe activity for both
asynchronous and synchronous effects. The EEG results also show
that there is an increase in activity in the temporal lobe by 25% for
asynchronous and occipital lobe by 20% for synchronous. The
preliminary gaze tracker results may support this by showing that
gaze deviation and occipital lobe activity increase mutually.
Future work includes identifying the influence of a wider
range of synchronicities on QoE by conducting further user studies
and data mining to gather information of alternate correlations
Frontal Frontal Frontal
Frontal Frontal
No Effects Async Sync
Type of Effects
Subject 1 -
X Coord
Subject 1 -
Y Coord
Subject 2 -
X Coord
Subject 2 -
Y Coord
Figure 5: Subject with increased brain activity in the temporal
lobe. No effects (A), async effects (B) and sync effects (C)
Figure 6: Subject with increased brain activity in the occipital
lobe. No effects (A), async effects (B) and sync effects (C)
No Effects Async Sync
Subjects (%)
Type of Effects
Figure 7: Most active brain lobes for different effects
between biosensors and QoE thus providing better knowledge for
future user studies in the area.
The authors would like to thank all participants for volunteering
their time, the Commonwealth Scientific and Industrial Research
Organisation (CSIRO) for their help with EEG analysis and
Christian Timmerer and Benjamin Rainer for their insightful and
useful preliminary discussions related to this work.
[1] C. Timmerer, M. Waltl, B. Rainer, and N. Murray, "Sensory
Experience: Quality of Experience Beyond Audio-Visual," in
Quality of Experience, ed: Springer, 2014, pp. 351-365.
[2] B. Rainer, M. Waltl, E. Cheng, M. Shujau, C. Timmerer, S.
Davis, et al., "Investigating the impact of sensory effects on
the quality of experience and emotional response in web
videos," in Quality of Multimedia Experience (QoMEX),
2012 Fourth International Workshop on, 2012, pp. 278-283.
[3] M. Waltl, C. Timmerer, and H. Hellwagner, "Improving the
quality of multimedia experience through sensory effects," in
Quality of Multimedia Experience (QoMEX), 2010 Second
International Workshop on, 2010, pp. 124-129.
[4] C. Timmerer, B. Rainer, and M. Waltl, "A utility model for
sensory experience," in Quality of Multimedia Experience
(QoMEX), 2013 Fifth International Workshop on, 2013, pp.
[5] C. Timmerer, M. Waltl, B. Rainer, and H. Hellwagner,
"Assessing the quality of sensory experience for multimedia
presentations," Signal Processing: Image Communication,
vol. 27, pp. 909-916, 2012.
[6] P. C. Petrantonakis and L. J. Hadjileontiadis, "Emotion
recognition from brain signals using hybrid adaptive filtering
and higher order crossings analysis," Affective Computing,
IEEE Transactions on, vol. 1, pp. 81-97, 2010.
[7] R. B. Silberstein and G. E. Nield, "Measuring Emotion in
Advertising Research: Prefrontal Brain Activity," Pulse,
IEEE, vol. 3, pp. 24-27, 2012.
[8] J.-N. Antons, S. Arndt, R. Schleicher, and S. Möller, "Brain
Activity Correlates of Quality of Experience," in Quality of
Experience, ed: Springer, 2014, pp. 109-119.
[9] E. Cheng, S. Davis, I. Burnett, and C. Ritz, "An ambient
multimedia user experience feedback framework based on
user tagging and EEG biosignals," 4th International
Workshop on Semantic Ambient Media Experience (SAME
2011), pp. 1-5, 2011.
[10] J. G. Carrier, "Mind, Gaze and Engagement Understanding
the Environment," Journal of Material Culture, vol. 8, pp. 5-
23, 2003.
[11] R. Steinmetz, "Human perception of jitter and media
synchronization," Selected Areas in Communications, IEEE
Journal on, vol. 14, pp. 61-72, 1996.
[12] W. Yaodu, X. Xiang, K. Jingming, and H. Xinlu, "A speech-
video synchrony quality metric using CoIA," in Packet Video
Workshop (PV), 2010 18th International, 2010, pp. 173-177.
[13] V. Jurcak, D. Tsuzuki, and I. Dan, "10/20, 10/10, and 10/5
systems revisited: their validity as relative head-surface-based
positioning systems," Neuroimage, vol. 34, pp. 1600-1611,
[14] Emotiv. (2014, May). EEG Features [Online]. Available:
[15] M. S. Buchsbaum, "Frontal cortex function," American
Journal of Psychiatry, vol. 161, pp. 2178-2178, 2004.
[16] R. Pascual-Marqui, "Standardized low-resolution brain
electromagnetic tomography (sLORETA): technical details,"
Methods Find Exp Clin Pharmacol, vol. 24, pp. 5-12, 2002.
[17] R. Gupta, S. Arndt, J.-N. Antons, R. Schleicher, S. Moller,
and T. H. Falk, "Neurophysiological experimental facility for
Quality of Experience (QoE) assessment," in Integrated
Network Management (IM 2013), 2013 IFIP/IEEE
International Symposium on, 2013, pp. 1300-1305.
[18] S. Kohlbecher, S. Bardinst, K. Bartl, E. Schneider, T.
Poitschke, and M. Ablassmeier, "Calibration-free eye tracking
by reconstruction of the pupil ellipse in 3D space," in
Proceedings of the 2008 symposium on Eye tracking research
& applications, 2008, pp. 135-138.
[19] C. W. Huang, Z. S. Jiang, W. F. Kao, and Y. L. Huang,
"Building a Low-Cost Eye-Tracking System," Applied
Mechanics and Materials, vol. 263, pp. 2399-2402, 2013.
[20] R. Valenti, J. Staiano, N. Sebe, and T. Gevers, "Webcam-
based visual gaze estimation," Image Analysis and
ProcessingICIAP 2009, pp. 662-671, 2009.
[21] J. San Agustin, H. Skovsgaard, E. Mollenbach, M. Barret, M.
Tall, D. W. Hansen, et al., "Evaluation of a low-cost open-
source gaze tracker," in Proceedings of the 2010 Symposium
on Eye-Tracking Research & Applications, 2010, pp. 77-80.
[22] Sony. (2014, April 24). PS3 EYE CAMERA [Online].
[23] N. G. Croal. (2007, May). Geek Out: The Playstation Eye is
Nearly Upon Us. Dr. Richard Marks Takes Us Behind the
Scenes of its Birth. [Online]. Available:
[24] ITUGazeGroup. (2014, May). ITU Gaze Tracker [Online].
[25] Emotiv. (2014, July). Scientific background for the emotions
in the affectiv suite, what is it? [online]. Available:
[26] K. M. Gilleade and A. Dix, "Using frustration in the design of
adaptive videogames," in Proceedings of the 2004 ACM
SIGCHI International Conference on Advances in computer
entertainment technology, 2004, pp. 228-232.
[27] C. Timmerer, S. Hasegawa, and S. Kim, "Working Draft of
ISO/IEC 23005 Sensory Information," ISO/IEC JTC 1/SC
29/WG 11 N, vol. 10618, 2009.
[28] M. Waltl, C. Timmerer, and H. Hellwagner, "Increasing the
user experience of multimedia presentations with sensory
effects," in Image Analysis for Multimedia Interactive
Services (WIAMIS), 2010 11th International Workshop on,
2010, pp. 1-4.
[29] F. De Simone, M. Naccari, M. Tagliasacchi, F. Dufaux, S.
Tubaro, and T. Ebrahimi, "Subjective assessment of H.
264/AVC video sequences transmitted over a noisy channel,"
in Quality of Multimedia Experience, 2009. QoMEx 2009.
International Workshop on, 2009, pp. 204-209.
[30] P. Le Callet, S. Möller, and A. Perkis, "‘Qualinet White Paper
on Definitions of Quality of Experience," European Network
on Quality of Experience in Multimedia Systems and Services
(COST Action IC 1003), Lausanne, Switzerland, Tech. Rep,
vol. 1.2, p. 6, 2013.
[31] P. ITU-T RECOMMENDATION, "Subjective video quality
assessment methods for multimedia applications," 1999.
[32] (2014, April). Sensory Experience Lab (SELab). Available:
... Past studies suggest that a multisensory hedonic activity enhances hedonic experience more than a unisensory hedonic activity (e.g., Giroux et al., 2019;Pauna et al., 2018;Donley, Ritz, & Shujau, 2014;Santangelo et al., 2008;Moran, Molholm, Reilly, & Foxe, 2008). More specifically, Giroux et al. (2019) investigated the effect that high-fidelity vibro-kinetic chair movements had on people's emotional experience when they listened to music. ...
... Regarding the generated positive hedonic experience, we found significant support for H1a when considering emotional valence using both self-reported and physiological (automatic facial analysis) measures. In other words, the HFVK armchair induced higher emotional valence (positive affect) in line with past studies that suggest a multisensory hedonic activity (compared to a unisensory activity) has a positive effect on hedonic experience (e.g., Giroux et al., 2019;Pauna et al., 2018;Donley et al., 2014). However, this effect's magnitude differed between explicit and implicit measures. ...
... Sensory experience (SE) describes enhancing user Quality of Experience (QoE) of traditional audio-visual media systems by augmenting content with tactile or olfactory information, ambient light, or blowing air [13], [14]. For SE of augmented audio and video content, a great body of research demonstrates increased QoE, e.g., [15], [16]. The present paper aims to investigate SE in a VR environment, dynamically supporting a learning task. ...
... Sensory experience (SE) describes enhancing user Quality of Experience (QoE) of traditional audio-visual media systems by augmenting content with tactile or olfactory information, ambient light, or blowing air [13], [14]. For SE of augmented audio and video content, a great body of research demonstrates increased QoE, e.g., [15], [16]. The present paper aims to investigate SE in a VR environment, dynamically supporting a learning task. ...
Full-text available
Combining interconnected wearables provides fascinating opportunities like augmenting exergaming with virtual coaches, feedback on the execution of sports activities, or how to improve on them. Breathing rhythm is a particularly interesting physiological dimension since it is easy and unobtrusive to measure and gained data provide valuable insights regarding the correct execution of movements, especially when analyzed together with additional movement data in real-time. In this work, we focus on indoor rowing since it is a popular sport that's often done alone without extensive instructions. We compare a visual breathing indication with haptic guidance in order for athletes to maintain a correct, efficient, and healthy breathing-movement-synchronicity (BMS) while working out. Also, user experience and acceptance of the different modalities were measured. The results show a positive and statistically significant impact of purely verbal instructions and purely tactile feedback on BMS and no significant impact of visual feedback. Interestingly, the subjective ratings indicate a strong preference for the visual modality and even an aversion for the haptic feedback, although objectively the performance benefited most from using the latter.
... Past studies suggest that the multisensory characteristic of a hedonic activity enhances the hedonic experience (Donley, Ritz, & Shujau, 2014). Hence, we make the following hypotheses. ...
Conference Paper
Full-text available
This study investigates the impact of information technology (IT) multitasking on multisensory hedonic experience. Existing literature extensively studies the impact of IT multitasking on user experience in a professional context but still lacks insight regarding this influence in a hedonic context. This study contributes to the literature by examining how technology can alter pleasure induced by hedonic activities. In a context of engaged IT interaction along with multisensory music listening, we hypothesize that the multisensory factor positively influences emotional reaction. We also hypothesize that IT interaction will degrade the hedonic experience. We conducted a multi-method experiment using both explicit (questionnaires) and implicit (automatic facial analysis, and electrodermal activity) measures of emotional reactions. Results support our hypotheses and highlight the importance of avoiding multitasking with technology during passive hedonic activities for better experience. Future research may examine IT multitasking's influence on active hedonic activities.
... Scholler et al. [19] recorded EEG data at different scalp positions, which was decoded using linear discriminant analysis to detect the perception of video distortion. Donley et al. [33] studied the impact of various levels of synchrony of wind, vibration and light on audio-visual sequences. The temporal and occipital lobes were found to have more activity during asynchronous and synchronous effects, respectively. ...
We present a survey of psychophysiology-based assessment for Quality of Experience (QoE) in advanced multimedia technologies. We provide a classification of methods relevant to QoE and describe related psychological processes, experimental design considerations, and signal analysis techniques. We summarise multimodal techniques and discuss several important aspects of psychophysiology-based QoE assessment, including the synergies with psychophysical assessment and the need for standardised experimental design. This survey is not considered to be exhaustive but serves as a guideline for those interested to further explore this emerging field of research.
... Such work would facilitate triangulation verification between subjective, objective and necessary utility models required to estimate user QoE of olfaction enhanced multimedia. Some initial works exist in this space such as reported in [Donley et al. 2014], where artificial skews between sensory effects and video content resulted in differences (via raw EEG data) in temporal lobe and occipital activity as opposed to synchronous presentation with just 500ms skew levels. ...
Full-text available
Recently, the concept of olfaction-enhanced multimedia applications has gained traction as a step toward further enhancing user quality of experience. The next generation of rich media services will be immersive and multisensory, with olfaction playing a key role. This survey reviews current olfactory-related research from a number of perspectives. It introduces and explains relevant olfactory psychophysical terminology, knowledge of which is necessary for working with olfaction as a media component. In addition, it reviews and highlights the use of, and potential for, olfaction across a number of application domains, namely health, tourism, education, and training. A taxonomy of research and development of olfactory displays is provided in terms of display type, scent generation mechanism, application area, and strengths/weaknesses. State of the art research works involving olfaction are discussed and associated research challenges are proposed.
Since the advent of HTML5, the development of web applications and libraries has been promoted. Web games using a web browser as a platform have become common due to the development efficiency in web application development. On the other hand, it is necessary to adjust tuning parameters such as sync0hronization methods and in-game parameters during development based on one's own experience and knowledge. To reduce the time required for such tuning and the time required for acquiring development knowledge, this study analyzes user assessment based on EEG information that suggests users' psychological load, questionnaire data after playing, latency time and packet arrival rate for each terminal, and log data of coordinates and input information in the game and develops a QoE metrics based on objective factors.
Conference Paper
Full-text available
The human brain is the epicenter of every human action, thus neurophysiology will pave the way for understanding human behavior and cognition and their interplay with Quality of Experience (QoE). Recent advances in neurophysiological monitoring tools have allowed useful QoE constructs to be measured in real-time, such as human cognition, attention, emotion, fatigue, perception and task performance. In this paper, we describe a multimodal neurophysiological experimental facility recently implemented for QoE evaluation. A description of the facility and the available equipment is presented. Results of three recent studies are also presented, thus showing that neurophysiological correlates can be obtained for i) natural speech and ii) synthesized speech QoE perception, as well as iii) image preference characterization for multimedia QoE evaluation.
Full-text available
This paper aims at providing a new feature extraction method for a user-independent emotion recognition system, namely, HAF-HOC, from electroencephalograms (EEGs). A novel filtering procedure, namely, Hybrid Adaptive Filtering (HAF), for an efficient extraction of the emotion-related EEG-characteristics was developed by applying Genetic Algorithms to the Empirical Mode Decomposition-based representation of EEG signals. In addition, Higher Order Crossings (HOCs) analysis was employed for feature extraction realization from the HAF-filtered signals. The introduced HAF-HOC scheme incorporated four different classification methods to accomplish a robust emotion recognition performance. Through a series of facial-expression image projection, as a Mirror Neuron System-based emotion elicitation process, EEG data related to six basic emotions (happiness, surprise, anger, fear, disgust, and sadness) have been acquired from 16 healthy subjects using three EEG channels. Experimental results from the application of the HAF-HOC to the collected EEG data and comparison with previous approaches have shown that the HAF-HOC scheme clearly surpasses the latter in the field of emotion recognition from brain signals for the discrimination of up to six distinct emotions, providing higher classification rates up to 85.17 percent. The promising performance of the HAF-HOC surfaces the value of EEG signals within the endeavor of realizing more pragmatic, affective human-machine interfaces.
This chapter outlines common brain activity correlates that are known from neuroscience, gives an overview on established electrophysiological analysis methods and on the background of electroencephalography (EEG). After that an overview on study designs will be given and a practical guideline for the design of experiments using EEG in the research area of Quality of Experience (QoE) will be presented. At the end of this chapter we will close with a summary, give practical advice, and we will outline potential interesting future research topics.
This chapter introduces the concept of Sensory Experience which aims to define the Quality of Experience (QoE) going beyond audio-visual content. In particular, we show how to utilize sensory effects such as ambient light, scent, wind, or vibration as additional dimensions contributing to the quality of the user experience. Therefore, we utilize a standardized representation format for sensory effects that are attached to traditional multimedia resources such as audio, video, and image contents. Sensory effects are rendered on special devices (e.g., fans, lights, motion chair, scent emitter) in synchronization with the traditional multimedia resources and shall stimulate also other senses than hearing and seeing with the intention to increase the Quality of Experience (QoE), in this context referred to as Sensory Experience.
This paper presents the developing of a low-cost eye-tracking system by modifying the commercial-over-the-shelf camera to integrate with the proper-tuned open source drivers and the user-defined application programs. The system configuration is proposed and the gaze-tracking approximated by the least square polynomial mapping is described. Comparisons between other low-cost systems as well as commercial system are provided. Our system obtained the highest image capturing rate of 180 frames per second, and the ISO 9241-Part 9 test performance favored our system, in terms of Response time and Correct response rate. Currently, we are developing gaze-tracking accuracy application. The real time gaze-tracking and the Head Movement Estimation are the issues in future work.
Conference Paper
Enriching multimedia with additional effects such as olfaction, light, wind, or vibration is gaining more and more momentum in both research and industry. Hence, there is the need to determine the influence of individual effects on the Quality of Experience (QoE). In this paper, we present a subjective quality assessment using the MPEG-V standard to annotate video sequences with individual sensory effects (i.e., wind, light, and vibration) and all combinations thereof. Based on the results we derive a utility model for sensory experience that accounts for the assessed sensory effects. Finally, we provide an example instantiation of the utility model and validate it against current and past results of our subjective quality assessments conducted so far.
A quality model was built to assess the influence of speech-video asynchrony on the audio-visual quality perception. The audio-visual contents were separated into two categories: “speaker inside” and “speaker outside”, depending on whether the speaker is inside the video. For the first category, speech was shifted in a small scale. DCT and MFCC coefficients were calculated from video and speech separately. A Co-inertia Analysis (CoIA) was used to decide the speech-video correlation, and as the speech progressively shifts, a correlation curve emerged. The curve was modeled by an Gaussian function, and then the function was used to predict the perceptual quality. On the other hand, a Gaussian curve was used to predict the perceptual quality of the “speaker outside” category. A subjective test proved the effectiveness of the proposed method.
A range of works on people's understandings of their surroundings express a recurring assumption that, as societies become more urban and capitalist, people are decreasingly likely to engage in practical and material ways with their surroundings, and are consequently more likely to stand at a distance from those surroundings culturally. This assumption implies that practical engagement is the only form of engagement that leads people to meaningful and consequential relationships with their surroundings, that the alternative to such engagement is something like the tourist gaze or the flâneur. Using a study of some environmental activists, this article raises questions about this assumption.