Content uploaded by Antonin Fourcade
Author content
All content in this area was uploaded by Antonin Fourcade on Dec 19, 2024
Content may be subject to copyright.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
1
AffectTracker: Real-time continuous rating of affective
experience in immersive Virtual Reality.
Fourcade, A* [1,2,3,4], Malandrone, F* [5], Roellecke, L [2], Ciston, A [2,3], de Mooij, J. [2], Villringer, A
[1,2,3,4], Carletto, S.§[5] and Gaebler, M.§[2,3]
*These authors share first authorship.
§These authors share senior authorship.
(1) Max Planck School of Cognition, Stephanstrasse 1a, Leipzig, Germany
(2) Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig,
Germany
(3) Max Planck Dahlem Campus of Cognition, Max Planck Society, Berlin, Germany
(4) Charité - Universitätsmedizin Berlin, Germany
(5) Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
Corresponding authors
francesca.malandrone@unito.it;antonin.fourcade@maxplanckschools.de
Keywords
affective states, emotion, virtual reality, dynamics, moment-to-moment, continuous affect analysis
Availability of data and materials
Data as well as videos depicting the Grid and Flubber feedback options, the experimental set-up as well as the
custom-made stimuli used used in Study 2: Evaluation are available at https://doi.org/10.17617/3.QPNSJA
Code availability
The tool (including manual) is available at https://github.com/afourcade/AffectTracker.
All code used for all analyses and plots are publicly available on GitHub at
https://github.com/afourcade/AffectiveVR.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
2
Funding
This research was supported by the Max Planck Dahlem Campus of Cognition (MPDCC) and funded by the
Max Planck Society - Fraunhofer-Gesellschaft cooperation (project “NEUROHUM”) and the German Federal
Ministry of Education and Research (BMBF grant 13GW0488). This project was also funded by the Grant for
Internationalization of the Department of Clinical and Biological Sciences, University of Turin, Italy
(CARS_GFI_22_01_F and CARS_RILO_22_03_F).
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethics approval
The research protocol was approved by the ethics committee of the Psychology Department at the
Humboldt-University zu Berlin (2023-21) and by the University Bioethics Committee of the University of
Turin (prot. number 0218914). All procedures contributing to this work comply with the ethical standards of the
relevant national and institutional committees on human experimentation and with the Helsinki Declaration of
1975, as revised in 2008. Written informed consent was obtained from participants to participate in the study.
Authors' contributions
A. Fourcade: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project
administration; Software; Visualization; Writing – original draft; Writing – review & editing
F. Malandrone: Conceptualization; Investigation; Methodology; Project administration; Writing – original draft;
Writing – review & editing
L. Roellecke: Formal analysis; Validation; Visualization; Writing – review & editing
A. Ciston: Methodology; Software; Writing – review & editing
J. de Mooij: Methodology; Software; Writing – review & editing
A. Villringer: Funding acquisition; Supervision; Writing – review & editing
S. Carletto: Conceptualization; Funding acquisition; Methodology; Project administration; Resources;
Supervision; Writing – original draft; Writing – review & editing
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
3
M. Gaebler: Conceptualization; Funding acquisition; Methodology; Project administration; Resources;
Supervision; Writing – original draft; Writing – review & editing
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
4
Abstract
Subjective experience is key to understanding affective states, characterized by valence and arousal. Traditional
experiments using post-stimulus summary ratings do not resemble natural behavior. Fluctuations of affective
states can be explored with dynamic stimuli, such as videos. Continuous ratings can capture
moment-to-moment affective experience, however the rating or the feedback can be interfering.
We designed, empirically evaluated, and openly share AffectTracker, a tool to collect continuous ratings of
two-dimensional affective experience (valence and arousal) during dynamic stimulation, such as 360-degree
videos in immersive virtual reality. AffectTracker comprises three customizable feedback options: a simplified
affect grid (Grid), an abstract pulsating variant (Flubber), and no visual feedback.
Two studies with healthy adults were conducted, each at two sites (Berlin, Germany, and Torino, Italy). In
Study 1 (Selection: n=51), both Grid and Flubber demonstrated high user experience and low interference in
repeated 1-min 360-degree videos. Study 2 (Evaluation: n=83) confirmed these findings for Flubber with a
longer (23-min), more varied immersive experience, maintaining high user experience and low interference.
Continuous ratings collected with AffectTracker effectively captured valence and arousal variability. For
shorter, less eventful stimuli, their correlation with post-stimulus summary ratings demonstrated the tool’s
validity; for longer, more eventful stimuli, it showed the tool’s benefits of capturing additional variance.
Our findings suggest that AffectTracker provides a reliable, minimally interfering method to gather
moment-to-moment affective experience also in immersive environments, offering new research opportunities
to link affective states and physiological dynamics.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
5
Introduction
What are affect and (subjective) affective experiences?
Affective states, such as emotions, are adaptive responses to external stimuli, such as challenging situations or
pleasant social interactions, or to self-generated mental states, such as memories. The most direct way to assess
the affective state of humans is to ask them how they feel, that is, to have them report or rate their subjective
affective experience (i.e., feelings; Damasio et al., 2000).
Affective states are an essential part of our experience of the world (James, 1884; James, 1890; Seth, 2013) and
they are crucial for our physical and mental health (Gross and Muñoz, 1995). Dimensional approaches consider
affective experience along the two axes of valence, ranging from negative (or displeasure) to positive (or
pleasure), and arousal, ranging from low to high intensity or activation (Duffy, 1957; Kuppens et al., 2013;
Russell, 1980; Russell and Barrett, 1999; Wundt and Judd, 1897). The rationale is that this latent space better
captures the structure of affective states than individual, discrete emotion categories (Russell, 1980; Russell &
Barrett, 1999), which have also been difficult to consistently associate with specific response patterns in the
autonomic nervous system (Kragel and Labar, 2013; Kreibig, 2010; Siegel et al., 2018) or in distinct brain
regions (Lindquist et al., 2012; but Saarimäki et al., 2016).
How can affective experience be measured?
Common ways to quantify the arousal and valence of a feeling state are through separate (e.g., 5-, 7- or
9-point) Likert-type scales with numbers or pictures (like the self-assessment manikin, SAM; Bradley & Lang,
1994) as labels, or continuous “visual analogue scales” (VAS; e.g., Kron et al., 2013). Both dimensions can also
be combined in a two-dimensional (Cartesian) coordinate system (“affect grid”; Russell, 1989), which allows
individuals to self-report their affective states by marking a point on a grid, where the perpendicular axes
represent the continuum of valence (typically on the x-axis and ranging from “negative” to “positive”) and
arousal (typically on the y-axis and ranging from “low” to “high”), with or without a visual aid (e.g., a 5x5-,
7x7-, 9x9-grid).
Assessing both dimensions simultaneously can be considered more comprehensive and nuanced because it (1)
captures joint variance (e.g., arousal and valence sometimes show a U-shaped relationship in that more positive
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
6
or more negative valence is more intense and rated higher along the dimension of arousal; Kuppens et al., 2013;
Yik et al., 2023); and (2) facilitates rapid and repeated ratings (Russell, 1989), which are essential to capture the
dynamics of affective experience.
The dynamics of affective experience
Psychophysiological experiments of affective states classically involve trial-based (i.e., discrete, sequential,
repetitive) designs, such as passive viewing of affective pictures with individual post-stimulus summary ratings
(e.g., with SAMs). Such tests create an artificial experience for participants and lack the complexity and context
of real-life experiences: in daily life, events do not occur suddenly but are embedded in a continuous sequence,
and natural human behavior unfolds over multiple timescales (Huk et al., 2018). It is becoming increasingly
clear that humans are in constantly fluctuating states both physiologically (see “resting-state” literature; Fox &
Raichle, 2007) and psychologically (e.g., “mind wandering”), which can be influenced by external stimuli. In
this vein, humans can be seen as constantly in states of pleasant or unpleasant arousal (‘core affect’; Russell and
Barrett, 1999; Lindquist, 2013) that temporally evolve in interactions with complex, dynamic environments.
Advances in stimulation and analysis techniques have recently enabled experiments to study the physiological
and psychological dynamics, such as the variability of affective response, for example, using movies
(Westermann et al., 1996; Hasson et al., 2004; Saarimäki, 2020) or immersive virtual reality (iVR), in which
participants are surrounded by interactive, dynamic, computer-generated environments that are often presented
in stereoscopic head-mounted displays (HMDs; Riva et al., 2007; Chirico and Gaggioli, 2019).
In summary, the waxing and waning of affective experience to dynamic stimuli may be insufficiently captured
by the typical one-time (“summary”) ratings when the stimulation (e.g., the movie or VR experience) is over.
Continuous ratings can be assumed to capture the affective experience in a more fine-grained fashion, also
because post-hoc (hindsight) ratings can be susceptible to distortions and biases (Kaplan et al., 2016; Levine
and Safer, 2002).
One possibility is to replay an audiovisual recording of the experience during the post-hoc rating to aid recall
(McCall et al., 2015; Hofmann et al., 2021) by minimizing biases related to the point of view (Berntsen and
Rubin, 2006; Marcotti and St Jacques, 2018) or timescale (e.g., Fredrickson and Kahneman, 1993). As a replay
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
7
extends (in case of a 1-to-1 replay “doubles”) the duration of the experiment, we here aim to collect ratings
during the experience, that is, in “real time” or “online”.
Continuous real-time ratings of affective experience have, for example, been collected for acoustic and visual
stimuli: while listening to poems (Wagner et al., 2021) or music (Nagel et al., 2007; Vuoskoski et al., 2021;
McClay et al., 2023), one-dimensional scales were used by moving the index finger (“liking”; Wagner et al.,
2021) or using the computer keyboard or mouse (Vuoskoski et al., 2021; McClay et al., 2023) as well
two-dimensional variants of the affect grid, such as a square (EMuJoy; Nagel et al., 2007) or a circle (Emotion
Compass; McClay et al., 2023). While watching videos, one-dimensional scales were used, for example, to rate
the degree of emotional arousal with the computer mouse or a USB dial (McCall et al., 2015; Hofmann,
Klotzsche, Mariola et al., 2021) as well as two-dimensional variants of the affect grid, such as a square with
SAMs displayed in the upper right corner of the (2D screen-based) video and rated with a joystick (CASE;
Sharma et al., 2017), a square with emoticons, displayed below the video and rated with a touch-pad
(AVDOS-VR; Gnacek et al., 2024), a square with emotion words displayed next to the video and rated with a
joystick (DARMA; Girard et al., 2018), or a free-floating, colored one, rated with a joystick (Xue et al., 2020;
Xue et al., 2021).
While real-time continuous ratings promise to effectively study affective dynamics, they may also influence the
stimulus perception and the experience itself. For example, the rating (task, activity, feedback) may interfere
with participants' experience, that is, demands of “dual-tasking” may occupy limited cognitive (e.g.,
attention-based) resources (as reported for working memory; Doherty et al., 2019). In addition, explicitly rating
one’s emotions may be a form of implicit emotion regulation just as putting feeling into words can attenuate
emotional experiences (Lieberman et al., 2011). It is therefore essential to identify continuous rating methods
that minimize potential interference with the subjective experience. From a technical angle, this can also be
framed in the context of “user experience”, that is, the assessment of how effectively, efficiently, and
satisfactorily a user can interact with a product or system to achieve specific goals (Jordan, 1998), in this case,
the rating of affective experience. To maximize user experience and manage attention, design principles of
peripheral feedback have been developed, which leverage the human ability to process information in the
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
8
periphery of attention. That is, peripheral feedback aims to provide users with important information without
demanding their full attention or interrupting their primary tasks (Bakker et al., 2016).
Hence, the requirements for our rating tool were to (1) continuously collect both valence and arousal ratings
during the experience (in “real-time”) and compare it with summary ratings, while (2) minimizing the rating’s
interference on the experience itself.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
9
Methods
We designed a tool, the AffectTracker, to collect continuous or moment-to-moment ratings of two-dimensional
affective experiences (valence and arousal) during dynamic stimuli, such as 360-degree videos, while
minimizing the rating's impact on the experience itself. We opted for 360-degree videos—fixed-sequence
audiovisual stimuli—over fully interactive content, such as games, to strike a balance between providing an
immersive experience while keeping cognitive demands low. This approach allowed participants to remain
hands-free and limited their degree of freedom, which was essential to ensuring high-quality data for future
physiological recordings.
In Study 1 ("Selection"), we aimed to identify the AffectTacker feedback with the best compromise between
validity for continuous measurement and interference with the experience, using brief 1-minute videos. Study 2
("Evaluation") then tested the selected feedback during longer affective stimulation, extending to 23 minutes, to
assess its performance over extended periods. The materials and procedures common to both studies are
described below, while the specifics of each study are reported in dedicated sections. Both studies were
conducted both at the Max Planck Institute (MPI) for Human Cognitive and Brain Sciences, Leipzig, Germany
(BER) and at the Department of Clinical and Biological Science, University of Turin, Italy (TUR). The research
protocol was approved by the local ethics committees.
Common Materials
Equipment
All sessions in both sites took place in dedicated rooms for immersive Virtual Reality (iVR) experiments, with
participants seated on swivel chairs. HTC Vive Pro headsets (HTC, Taiwan) with headphones were used. The
iVR headset offers stereoscopy with two 1400 × 1600-pixel OLED displays (615 PPI), a 110° field-of-view,
and a frame rate of 90 Hz. The iVR application was developed in Unity 2022.3.12 and therefore built as an
executable that runs on a Windows 10 machine. The rating sampling is set at a frequency of 20 Hz, auto-saving
the data every 30 seconds to disk. Experiments on the BER site were conducted in English, while those on the
TUR site were conducted in Italian.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
10
AffectTracker
The AffectTracker has been developed with Unity 2022.3.12 under OpenXR (khronos.org/openxr/). The tool is
designed to work with the touchpad or joystick of a iVR controller of any iVR equipment supported by
OpenXR and the type of real-time visual feedback they receive can be customized. We released the
AffectTracker as a Unity prefab here: https://github.com/afourcade/AffectTracker.
To continuously rate the moment-to-moment affective state, users simultaneously indicate their arousal and
valence experience using the input device of the iVR controller (e.g., in our studies: touchpad of the HTC Vive
Pro), onto which an affect grid with horizontal (valence, range [-1 1]) and vertical (arousal, range [-1 1]) axis is
mapped (see Figure 1).
The tool includes customization options with respect to:
●visual user feedback
●haptic vibrations (e.g., to remind users to rate continuously)
●adjustable sampling frequency
The AffectTracker offers several options for the visual feedback that can be switched on and off independently:
1. Grid: visualize the valence-arousal space with a simplified version of the affect grid, with four static
quadrants, a moving dot/cursor and no text.
2. Flubber: visualize the valence-arousal space as a moving abstract shape, called Flubber, whose
low-level visual features are mapped onto the valence-arousal space. The Flubber consists of three parts
that can be toggled on and off independently: a base, an outline and a halo. The base is the central
circular body with radiating projections, while the outline and halo have been designed to make the base
stand out against any background. Many of the visual features, as well as their mapping onto the
Cartesian (valence, arousal) and polar (distance, angle) coordinates of the affect grid, can be adjusted.
The following table describes each customizable feature and indicates its default/recommended values:
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
11
Table 1. Customizable low-visual features for the Flubber feedback. Each feature takes a minimum and a
maximum value as inputs, corresponding to extrema of the mapped coordinates in the valence-arousal space.
Default values and possible range are described.
Feature
Min
(default)
Max
(default)
Range
Description
Projection
Smoothness
0
1
0-1
angular, edgy, pointy, sharp (value: 0) vs.
rounded, smooth, soft (value: 1) projections
Projection
Amplitude
0.2
0.4
0-1
amplitude or height of the projections from the
main body of the Flubber
Color
0
1
0-1
position in a predefined color (Base of Flubber)
gradient
Saturation
0
1
0-1
proportional color saturation value (note: can go
higher than 1)
Oscillation
Frequency
0.5
2.5
0-90
frequency in Hz of projection oscillations
Projection Time
Synchronization
0.8
0
0-1
degree of temporal de-synchronization of all
projections (higher values are less synchronized;
0 is perfectly synchronized)
Projection
Amplitude
Difference
0.8
0
0-1
degree of asymmetry in the amplitudes of all
projections (0 is perfectly symmetrical)
Projection Time Synchronization and Projection Amplitude Difference jointly control the degree of regularity
in the oscillations of the Flubber (e.g., from chaotic/irregular to regular).
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
12
Each feature needs a minimum and a maximum value as inputs, which correspond to extrema of the mapped
coordinates. For example, when a feature is mapped to the x-axis, min corresponds to the left-most while max
corresponds to the right-most side of the grid. Similarly, when mapped to the y-axis, min refers to the bottom
and max to the top.
Both Grid and Flubber are abstract graphical representations that are language-independent (i.e.,
cross-culturally suitable): they do not require the participants to verbalize potentially complex or mixed
emotions (Toet et al., 2018), which could help with minimizing interferences with subjective experience and
brain activity (Lieberman et al., 2011).
The goal for Flubber was to design an abstract representation of the affect grid that is as universal and intuitive
as possible in order to provide feedback without demanding the users’ full attention and interfering with their
experience (peripheral interaction; Bakker et al., 2016). For our studies, the Flubber was placed at the
center-bottom of the visual field, which allows the feedback to be in the peripheral vision while minimizing
blur caused by the lenses of the HTC Vive Pro. The affect grid's classical rating format can be challenging for
participants to understand and heavily relies on prior instructions and training (Ekkekakis, 2013). A more
intuitive and less cognitively demanding rating system, suitable for non-experts, could reduce the need for
deliberate reasoning (Evans, 2010), which is crucial for repeated or even continuous ratings. Pinilla and
colleagues (2021) highlight how visual properties like rounded lines and regular movements are consistently
associated with positive valence, while angular shapes and erratic movements are tied to negative valence.
Similarly, faster movements correspond to higher arousal, while slower ones denote lower arousal (e.g., Feng et
al., 2014). The design of Flubber was informed by these findings and inspired by Emotion-prints (Cernea et al.,
2015), a tool to visualize two-dimensional affective dimensions in the context of multi-touch systems, where
valence was mapped to the line smoothness and arousal to the color and pulsation frequency of the contour of a
touched area.
As there is accumulating evidence that valence and arousal are correlated (Yik et al., 2023; Kuppens et al.,
2013), we added the option to map the features to the polar coordinates (distance and angle) instead of the x-
and y-axes of the affect grid.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
13
Of note, given its common use in other tools (e.g., McClay et al., 2023), we provide the option to have an
affect-color mapping. We chose not to map the Flubber’s color in our two studies, as color associations to
specific emotions can differ between cultures (Hupka et al., 1997; Madden et al., 2000; Soriano & Valenzuela,
2009) and contexts (Lipson-Smith et al., 2020).
A video demonstrating the Grid and Flubber (with the visual-affect mapping used in our studies) feedback
options can be found here: https://doi.org/10.17617/3.QPNSJA
Surveys
In the Selection study, LimeSurvey (LimeSurvey GmbH, n.d.) was chosen for pre- and post-experimental
digital surveys, while SoSci Survey (Leiner, 2021) was used in the Evaluation study. Before the experiment, in
both Selection and Evaluation studies, the survey included demographic questions (age, gender), a question on
prior iVR and gaming experiences and a shortened version of the Simulator Sickness Questionnaire (SSQ;
Kennedy et al., 1993). In addition to these instruments, the Evaluation study's pre-experimental survey also
included an additional demographic question (education), the Toronto Alexithymia Scale (TAS; Leising et al.,
2009), and the Multidimensional Assessment of Interoceptive Awareness (MAIA; Mehling et al., 2018) to
provide a more detailed and comprehensive characterization of the sample.
During the experiment, participants in both studies completed the System Usability Scale (SUS; Brooke, 1996,
2013) at the end of each experimental block to assess Usability, as well as the Kunin Scale (Kunin, 1998) to
gauge user Satisfaction. Participants also rated the Distraction of the feedback by responding to the statement,
"The feedback was distracting and/or disturbing," using a 7-point Likert scale (1 = strongly disagree, 7 =
strongly agree). Additionally, questions assessing the Emotional Representation of the feedback ("How much
was the feedback representative of your inner emotions?") and Sense of Presence ("How strongly did you
experience these videos/situations?" and "How much were you aware of the outside world?") were presented on
a 7-point Likert scale.
After the experiment, participants in the Selection study indicated their preferred feedback with the question
"Of the three feedback options you used, which did you like best?" and they were also invited to provide
general comments on their experience. Similarly, in the Evaluation study, participants were asked to provide
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
14
their general impression through a post-experimental survey. This included an additional open-ended question,
asking participants to reflect specifically on their experience with continuous rating. Both studies concluded
with participants completing a final shortened version of the SSQ to assess any symptoms of simulator
sickness.
Common Procedures
Participants
The sample size for both studies was determined to be at least 50 participants, chosen arbitrarily. Participants
were recruited by providing a description of the project and listing its research objectives. Participation was
voluntary, and informed consent was collected in written form. Reimbursement was provided to participants at
the BER site who were compensated with €12 per hour. The inclusion criteria were being at least 18 years old
with normal or corrected-to-normal vision. The exclusion criteria included having suffered from, or currently
suffering from, any psychiatric or neurological disorder, as well as having a dependency disorder or being
engaged in substance abuse within the last 6 months. For the BER recruitment (through Castellum; Bengfort et
al., 2022), proficiency in speaking and understanding English was required. For both Selection and Evaluation
studies, participants were informed that some of the videos contained scenes depicting spiders, blood, snakes,
dead corpses, or heights. Therefore, individuals with severe phobias or fears related to these stimuli were
advised not to participate. Participants were also asked not to consume caffeine and nicotine within 3 hours
before the experiment.
Preprocessing
All code used for all analyses and plots are publicly available on GitHub at
https://github.com/afourcade/AffectiveVR. All data were preprocessed in Python (version 3.10) as follows:
Continuous ratings (CRs) were (linearly) resampled to 20 Hz to ensure uniform sampling. As participants
typically took a few seconds at the start of each trial to initiate and stabilize their ratings, the first five seconds
of each trial were discarded. To obtain a singular representation of the CRs and facilitate comparison with the
single rating (SR), CRs indices (CRi) were derived from the CRs. This involved computing various
characteristics of the CRs distribution over time, including last rating, central tendencies such as mean, median,
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
15
and mode, dispersion tendencies including maximum (max), minimum (min), standard deviation (std),
coefficient of variation (cv), and interquartile range (iqr), shape of the distribution such as skewness (skew) and
kurtosis, and area under the curve (auc).
A Sense of Presence score was calculated from the two questions by reverse coding the second question and
computing the mean of both questions. An Usability score was calculated from the SUS questionnaire by
adapting the standard procedure to a shorter version of the survey and a different Likert scale. As there were 7
questions with responses within the 0-6 range, for each of the even numbered questions (positive), we
subtracted 0 from the response; for each of the odd numbered questions (negative), we subtracted their value
from 6. We added all the new values together and multiplied this score by 100/(7*6) = 2.38 in order to have a
total score within the 0-100 range. iVR experience scores were derived by computing the mean of the two
questions about iVR and gaming experience.
Study 1: Selection.
The Selection study aimed to design, refine and evaluate our CRs method and the different feedback options,
suitable for use in a 360° iVR environment, by assessing their comparability with SR and evaluating
Interference and User Experience characteristics, to identify the most effective feedback for the AffectTracker.
To support this aim, we selected 60-second videos from YouTube, allowing for multiple repetitions of the
experimental conditions (Table 2). These videos were specifically curated to elicit low affective variability
(over time), featuring emotional-inducing content with minimal events, and were positioned within distinct
affective quadrants of the arousal-valence space. This type of stimulus (i.e., inducing stable - constant over time
- affective experiences) was chosen to carefully extend the classical approach of a short event-related stimulus
associated with a SR, as an intermediate type of stimulus between static images and long eventful videos.
Table 2. Detailed descriptions of the four 360° VR videos used in Study 1: Selection.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
16
Video label
Description
Purpose
Link
Animal puppies
A video showing animal
puppies in natural settings. The
animals remained still.
Low arousal - positive valence (LP).
To create a pleasant and soothing
emotional response.
https://www.youtube.com/watch?
v=FMU0jd2IUks
Abandoned
power plant
An uneventful scene set in an
abandoned power plant.
Low arousal - negative valence (LN).
To inspire a somber and unsettling
feeling, emphasizing the emptiness
and stillness of the scene.
https://www.youtube.com/watch?
v=Jl7Etw0a7ro
Skydiving
A video depicting skydiving
experience with a wingsuit.
High arousal - positive valence (HP).
To produce an exhilarating and
energizing emotional experience,
heightening the sense of thrill.
https://www.youtube.com/watch?
v=IrE6T1rct6g
Haunted house
A video featuring a haunted
house. The video included small
jumpscare moments, producing
feelings of tension and fear
through a sinister atmosphere.
High arousal - negative valence
(HN).To provoke fearful reactions,
amplifying the sense of dread and
suspense.
https://www.youtube.com/watch?
v=g1WBh457UUM
Feedback for Continuous Rating (CR)
Participants rated their affective state continuously during the video presentation and with a SR after the video,
using the touchpad on the iVR controller. The possible range for all ratings on both dimensions was [-1 1].
During the CR, the participant’s thumb trajectory on the touchpad was recorded at a sampling frequency of 20
Hz. During the SR, the participants placed their thumb on the position corresponding to their subjective
experience and validated their rating by pressing the trigger button on the iVR controller.
Three prototypes of feedback for CR were tested in a pilot study with 12 participants which was conducted to
verify the feasibility of the Selection study and to refine the design and functionality of the different feedback
options (see Supplementary Material S1 for a detailed description of the feedback development and the pilot
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
17
study). The Selection study included the refined feedback options derived from the insights gained in the pilot
study (Figure 1).
Figure 1. Feedback options used in Study 1: Selection for CR of affective states: Grid, Flubber, and
Proprioceptive. The Grid feedback displayed valence on the x-axis and arousal on the y-axis, with participants’
ratings represented by a dot in the arousal-valence space. The Flubber feedback used a dynamic abstract shape
that changed based on participants' affective ratings, mapping arousal and valence onto its form and
oscillation. The Proprioceptive feedback relied on participants' proprioceptive input on the touchpad, with no
visual feedback, but included periodic vibrations to prompt continuous rating.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
18
When using the Grid feedback, participants encountered a simplified version of the affect grid, featuring
valence on the x-axis (negative to positive) and arousal on the y-axis (low to high). A dot continuously
represented their momentary rating in this valence-arousal space.
Conversely, the Flubber feedback presented participants with an abstract shape called Flubber, dynamically
changing based on their affective ratings. Low-level visual properties were mapped onto the Cartesian
coordinates in the grid. Thus, y coordinates corresponded to arousal, with projection’s amplitude (set to defaults
values: min: 0.2, max: 0.4) and oscillation frequency (set to defaults values: min: 0.5, max: 2.5) varying with
arousal levels. Of note, the frequency of the oscillations was tailored to [0.5 2.5] Hz after experimenting during
the pilot. Meanwhile, x coordinates represented valence, with projection’s smoothness (set to defaults values:
min: 0, max: 1), projection’s time synchronization (set to defaults values: min: 0.8, max: 0) and projection’s
amplitude difference (set to defaults values: min: 0.8, max: 0) reflecting different valence levels. For example,
placing their thumb on the lower left side (e.g., -1,-1 coordinate) of the touchpad would thus result in a slowly
pulsating Flubber with irregular angular projections, representing the participant’s experience of low arousal
and negative valence. Similarly, placing their thumb on the top right side (e.g., 1,1 coordinate) of the touchpad
would thus result in a quickly pulsating Flubber with regular smooth projections, representing the participant’s
experience of high arousal and positive valence.
Finally, with the Proprioceptive feedback, no visual feedback was given. Participants needed to rely solely on
their thumb's proprioceptive awareness on the touchpad, indicating their position in the valence-arousal space.
To ensure participants would not forget to continuously rate when no feedback was provided, the iVR controller
vibrated periodically every two seconds (0.5 Hz frequency, independent of the ratings) as a reminder.
Study design
The observational study employed a 4x4 within-subjects design with two factors with four levels each:
feedback (Grid, Flubber, Proprioceptive and Baseline) and videos that could each elicit affective responses
within a different quadrant of the valence-arousal space (high arousal - negative valence [HN], high arousal -
positive valence [HP], low arousal - negative valence [LN], and low arousal - positive valence [LP]). A nested
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
19
randomization of the videos within blocks and blocks across subjects was performed, using custom Python
scripts and the numpy.random package.
Experimental design and measured variables
The experimental design is illustrated in Figure 2. The total duration of the experiment was approximately one
hour (see Supplementary Material S2 for the detailed session script).
Figure 2. Experimental design of Study 1: Selection. The experiment lasted approximately one hour and
consisted of four blocks, each containing one of three continuous rating (CR) feedback conditions—Grid,
Flubber, and Proprioceptive—or a Baseline condition without CR. Participants completed a training phase,
where they interacted with 2D images from the International Affective Picture System (IAPS), followed by a
trial phase with 360° iVR videos that elicited affective responses. CR and summary rating were collected in
each trial. After each block, participants evaluated the User Experience and Interference through
questionnaires about Distraction, Usability, Satisfaction, Emotional representation, and Sense of Presence.
The experiment comprised three conditions in which participants continuously rated their affective experience
using different feedback options: Grid, Flubber, and Proprioceptive. Additionally, a Baseline condition with no
CR during the videos was included. The four conditions were presented in random order across four blocks of
trials. Each block consisted of three phases: training, four trials in which CR and SR scores were collected, and
a questionnaire session with in-iVR questions.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
20
During the training, participants were shown four 2D pictures from the International Affective Picture System
(IAPS; Bradley & Lang, 2007), each representing one quadrant of the valence-arousal space. These pictures
were randomly presented for a maximum of 100 seconds each, allowing participants to self-pace, and were
combined each time with a different feedback. After a welcome screen, participants familiarized with the iVR
environment, followed by a brief introduction to the feedback or Baseline condition. A 3-second fixation cross
preceded the picture presentation, during which participants could become accustomed to the feedback. The
training concluded with a screen summarizing the instructions for reporting affective states for that specific
block. For each block, 360° videos with 4K (4096x2160 pixels) resolution were used to elicit an affective
response. The videos were edited to 60 seconds each and shown four times overall, once in each of the four
blocks, in a randomized order. Each experimental block began with a 3-second fixation cross, followed by one
of the videos combined with the feedback or Baseline condition, and concluded with a screen for SR.
Upon completion, a final screen appeared, signaling the end of the procedure and thanking the participants for
their involvement.
Statistical analysis
All data were analyzed using custom scripts in R (version 4.1.0). To examine associations between CRis and
SR, Pearson correlations were computed between each CRi and the corresponding SR across all conditions
(feedback and videos) and participants. To investigate the influence of the feedback options on these
associations, correlation coefficients between each feedback were compared. Specifically, differences in
Pearson’s rfor each pair of feedback were statistically tested using the cocor package in R (Diedenhofen &
Musch, 2015), treating the correlations as two non-overlapping correlations based on dependent groups.
Comparison between feedback options - Interference and User Experience
The differences in Interference between feedback options were examined both (1) directly, using the Distraction
and Sense of Presence questionnaires, and (2) indirectly, investigating differences between SRs during CR
rating (i.e, during each feedback condition [SRfeedback]) and no rating (i.e., baseline [SRbaseline]). The differences
in User Experience were examined through the Usability, Satisfaction and Emotion Representation
questionnaires. Differences in Distraction, Sense of Presence, Usability, Satisfaction, and Emotion
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
21
Representation scores were tested using a Type 3 ANOVA with the factor feedback (four levels: Grid, Flubber,
Proprioceptive, Baseline), followed by post-hoc t-tests (with false discovery rate [FDR] correction) to explore
specific differences within feedback. For each feedback, differences between SRfeedback and SRbaseline were
assessed using paired t-tests.
Equivalence tests were conducted using the TOST (Two One-Sided Tests) approach (Lakens et al., 2018). For
the questionnaire comparison, equivalence tests were conducted using a SESOI of 7 points for Usability scores
on a 0-100 scale and 0.5 for the other scores. For the SR comparison, a SESOI of 0.125 points (raw effect size)
on a scale ranging from -1 to 1 for arousal and valence was chosen.
Study 2: Evaluation.
The Evaluation study aimed to assess the selected feedback and compare CR to SR during a longer and more
varied 360° iVR stimulus, while also re-evaluating User Experience and Interference factors.
Additionally, data of a subset of participants who performed the experiment in a standing position instead of
being seated were collected in the Berlin site only, with the aim to perform a sub-analysis by exploring the
effects of body posture on emotion ratings (Nair et al., 2015). Such effects may be subtle for self reports but
more pronounced for cardiorespiratory signals (Widmaier et al., 2022), which we aimed to acquire in a later
phase of the project.
To curate the ideal stimulus, we followed specific requirements: the videos needed to be 360-degree, have a
resolution of 4K (4096x2160 pixels), be stereoscopic (top-bottom), and have a duration of around 20 minutes.
Importantly, the videos had to offer high affective variability (over time) with dynamic storytelling, containing
many events to contrast the more stable emotion-inducing or uneventful videos used in the Selection phase. The
stimulus needed to cover multiple quadrants of the affect grid, avoiding restriction to just one. Additionally, no
language or text should be present to prevent language barriers or emotional labeling. For the sake of
immersion and continuity, the videos had to either be fully computer-generated or fully based on real-life
captures, with no mixing of the two. As no single video met all these criteria, we created a continuous sequence
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
22
of videos as a compromise to best meet our requirements and provide the necessary immersive and affectively
rich experience for the study (Table 3).
Table 3. Detailed descriptions of the four 360° VR videos used in Study 2: Evaluation.
Temporal
sequence
Video label
Description
Purpose
0:00 - 1:30
5:43 - 7:13
13:43 - 15:13
21:34 - 23:04
Scifi
(Video 1)
Depicting the inside of a space
station, a still video with no events.
Served as a baseline; shown at the start and
interspersed between other videos.
1:30 - 5:43
Invasion!
(Video 2)
Short animated film (Baobab
Studios), where a rabbit encounters
two aliens on an iced lake.
To provide a general emotional stimulus with
moderate changes throughout.
7:13 - 13:43
Asteroids!
(Video 3)
Short animated film (Baobab
Studios), featuring two aliens
navigating through a cloud of
asteroids in a spaceship.
To deliver a dynamic emotional experience
spanning a range of intensities and emotional
tones.
15:13 - 21:34
Underwood
(Video 4)
Guided walk through an
underground bunker where strange
biological experiments occurred.
Based on the iVR environment
Underwood (McCall et al., 2022).
To induce a sense of ambiguous threat, fear, or
anxiety, consistent with previous research
methodologies (McCall et al., 2016; Legrand &
Allen, 2023).
Video 1 and Video 4 are custom-made and accessible at the link:https://doi.org/10.17617/3.QPNSJA. Video 2 is available at the link
https://youtu.be/SZ0fKW5PttM?si=G23Zm60xveVJUBWy; Video 3 is available at the link
https://youtu.be/jEUnBEKEKCs?si=UInxqexWmyITI816.
Feedback for Continuous Rating (CR)
The feedback used in the Evaluation study and subject of the validation was the Flubber already described in
the section related to feedback for CR of the Selection study (Figure 1). Similarly, the sampling frequency was
set to 20 Hz and the possible range for all ratings on both dimensions was [-1 1]. To enhance the user
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
23
experience, some modifications were implemented. Notably, to ensure visibility across various backgrounds,
we incorporated an outline and a halo.
Study design
The observational study involved a continuous sequence of videos carefully selected to encompass the entire
spectrum of the valence-arousal space (including high arousal - negative valence, high arousal - positive
valence, low arousal - negative valence, and low arousal - positive valence). The sequence of videos was 1384
seconds long.
Experimental design and measured variables
The experimental design is illustrated in Figure 3. The total duration of the experiment was approximately one
hour (see Supplementary Material S3 for the detailed session script).
Figure 3. Experimental design of Study 2: Evaluation. The experiment lasted approximately one hour and
included a training phase followed by a trial involving a 23-minute sequence of 360° stereoscopic videos.
Participants continuously rated their affective experience using the Flubber feedback and provided a summary
rating at the end of the video. Video 1 (Scifi), serving as a neutral baseline, was repeated between the
emotionally evocative videos. Before and after the experiment, participants completed surveys, including the
SSQ, MAIA, TAS, and questions on User Experience and Interference.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
24
The in-iVR session comprised a training phase and a trial involving a 23-minutes video, during which
participants continuously rated their affective experience using the Flubber feedback and provided a SR at the
end of the video.
During the training phase, participants were shown the LP video from the Selection study combined with the
Flubber, followed by the SR. The video presentation, allowing participants to become accustomed to the
feedback, was preceded by a 3-second fixation cross. The training concluded with a screen summarizing the
instructions for reporting affective states for the following trial session.
The experiment began with a 3-second fixation cross, followed by an uninterrupted sequence of videos where
participants continuously rated their affective states visualized by the Flubber, and concluded with a screen for
SR. Four 360° stereoscopic videos were used for the experiment. Table 3 shows the details of the selected
videos. The videos were linked together by 2s fade-in and fade-out to create a continuous sequence. Video 1,
chosen for its neutral content to serve as a baseline, was presented at the beginning and in between each of the
three more emotionally engaging videos, making it the only video repeated four times. This baseline placement
is intended to provide a consistent reference point, which could be especially valuable for future studies
involving physiological measures, such as heart rate or skin conductance, to assess the physiological impact of
emotional stimuli.
A video of an exemplary participant performing CR during an excerpt of the video sequence (Scifi and
Underwood videos) is available here: https://doi.org/10.17617/3.QPNSJA.
Statistical analysis
To analyze associations between the different CRis and SR, Pearson correlations were initially conducted
between each CRi and the corresponding SR. Subsequently, to compare correlations between the CR mean and
SR for the Flubber feedback between the Selection and Evaluation studies, the correlation coefficients from the
two studies were subjected to Fischer r-to-z transformation. This allowed the statistical assessment of the
difference between two independent correlation coefficients.
To compare the variability of the CRs between the Selection and Evaluation studies, the standard deviation (std)
of the CRs for each participant, each video and each affective dimension was computed and entered into
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
25
ANOVAs with factor video (5 levels: Selection HN, Selection HP, Selection LN, Selection LP, Evaluation).
Post-hoc t-tests (with Bonferroni correction) were then employed to explore specific differences between the
videos.
To compare responses between the Selection and Evaluation phases, a Type 3 ANOVA with the factor study
(two levels: Selection, Evaluation) was performed. Post-hoc t-tests were then employed to explore specific
differences within studies.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
26
Results
Participants
Table 4 shows the baseline characteristics of the participants for both studies. In Study 1: Selection 51
participants were included, while Study 2: Evaluation comprised 84 participants.
The subset of participants in a standing position in Study 2: Evaluation consisted of 21 subjects. Sub-analyses
investigating differences in body posture are reported in the Supplements (see Table S8).
The results presented below are from analyses performed on the seated participants only (Study 1: Selection N
= 51; Study 2: Evaluation N = 62).
Table 4. Description of the samples for Study 1: Selection and Study 2: Evaluation.
Study 1: Selection
Study 2: Evaluation
Berlin (N = 29)
Torino (N = 22)
Berlin (N =
21+21)
Torino (N =
41)
N (%)/mean
(std)
N (%)/mean
(std)
N (%)/mean (std)
N (%)/mean
(std)
Age
25.8 (5.8)
30.0 (6.0)
28.2 (5.2)
31.4 (7.6)
Gender
Male
15 (52%)
7 (32%)
21 (50%)
10 (24%)
Female
14 (48%)
15 (68%)
20 (48%)
31 (76%)
Non-binary
0 (0%)
0 (0%)
1 (2%)
0 (0%)
Education
Unknown
12 (41%)
Middle school
3 (10%)
0 (0%)
2 (5%)
0 (0%)
High school
10 (35%)
7 (32%)
10 (24%)
8 (20%)
Bachelor
4 (14%)
3 (14%)
19 (45%)
5 (5%)
Master
0 (0%)
8 (36%)
9 (21%)
24 (59%)
Doctorate
0 (0%)
4 (18%)
2 (5%)
7 (17%)
VR experience score
1.98 (0.71)
1.98 (0.71)
1.93 (0.64)
1.8 (0.69)
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
27
Position
Seated
29 (100%)
22 (100%)
21 (50%)
41 (100%)
Standing
0 (0%)
0 (0%)
21 (50%)
0 (0%)
MAIA
Noticing
n/a
n/a
3.49 (0.76)
3.4 (0.75)
Not-Distracting
n/a
n/a
2.69 (0.9)
2.52 (0.63)
Not-Worrying
n/a
n/a
2.55 (0.48)
2.18 (0.55)
Attention Regulation
n/a
n/a
2.84 (0.85)
2.71 (0.79)
Emotional Awareness
n/a
n/a
3.46 (1.0)
3.45 (0.94)
Self-Regulation
n/a
n/a
2.93 (1.13)
2.64 (1.02)
Body Listening
n/a
n/a
2.66 (1.1)
2.82 (0.99)
Trust
n/a
n/a
3.44 (1.29)
3.2 (1.23)
TAS
Difficulty Describing
Feelings
n/a
n/a
11.9 (4.5)
11.66 (4.14)
Difficulty Identifying Feeling
n/a
n/a
16.55 (5.64)
14.83 (4.45)
Externally-Oriented Thinking
n/a
n/a
17.43 (4.04)
14.54 (4.15)
TAS total score
n/a
n/a
45.88 (10.34)
41.02 (9.63)
Study 1: Selection
CR and SR
CR mean (over timepoints) and SR are described in Table 5 and illustrated in Figure 4. There were no
significant differences between test sites of both CR mean (valence: t(610) = 1.27, p= .204; arousal: t(610) =
-0.25, p= .802) and SR (valence: t(814) = 1.30, p= .193; arousal: t(814) = -0.92, p= .357) for both affective
dimensions. Of note, CR means and SRs for each video were consistent with the quadrants they were selected
for. Mean CR time-series across participants are shown in Figure 7.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
28
Table 5. Descriptive statistics of CR mean and SR for each feedback and each video in Study 1: Selection.
Valence
CR mean
SR
mean
std
min
max
mean
std
min
max
HN
Baseline
-0.64
0.35
-1
0.58
Flubber
-0.72
0.35
-1
0.75
-0.74
0.28
-1
0.45
Grid
-0.6
0.28
-0.95
0.2
-0.72
0.29
-1
0.26
Proprioceptive
-0.64
0.31
-1
0.25
-0.72
0.32
-1
0.44
HP
Baseline
0.3
0.58
-1
1
Flubber
0.41
0.61
-0.93
1
0.34
0.6
-0.93
1
Grid
0.23
0.53
-0.87
0.97
0.26
0.59
-1
1
Proprioceptive
0.39
0.63
-1
1
0.29
0.6
-1
1
LN
Baseline
-0.29
0.42
-1
0.88
Flubber
-0.31
0.51
-1
1
-0.31
0.43
-1
1
Grid
-0.31
0.33
-0.92
0.77
-0.35
0.38
-1
0.94
Proprioceptive
-0.45
0.51
-0.99
1
-0.31
0.42
-1
1
LP
Baseline
0.67
0.29
-0.34
1
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
29
Flubber
0.74
0.33
-0.95
1
0.74
0.29
-0.66
1
Grid
0.58
0.25
-0.2
0.98
0.63
0.34
-0.65
1
Proprioceptive
0.76
0.24
0.05
1
0.74
0.23
-0.04
1
Arousal
CR mean
SR
mean
std
min
max
mean
std
min
max
HN
Baseline
0.52
0.45
-1
1
Flubber
0.31
0.55
-0.98
1
0.64
0.31
-0.54
1
Grid
0.43
0.32
-0.56
0.93
0.6
0.35
-0.28
1
Proprioceptive
0.33
0.48
-0.86
0.97
0.57
0.42
-0.62
1
HP
Baseline
0.58
0.31
-0.48
1
Flubber
0.55
0.42
-0.43
1
0.67
0.26
-0.13
1
Grid
0.51
0.3
-0.65
0.99
0.59
0.31
-0.6
1
Proprioceptive
0.5
0.44
-0.89
1
0.6
0.31
-0.39
1
LN
Baseline
-0.36
0.43
-1
0.91
Flubber
-0.7
0.45
-1
0.94
-0.46
0.42
-1
0.76
Grid
-0.3
0.41
-1
0.85
-0.33
0.45
-1
0.85
Proprioceptive
-0.65
0.41
-1
1
-0.4
0.44
-1
0.85
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
30
LP
Baseline
-0.21
0.49
-1
0.79
Flubber
-0.63
0.47
-1
0.9
-0.36
0.5
-1
0.96
Grid
-0.16
0.46
-1
0.94
-0.13
0.52
-1
0.9
Proprioceptive
-0.38
0.52
-1
0.89
-0.2
0.56
-1
0.87
Legend: CR: continuous rating; SR: summary rating; HP: high arousal - positive valence; LN: low arousal - negative valence; LP:
low arousal - positive valence.
Figure 4. Overview of the ratings in Study 1: Selection. A. Means across participants of the CR mean (over
timepoints) for each feedback (color coded; Grid: red, Flubber: green, Proprioceptive: blue, Baseline: purple)
and each video (shape coded). Density plots and individual dots are also shown for each feedback and each
video. B. SRs for each feedback and each video. Density plots and individual dots are also shown for each
feedback and each video. CR means and SRs were consistent with the quadrants the videos were selected for.
HN: high arousal - negative valence; HP: high arousal - positive valence; LN: low arousal - negative valence;
LP: low arousal - positive valence.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
31
For each affective dimension (valence, arousal), we computed Pearson correlations between SR and each CRis
across all feedback options, videos, and participants. CR mean was the most correlated to the SR (valence:
r(CR_mean-SR) = .926, p< .001; arousal: r(CR_mean-SR) = .873, p<.001, see Table 6). Therefore, for the
following analyses comparing the different feedback options, the focus was on the CR mean. More details and
other CRis can be found in the Supplementary Material.
Feedback comparison
CR mean-SR correlation
The CR means were equivalent between Flubber and Proprioceptive (TOST approach, all p> .05 for difference;
all p< .025 for equivalence, see Table S4 in Supplementary Material). There was a significant difference
between Grid and the other two feedback options (i.e. Flubber and Proprioceptive) for the dimension of arousal
(all t> 5.3; all p< .001; see Table S4 in Supplementary Material), but not for valence. Additional feedback
comparison of CR std, skewness and kurtosis can be found in Table S4 in Supplementary material.
The SR for all feedback options and during Baseline were equivalent for both affective dimensions (TOST
approach, all p> .05 for difference, all p< .010 for equivalence, see Table S5 in Supplementary Material).
Finally we compared the correlations between CR mean and SR for each feedback and affective dimension. For
valence, there were no significant differences in CR mean-SR correlation between feedback options (all z< 1.5;
all p> .125, see Figure 5 and Table S6 in Supplementary Material). For arousal, the correlation for Grid was
significantly higher than for the other two feedback options (all z> 2.2, all p< .025, see Figure 5 and Table S6
in Supplementary Material).
Table 6. Pearson correlations between SR and each CR indices, across all feedback options, videos and
participants, in Study1: Selection.
Pearson r(CRi-SR)
Valence
Arousal
r
p
r
p
cr_last
.903
< .001
.852
< .001
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
32
cr_mean
.926
< .001
.873
< .001
cr_median
.918
< .001
.865
< .001
cr_mode
.831
< .001
.721
< .001
cr_max
.726
< .001
.793
< .001
cr_min
.767
< .001
.609
< .001
cr_std
-.089
< .001
.227
< .001
cr_cv
-.009
.833
.091
.024
cr_range
-.147
< .001
.315
< .001
cr_iqr
-.051
.21
.154
< .001
cr_skew
-.539
< .001
-.471
< .001
cr_kurtosis
.019
.633
-.138
< .001
cr_auc
.922
< .001
.871
< .001
Legend: last: last rating; cv: coefficient of variation; iqr: interquartile range; skew: skewness; auc: area under the curve.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
33
Figure 5. Correlation between continuous rating (CR) mean and summary rating (SR). A. Valence dimension.
SR plotted against CR mean for each feedback (color coded; Grid: red, Flubber: green, Proprioceptive: blue)
and each video (shape coded), both for the Selection and Evaluation studies. There were no significant
differences in CR mean-SR correlation between feedback options (all z < 1.5; all p > .125), in the Selection
study. B. Arousal dimension. The correlation for Grid was significantly higher than for the other two feedback
options (all z > 2.2, all p < .025), in the Selection study. HN: high arousal - negative valence; HP: high arousal
- positive valence; LN: low arousal - negative valence; LP: low arousal - positive valence.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
34
Questionnaires
Scores to each questionnaire are described in Table 7 and illustrated in Figure 6.
Table 7. Descriptive statistics of each questionnaire for each feedback in Study 1: Selection.
Baseline
Flubber
Grid
Proprioceptive
max
mean
min
std
max
mean
min
std
max
mean
min
std
max
mean
min
std
Emotion
Represen
tation
6
4.27
1
1.47
6
4.78
2
0.99
6
4.47
1
1.27
6
4.16
1
1.3
Distracti
on
3
0.47
0
0.81
5
1.57
0
1.81
6
1.47
0
1.65
6
1.57
0
1.8
Sens of
Presence
6
4.21
1
1.16
6
4.21
1
1.14
6
4.31
1
1.08
6
4.06
0.5
1.28
Satisfacti
on
6
4.67
2
1.14
6
4.92
1
1.23
6
4.94
2
1.05
6
4.22
0
1.55
System
Usability
Scale
99.96
89.32
52.36
11.23
99.96
82.18
33.32
15.76
99.96
87.59
49.98
10.83
99.96
75.13
42.84
16.05
Differences in Distraction, Sense of Presence, Usability, Satisfaction, and Emotion Representation scores were
tested comparing participants’ responses to the different questionnaires between the feedback options and
including Baseline (i.e, no CR, only SR; see Figure 6 and Table S7 in Supplementary Material).
For the Distraction questionnaire, there was a significant effect of feedback (F(3) = 6.8, p< .001), where no CR
during videos (i.e, Baseline condition) was significantly less invasive than CR during videos (i.e., for all of the
feedback options; all t> 4.0; all p< .001). Importantly, there were no significant differences between the
feedback options (all t< 0.33; all p> .992). For Sense of Presence, there was no significant effect of feedback
(F(3) = 1.23 ,p= .301). The responses were equivalent for all feedback options as well as the Baseline
condition (TOST approach, all p> .310 for difference, all p< .020 for equivalence). For Usability, there was a
significant effect of feedback (F(3) = 17.9, p< .001). The SUS scores were the highest for Grid and Baseline
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
35
(all |t| > 2.5, all p< .020). The SUS score was also higher for Flubber than Proprioceptive (t(50) = 3.0, p =
.010). For Emotion Representation, the score was significantly higher for Flubber than for Proprioceptive (t(50)
= 3.6, p< .001) and equivalent between Grid, Proprioceptive and Baseline (TOST approach, all p> .090 for
difference, all p< .050 for equivalence). For Satisfaction, there was a significant effect of feedback (F(3)= 4.87,
p= .003). The scores were equivalent for Flubber and Grid (TOST approach, p= .920 for difference, all p<
.010 for equivalence), while scores for Proprioceptive were lower than for Flubber and Grid (all t< 3.01, all p>
.010). Finally, Flubber was ranked first as preferred feedback (62% of participants) and Grid second (27% of
participants). Figure 6 shows a radar plot illustrating the mean scores of questionnaires across participants for
each feedback.
Figure 6. Questionnaires mean scores across participants for each feedback (color coded; Grid: red, Flubber:
green, Proprioceptive: blue, Baseline: purple), in Study 1: Selection. Scores are normalized to 0-100 for
visualization purposes. Distraction and Sense of Presence were part of the Interference assessment of the
feedback options, while Usability, Satisfaction and Emotion Representation were part of the User Experience
assessment.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
36
Study 2: Evaluation
Sample
The characteristics of the sample that participated in the Evaluation study are illustrated in Table 4. Results of
MAIA mean scores indicated that the participants generally had an overall good ability to recognize and
understand bodily sensations related to emotions. TAS-20 scores are within the non-alexithymic range.
CR and SR
CR mean (over timepoints) and SR are described in Table 8. There was no significant effect of test site
(valence: F(1) = 1.78, p= .187; arousal: F(1) = 0.40, p= .528) or gender (valence: F(2) = 1.87, p= .162;
arousal: F(2) = 1.24, p= .294) on CR mean for both affective dimensions.
There was no significant effect of test site (valence: F(1) = 0.10, p= .755; arousal: F(1) = 1.93, p= .169) or
gender (valence: F(2) = 0.91, p= .406; arousal: F(2) = 0.18, p= .835) on SR for both affective dimensions.
Table 8. Descriptive statistics of CR mean and SR for each test site and each body posture in Study 2:
Evaluation.
Valence
CR mean
SR
mean
std
min
max
mean
std
min
max
TOR
seated
0.05
0.24
-0.36
0.75
0.09
0.42
-1
1
BER
seated
0.05
0.21
-0.46
0.3
0.08
0.32
-0.42
1
standing
-0.04
0.21
-0.37
0.51
-0.02
0.33
-0.48
0.71
TOR+BER
seated
0.05
0.23
-0.46
0.75
0.08
0.39
-1
1
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
37
Arousal
CR mean
SR
mean
std
min
max
mean
std
min
max
TOR
seated
-0.09
0.4
-0.83
0.55
0.39
0.37
-0.53
0.99
BER
seated
-0.38
0.38
-0.96
0.52
0.2
0.34
-0.42
0.7
standing
0.02
0.28
-0.79
0.47
0.33
0.3
-0.44
0.68
TOR+BER
seated
-0.19
0.41
-0.96
0.55
0.33
0.37
-0.53
0.99
Legend: CR: continuous rating; SR: summary rating; TOR: Torino; BER: Berlin
Similar to the Selection study, Pearson correlations between SR and CRis were computed for each affective
dimension (Table 9). We found that for valence the CR mean exhibited the highest correlation with the SR
(r(CR_mean-SR) = .186, p= .147), while for arousal it was the CR std (r(CR_std-SR) = .591, p< .001).
Table 9. Pearson correlations between SR and each CR indices in Study 2: Evaluation.
Pearson r(CRi-SR)
valence
arousal
r
p
r
p
cr_last
-.012
.928
.143
.266
cr_mean
.186
.147
.559
< .001
cr_median
.052
.688
.499
< .001
cr_mode
.182
.156
.398
.001
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
38
cr_max
NA
NA
.378
.002
cr_min
-.092
.476
NA
NA
cr_std
-.006
.961
.591
< .001
cr_cv
-.133
.303
.185
.149
cr_range
.092
.476
.378
.002
cr_iqr
-.069
.596
.493
< .001
cr_skew
-.113
.384
-.522
< .001
cr_kurtosis
.112
.388
-.400
.001
cr_auc
.175
.173
.549
< .001
Legend: last: last rating; cv: coefficient of variation; iqr: interquartile range; skew: skewness; auc: area under the curve.
Questionnaires
Scores to each questionnaire are described in Table 10.
Table 10. Descriptive statistics of each questionnaire in Study 2: Evaluation.
mean
std
min
max
Emotion Representation
5.16
1
2.71
7
Distraction
2.48
1.47
1
7
Sense of Presence
4.93
1.24
1.43
7
Satisfaction
5.77
1
2
7
System Usability Scale
81.19
13.26
28.56
99.96
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
39
Comparison Selection vs. Evaluation
CR variability
Figure 7 presents the time-series of the CR averaged across participants for both affective dimensions and both
phases. As anticipated, there was minimal variability in the ratings during the Selection study, whereas higher
variability was observed during the Evaluation study.
Figure 7. Comparison of time-series of continuous ratings (CRs, possible range for both affective dimensions:
[-1 1]). A. Study 1: Selection. Average across participants for each feedback (color coded; Grid: red, Flubber:
green, Proprioceptive: blue) and each video (shape coded), for valence (top) and arousal (bottom). CRs show
low affective variability and are in line with the videos’ quadrants. B. Study 2: Evaluation, with Flubber only.
Average across participants for valence (top) and arousal (bottom). CRs show high affective variability over
time. Colored lines: individual participants; green line: mean across participants.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
40
To compare the variability of the CRs with the Flubber feedback between the two studies, ANOVAs (type 3)
with factor video (five levels: Selection HN, Selection HP, Selection LN, Selection LP, Evaluation Sequence)
were performed for CR std, for each affective dimension. There was a significant effect of video (valence: F(4)
= 93.0, p< .001; arousal: F(4) = 79.3, p< .001) on CR std for both affective dimensions. Post-hoc t-tests
revealed higher CR variability for Evaluation Sequence compared to the four Selection videos, for both
affective dimensions (all |t| > 10.4, p< .001, see Figure 8 and Table S9 in Supplementary Material)
Figure 8. Comparison of variability of continuous ratings (CRs) between Stud1: Selection and Study 2:
Evaluation. Standard deviation (std) of the CRs for each video in the two studies, for valence (left) and arousal
(right). CRs during the Evaluation Sequence showed higher variability than during the four Selection videos
(all |t| > 10.4, p < .001, see Supplementary Material) . HN: high arousal - negative valence; HP: high arousal -
positive valence; LN: low arousal - negative valence; LP: low arousal - positive valence.
CR mean-SR correlation
The highest CRi-SR correlation for Flubber during Study 1: Selection (valence and arousal: r(CR_mean-SR))
was significantly lower than the highest CRi-SR correlation during Study 2: Evaluation (valence:
r(CR_mean-SR), arousal: r(CR_std-SR)), for both affective dimensions (valence: z= 9.5, p< .001; arousal: z=
4.5, p< .001).
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
41
Questionnaires
To compare questionnaire responses to Flubber between the two studies, an ANOVA (type 3) with the factor
study (two levels: Selection, Evaluation) was conducted. Subsequently, post-hoc t-tests were employed to
explore specific differences within studies. Notably, there was no significant effect of study on the scores for
Distraction, Sense of Presence, Usability, and Satisfaction (all F< 1.53, all p> .220, see Table S10 in
Supplementary Material). For Emotion Representation, we found a significant effect of study (F(1) = 10.95, p<
.001), with higher responses for Study 1: Selection than for Study 2: Evaluation (t(107) = 3.31, p< .001).
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
42
Discussion
Summary of main findings
The aim of this study was to develop and empirically evaluate the AffectTracker, a tool to collect continuous
valence and emotional arousal ratings (Sabat et al., 2024) during a dynamic affective experience,
simultaneously and with minimal interference (Bakker et al., 2016). It comprises three feedback options with
adjustable features: a simplified affect grid (Grid; Russell, 1989), an abstract pulsating visual variant (Flubber),
and a proprioceptive variant without visual feedback. We empirically evaluated the AffectTracker in two studies
with 360-degree videos in iVR as stimuli and using the HTC Vive Pro controller’s touchpad (Riva et al., 2007;
Chirico & Gaggioli, 2019).
In Study 1 (“Selection”), the three feedback options were compared to each other as well as to a control
condition without continuous ratings. Overall, we find that (1) CR in real-time capture variance of affective
experience with (2) high user experience, while (3) minimally interfering with the experience itself. Based on
the quantitative results and the qualitative assessment, Flubber was selected for Study 2 (“Evaluation”), which
included a longer (23-min) and more variable iVR stimulus (stereoscopic 360° video). Overall, the results from
Study 1 were confirmed in Study 2, indicating Flubber’s applicability for both shorter and repeated as well as
longer and more variable videos. Study 2 differed from Study 1 by (1) involving only one (Flubber) instead of
four conditions (Grid, Flubber, Proprioceptive, Baseline) and by (2) stimulation with one longer experience
with higher affective variability instead of four repeated videos with less affective variability. These differences
in setup led to two notable differences in the results: first, participants in Study 2 rated the Flubber feedback
lower in terms of how well it represented their “inner emotions”. Second, the correlation between the SR and
the CR mean was lower in Study 2, while the standard deviation emerged as the most correlated CR index. This
suggests that CR could capture nuances of affective experience that a single SR may miss, particularly during
extended and complex emotional stimuli.
AffectTracker provides flexibility by allowing researchers to choose between feedback options, depending on
the specific needs of their study—whether a more engaging, dynamic representation (like the Flubber) or a
more precise, structured visualization (as the Grid) is preferred. Its main contribution is to collect CR of
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
43
affective experience - compared to the more traditional SR - with minimal interference and high user
experience.
Comparing continuous and summary ratings
Both the mean and the standard deviation of CRs were significantly correlated with the SR across participants.
This suggests that participants integrate or implicitly consider both aspects of their affective experience, that is,
the central tendency and the dispersion, when providing SRs rather than the peak or final moments
(Fredrickson, 2000; Levine & Safer, 2002; Kaplan et al., 2016).
The correlation between CR and SR was more pronounced for emotional arousal than for valence. This may
indicate that emotional arousal, being more dynamic, is easier for participants to track and summarize than
valence, which may involve more complex, context-dependent appraisals. This observation can be better
understood in light of theoretical and empirical insights into the interplay between arousal and valence. For
instance, emotional arousal has been shown to modulate valence by enabling greater variability in valence
during high-arousal states (Petrolini & Viola, 2020). This suggests that arousal not only enhances the salience
of stimuli but also shapes how valence is experienced and reported. Such a dynamic is consistent with findings
indicating that arousal facilitates attentional shifts and enhances visual processing of salient targets (Sutherland
and Mather, 2018; Petrolini & Viola, 2020). Together, these insights underscore the distinct but interrelated
roles of arousal and valence in shaping affective experiences, highlighting the importance of capturing both
dimensions and considering their unique characteristics when interpreting CR data (Kuppens et al., 2013).
For the 1-min videos with low affective variability in Study 1, the mean of CRs were highly correlated to the
SRs, indicating that CR with AffectTracker (irrespective of the feedback options) can effectively capture
relatively stable affective experiences, that is, experiences that are rather constant over time and whose minimal
variance can also be adequately indexed by single SRs.
As compared to Study 1, the correlation between the mean of CRs and SR was lower for the 23-min experience
with high affective variability in Study 2, where the SR was more strongly associated with CR dispersion (i.e.,
std). This indicates that for longer and more complex stimuli, the average of CRs may become less
representative of the overall experience, likely due to richer and more dynamic emotional fluctuations that are
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
44
difficult to integrate into a single rating. This finding underscores the importance of capturing affective
fluctuations over time, that is, CRs during dynamic stimulation capture variance that SRs alone may miss.
Overall, our results support the value of using CRs as a complementary method to SR. While SR remains
effective and practical for simpler, brief, and less variable stimuli, like static images, the AffectTracker can
address the limitations of SR in capturing affective dynamics during complex, continuous, and immersive
experiences. CR provides a richer and more granular understanding of how affective states evolve over time,
especially in tasks that require ongoing engagement and feature varying emotional content. By integrating both
CR and SR, researchers can obtain a more comprehensive picture of affective responses, tailored to the
complexity of the stimuli being studied. Time-resolved ratings, for example of subjective experience, also
enable the joint analyses with other time series, for example, collected with electrophysiological methods such
as EEG or ECG.
Interference and User Experience
One challenge of real-time CR is that it could interfere with the affective experience (Lieberman et al., 2011).
One of our goals was therefore to assess the levels of Interference and User Experience of the AffectTracker.
First, we leveraged the advantages of peripheral feedback (Bakker et al., 2016) by positioning the visual
feedback at the center-bottom of the field of view. Our empirical assessment showed that in general,
continuously rating with the AffectTracker was not interfering, irrespective of the feedback options. While the
subjective ratings of Distraction were - expectedly - higher during CR than no ratings (i.e., baseline), the
post-stimulation SRs as well as the Sense of Presence were equivalent when continuously rating compared to
no rating, suggesting that the affective experience overall was not altered. This may have been facilitated by the
excellent Usability of the Grid and Flubber (grade A, SUS score > 80.3; Sauro & Lewis, 2016). The
Proprioceptive feedback had a significantly lower Usability and Satisfaction, suggesting that rating without
feedback could be more difficult/demanding and that the Grid and Flubber feedback helped.
Overall our results suggest that the Grid and Flubber can be used interchangeably, depending on the study’s or
experimenter’s priority and the target users. While the Grid offers a more classical, precise and structured
visualization, the Flubber offers a more intuitive, engaging, and dynamic representation. The ratings on the
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
45
arousal (but not valence) dimension were significantly different for Grid than for Flubber (and Proprioceptive).
On one hand, the ratings with Grid showed a stronger tendency towards the center (i.e., 0,0 coordinate) of the
affect grid. While for valence, this center (middle of an unpleasant-to-pleasant gradient) represents a neutral
state, this is less clear for the dimension of arousal (medium on a low-to-high gradient). Therefore, it could be
that the arousal ratings may be biased by the visual representation of the grid. On the other hand, Flubber
ratings showed a stronger tendency towards the extreme. As opposed to Grid, the limits of the ratings are not
visualized for Flubber and may have been prone to ceiling effects. Additionally, Emotion Representation was
rated higher for Flubber than for Grid. This difference could be attributed to the level of dynamicity between
the two feedbacks. While Grid relies on the movement of a dot within a static grid to abstractly represent the
transient affective state, Flubber dynamically changes as a whole. This dynamic behavior could lead to
Flubber’s affect-visual mapping being more intuitive and easier to interpret, particularly for “naive” participants
who may lack prior experience or expertise.
Comparing the AffectTracker to other tools
The findings from both studies underscore the promise of the AffectTracker as an effective tool for capturing
continuous, real-time affective experiences with minimal interference—marking a notable advancement beyond
SR. Our design stands out from traditional affect-tracking concepts, such as Likert scales and self-assessment
tools like the SAM (Bradley & Lang, 1994), as well as continuous models using 2D affect grids (Russell,
1989), by incorporating features already present in the technical (e.g., human-computer interaction) literature
(McClay et al., 2023; Xue et al., 2021; Sharma et al., 2017). AffectTracker, particularly with the Flubber
feedback, demonstrated a balance between intuitiveness, good user experience and minimal interference,
effectively capturing the intricate dynamics of valence and arousal within iVR environments. Our results
contribute to the growing body of research advocating for methods that seamlessly integrate into the user’s
experience, facilitating precise, real-time emotional assessments without interfering with the immersive nature
of VR (Riva et al., 2007; Chirico and Gaggioli, 2019).
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
46
Limitations and implications for future research
Several limitations should be acknowledged in this study. First, one of the visual features of the Flubber
feedback, oscillation frequency - used to map the arousal dimension, has a range within physiologically
possible pulsation frequency (e.g., heart rate). While this could have added to the intuitiveness of the feedback,
this could also act as a ‘fake’ biofeedback and the rating could influence heart rate, or vice-versa. This should
be further examined, for example, by recording participants’ heart activity during the ratings and investigating
potential entrainment.
Second, while the AffectTracker tool and its Flubber visual feedback were validated using the touchpad of the
HTC Vive Pro headset, the applicability to other input devices, such as joysticks, and compatibility with
different VR headsets remain untested. Although technically feasible with various OpenXR-compatible VR
equipment, further validation across different platforms is warranted to ensure broader applicability.
Additionally, the interactivity of the stimulus was limited to head movement and full body rotation, restricting
participants to passive viewing - as compared to more interactive games or tasks. This constraint may have
influenced engagement levels and user experience perceptions. Furthermore, the generalizability of the findings
may be limited by the relatively young and healthy participants included in the study. Future research could
address these limitations by expanding validation to diverse input devices and age groups, thus enhancing the
robustness and applicability of the AffectTracker.
The data collected in these studies offer a valuable opportunity to better explore the dynamics of affective
states, with a particular focus on subjective experience. Future analyses could examine how fluctuations in CR
correspond to specific salient events within the stimuli, investigating different features. Further, exploring the
distance and angle between valence and arousal dimensions within the affective space could provide insights
into their joint dynamics and potential cross-influences. Additionally, inter-individual differences in
interoceptive and emotional awareness could be analyzed to determine whether these factors influence
sensitivity to change-points or whether the Flubber feedback remains consistently effective across varying
levels of awareness.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
47
In the future, integrating the AffectTracker with physiological recordings could provide insights into the
relationship between subjective experience and brain-body activities. This would enable the study of affective
phenomena in healthy subjects and help identify discrepancies in psychopathological contexts, such as
interoceptive challenges or difficulties in emotional regulation.
Conclusion
We developed and evaluated the AffectTracker, an open-source tool designed to continuously and
simultaneously capture valence and arousal ratings in real-time. The AffectTracker offers flexibility, enabling
researchers to select between visual feedback options that could suit their study context, whether a more
engaging, dynamic representation - such as the Flubber - or a precise, structured visualization - such as the
Grid.
The AffectTracker represents a novel approach to study affective dynamics with minimal interference, while
effectively capturing the nuances of subjective affective experience. This tool broadens the scope for
investigating the intersection of subjective experience and physiological processes within immersive
environments, opening new avenues for research into real-time emotional and physiological interactions.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
48
References
Allen, M. (2023). The Tell-Tale Heart: Interoceptive Precision and Ecological Fear Experiences. OSF.
https://doi.org/10.31234/osf.io/ngamx
Bakker, S., Hausen, D., & Selker, T. (Eds.). (2016). Peripheral Interaction. Springer International Publishing.
https://doi.org/10.1007/978-3-319-29523-7
Bengfort, T., Hayat, T., & Göttel, T. (2022). Castellum: A participant management tool for scientific studies.
Journal of Open Source Software, 7(79), 4600. https://doi.org/10.21105/joss.04600
Berntsen, D., & Rubin, D. C. (2006). Emotion and vantage point in autobiographical. Cognition and Emotion,
20(8), 1193–1215. https://doi.org/10.1080/02699930500371190
Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: The self-assessment manikin and the semantic
differential. Journal of Behavior Therapy and Experimental Psychiatry, 25(1), 49–59.
https://doi.org/10.1016/0005-7916(94)90063-9
Bradley, M. M., & Lang, P. J. (2007). The International Affective Picture System (IAPS) in the study of
emotion and attention. In Handbook of emotion elicitation and assessment (pp. 29–46). Oxford University
Press.
Brooke, J. (1996). SUS: A quick and dirty usability scale. Usability Evaluation in Industry.
https://rickvanderzwet.nl/trac/personal/export/104/liacs/hci/docs/SUS-questionaire.pdf
Brooke, J. (2013). SUS: A retrospective. Journal of Usability Studies, 8(2).
http://uxpajournal.org/wp-content/uploads/sites/7/pdf/JUS_Brooke_February_2013.pdf
Cernea, D., Weber, C., Ebert, A., & Kerren, A. (2015). Emotion-prints: Interaction-driven emotion visualization
on multi-touch interfaces. Visualization and Data Analysis 2015, 9397, 82–96.
https://doi.org/10.1117/12.2076473
Chirico, A., & Gaggioli, A. (2019). When Virtual Feels Real: Comparing Emotional Responses and Presence in
Virtual and Natural Environments. Cyberpsychology, Behavior, and Social Networking, 22(3), 220–226.
https://doi.org/10.1089/cyber.2018.0393
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
49
Damasio, A. R., Grabowski, T. J., Bechara, A., Damasio, H., Ponto, L. L. B., Parvizi, J., & Hichwa, R. D.
(2000). Subcortical and cortical brain activity during the feeling of self-generated emotions. Nature
Neuroscience, 3(10), 1049–1056. https://doi.org/10.1038/79871
Diedenhofen, B., & Musch, J. (2015). cocor: A Comprehensive Solution for the Statistical Comparison of
Correlations. PLOS ONE, 10(4), e0121945. https://doi.org/10.1371/journal.pone.0121945
Doherty, J. M., Belletier, C., Rhodes, S., Jaroslawska, A., Barrouillet, P., Camos, V., Cowan, N.,
Naveh-Benjamin, M., & Logie, R. H. (2019). Dual-task costs in working memory: An adversarial collaboration.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(9), 1529–1551.
https://doi.org/10.1037/xlm0000668
Duffy, E. (1957). The psychological significance of the concept of “arousal” or “activation.” Psychological
Review, 64(5), 265–275. https://doi.org/10.1037/h0048837
Ekkekakis, P. (2013). The Measurement of Affect, Mood, and Emotion: A Guide for Health-Behavioral
Research. Cambridge University Press.
Evans, J. S. B. T. (2010). Intuition and Reasoning: A Dual-Process Perspective. Psychological Inquiry.
https://www.tandfonline.com/doi/abs/10.1080/1047840X.2010.521057
Feng, C., Bartram, L., & Riecke, B. E. (2014). Evaluating affective features of 3D motionscapes. Proceedings
of the ACM Symposium on Applied Perception, 23–30. https://doi.org/10.1145/2628257.2628264
Fox, M. D., & Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with functional
magnetic resonance imaging. Nature Reviews Neuroscience, 8(9), 700–711. https://doi.org/10.1038/nrn2201
Fredrickson, B. L. (2000). Extracting meaning from past affective experiences: The importance of peaks, ends,
and specific emotions. Cognition & Emotion, 14(4), 577–606. https://doi.org/10.1080/026999300402808
Fredrickson, B. L., & Kahneman, D. (1993). Duration neglect in retrospective evaluations of affective episodes.
Journal of Personality and Social Psychology, 65, 45–55. https://doi.org/10.1037/0022-3514.65.1.45
Girard, J. M., & C. Wright, A. G. (2018). DARMA: Software for dual axis rating and media annotation.
Behavior Research Methods, 50(3), 902–909. https://doi.org/10.3758/s13428-017-0915-5
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
50
Gnacek, M., Quintero, L., Mavridou, I., Balaguer-Ballester, E., Kostoulas, T., Nduka, C., & Seiss, E. (2024).
AVDOS-VR: Affective Video Database with Physiological Signals and Continuous Ratings Collected Remotely
in VR. Scientific Data 2024 11:1, 11(1), 1–18. https://doi.org/10.1038/s41597-024-02953-6
Gross, J. J., & Muñoz, R. F. (1995). Emotion regulation and mental health. Clinical Psychology: Science and
Practice, 2(2), 151–164. https://doi.org/10.1111/j.1468-2850.1995.tb00036.x
Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject Synchronization of Cortical
Activity During Natural Vision. Science, 303(5664), 1634–1640. https://doi.org/10.1126/science.1089506
Hofmann, S. M., Klotzsche, F., Mariola, A., Nikulin, V., Villringer, A., & Gaebler, M. (2021). Decoding
subjective emotional arousal from EEG during an immersive virtual reality experience. 10.
https://doi.org/10.7554/eLife
Huk, A., Bonnen, K., & He, B. J. (2018). Beyond Trial-Based Paradigms: Continuous Behavior, Ongoing
Neural Activity, and Natural Stimuli. Journal of Neuroscience, 38(35), 7551–7558.
https://doi.org/10.1523/JNEUROSCI.1920-17.2018
Hupka, R. B., Zaleski, Z., Otto, J., Reidl, L., & Tarabrina, N. V. (1997). The Colors of Anger, Envy, Fear, and
Jealousy: A Cross-Cultural Study. Journal of Cross-Cultural Psychology, 28(2), 156–171.
https://doi.org/10.1177/0022022197282002
James, W. (1884). II.—WHAT IS AN EMOTION ? Mind, os-IX(34), 188–205.
https://doi.org/10.1093/mind/os-IX.34.188
James, W. (1894). Discussion: The physical basis of emotion. Psychological Review, 1(5), 516–529.
https://doi.org/10.1037/h0065078
Jordan, P. W. (1998). An Introduction To Usability. CRC Press. https://doi.org/10.1201/9781003062769
Kaplan, R. L., Levine, L. J., Lench, H. C., & Safer, M. A. (2016). Forgetting feelings: Opposite biases in
reports of the intensity of past emotion and mood. Emotion, 16, 309–319. https://doi.org/10.1037/emo0000127
Kennedy, R. S., Lane, N. E., Berbaum, K. S., & Lilienthal, M. G. (1993). Simulator Sickness Questionnaire: An
Enhanced Method for Quantifying Simulator Sickness. The International Journal of Aviation Psychology, 3(3),
203–220. https://doi.org/10.1207/s15327108ijap0303_3
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
51
Kragel, P. A., & LaBar, K. S. (2013). Multivariate pattern classification reveals autonomic and experiential
representations of discrete emotions. Emotion, 13(4), 681–690. https://doi.org/10.1037/a0031820
Kreibig, S. D. (2010). Autonomic nervous system activity in emotion: A review. Biological Psychology, 84(3),
394–421. https://doi.org/10.1016/j.biopsycho.2010.03.010
Kron, A., Goldstein, A., Lee, D. H.-J., Gardhouse, K., & Anderson, A. K. (2013). How Are You Feeling?
Revisiting the Quantification of Emotional Qualia. Psychological Science, 24(8), 1503–1511.
https://doi.org/10.1177/0956797613475456
Kunin, T. (1998). The Construction of a New Type of Attitude Measure. Personnel Psychology, 51(4), 823–824.
https://doi.org/10.1111/j.1744-6570.1998.tb00739.x
Kuppens, P., Tuerlinckx, F., Russell, J. A., & Barrett, L. F. (2013). The relation between valence and arousal in
subjective experience. Psychological Bulletin, 139(4), 917–940. https://doi.org/10.1037/a0030811
Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence Testing for Psychological Research: A Tutorial.
Advances in Methods and Practices in Psychological Science, 1(2), 259–269.
https://doi.org/10.1177/2515245918770963
Leiner, D. J. (2021). SoSci Survey (version 3.2. 31)[computer software]. München: SoSci Survey GmbH.
Leising, D., Grande, T., & Faber, R. (2009). The Toronto Alexithymia Scale (TAS-20): A measure of general
psychological distress. Journal of Research in Personality, 43(4), 707–710.
https://doi.org/10.1016/j.jrp.2009.03.009
Levine, L. J., & Safer, M. A. (2002). Sources of Bias in Memory for Emotions. Current Directions in
Psychological Science, 11(5), 169–173. https://doi.org/10.1111/1467-8721.00193
Lieberman, M. D., Inagaki, T. K., Tabibnia, G., & Crockett, M. J. (2011). Subjective responses to emotional
stimuli during labeling, reappraisal, and distraction. Emotion, 11(3), 468–480. https://doi.org/10.1037/a0023503
Lindquist, K. A. (2013). Emotions Emerge from More Basic Psychological Ingredients: A Modern
Psychological Constructionist Model. Emotion Review, 5(4), 356–368.
https://doi.org/10.1177/1754073913489750
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
52
Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., & Barrett, L. F. (2012). The brain basis of emotion:
A meta-analytic review. Behavioral and Brain Sciences, 35(03), 121–143.
https://doi.org/10.1017/S0140525X11000446
Lipson-Smith, R., Bernhardt, J., Zamuner, E., Churilov, L., Busietta, N., & Moratti, D. (2021). Exploring colour
in context using Virtual Reality: Does a room change how you feel? Virtual Reality, 25(3), 631–645.
https://doi.org/10.1007/s10055-020-00479-x
Madden, T. J., Hewett, K., & Roth, M. S. (2000). Managing Images in Different Cultures: A Cross-National
Study of Color Meanings and Preferences. Journal of International Marketing, 8(4), 90–107.
https://doi.org/10.1509/jimk.8.4.90.19795
Marcotti, P., & St. Jacques, P. L. (2018). Shifting visual perspective during memory retrieval reduces the
accuracy of subsequent memories. Memory, 26(3), 330–341. https://doi.org/10.1080/09658211.2017.1329441
McCall, C., Hildebrandt, L. K., Bornemann, B., & Singer, T. (2015). Physiophenomenology in retrospect:
Memory reliably reflects physiological arousal during a prior threatening experience. Consciousness and
Cognition, 38, 60–70. https://doi.org/10.1016/j.concog.2015.09.011
McCall, C., Hildebrandt, L. K., Hartmann, R., Baczkowski, B. M., & Singer, T. (2016). Introducing the
Wunderkammer as a tool for emotion research: Unconstrained gaze and movement patterns in three
emotionally evocative virtual worlds. Computers in Human Behavior, 59, 93–107.
https://doi.org/10.1016/j.chb.2016.01.028
McCall, C., Schofield, G., Halgarth, D., Blyth, G., Laycock, A., & Palombo, D. J. (2022). The underwood
project: A virtual environment for eliciting ambiguous threat. Behavior Research Methods.
https://doi.org/10.3758/s13428-022-02002-3
McClay, M., Sachs, M. E., & Clewett, D. (2023). Dynamic emotional states shape the episodic structure of
memory. Nature Communications, 14(1), Article 1. https://doi.org/10.1038/s41467-023-42241-2
Mehling, W. E., Acree, M., Stewart, A., Silas, J., & Jones, A. (2018). The multidimensional assessment of
interoceptive awareness, version 2 (MAIA-2). PloS One, 13(12), e0208034.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
53
Nagel, F., Kopiez, R., Grewe, O., & Altenmüller, E. (2007). EMuJoy: Software for continuous measurement of
perceived emotions in music. Behavior Research Methods, 39(2), 283–290.
https://doi.org/10.3758/BF03193159
Nair, S., Sagar, M., Sollers, J., Consedine, N., & Broadbent, E. (2015). Do slumped and upright postures affect
stress responses? A randomized trial. Health Psychology: Official Journal of the Division of Health Psychology,
American Psychological Association, 34(6), 632–641. https://doi.org/10.1037/hea0000146
Petrolini, V., & Viola, M. (2020). Core affect dynamics: Arousal as a modulator of valence. Review of
Philosophy and Psychology, 11, 783–801.
Pinilla, A., Garcia, J., Raffe, W., Voigt-Antons, J.-N., Spang, R. P., & Möller, S. (2021). Affective Visualization
in Virtual Reality: An Integrative Review. Frontiers in Virtual Reality, 2, 105.
https://doi.org/10.3389/frvir.2021.630731
Riva, G., Mantovani, F., Capideville, C. S., Preziosa, A., Morganti, F., Villani, D., Gaggioli, A., Botella, C., &
Alcañiz, M. (2007). Affective Interactions Using Virtual Reality: The Link between Presence and Emotions.
CyberPsychology & Behavior, 10(1), 45–56. https://doi.org/10.1089/cpb.2006.9993
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6),
1161–1178. https://doi.org/10.1037/h0077714
Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called
emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805–819.
https://doi.org/10.1037/0022-3514.76.5.805
Russell, J. A., Weiss, A., & Mendelsohn, G. A. (1989). Affect Grid: A single-item scale of pleasure and arousal.
Journal of Personality and Social Psychology, 57(3), 493–502. https://doi.org/10.1037/0022-3514.57.3.493
Saarimäki, H. (2021). Naturalistic Stimuli in Affective Neuroimaging: A Review. Frontiers in Human
Neuroscience, 15. https://doi.org/10.3389/fnhum.2021.675068
Saarimäki, H., Gotsopoulos, A., Jääskeläinen, I. P., Lampinen, J., Vuilleumier, P., Hari, R., Sams, M., &
Nummenmaa, L. (2016). Discrete Neural Signatures of Basic Emotions. Cerebral Cortex, 26(6), 2563–2573.
https://doi.org/10.1093/cercor/bhv086
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
54
Sabat, M., Dampierre, C. de, & Tallon-Baudry, C. (2024). Evidence for domain-general arousal from semantic
and neuroimaging meta-analyses reconciles opposing views on arousal (p. 2024.05.27.594944). bioRxiv.
https://doi.org/10.1101/2024.05.27.594944
Sauro, J., & Lewis, J. R. (2016). Quantifying the user experience: Practical statistics for user research. Morgan
Kaufmann.
https://books.google.com/books?hl=fr&lr=&id=USPfCQAAQBAJ&oi=fnd&pg=PP1&dq=Sauro+%26+Lewis+
2016&ots=VzZd0_8qMm&sig=vQotURtESC3T9DjsSxULWygXzUM
Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. In Trends in Cognitive Sciences
(Vol. 17). https://doi.org/10.1016/j.tics.2013.09.007
Sharma, K., Castellini, C., Stulp, F., & van den Broek, E. L. (2017). Continuous, Real-Time Emotion
Annotation: A Novel Joystick-Based Analysis Framework. IEEE Transactions on Affective Computing, 11(1),
78–84. https://doi.org/10.1109/TAFFC.2017.2772882
Siegel, E. H., Sands, M. K., Van den Noortgate, W., Condon, P., Chang, Y., Dy, J., Quigley, K. S., & Barrett, L.
F. (2018). Emotion fingerprints or emotion populations? A meta-analytic investigation of autonomic features of
emotion categories. Psychological Bulletin, 144(4), 343. https://doi.org/10.1037/bul0000128
Soriano, C., & Valenzuela, J. (2009). Emotion and colour across languages: Implicit associations in Spanish
colour terms. Social Science Information, 48(3), 421–445. https://doi.org/10.1177/0539018409106199
Sutherland, M. R., & Mather, M. (2018). Arousal (but not valence) amplifies the impact of salience. Cognition
and Emotion, 32(3), 616–622. https://doi.org/10.1080/02699931.2017.1330189
Toet, A., Heijn, F., Brouwer, A.-M., Mioch, T., & van Erp, J. B. F. (2020). An Immersive Self-Report Tool for
the Affective Appraisal of 360° VR Videos. Frontiers in Virtual Reality, 1, 552587.
https://doi.org/10.3389/frvir.2020.552587
Vuoskoski, J. K., Zickfeld, J. H., Alluri, V., Moorthigari, V., & Seibt, B. (2022). Feeling moved by music:
Investigating continuous ratings and acoustic correlates. PLOS ONE, 17(1), e0261151.
https://doi.org/10.1371/journal.pone.0261151
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
55
Wagner, V., Scharinger, M., Knoop, C. A., & Menninghaus, W. (2021). Effects of continuous self-reporting on
aesthetic evaluation and emotional responses. Poetics, 85, 101497. https://doi.org/10.1016/j.poetic.2020.101497
Westermann, R., Spies, K., Stahl, G., & Hesse, F. W. (1996). Relative effectiveness and validity of mood
induction procedures: A meta-analysis. European Journal of Social Psychology, 26(4), 557–580.
https://doi.org/10.1002/(SICI)1099-0992(199607)26:4<557::AID-EJSP769>3.0.CO;2-4
Widmaier, E., & Raff, H. (2022). Vander’s human physiology.
https://www.mheducation.com/highered/product/Vanders-Human-Physiology-Widmaier.html
Wundt, W. (1897). Outline of psychology (pp. xviii, 342). Wilhelm Engelmann.
https://doi.org/10.1037/12908-000
Xue, T., Ali, A. E., & Zhang, T. (2021, May). Rcea-360vr: Real-time, continuous emotion annotation in 360?
Vr videos for collecting precise viewport-dependent ground truth labels. Conference on Human Factors in
Computing Systems - Proceedings. https://doi.org/10.1145/3411764.3445487
Xue, T., Ghosh, S., Ding, G., El Ali, A., & Cesar, P. (2020). Designing real-time, continuous emotion
annotation techniques for 360° VR videos. Conference on Human Factors in Computing Systems -
Proceedings. https://doi.org/10.1145/3334480.3382895
Yik, M., Mues, C., Sze, I. N. L., Kuppens, P., Tuerlinckx, F., De Roover, K., Kwok, F. H. C., Schwartz, S. H.,
Abu-Hilal, M., Adebayo, D. F., Aguilar, P., Al-Bahrani, M., Anderson, M. H., Andrade, L., Bratko, D., Bushina,
E., Choi, J. W., Cieciuch, J., Dru, V., … Russell, J. A. (2023). On the relationship between valence and arousal
in samples across the globe. Emotion, 23(2), 332–344. https://doi.org/10.1037/emo0001095
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
56
Supplementary Material
S1. Detailed Description of the development of different feedback options and Pilot Study.
Pilot description
The primary goal of the pilot study was to develop different feedback options for Continuous Rating (CR) of Affective
States (AS) in VR and explore their relationship with Summary Rating (SR). We aimed to test the feasibility in real-time
continuous rating and the different feedback.
Twelve participants engaged in a 1-hour VR session, experiencing four blocks of trials: three with different feedback and
one baseline condition without CR. They rated their affective state twice during each trial, using CR and SR. The different
feedback included a Visual prototype with a simplified affect grid on top of the Flubber, a Tactile prototype with
vibrations of the VR controller, and a Proprioceptive prototype, i.e., no feedback, with participants relying on thumb
sensation.
Our findings revealed a preference for the Visual feedback. SR was strongly correlated with the mean CR. We found no
significant differences in Distraction among the different feedback. However, there were notable distinctions between
Visual, Tactile, Proprioceptive, and Baseline conditions.
Despite the short stimuli with low affective variability, comparing CR to SR proved beneficial, aiding in avoiding order
effects. Our next steps involve refining the prototypes based on participant feedback and conducting a comprehensive data
collection. In response to participant suggestions, we integrated a vibration feature into the Proprioceptive feedback, to act
as a reminder to rate. We removed the Tactile feedback because the valence and arousal mapping to vibrations revealed to
be too coarse and crude, due to the technical limitations of the VR controller (fine control of the vibrations not possible).
Finally, we separated the Grid and the Flubber into two distinct visual feedback, because participants reported using
mostly the Grid and disregarding the Flubber.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
57
S2. Detailed Session Script for Study 1: Selection.
1. Introduction. The aim of the study is to measure emotions while watching 360 videos inside the VR. Participants’
will continuously rate their affective states during the videos.
2. Pre-experiment digital survey.
3. What are affective states? Explanation of how valence and arousal work, giving some examples of affective states
(showing the Arousal-Valence space illustration).
a. Arousal: from low arousal to high arousal; how intense is your feeling; y-axis of the affective grid.
b. Valence: from unpleasant to pleasant; x-axis of the affective grid.
4. To help you rate your affective state in real time, we designed a rating method with 3 feedback options: a grid, a
flubber and a proprioceptive feedback. There is also a baseline condition where there is no rating continuously
(showing the VR controller and trackpad). The rating method involves using the trackpad as-if the affect grid is
on it, and you will be using your finger to move around the trackpad to indicate your affective state.
5. The VR experience is organised in 4 sessions. Each session will begin with a training phase to familiarise with the
rating method and the feedback. Then there is the ‘real’ experiment, where you will see 4 different videos, each of
which is followed by a summary rating. A third phase involves you answering some questions about the feedback
you have just tested. At the end of one session, another one will begin immediately, again starting with the
training phase. Everything will be preceded by written step-by-step instructions, and we will be here every step of
the way. You can stop the experiment at any time.
6. The controller you will use during the experiment will help you signal to us moment by moment what you are
feeling as you watch the videos. Please refrain from commenting or speaking out loud but use your finger on the
touchpad to indicate what you are experiencing.
7. VR setup and familiarisation with the system:
a. Now you enter a room, you will see that VR allows you to explore the space around you by turning your
head.
b. Explanation of how the controller works (touchpad and trigger).
8. Training. To get used to the affect grid and the feedback. Here, look at pictures, explore the touchpad with your
thumb and see how the feedback reacts, get familiar with it. Differently, during the experiment you’ll have to do
live rating/continuous rating and then a summary rating. Please take your time and get yourself used to the
method and whenever you feel ready, you can skip to the next picture by pulling the trigger.
a. Grid feedback: You’ll see the affect grid on the bottom of the screen. You will be using the trackpad to
indicate your affective state.
b. Flubber feedback: You’ll see a moving shape – we call it Flubber. The Flubber will change form and
movement based on the position of your finger on the trackpad/grid.
c. Proprioceptive feedback: this time there’s no visual feedback. You won’t see any grid; you’ll just have to
move your finger on the trackpad on the controller. There will be a light but continuous vibration that will
remind you to keep on rating.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
58
d. Baseline: in this session, you are going to simply watch the videos and describe your feelings with a
single evaluation that you will be asked to give at the end of each video. To validate your answer, pull the
trigger.
9. Post-experiment digital survey.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
59
S3. Detailed Session Script for Study 2: Evaluation.
1. Introduction. The purpose of the study is to measure emotions during the viewing of 360° videos within Virtual
Reality. You will continuously assess your emotional states during a sequence of videos.
2. Pre-experiment survey.
3. What are emotional states? Explanation of arousal and valence, with some examples of emotional states (show
the illustration of the Arousal-Valence Space).
a. Arousal: From low to high; how intense the feelings experienced are; the y-axis of the affective grid.
b. Valence: From left to right; how pleasant or unpleasant the feelings experienced are; the x-axis of the
affective grid.
4. To help you assess your emotional state in real-time, we have designed a new rating method with some visual
feedback: the flubber (show the VR controller and the trackpad). You will use the trackpad as if the AVS grid
were drawn on it, moving your finger to indicate your emotional state.
5. The VR experience is organised in 2 phases:
a. A training phase to become familiar with the flubber.
b. The actual experiment, in which you will watch a series of videos and provide continuous ratings
throughout. At the end of the video session, you will provide an overall evaluation.
6. You can stop the experiment at any time.
7. The controller you will use during the experiment will help you signal to us what you are experiencing while
watching the videos. Please refrain from commenting or speaking aloud but use your finger on the touchpad to
indicate what you are feeling.
8. VR setup and familiarisation with the system:
a. You are now in a room and can notice that VR allows you to explore the space by rotating your head.
b. Explanation of how the controller works (touchpad and trigger).
9. Training. While watching the video, explore the touchpad with your thumb to understand how the flubber reacts,
gaining confidence. During the experiment, you will need to continuously and in real-time assess, immediately
after the videos, provide a summary rating. Take the time you need to get used to the Flubber.
a. During the video: This moving shape is the Flubber. The Flubber changes shape and movement based on
the position of your finger on the trackpad. As you can see, the speed of Flubber's movements changes
based on the level of activation you are reporting. When the finger goes up, the speed increases (higher
intensity), while when it goes down, it slows down (lower intensity). It can be noticed that the shape
changes based on pleasantness. When the finger goes left, the shape becomes irregular with spikes (more
unpleasant). When you go right, the shape becomes regular and more rounded (more pleasant).
b. When Summary Rating: The summary rating will appear at the end of the video. Place the dot in the
space representing the emotional state you experienced during the video, and when you have decided,
press the trigger.
10. Post-experiment survey.
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
60
S4. Comparison of continuous rating indices (CRi) between all feedback options in the Valence and Arousal
dimensions, in Study 1: Selection.
Selection of CRi representing different statistical moments of the CR distribution over time: central tendency (mean),
dispersion (std) and shape (skewness and kurtosis). Important significant p-values are followed by an asterisk and in
bold.
CR mean
Grid vs. Flubber
Grid vs. Proprioceptive
Flubber vs. Proprioceptive
Valence
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
-2.027
2.651
-6.705
-1.479
3.192
-6.15
0.496
4.744
-3.753
SE
0.027
0.027
0.027
0.027
0.027
0.027
0.029
0.029
0.029
df
203
203
203
203
203
203
203
203
203
p-value
.044*
.004*
< .001*
.141
.001*
< .001*
.621
< .001*
< .001*
FDR adjusted
p-value
.132
.211
.621
Arousal
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
6.93
10.57
3.291
5.276
9.074
1.479
-2.096
1.974
-6.167
SE
0.034
0.034
0.034
0.033
0.033
0.033
0.031
0.031
0.031
df
203
203
203
203
203
203
203
203
203
p-value
< .001*
< .001*
0.999
< .001*
< .001*
0.930
0.037*
0.025*
< .001*
FDR adjusted
p-value
< .001*
<
.001*
0.037*
CR std
Grid vs. Flubber
Grid vs. Proprioceptive
Flubber vs. Proprioceptive
Valence
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
-3.543
5.506
-12.592
-5.131
3.298
-13.56
-1.725
6.218
-9.668
SE
0.014
0.014
0.014
0.015
0.015
0.015
0.016
0.016
0.016
df
203
203
203
203
203
203
203
203
203
p-value
< .001*
< .001*
< .001*
< .001*
.001*
< .001*
.086
< .001*
< .001*
FDR adjusted
p-value
.001*
<
.001*
.086
Arousal
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
-0.846
9.291
-10.983
-5.671
3.096
-14.438
-5.168
4.006
-14.341
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
61
SE
0.012
0.012
0.012
0.014
0.014
0.014
0.014
0.014
0.014
df
203
203
203
203
203
203
203
203
203
p-value
.398
< .001*
< .001*
< .001*
.001*
< .001*
< .001*
< .001*
< .001*
FDR adjusted
p-value
.398
<
.001*
<
.001*
CR skewness
Grid vs. Flubber
Grid vs. Proprioceptive
Flubber vs. Proprioceptive
Valence
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
0.375
0.903
-0.152
-0.02
0.507
-0.548
-0.339
0.113
-0.79
SE
0.237
0.237
0.237
0.237
0.237
0.237
0.277
0.277
0.277
df
203
203
203
203
203
203
203
203
203
p-value
.708
.184
.440
.984
.306
.292
.735
.455
.215
FDR adjusted
p-value
.984
.984
.984
Arousal
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
-3.124
-2.687
-3.561
-2.056
-1.588
-2.525
0.979
1.334
0.624
SE
0.286
0.286
0.286
0.267
0.267
0.267
0.352
0.352
0.352
df
203
203
203
203
203
203
203
203
203
p-value
.002*
.996
< .001*
.041*
.943
.006*
.329
.092
.733
FDR adjusted
p-value
.006*
.062
.329
CR kurtosis
Grid vs. Flubber
Grid vs. Proprioceptive
Flubber vs. Proprioceptive
Valence
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t
-1.71
-1.686
-1.734
-2.194
-2.164
-2.224
-0.027
-0.008
-0.046
SE
5.255
5.255
5.255
4.177
4.177
4.177
6.599
6.599
6.599
df
203
203
203
203
203
203
203
203
203
p-value
.089
.953
.042*
.029*
.984
.014*
.978
.503
.482
FDR adjusted
p-value
.133
.088
.978
Arousal
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
62
t
-2.356
-2.336
-2.375
-1.65
-1.63
-1.669
0.453
0.467
0.44
SE
6.351
6.351
6.351
6.554
6.554
6.554
9.153
9.153
9.153
df
203
203
203
203
203
203
203
203
203
p-value
.019*
.990
.009*
.101
.948
.048*
.651
.321
.670
FDR adjusted
p-value
.058
.151
.651
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
63
S5. Comparison of summary ratings (SR) for all feedback options and during Baseline in the Valence and Arousal
dimensions, in Study 1: Selection.
The SR for all feedback options and during Baseline were equivalent for both affective dimensions (TOST approach, all p
> .05 for difference, all p < .010 for equivalence).Important significant p-values are followed by an asterisk and in bold.
Grid vs Baseline
Flubber vs Baseline
Proprioceptive vs Baseline
Valence
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t
-2.4
3.29
-8.09
-0.05
5.83
-5.92
-0.53
5.11
-6.16
SE
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
df
203
203
203
203
203
203
203
203
203
p-value
.022*
< .001*
< .001*
.960
< .001*
< .001*
.600
< .001*
< .001*
FDR adjusted
p-value
.051
.960
.900
Arousal
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t
1.55
5.62
-2.52
-0.37
4.18
-4.92
0.34
4.57
-3.88
SE
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
0.03
df
203
203
203
203
203
203
203
203
203
p-value
.120
< .001*
0.010*
.710
< .001*
< .001*
.730
< .001*
< .001*
FDR adjusted
p-value
.370
.730
.730
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
64
S6. Comparison of CRis-SR Pearson correlations between all feedback options in the Valence and Arousal
dimensions, in Study 1: Selection.
Selection of CRi representing different statistical moments of the CR distribution over time: central tendency (mean),
dispersion (std) and shape (skewness and kurtosis). For valence, there were no significant differences in CR mean-SR
correlation between feedback options (all z < 1.5; all p > .125). For arousal, the correlation for Grid was significantly
higher than for the other two feedback options (all z > 2.2, all p < .025). Important significant p-values are followed by
an asterisk and in bold.
Valence
Arousal
CRi
mean
std
skewness
kurtosis
mean
std
skewness
kurtosis
Grid
0.95
0.062
-0.581
0.95
0.938
0.101
-0.507
-0.012
Flubber
0.922
-0.114
-0.539
0.922
0.873
0.321
-0.528
-0.167
Proprioceptive
0.92
-0.174
-0.565
0.92
0.852
0.243
-0.437
-0.156
Grid vs.
Flubber
z= 1.459
p= .144
z= 0.975
p= .330
z = -0.326
p= .745
z= -0.848
p= .397
z = 2.237
p= 0.025*
z = -1.264
p= .206
z = 0.151
p= .880
z = 0.759
p= .448
Grid vs.
Proprioceptive
z = 1.534
p= .125
z = 1.248
p= .212
z = -0.124
p= .901
z = -0.267
p= 0.79
z = 2.679
p= .007*
z = -0.757
p= .449
z = -0.457
p= .648
z = 0.714
p= .475
Flubber vs
Proprioceptive
z = 0.083
p= .934
z = 0.352
p= .725
z= 0.208
p= .836
z = 0.594
p= .552
z = 0.527
p= .598
z = 0.489
p= .625
z = -0.607
p= .544
z = -0.052
p= .958
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
65
S7. Comparison of responses to questionnaires between all feedback options, and including Baseline, in Study 1: Selection.
For the Distraction questionnaire, CR during videos (i.e, Baseline condition, only SR) was significantly less invasive than CR during videos (i.e., for all of the feedback; all t >
4.0; all p < .001). Importantly, there were no significant differences between the feedback options (all t < 0.33; all p > .920). For Sense of Presence, the responses were
equivalent for all feedback options as well as the Baseline condition (TOST approach, all p > .310 for difference, all p < .020 for equivalence). For Usability, the SUS scores
were the highest for Grid and Baseline (i.e., no CR, only SR; all |t| > 2.5, all p < .020). The SUS score was also higher for Flubber than Proprioceptive (t(50) = 3.0, p = .010).
For Satisfaction, the scores were equivalent for Flubber and Grid (TOST approach, p = .920 for difference, all p < .010 for equivalence). Scores for Proprioceptive were lower
than for Flubber and Grid (all t < 3.01, all p > .010). For Emotion Representation, the scores were significantly higher for Flubber than for Proprioceptive (t(50) = 3.6, p <
.001) and equivalent between Grid, Proprioceptive and Baseline (TOST approach, all p > .090 for difference, all p < .050 for equivalence). Significant p-values are followed by
an asterisk and important ones are highlighted in bold.
Grid vs Baseline
Flubber vs Baseline
Proprioceptive vs Baseline
Grid vs Flubber
Grid vs Proprioceptive
Flubber vs Proprioceptive
Distraction
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST
Upper
t test
TOST
Lower
TOST Upper
t
4.02
6.03
2.01
4.3
6.25
2.34
4.13
6.01
2.25
-0.33
1.34
-1.99
-0.3
1.22
-1.81
0
1.53
-1.53
SE
0.25
0.25
0.25
0.26
0.26
0.26
0.27
0.27
0.27
0.3
0.3
0.3
0.33
0.33
0.33
0.33
0.33
0.33
df
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
p value
< .001*
< .001*
0.98
< .001*
< .001*
0.99
< .001*
< .001*
0.99
0.75
0.09
0.03*
0.77
0.11
0.04*
1
0.07
0.07
FDR adjusted
p-value
< .001*
< .001*
< .001*
0.920
0.920
1
Sense of
Presence
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST Upper
t
0.94
5.29
-3.41
0
3.27
-3.27
-0.88
2.12
-3.89
1.05
5.95
-3.84
1.99
5.9
-1.92
1.15
5.06
-2.76
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
66
SE
0.11
0.11
0.11
0.15
0.15
0.15
0.17
0.17
0.17
0.1
0.1
0.1
0.13
0.13
0.13
0.13
0.13
0.13
df
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
p-value
0.350
< .001*
< .001*
1
< .001*
< .001*
0.380
0.020*
< .001*
0.300
< .001*
<
.001*
0.050
< .001*
0.03*
0.260
< .001*
< .001*
FDR adjusted
p-value
0.460
1
0.460
0.460
0.310
0.460
Usability
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST Upper
t
-1.15
3.5
-5.79
-3.18
-0.06
-6.29
-6.53
-3.31
-9.76
2.49
5.7
-0.73
5.6
8.74
2.45
2.96
5.91
0.02
SE
1.51
1.51
1.51
2.25
2.25
2.25
2.17
2.17
2.17
2.18
2.18
2.18
2.23
2.23
2.23
2.38
2.38
2.38
df
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
p-value
0.260
< .001*
< .001*
< .001*
0.520
< .001*
< .001*
1
< .001*
0.020*
< .001*
0.230
< .001*
< .001*
0.99
< .001*
< .001*
0.510
FDR adjusted
p-value
0.260
0.010*
< .001*
0.020*
< .001*
0.010*
Grid vs Baseline
Flubber vs Baseline
Proprioceptive vs Baseline
Grid vs Flubber
Grid vs Proprioceptive
Flubber vs Proprioceptive
Satisfaction
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST Upper
t
1.63
4.6
-1.34
1.22
3.61
-1.17
-1.84
0.2
-3.89
0.1
2.72
-2.52
3.01
5.08
0.94
3.03
5.17
0.88
SE
0.17
0.17
0.17
0.21
0.21
0.21
0.24
0.24
0.24
0.19
0.19
0.19
0.24
0.24
0.24
0.23
0.23
0.23
df
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
67
p-value
0.110
< .001*
0.090
0.230
< .001*
0.120
0.070
0.420
< .001
0.920
< .001*
0.010*
< .001*
< .001*
0.820
< .001*
< .001*
0.810
FDR adjusted
p-value
0.160
0.270
0.140
0.920
0.010*
0.010*
Emotion
Representation
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST
Upper
t-test
TOST
Lower
TOST Upper
t
1.07
3.79
-1.65
2.25
4.46
0.04
-0.51
1.65
-2.66
-1.49
0.88
-3.87
1.49
3.87
-0.88
3.59
6.45
0.73
SE
0.18
0.18
0.18
0.23
0.23
0.23
0.23
0.23
0.23
0.21
0.21
0.21
0.21
0.21
0.21
0.17
0.17
0.17
df
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
p-value
0.290
< .001*
0.05*
0.030*
< .001*
0.52
0.61
0.050*
0.010*
0.140
0.190
<
.001*
0.140
< .001*
0.190
< .001*
< .001*
0.770
FDR adjusted
p-value
0.350
0.090
0.610
0.210
0.210
< .001*
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
68
S8. Comparison of body posture (standing vs. seated) in the BER sample on ratings and questionnaires, in Study 2: Evaluation.
Significant effect of body posture on CR means for arousal (F(1) = 14.95, p < .001). Post-hoc t-tests revealed higher CR mean for arousal in standing compared to seated posture
(t = 3.87, p < .001). No significant effect of body posture on SR, Distraction, Sense of Presence, Usability, Satisfaction and Emotion Representation (all F < 2.48, all p > .122).
Significant p-values are followed by an asterisk and important ones are highlighted in bold.
ANOVA
Valence
Arousal
CR mean
sum_sq
df
F
p-value
sum_sq
df
F
p-value
Effect of
posture
0.08
1
1.78
.190
1.66
1
14.95
< .001*
Residual
1.83
40
4.44
40
Posthoc t-tests
t
df
p-value
CI 95%
cohen-d
BF10
power
3.87
40
< .001*
[0.19 0.61]
1.19
67.25
7
0.96
ANOVA
Valence
Arousal
SR
sum_sq
df
F
p-value
sum_sq
df
F
p-value
Effect of
posture
0.10
1
0.92
.343
0.18
1
1.72
.198
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
69
Residual
4.29
40
4.12
40
ANOVA
Distraction
Sense of Presence
Usability
Satisfaction
Emotion Representation
sum_sq
df
F
p-value
sum_sq
df
F
p-value
sum_sq
df
F
p-value
sum_sq
df
F
p-value
sum_sq
df
F
p-value
Effect of
posture
6.10
1
2.48
.122
0.02
1
0.0
1
.913
4.86
1
0.02
.878
0.59
1
0.4
5
.507
0.07
1
0.07
.793
Residual
98.19
40
58.11
4
0
8079.0
5
40
53.05
4
0
40.44
40
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
70
S9. Comparison of CR variability (CR std) in Study 1: Selection vs. Study 2: Evaluation.
Significant effect of video (valence: F(4) = 93.0, p < .001; arousal: F(4) = 79.3, p < .001) on CR std for both affective dimensions. Post-hoc t-tests revealed higher CR variability
for Evaluation Sequence compared to the other Selection (LN, LP, HN, HP) videos, for both affective dimensions (all |t| > 10.4, p < .001). HN: high arousal - negative valence;
HP: high arousal - positive valence; LN: low arousal - negative valence; LP: low arousal - positive valence. Significant p-values are followed by an asterisk and important ones
are highlighted in bold.
ANOVA
Valence
Arousal
CR std
sum_sq
df
F
p-value
sum_sq
df
F
p-value
Effect of
video
11.21
4
92.98
< .001*
8.13
4
79.26
<
.001*
Residual
7.87
261
6.69
261
Posthoc t-tests
t
df
p-unc
CI95%
cohen-d
BF10
power
p-bonf
t
df
p-unc
CI95%
cohen-d
BF10
power
p-bonf
HP vs. LP
1.591
100
0.115
[-0.02
0.15]
0.315
0.641
0.35
1
2.256
100
0.026*
[0.01
0.13]
0.447
1.957
0.608
0.262
HP vs. LN
0.381
100
0.704
[-0.07
0.1 ]
0.075
0.223
0.066
1
3.071
100
0.003*
[0.03
0.16]
0.608
12.448
0.86
0.027*
HP vs. HN
1.447
100
0.151
[-0.02
0.13]
0.287
0.529
0.3
1
-1.037
100
0.302
[-0.1
0.03]
0.205
0.337
0.177
1
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
71
LP vs. LN
-1.346
100
0.181
[-0.12
0.02]
0.267
0.467
0.266
1
1.021
100
0.310
[-0.03
0.09]
0.202
0.332
0.173
1
LP vs. HN
-0.297
100
0.767
[-0.07
0.06]
0.059
0.217
0.06
1
-3.433
100
0.001*
[-0.16
-0.04]
0.68
33.274
0.925
0.009*
LN vs. HN
1.177
100
0.242
[-0.03
0.11]
0.233
0.387
0.215
1
-4.201
100
< .001*
[-0.19
-0.07]
0.832
358.383
0.986
0.001*
HP vs.
sequence
-12.787
69.34
7
< .001*
[-0.52
-0.38]
2.568
1.80E+20
1
< .001*
-11.333
108.153
< .001*
[-0.43
-0.3 ]
2.135
1.07E+17
1
< .001*
LP vs.
sequence
-17.723
80.07
4
< .001*
[-0.57
-0.46]
3.5
4.21E+30
1
< .001*
-14.878
110.738
< .001*
[-0.49
-0.38]
2.747
6.07E+24
1
< .001*
LN vs.
sequence
-15.523
77.93
2
< .001*
[-0.52
-0.4 ]
3.076
1.39E+26
1
< .001*
-15.087
110.554
< .001*
[-0.52
-0.4 ]
2.815
1.68E+25
1
< .001*
HN vs.
sequence
-20.064
91.52
< .001*
[-0.55
-0.45]
3.896
1.31E+35
1
< .001*
-10.415
108.958
< .001*
[-0.4
-0.27]
1.957
9.48E+14
1
< .001*
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
72
S10. Comparison of Questionnaires responses after using Flubber between Study 1: Selection and Study 2: Evaluation.
No significant effect of phase on the scores for Distraction, Sense of Presence, Usability, and Satisfaction (all F < 1.53, all p > .220). For Emotion Representation, there was a
significant effect of phase (F(1) = 10.95, p < .001), with higher responses for the Selection phase than the Evaluation phase (t(107) = 3.31, p < .001). Significant p-values are
followed by an asterisk and important ones are highlighted in bold.
ANOVA
Distraction
Sense of Presence
Usability
Satisfaction
Emotion Representation
sum_sq
df
F
p-value
sum_sq
df
F
p-value
sum_sq
df
F
p-value
sum_s
q
df
F
p-value
sum_sq
df
F
p-value
Effect of
study
0.20
1
0.08
.780
2.19
1
1.53
.220
27.5
1
0.13
.720
0.61
1
0.49
.480
10.86
1
10.95
<
.001*
Residual
295.99
111
158.83
111
23153.39
111
136.52
111
110.12
111
Posthoc
t-test
t
3.31
df
107.53
p-value
< .001*
CI95%
[0.25 1]
AFFECTTRACKER: CONTINUOUS AFFECTIVE RATINGS IN IMMERSIVE VIRTUAL REALITY
73
cohen-d
0.63
BF10
24.154
power
0.91