ThesisPDF Available

Impacting the social presence of virtual agents by scaling the fidelity of their speech and movement

Thesis

Impacting the social presence of virtual agents by scaling the fidelity of their speech and movement

Abstract and Figures

Virtual agents are constructs that fulfill human or human-like roles in virtual environments, but are directly controlled by software instead of real humans. They have use cases such as presenting information, demonstrating actions or simulating a social environment. If a real person perceives them as sufficiently human-like, they may induce social phenomena like empathy, competition or conversational turn taking, even if the person is consciously aware that the agent is purely virtual. This thesis explores the influence of technical fidelity on perceived social presence in terms of the virtual agents’ speech and movement. Both of these two variables were assigned different implementations of varying technical sophistication, from text-to-speech output to fully recorded voices and from a completely rigid idle body to a high-quality relaxed idle animation based on motion capturing data. The various combinations were tested in an experiment using a head-mounted virtual reality display in order to measure their influence on perceived social presence. This thesis describes the experiment and its results.
Content may be subject to copyright.
Impacting the social presence of virtual
agents by scaling the fidelity of their speech
and movement
Julian Fietkau
February 19th, 2015
Master’s thesis
for the purpose of attaining the academic degree
Master of Science. (M.Sc.)
First supervisor: Prof. Dr. Frank Steinicke
Second supervisor: Prof. Dr. Martin Christof Kindsmüller
Fachbereich Informatik
Universität Hamburg
Contents
1. Introduction 4
2. Definitions 5
2.1. VirtualReality ................................. 5
2.2. VirtualAgents ................................. 5
2.3. SocialPresence ................................. 6
2.4. Fidelity ..................................... 6
2.5. IdleMotion ................................... 7
3. Experiment 8
3.1. Hypotheses ................................... 8
3.2. Design...................................... 9
3.2.1. Questionnaire.............................. 11
3.2.2. Movement................................ 13
3.2.3. Speech.................................. 14
3.2.4. Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3. Implementation................................. 18
3.3.1. Technical Components . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.2. Assets.................................. 19
3.3.3. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.4. Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . 25
4. Results 27
4.1. Evaluation.................................... 27
4.2. Discussion.................................... 33
5. Conclusion 34
References 35
Appendix 37
A. Questionnaire 38
B. Data: Questionnaire 53
C. Data: Experiment 71
This creative work is available in accordance with the terms of the Creative Commons Attribution Share-Alike 4.0 license. This means
that, with very few restrictions, it may be freely copied, transferred and used for any purpose as long as the name of the author (Julian
Fietkau) is clearly mentioned as the original creator and derivative works are made available under the same license. More information:
http://creativecommons.org/licenses/by-sa/4.0/
2
Abstract
Virtual agents are constructs that fulfill human or human-like roles in virtual
environments, but are directly controlled by software instead of real humans.
They have use cases such as presenting information, demonstrating actions or
simulating a social environment. If a real person perceives them as sufficiently
human-like, they may induce social phenomena like empathy, competition or
conversational turn taking, even if the person is consciously aware that the
agent is purely virtual.
This thesis explores the influence of technical fidelity on perceived social
presence in terms of the virtual agents’ speech and movement. Both of these
two variables were assigned different implementations of varying technical
sophistication, from text-to-speech output to fully recorded voices and from
a completely rigid idle body to a high-quality relaxed idle animation based on
motion capturing data. The various combinations were tested in an experi-
ment using a head-mounted virtual reality display in order to measure their
influence on perceived social presence. This thesis describes the experiment
and its results.
Keywords: avatars, head-mounted displays, social presence, virtual agents, virtual reality
3
1. Introduction
For several decades now, personal computers have been capable of producing real-time
3D graphics, predominantly used in games, that – even if they are not photorealistic –
look convincingly enough like a spatial location to evoke a sense of immersion (Slater,
Usoh, & Steed, 1994). In order to populate these environments, virtual agents (perhaps
more commonly known as “non-player characters”) are commonplace. They are virtual
humanoid characters controlled by software. Depending on the quality of their imple-
mentation, they may be a terrific addition to an immersive world, or they might feel
artificial and jarring.
In the context of this thesis, we1are concerned with the technical aspects of such
implementations. Specifically, we investigate whether the technical quality of their voice
or their animation has a strong influence on the user’s feeling of interacting with a
person, even if they are aware that there is no real human behind the virtual agent.
Some unconscious social actions might take place even in exchanges with virtual agents
(Biocca, Harms, & Gregg, 2001).
We make a point of focusing on characteristics that are not easily communicated
through a static screenshot. Voice and animation quality are perhaps not the first things
to come to mind when we consider realism in virtual environments, but we think that
neglecting them outright could have very negative consequences for the user’s feeling of
social presence (a term we define on page 6).
On the other hand, if we know how that feeling interacts with the fidelity of our virtual
agents’ voice and animation, then we would be better equipped to find compromises
between it and development resources.
This, all in all, is why we decided to examine this specific area of VR research further,
and conduct an experiment to produce some reliable answers.
We start out by defining a number of important concepts in chapter 2, relying on
established knowledge wherever possible. In chapter 3 we describe our experiment in
detail, from the initial idea through the design decisions and including a summary of
the final implementation, before analyzing and interpreting the results in chapter 4.
We close with a summary and conclusion in chapter 5. Bulk data can be found in the
appendix.
1Of course this is a master’s thesis, so any usage of the first person plural in the manuscript refers more
or less exclusively to the author, who even has to certify that he wrote everything by himself. Still,
we stick to this pronoun not only because it is the polite thing to do, but also as a respectful nod
towards the friends, colleagues and advisors who contributed to discussions, talked about ideas or
gave valuable feedback. Thank you!
4
2. Definitions
In order to create a common understanding of the core concepts of this work, it is vital
that there be agreed-upon definitions. Wherever possible, we base our definitions on
previous established works to increase the viability of this work as a stepping stone for
future scientific progress.
2.1. Virtual Reality
The term virtual reality (VR) has historically often been defined in terms of the hard-
ware used to convey a particular medial experience to a human user (Krueger, 1991).
To alleviate the ties to concrete technological developments, Steuer (1992) proposes a
definition based on the perception of the experience rather than the method of imple-
mentation, he defines virtual reality as “a real or simulated environment in which a
perceiver experiences telepresence” (Steuer, 1992, p. 7), building upon his previously
established definition for telepresence as “the experience of presence in an environment
by means of a communication medium” (Steuer, 1992, p. 6).
Even though this definition might at first glance seem overly broad, the mandate of
achieving telepresence using a communication medium (as opposed to natural human
senses) covers a lot of past and future implementations, and Steuer makes a convincing
case for not chaining the concept of VR to classes of hardware like head-mounted displays
or data gloves, which is why we operate on the basis of his definition even though this
work happens to have a concrete technical scope in that we focus on a VR experience
using a head-mounted display (see section 3.3.1).
2.2. Virtual Agents
Even though virtual agents have been extensively studied in works such as Caridakis et
al. (2008) or Kopp, Sowa, and Wachsmuth (2003), a systemic definition of the term is
often not supplied and there does not seem to be an agreed-upon understanding of the
term. We provide our own definition as follows.
We understand an agent (in the context of software programming) to be a software
construct that possesses agency, i.e. something that distinguishes between its own be-
havioral autonomy and the environment in which it exists. An agent may have some
perception of its environment, and its actions may have consequences within the envi-
ronment. A virtual agent is then defined to be an agent that exists in a virtual reality.
Virtual agents are not virtual avatars, because the latter represent and are controlled
by human users while the former are controlled by software (Blascovich & Bailenson,
5
2011). However, both belong to the overarching category of virtual actors. Some of the
results of this experiment may be applicable to avatars as well as agents, but since we
only tested agents, we do not wish to make any claims to that effect.
2.3. Social Presence
As mentioned above in section 2.1, Steuer (1992) provides a useful definition for the term
“presence” (within the context of VR). While he also touches on telepresence, he does
not talk about social presence. To find a good definition for this concept, we consult
Biocca et al. (2001), who establish what they call “three dimensions of social presence”:
Co-presence: The degree to which the observer believes he/she is not
alone and secluded, their level of peripherally or focally awareness
of the other, and their sense of the degree to which the other is
peripherally or focally aware of them.
Psychological Involvement: The degree to which the observer allo-
cates focal attention to the other, empathically senses or responds to
the emotional states of the other, and believes that he/she has insight
into the intentions, motivation, and thoughts of the other.
Behavioral engagement: The degree to which the observer believes
his/her actions are interdependent, connected to, or responsive to the
other and the perceived responsiveness of the other to the observer’s
actions.
From: Biocca et al. (2001, p. 2)
They further divide these three dimensions into various factors like awareness, attention,
understanding, and interaction. However, their high-level overview is sufficient for our
experiment.
2.4. Fidelity
The concept of fidelity (as it is understood in the context of technology) is etymologically
rooted in “faith”/“faithful” and refers to “the degree to which something matches or
copies something else” (Merriam-Webster Dictionary, 2015). As the presentation of our
virtual agents aims to emulate the real world, we interpret the fidelity of a property of
a virtual agent as something akin to a degree of closeness to the real-world counterpart.
6
We would further like to highlight the contrast between fidelity and realism. We
understand fidelity to be an inherent property of the implementation of the virtual
agent, the degree of fidelity is a design decision. Realism, on the other hand, is the
(intended) result of a high degree of fidelity, it is inevitably influenced not only by the
virtual agent, but also the rest of the VR experience, and it is dependent on a human
observer.
Real-world applications have to make certain trade-offs when it comes to fidelity. Even
though a higher degree of fidelity is helpful in achieving a more realistic experience, it also
tends to be more difficult (and thus costly) to achieve than lower-fidelity alternatives.
The examples given in sections 3.2.2 and 3.2.3 might illuminate the concept further.
2.5. Idle Motion
Even if a human being is not actively doing anything in particular, their body never
stops moving completely. They unconsciously perform actions that we summarize as
idle motion (Egges, Molet, & Magnenat-Thalmann, 2004), such as shifting their weight,
slightly moving their arms, or mildly moving their head while gazing around. These
actions are involuntary and require concentrated effort to be suppressed, which is why
they are crucial for a virtual agent to appear convincingly “alive” instead of appearing
to be a statue. So-called idle animations are commonplace for virtual agents in mod-
ern games (Starck, Miller, & Hilton, 2005). We hypothesize that the fidelity (or utter
absence) of idle motion may have an influence on the virtual agent’s social presence.
7
3. Experiment
Our research is rooted in the question of how the technical fidelity of virtual agents
influences their social presence in a VR setting. This chapter begins by formulating a
number of hypotheses about the interrelations of speech and movement fidelity with user
perception and behavior.
To evaluate the validity of our hypotheses, we conducted an experiment involving bi-
nary comparisons between pairs of virtual agents whose fidelity of speech and movement
had been set to various preconfigured levels. The sections starting from 3.2 detail the
design decisions that went into it as well as the technical execution. A summary of the
results follows in chapter 4.
3.1. Hypotheses
The possible interactions between the kinds of fidelity that we intend to manipulate
and the social presence of the virtual agents are manifold, but some ideas and hunches
are certainly more obvious than others. For example, given that higher fidelity virtual
agents tend to be more difficult to develop, and seeing that this development happens
in real-world applications anyway, it is easy to assume that high-fidelity virtual agents
are developed because they are better at producing the respective intended results (de-
pending on the use case). If that is indeed the case, then it is also reasonable to look
into whether stronger social presence may be a factor in their increased efficacy, which
leads us to our first set of hypotheses:
Hypothesis 1a: A virtual agent with a higher technical fidelity in terms of speech will
have a stronger social presence compared to one with a lower fidelity.
Hypothesis 1b: A virtual agent with a higher technical fidelity in terms of movement
will have a stronger social presence compared to one with a lower fidelity.
These hypotheses imply positive correlations between the technical fidelity of the virtual
agent in terms of one of the two properties speech and movement. To test them, we
need to define how exactly we intend to manipulate their fidelity (which happens in
sections 3.2.2 and 3.2.3) and we have to provide a measure for their social presence,
which we outline at the beginning of the following section 3.2.
Experimental data that would substantiate the above two hypotheses would allow
us to infer further details. For example, there could be interaction effects between the
fidelity of the two properties – or for the sake of simplicity, it might make sense to assume
8
that they act independently until proven otherwise.
Hypothesis 2: Changes in the fidelity of speech and changes in the fidelity of movement
will independently influence the social presence of the virtual agent.
Since we have full control over the experimental software, we are at liberty to record
the time that the participants take to make their choices. The next step would then be
to draw conclusions from the decision duration time to the difficulty of the choice – it
seems reasonable that someone would take more time to make a decision if the choice is
extraordinarily difficult.
If we can define a metric for the difficulty of the choice between two of our virtual
agents, then we might find a correlation to the duration of the choosing phase.
Hypothesis 3: Comparing virtual agents in terms of social presence is easier (faster) if
they have a big difference in technical fidelity in terms of speech and/or movement.
For the sake of scope, it should be noted that any systematic analysis of social presence
in virtual agents will have to make abstractions from real-world use cases. For example,
our experiment can not feature a rich and complex VR scenario with large numbers of
virtual agents interacting with different users and with one another. In order to be able
to make empirically substantiated claims, we have to reduce the interaction between the
virtual agents and the study participants to a clearly defined minimum to ensure clarity
and reproducibility.
3.2. Design
Even though we are building upon an existing definition of social presence, there are no
substantiated methods to measure it on a scale in an experimental setting. As a simple
tool to enable comparisons between the different degrees of fidelity, we construct our
experiment around singular binary comparisons. Pairs of differently configured virtual
agents are presented to the participant, who judges them in relation to one another
and points out the one with the stronger social presence (see figure 1). This process is
repeated for all pairs of configurations.
As we are testing two different axes of technical fidelity, namely speech and movement,
we design independent degrees of fidelity. Each of them gets implemented as three
different realizations, which are described in detail in sections 3.2.2 and 3.2.3.
We also decide to focus our research on a setting where the participant uses a head-
mounted display (HMD) instead of commodity display hardware. We do this in order
to increase the participant’s sense of presence, since it has been established that head-
9
Figure 1: This is the original concept sketch of the virtual experimental setup. The
camera is positioned in an otherwise unremarkable scene with two virtual
agents who perform an action of speech one after the other, after which the
participant decides which of the two has a stronger social presence.
10
mounted displays have that effect (Pausch, Proffitt, & Williams, 1997). It stands to
reason that a higher sense of presence on the user’s part could also lead to a higher
sensitivity for social presence of virtual agents, or at least it should not be detrimental –
however, we are not aware of any empirical proof for this conjecture. At the very least,
VR is a research field where we observe a healthy dialogue and openness to new ideas
regarding the construction of virtual agents.
3.2.1. Questionnaire
We created a digital questionnaire to guide the participant through the experiment
procedure. After the initial greeting, the participant sits down at the desk and finds the
questionnaire displayed on the PC monitor in a fullscreen web browser window.
The questionnaire consists of several segments, their order determined by the structure
of the experiment, which will be enumerated as follows.
The full questionnaire can be found in appendix A, p. 38ff. as a display variant
optimized for printing.
Demographic and Biological Data
The first few questions cover standard demographic information such as age, gender
and occupation. Since the experiment deals with the participants’ reactions to acts
of speech, among other things, we also ask about their degree of familiarity with the
German language, since people with less proficiency might interpret speech (even those
that are only pseudo-German, see section 3.2.3) differently or more slowly than someone
whose mother tongue is German.
To gauge the influence of medical issues regarding vision or hearing, we also inquire
about known issues in those two areas as well as about any vision and hearing corrections
that may exist.
Furthermore, we ask participants about their experience with 3D games, 3D stereo-
scopic displays, and head-mounted displays, as each of these could have an influence on
the way that virtual agents are perceived.
Lastly, participants are asked to state their handedness (left- or righthanded, or am-
bidextrous) and their inter-pupillary distance, the latter of which is measured in the
laboratory.
11
Hearing Assessment
Even though participants are asked about any issues with their hearing capacity, we strive
to make doubly sure that there are no directional hearing problems, not even potentially
unknown ones, that could jeopardize our reliance on directional audio signals during the
experiment. To that end, we conduct a very brief directional hearing assessment of each
participant using the Home Audiometer software by Esser (2012–2015). It tests both
ears’ hearing capacity across the frequency spectrum typically audible to humans and
displays the results graphically.
The questionnaire makes it abundantly clear to the participants that our hearing
assessment is, for a number of reasons, not a substitute for any medical procedure.
Our audio equipment is not professionally calibrated, we’re likely to have high levels of
ambient noise (e.g. due to the technical equipment in the laboratory and the relative
proximity to the Hamburg Airport), and our personnel are not trained to make any
medical diagnoses. However, the results of the hearing assessment would give us the
possibility to react to any detectable directional hearing issues that might occur.
Lateral Preference Inventory
The Lateral Preference Inventory – or, in full, the Lateral Preference Inventory for
Measurement of Handedness, Footedness, Eyedness, and Earedness, and in short, the
LPI – is a set of 16 questions developed by Coren (1993). It is intended to determine the
four abovementioned lateral preference indices (hand, foot, eye, ear). We include it in
our questionnaire to acquire some more detailed information than just the participants’
stated handedness, especially since any lateral preferences for vision and hearing might
be relevant for our results even though the participants themselves might not even be
consciously aware of them.
Simulator Sickness Questionnaire
The Simulator Sickness Questionnaire created by Kennedy, Lane, Berbaum, and Lilien-
thal (1993) is a standard tool to gauge the extent to which the participant might be
affected by simulator sickness (also known as cybersickness), a set of short-term symp-
toms that can arise if a person spends a prolonged amount of time using VR hardware.
The SSQ is split into a pre- and a post-experiment half, each consisting of identical
questions about the participant’s subjective well-being. It is designed to detect whether
any temporary health effects (such as nausea, eyestrain, or dizziness) are produced or
amplified by the experiment.
12
Post Questionnaire
The general post questionnaire consists of a small number of questions tailored to our
experiment and the local circumstances. Specifically, we ask the participants about any
outside distractions that might have occurred and about their opinion of the experiment,
including opportunities for free-form answers and feedback.
Slater-Usoh-Steed Questionnaire
The Slater-Usoh-Steed Questionnaire intends to measure a VR system’s degree of im-
mersion as defined by Slater et al. (1994). In the scope of this thesis, we are not overly
concerned with the concept of immersion by itself, but the questionnaire still provides
valuable data about how the participants perceived the experience and the extent to
which they themselves had a sense of presence.
3.2.2. Movement
Since the social actions of our virtual agents are heavily based on speech, their movement
might seem like a secondary concern. However, in order to create a convincing social
presence, the usage of suitable idle motions (see section 2.5) is a big contributor to social
presence (Egges et al., 2004).
For the highest degree of fidelity that is feasible, we use idle animations based on high
resolution motion capturing data, a process that creates animations for virtual agents
based on recording the movements of real actors (Moeslund, Hilton, & Krüger, 2006).
In real-life applications, this is a costly approach compared to, for example, keyframe
animation (which entails a 3D animator creating several “key” poses and interpolating
in-between movements), but understandably provides more realistic results. For the
purposes of our experiment, we rely on commercially available high-quality animations
that surpass anything that we could produce in the local laboratory.
The obvious opposite end of the movement fidelity scale is the completely frozen
virtual agent with no idle motions at all. This is trivially easy to implement, fulfilling
our expectation that lower-fidelity approaches tend to have a smaller resource impact
during development.
For the in-between step, a keyframe-based animation would be a possible middle
ground in terms of fidelity, and the comparison between the social presence for keyframe
animations versus motion capturing animations in the general case would certainly be of
interest. However, for our specific experiment, such a comparison would be difficult to
generalize, because any difference in perceived social presence may just as well be rooted
13
in the specific movements that make up the two animations we would use instead of their
overall categories. In other words: We would only be comparing one specific keyframe
animation with one specific motion capturing animation. To mitigate this issue and
permit a general inference, we would have to compare a large number of examples from
each category so that we would be able to prove the presence of statistically significant
differences, but this is too far beyond the scope of our experiment to be feasible.
Instead, we base our in-between step on the full motion capturing animation, but
manipulate it in a way that reduces its fidelity. To that end, we exclude parts of the 3D
model from the idle animation, namely the hands and the legs. For the hands, we simply
ignore them altogether, leaving them non-animated. For the legs, instead of using the
motion-capturing data, we enable a feature called inverse kinematics (Tolani, Goswami,
& Badler, 2000), which describes a set of algorithms that are capable of making sure that
the virtual agent’s feet stay connected to the ground, even if the upper body moves (or
the ground becomes uneven, which is not applicable to our experiment). As a result, the
legs no longer use the prerecorded idle animation, but instead do the minimal amount
of movement that is needed to plausibly support the upper body. We believe that this
approach is a suitable compromise to reduce the movement fidelity.
3.2.3. Speech
There are many kinds of acts of speech that could be considered viable for our experi-
ment. Depending on the use case, virtual agents in different applications may be used
to ask questions, deliver instructions, perform back-and-forth conversations or fulfill any
number of communicative roles.
However, since the experiment specifically attempts to test for effects of the technical
fidelity of the speech, our aim was to provide as little distraction through the content of
the speech as possible. Since our experiment relied on direct comparisons, clearly both
sides of any comparison would need to execute the same act of speech, so that any bias
that might arise from the content of the speech would be symmetrically canceled out.
Finding Suitable Acts of Speech
Ideally, we would like to rely on being able to make comparisons even across the different
trials, which is why the differences in terms of speech content between trials should also
be minimized. One way of achieving this would be to reuse the very same sentence over
and over for every single trial. However, we suspected that this approach would lead
to increased monotony during the experiment, since a full run would encompass a large
14
number of trials. This could produce a more tiring experience for the participants, which
would in turn reduce the quality of the data. We also suspected that continued use of the
same sentence could lead to semantic satiation, a psychological phenomenon by which a
word or phrase seems to lose its meaning and appears increasingly alien if it is repeated
a sufficiently large number of times (Esposito & Pelton, 1971). These problems could be
mitigated by the use of a number of different sentences instead of a single one, but that
introduces variance into the process of understanding and interpreting the speech that
could also detract from our results.
This is how we arrived at the idea of using gibberish speech (speech that is more or
less phonetically and/or syntactically plausible, but does not contain any discernable
meaning) instead of real acts of speech. Ideal gibberish speech would enable us to use a
variety of different acts of (pseudo-)speech to stave off boredom and semantic satiation,
while also keeping all speech at a constant level of semantic contents, that being none
at all.
This raises the question of how to generate “high-quality” gibberish, in the sense that
it should be nonsensical enough to not contain any meaning, yet sound plausible and
familiar enough not to appear overly foreign. Fortunately, solutions to this problem have
already been developed. We used a pseudoword generator named Wuggy to create our
gibberish, which is based on existing psycholinguistic research (Keuleers & Brysbaert,
2010). It is capable of creating polysyllabic pseudowords from any given list of real
words while preserving the phonetic constraints of the source language. We used a
dataset courtesy of the Wortschatz project (Institut für Informatik, Universität Leipzig,
2001) containing the 1000 most common German words, from which we had to filter 16
abbreviations2. The remaining 984 German words were fed into Wuggy to be used as
the basis for our gibberish.
The resulting list of pseudowords was then shuffled randomly to produce sentences of
12 words each. When spoken out loud, each one of them is four to five seconds long,
which we considered a reasonable length to enable the participants to judge the speech.
The eight sentences that we used in our experiment are as follows:
1. Kie Verpreils Hopitie Phraxe metes scheches krumciespiel Dimen wor klück
Mozualiin Zaß.
2. Putaun ehte pflon veßten düfflich La Fing hürte Kopp geripten Südchen Daude.
2The following abbreviations were manually removed from the word list: AG, CDU, CSU, DDR, DM,
dpa, Dr., EU, FDP, GmbH, Mio, Mrd, SPD, USA, WELT, z.B.
15
3. Lychte rafen Fahl toswenden lält luchsgans gorm dadee Spresten ebstbals vesses
Newage.
4. Sis fist Lab Wuderfet kühe Hamte veuten Läuen alny Bopie schäler belögte.
5. Allerlochs spöbten stekken hanuß bes Beren Rie fal rereis Piedes lanter dabbte.
6. Tonzerr for Turicht gopen Gander fürr jor nasen hührend rusband zusel Händern.
7. Vorkau hind nirgst ehka ätmehin umhächst zondern zöln giesen kolst begids Bel-
sallem.
8. Gesprals Marf hillten fiesen Rottel zockte Jen arrhen peit rafe Wuloner zührend.
It should be noted that word capitalization is essentially random, although we made
sure to manually capitalize the first word in each sentence.
From Written Words to Audio Signals
To go from the above pseudospeech to audio signals to be played during the experiment,
we first had to define the degrees of fidelity to serve as a basis for the experiment.
A viable approach to low-fidelity speech is text-to-speech (TTS) software. This term
describes software that is capable of taking pure text as an input and converting it
to audible speech. Detailing the various approaches to this problem in general would
be vastly beyond the scope of this work, but plenty of literature on the subject exists
(Sproat, 1997). We are largely interested in the results that the current “state of the art”
can produce, so we did a short preliminary analysis of free and commercial consumer-
grade text-to-speech systems, with the constraint that they had to support German
TTS, since our pseudowords were based on the German phonetic structure.
We evaluated the following applications:
Google Translate TTS3
IVONA Text to Speech4
Linguatec Voice Reader Studio 155
Smart Link ImTranslator6
3https://translate.google.com/
4http://www.ivona.com/
5http://www.linguatec.de/products/tts/voice_reader/vrs15
6http://imtranslator.net/translate-and-speak/speak/german/
16
Figure 2: This is a visualization of the waveform of our first recorded gibberish sentence.
The top one displays the unaltered recording, while the bottom one represents
the modified recording with most of the silent parts (highlighted in green) cut
out.
After listening to some example output from each application and comparing them
in terms of vocal fluidity, phonetic plausibility and sound quality (this was a subjective
comparison without any quantified justification), we decided to use the IVONA software
as our text-to-speech solution for the experiment. However, the differences between the
various products were not glaring, and the research field of speech synthesis is bound
to make further improvements in the upcoming years. IVONA was able to read our
gibberish without issue and we got the corresponding sound files out of it.
At the opposite end of the fidelity scale, it seemed like the obvious choice to create a
fully human-voiced set of recordings. We used an adult male voice for the TTS files, so
we had a real adult male listen to them and recorded his voice in attempting to read the
sentences at the same speed and with the same inflection. We were not able to create
an exact match, but we got as close as we could within our constraints.
To create a third stage in between the previous two, a middle ground between text-
to-speech and full voice recording, we took the recorded sound files and made some
alterations to them. We duplicated the waveform and played it at a delay of 5 millisec-
onds, which is too short to be perceived as an echo, but produces a tinny, metallic sound.
We also cut out most of the small portions of silence within the recordings (see figure 2),
which creates “jumps” in the audio recording that would be impossible to achieve by a
real human mouth, but that we observed to be reminiscent of the audible inaccuracies
found in text-to-speech sound samples. This leaves us with a set of sound files that still
sound somewhat like a real voice (at least more than the TTS output does), and yet
differentiate themselves from the full recording enough to be slightly uncanny.
3.2.4. Experimental Procedure
As described above, we have chosen the two properties speech and movement as our
variable degrees of technical fidelity, which we manipulate independently in three steps
17
each. This means that we have 3×3 = 9 possible ways to combine the two properties
for each of our virtual agents. To reduce confusion, we will call them configurations (of
the virtual agent) in order to differentiate them from the pairs of configurations, which
we will call constellations.
Since we ask our participants to compare the configurations in pairs, we would ideally
want to pair every configuration with every other one (deliberately excluding constel-
lations where both configurations would be identical), which leaves us with 9×8 = 72
constellations. This number already includes symmetrical constellations, i.e. if we un-
derstand a constellation to be a two-tuple of configurations, and (C1, C2)is part of our
set of constellations, then so is (C2, C1). Even though this doubles the number of trials
per participant compared to the hypothetical situation where we would exclude mirrored
constellations, they are indeed a big help in reducing the impact of any (conscious or
unconscious) lateral preferences on the part of our participants.
Furthermore, we have to keep in mind that our experiment displays the two configu-
rations in each constellation sequentially. As a result, for each of the above 72 constel-
lations, we include it twice: once starting with the left configuration and following with
the right, and once starting with the right configuration followed by the left. From here
on out, we will call them left-to-right and right-to-left constellations, respectively. This
doubles the total number again, leaving us with our final number of 72 ×2 = 144 trials
per participant.
With such a big number of trials, each single one has to be very short if the exper-
iment is to be completed in one sitting. With each of the two configuration displays
lasting five seconds, and the decision time expected to be between one and three seconds
approximately, we expect a total duration of about 12 seconds per trial. At 144 trials in
total, we arrive at an expected experiment length of just under 30 minutes, which seems
adequate.
3.3. Implementation
This section describes in further detail how our experiment was put together. In par-
ticular, we describe the technologies we used, the location as well as other details of the
experimental setup, and we explain some noteworthy problems and other occurances
from the execution of the experiment.
18
3.3.1. Technical Components
The central hardware component of our experiment is the Oculus Rift DK2 7head-
mounted display. It has a 1920 ×1080 pixel display covering a 100 horizontal field
of view as well as various internal sensors for directional and positional head tracking
(Oculus VR, LLC, 2014-2015). It is connected to a standard desktop PC which also has
mouse and keyboard for input as well as a traditional LCD monitor.
The beyerdynamic MMX 2 8provides the sound component of the VR experience. It
is advertised as a “gaming headset” and also contains a microphone, which was not used
during the experiment. It is capable of reproducing sound in the range of 18 to 22000
Hz (beyerdynamic GmbH & Co. KG, 2012-2015).
We decided to use the Unity Game Engine9(version 4.5) as the basis for our virtual
reality experience, which not only has the capability of interacting with the Oculus Rift
HMD, but also has a proven track record as a relatively easy to use basis for real-
time 3D applications in scientific contexts (Craighead, Burke, & Murphy, 2007). It
runs on modern PCs on top of Microsoft Windows and encapsulates many difficulties
of multimedia (in particular real-time 3D graphics) programming behind a graphical
interface coupled with freely available documentation. The Unity Engine handles the
aspects such as camera projection, lighting, and texturing so that we were able to focus
on integrating our assets and programming the experiment.
As mentioned in section 3.2.1, we use the Home Audiometer software written by Esser
(2012–2015) to perform a brief non-medical hearing assessment. For an example of what
the results of an assessment look like, see figure 3.
The questionnaire was delivered through Google Forms10 in a standard web browser.
3.3.2. Assets
We used the MakeHuman11 software to create the 3D model of our virtual agent. It is
capable of producing highly detailed and textured 3D models of human bodies that can
be adjusted according to various physiological parameters. Our virtual agent is based
largely on the MakeHuman defaults with the gender set to 100% male, the race being
caucasian, and the physique being slim/athletic. The nondescript black hair and suit
are also part of the MakeHuman default assets and proved easy to integrate. See figure 4
7https://www.oculus.com/dk2/
8http://www.beyerdynamic.de/shop/mmx-2.html
9http://unity3d.com/
10https://docs.google.com/forms/
11http://www.makehuman.org/
19
Figure 3: These diagrams show an example result from a hearing assessment done with
the Home Audiometer software (Esser, 2012–2015). For both ears individually,
the application tests various frequencies for their audibility at increasing vol-
umes (the higher the line, the lower the volume, the better the hearing). The
results shown here are unremarkable because they stem from a young adult
with a healthy hearing capacity.
for a visual representation.
During the initial implementation of the virtual agent and the integration of the sound
recordings, it quickly became obvious that the connection between the virtual agent and
the voice recordings was not readily apparent as long as there was no mouth movement.
Naturally, a human’s mouth moves while they talk, so we decided to implement some
primitive lip-synchronization into our virtual agent. There are some lip-sync solutions
available for the Unity Engine, for example FaceFX12, but their complexity would have
been prohibitive at that stage of implementation. Instead, we implemented a barebones
lip-sync algorithm written by UnityAnswers forum user Naletto (2011) – see figure 5 –
which reads the audio file’s spectrum data to poll the sound amplitude over a certain
time interval and use it to manipulate (stretch, move, etc.) any Unity object.
We applied a suitable scaling to the value and used it to move the jaw bone of our
virtual agent downwards synchronized to the audio signal. The result is obviously diffi-
cult to appreciate in print, but a pair of screenshots can be seen in figure 6. Thanks to
the high quality of the 3D mesh produced by MakeHuman, the simple act of moving the
jaw bone results in relatively plausible and visually pleasing facial deformations. Even
12http://facefx.com/
20
Figure 4: This is what our virtual agent looks like under ideal lighting and texturing
conditions. The MakeHuman software makes it feasible to create human 3D
models like this without much knowledge about 3D modeling. Please note
that this is a high-resolution render image using idealized lighting and that
the real-time 3D representation in the Unity Engine has distinctly lower visual
fidelity.
21
 
1function BandVol(fLow: float, fHigh: float): float
{
fLow = Mathf.Clamp(fLow, 20, fMax); // limit low...
fHigh = Mathf.Clamp(fHigh, fLow, fMax); // and high frequencies
5// get spectrum: freqData[n] = vol of frequency n *fMax / nSamples
audio.GetSpectrumData(freqData, 0, FFTWindow.BlackmanHarris);
var n1: int = Mathf.Floor(fLow *nSamples / fMax);
var n2: int = Mathf.Floor(fHigh *nSamples / fMax);
var sum: float = 0;
10 // average the volumes of frequencies fLow to fHigh
for (var i=n1; i<=n2; i++){
sum += freqData[i];
}
return sum / (n2 - n1 + 1);
15 }
var mouth: GameObject;
var volume = 40;
var frqLow = 200;
20 var frqHigh = 800;
private var y0: float;
function Start()
{
25 y0 = mouth.transform.position.y;
freqData = new float[nSamples];
audio.Play();
}
30 function Update()
{
mouth.transform.position.y = y0 + BandVol(frqLow,frqHigh) *volume;
}
35 // A function to play sound N:
function PlaySoundN(N: int)
{
audio.clip = sounds[N];
audio.Play();
40 }
 
Figure 5: This is the code supplied by Naletto (2011) on the UnityAnswers forum that
accomplishes rudimentary automated lip synchronization. While an audio file
is being played, this script analyzes the spectrum data and manipulates the y
position of a predetermined game object accordingly.
22
Figure 6: This pair of images shows the impact of the lip-sync script on our virtual
agent. The idea of simply moving the jaw bone downwards in proportion to
the volume of the sound file is crude, but works surprisingly well.
though it would likely not fool a face-to-face observer, it is convincing enough for use
with our HMD and VR scene, where there’s a constant distance between the participant
and the virtual agents that renders small inaccuracies invisible.
3.3.3. Experimental Setup
We set up our experiment in a room within the main HCI laboratory (Fachbereich
Informatik, Universität Hamburg). While the laboratory itself was partially in use during
the experiment, our room was seperated by a wall and a door.
Every part of the experiment took place on or around a table that we placed in the
middle of the room (see figure 7), with one chair for the participant positioned as if
the table were a normal desk, and one chair off to the side for the researcher. The PC
was positioned under the table towards the left, with keyboard, mouse and monitor on
the tabletop. Participants completed the questionnaire facing the monitor, while for the
hearing assessment it was turned to face the researcher and to make it impossible for
the participant to read the results of the assessment while it was in progress.
Participants only wore the headphones and the HMD whenever each was needed for
the experiment. For the rest of the time, they were kept on the left side of the table.
The software setup made it feasible to have both the monitor and the HMD connected
and running at the same time without interfering with each other.
Water and snacks were available to participants during break times, but were stored
on a shelf behind the researcher while the experiment was in progress.
23
Figure 7: This photo shows one participant sitting at the table, wearing the HMD and
the headphones during the experiment. The keyboard, mouse and monitor are
also visible, as is the researcher’s laptop. The screen shows the Unity Engine
running the experimental VR scene. In the background of the photo, the
mostly empty experimental room is visible, with the rest of the HCI laboratory
behind the glass windows.
24
3.3.4. Experimental Procedure
Volunteer participants were acquired from the students at the Fachbereich Informatik as
well as the research staff. There was no material compensation for participation, either
financial or otherwise.
After being greeted and going over the experiment consent form, the participant would
start by filling out the questionnaire page by page, with the measurement of the inter-
pupillary distance, the hearing assessment, and the HMD phase in between.
The hearing assessment involved the participant pressing the Ctrl key on the keyboard
whenever they heard a noise. The audiometer software would adjust the volume and the
frequency and switch between the left and right ear. The results of the assessment were
stored with the rest of the experimental data.
The HMD section of the experiment involved 144 trials per participant, as explained
above. It started with an explanation how to choose between the two virtual agents
with the arrow keys and how to advance using the spacebar (see figure 8). Participants
were shown a short summary of the social presence definition by Biocca et al. (2001)
(see section 2.3) in order to know how to make the comparisons and were given the
opportunity for prior questions. The 144 trials were broken up into 12 blocks of 12 trials
each, with opportunities to take a break between blocks.
As explained in section 3.2.4, we expected a length of about 30 minutes for the HMD
phase, which turned out to be rather accurate. In addition to that, the hearing assess-
ment took 10 minutes and the questionnaire about 20 minutes per participant, adding
up to an hour in total, which was also within our expectations. A few participants
took longer breaks than others, which led to a total time of up to 80 minutes in some
instances.
There were no significant problems or distractions throughout the experiment. On a
few occassions, the hearing assessment was momentarily disrupted by passing planes (the
laboratory is geographically close to an airport), but this proved to not be a problem.
25
Figure 8: This pair of screenshots shows the experimental VR scene. In the top image,
the two virtual agents are displayed and the one on the right is currently talking
– the scene puts an additional highlight on the talking agent as an added visual
focus cue. In the bottom image, both of them have finished talking and the
program is waiting for user input. The participant has to press either the
” or the “” key. The instructional message is displayed in German if the
participant’s mother tongue is German.
26
4. Results
In this chapter, we examine the results of our experiment and interpret the data we
gathered in such a way as to evaluate the hypotheses from section 3.1.
To start off, it bears mentioning that we had n= 15 participants aged between 19 and
45 years (M= 26.65, SD = 6.76), which should be enough to infer some statistically
significant results. However, some of the answers we received make it clear that any
results gathered from this experiment are not certain to be applicable to the populace at
large. For example, all of our participants had a computer science background (10 with
an HCI specialization, 5 without), all of them were native speakers of German, none of
them suffered from any notable disorders in vision or hearing, and all participants were
right-handed. Any conclusions we draw from the experimental data should only be relied
on with these caveats in mind until the experiment can be repeated with participants of
a more varied background.
4.1. Evaluation
As we decided early on that our trials would be binary comparisons between different
virtual agent configurations, we can now look at the “winner” of each trial (the con-
figuration that was chosen). If we look at how often each value for speech was in the
winning configuration (cf. table 1, figures 9 & 10), we observe mean counts for the
text to speech condition of M= 33.53 (SD = 12.92), for the modified recording con-
dition of M= 48.47 (SD = 14.40), and for the full recording condition of M= 61.40
(SD = 9.49). Analogously, for the different idle motion values (cf. table 2), we observe
mean counts for the “no idle motion” condition of M= 34.20 (SD = 13.52), for the
reduced idle motion condition of M= 52.33 (SD = 8.27), and for the motion capturing
idle motion condition of M= 56.87 (SD = 8.06).
All of the value counts are normally distributed across subjects according to a Shapiro-
Wilk test at the p < 0.05 level.
Using the Kruskal-Wallis rank sum test, we can not confirm at the p < 0.05 level
that the winning counts for the different degrees of fidelity are based on underlying
distributions with distinct location parameters.
Aχ2test did not assert any interdependence between the winning speech values and
the winning idle motion values across all trials.
We analyzed the effects of display order (left to right or right to left) and the ran-
domly chosen gibberish sentence on the winning fidelity values with a repeated mea-
sures ANOVA and paired-samples t-tests. For the winning speech value, there are no
27
subject id text to speech modified recording full recording
1 16 66 62
2 23 44 77
3 44 44 56
4 25 63 56
5 35 49 60
6 49 50 45
7 27 55 62
8 38 34 72
9 30 41 67
10 37 55 51
11 16 63 65
12 17 74 53
13 58 33 51
14 46 23 75
15 42 33 69
Mean 33.53 48.47 61.40
SD 12.92 14.40 9.49
Table 1: These are the absolute counts of how often each value for speech fidelity has
been in the winning configuration per participant.
subject id no idle motion reduced idle motion mo-cap idle motion
1 53 44 47
2 52 38 54
3 14 65 65
4 39 42 63
5 26 49 69
6 20 62 62
7 24 60 60
8 35 52 57
9 34 51 53
10 17 61 65
11 46 50 48
12 39 51 54
13 43 57 42
14 52 43 49
15 19 60 65
Mean 34.20 52.33 56.87
SD 13.52 8.27 8.06
Table 2: These are the absolute counts of how often each value for idle motion fidelity
has been in the winning configuration per participant.
28
speech movement
lower delity
higher delity
Figure 9: These diagrams show the number of times, for each participant, when a par-
ticular value for the fidelity of speech or movement was part of the winning
configuration. As the fidelity gets higher, the corresponding variable is chosen
more often and more consistently.
29
speech movement
0 200 400 600 800
text to
speech modied
recording full
recording
0 200 400 600 800
no idle
motion reduced
idle motion mo-cap idle
motion
Figure 10: These two diagrams show the cumulative number of times each value of the
two variables appeared in the winning configuration across all participants.
This makes the weight towards the higher-fidelity values more obvious.
significant interactions. For the winning idle motion value, there is a highly significant
interaction between the winning value and the gibberish sentence, but no significant
interactions with the display order, nor any interaction effects between the sentence and
the display order.
For the time it took the participants to make their individual binary choices (from
here on out “choice duration”), we measured delays between 189 and 14850 milliseconds
(M= 1384,SD = 1303). Even though the value range spans two orders of magnitude,
even the highest outliers exist within the realm of plausibility, which is why we do not
discard any of the data points (see figure 11).
To create a useful measure for the difference in fidelity between two configurations, we
have to set a fidelity value for each individual configuration. We do this by interpreting
the three values of each of our two variables as integer values in {0,1,2}, with 2 being
the highest fidelity value and 0 being the lowest. We then define the fidelity value of a
configuration as the sum of the fidelity values of its two components. Lastly, we define
the fidelity distance as fdAB =|fvAfvB|(visualized in figure 12).
If we examine the distribution of the choice duration in relation to the fidelity distance
(see figure 13), we see that there are visual hints for a small negative correlation, and the
Pearson product-moment correlation coefficient of the two variables is indeed 0.036 at
the p < 0.1level.
30
● ● ●● ● ● ●● ●● ●●●● ●●●
0 5000 10000 15000
Figure 11: This box plot shows the spread of the choice duration (milliseconds). Even
though a large majority of all data points are lower than 1500 milliseconds,
there are outliers up to ten times as big. Altogether, 233 points exist outside
the 1.5×IQR distance (233 of 2151, 10.8%).
text to speech
mo-cap idle motion
fv=2
modied recording
mo-cap idle motion
fv=3
full recording
mo-cap idle motion
fv=4
text to speech
reduced idle motion
fv=1
modied recording
reduced idle motion
fv=2
full recording
reduced idle motion
fv=3
text to speech
no idle motion
fv=0
modied recording
no idle motion
fv=1
full recording
no idle motion
fv=2
fd=3
Figure 12: This is a tabular visualization of the fidelity distance between two configura-
tions, defined as fdAB =|fvAfvB|. The fidelity distance between the top
left and the bottom middle configuration is given as an example.
31
0 1 2 3 4
0 5000 10000 15000
delity distance
choice duration [ms]
Figure 13: This is an array of box plots showing the spread of the choice duration de-
pending on the fidelity distance. The plots largely resemble the independent
one shown in figure 11. For fd 2there is not much variation, but for fd = 3
and especially fd = 4 it is obvious that the choice duration has far fewer
outliers and even a slightly lower median.
32
The remaining parts of the questionnaire did not lead to any interesting results. The
Lateral Preference Inventory aligned very well with the stated handedness of the par-
ticipants and did not offer any further insight, the Simulator Sickness Questionnaire
(fortunately) gave no signs of any health problems more significant than mild fatigue.
4.2. Discussion
Going back to our hypotheses from section 3.1, we are unfortunately not able to sub-
stantiate hypotheses 1a and 1b (the existence of significant correlations between speech
or idle motion fidelity and the social presence of the virtual agent) based on our experi-
mental data. To the naked eye it seems apparent that the higher fidelity configurations
were chosen more often, but it appears that our sample size compared with the relatively
small difference between the values (between one to two standard deviations) is not big
enough to prove it conclusively, at least not at any worthwhile level of significance.
However, it does seem like a worthwhile avenue for further research.
It could be a boon to focus on only one of our two fidelity scales per experiment –
perhaps it would have been easier to prove a correlation, even with the same number of
participants, if all trials were geared towards one scale of comparison instead of mixing
both. At this stage that is a wild guess though.
In reference to hypothesis 2, we were able to provide some evidence that there are
no significant interaction effects between the effects that the fidelity of speech and the
one of idle motion have on social presence. Even though true independence can not
be statistically proven, the data suggests that it is a safe assumption that there are no
interactions between the two.
Regarding hypothesis 3 (a negative correlation between the overall fidelity and the
duration of the choice phase), we were indeed able to prove the existence of a negative
correlation at the p < 0.1level. The effect is small but noticeable. From a user per-
spective it is not very surprising that configurations with a greater fidelity distance are
easier to compare, but it is reassuring that the data corroborates the hypothesis.
Although it is hard to say whether a larger sample size would have led to more sig-
nificant results, as a suggestion for the future it seems prudent to say that a larger
number of participants would likely benefit experiments of this kind. Furthermore, our
conjecture is that the idea behind our experimental trials – the binary comparison of
two configurations with each combining more than one variable – generates only a small
amount of usable information per trial. Maybe a proper measurement scale for social
presence would make it easier to draw reliable conclusions, even if it comes at the cost
of increased experimental duration.
33
5. Conclusion
During the work on this thesis, we consolidated several different sources to create suit-
able notions of both technical fidelity and social presence and to relate them to one
another. We conceived an experimental framework based around large numbers of fast-
paced binary comparisons to measure the social presence of virtual agents based largely
on participants’ instant reactions, implemented these ideas into a real-time VR scenario
capable of using a state-of-the-art HMD, and executed our experiment with 15 partici-
pants, proceeding to analyze the gathered data statistically.
Our intention for this experiment was to investigate any interactions between the
fidelity of virtual agents’ speech and movement (specifically idle motions) and their
social presence in a virtual reality context – one based around a head-mounted display
with directional tracking, in our case.
It is unfortunate that we were unable to prove whether higher technical fidelity of
virtual agents leads to a stronger social presence, which would have helped explain and
steer some of the current developments around virtual agents. In the absence of such
statistical proof and with us only having been able to draw some incidental conclusions,
the experiment would have to be considered a partial success at best. But of course not
every experimental result has to be groundbreaking, and especially in its role as part of
a master’s thesis, perhaps this lesson that not every experiment can lead to clean results
every time is all the more fitting.13
Viewed from a more constructive perspective, it bears mentioning that our framework
for virtual agents’ technical fidelity and how to manipulate it along different axes is
a potentially useful tool that did not exist before we developed it in anticipation of
our experiment. Now that at least two examples for three-step fidelity manipulation
have been established, it will be much easier for future experiments in the same area to
establish controlled and reproducible fidelity circumstances.
13And needless to say, the author learned a lot about laboratory experiments, VR technology, real-time
3D engines and other methods and technologies throughout the preparation and execution of this
thesis.
34
References
beyerdynamic GmbH & Co. KG. (2012-2015). beyerdynamic MMX 2. Retrieved February
18th, 2015, from http://www.beyerdynamic.de/shop/mmx-2.html
Biocca, F., Harms, C., & Gregg, J. (2001). The networked minds measure of social
presence: Pilot test of the factor structure and concurrent validity. In 4th annual
international workshop on presence (pp. 1–9). Philadelphia, PA, USA.
Blascovich, J., & Bailenson, J. (2011). Infinite reality: Avatars, eternal life, new worlds,
and the dawn of the virtual revolution. New York, NY, USA: William Morrow &
Co.
Caridakis, G., Raouzaiou, A., Bevacqua, E., Mancini, M., Karpouzis, K., Malatesta, L.,
& Pelachaud, C. (2008). Virtual agent multimodal mimicry of humans. Language
Resources and Evaluation,41 (3–4), 367–388.
Coren, S. (1993). The lateral preference inventory for measurement of handedness,
footedness, eyedness, and earedness: Norms for young adults. Bul letin of the
Psychonomic Society,31 (1), 1–3.
Craighead, J., Burke, J., & Murphy, R. (2007). Using the unity game engine to develop
sarge: a case study. Computer ,4552 , 366–372.
Egges, A., Molet, T., & Magnenat-Thalmann, N. (2004). Personalised real-time idle
motion synthesis. Computer Graphics and Applications, 121–130.
Esposito, N. J., & Pelton, L. H. (1971). Review of the measurement of semantic satiation.
Psychological Bulletin,75 (5), 330–340.
Esser, T. (2012–2015). Home Audiometer Hörtest. Retrieved February 18th, 2015, from
http://www.esseraudio.com/de/home-audiometer-hoertest.html
Institut für Informatik, Universität Leipzig. (2001). Projekt Deutscher Wortschatz –
Wortlisten. Retrieved February 18th, 2015, from http://wortschatz.uni-leipzig
.de/html/wliste.html
Kennedy, R., Lane, N., Berbaum, K., & Lilienthal, M. (1993). Simulator sickness ques-
tionnaire: An enhanced method for quantifying simulator sickness. International
Journal of Aviation Psychology,3(3), 203–220.
Keuleers, E., & Brysbaert, M. (2010). Wuggy: A multilingual pseudoword generator.
Behavior Research Methods,42 (3), 627–633.
Kopp, S., Sowa, T., & Wachsmuth, I. (2003). Imitation games with an artificial agent:
From mimicking to understanding shape-related iconic gestures. Gesture-Based
Communication in Human-Computer Interaction, 5th International Gesture Work-
shop.
35
Krueger, M. W. (1991). Artificial reality 2 (2nd ed.). Reading, MA, USA: Addison-
Wesley.
Merriam-Webster Dictionary. (2015). Fidelity – Definition and more. Retrieved Febru-
ary 18th, 2015, from http://www.merriam-webster.com/dictionary/fidelity
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based
human motion capture and analysis. Computer Vision and Image Understanding,
104 (2–3), 90—126.
Naletto, A. (2011). UnityAnswers: Any way of “automatic” lip syncing? Retrieved
February 18th, 2015, from http://answers.unity3d.com/questions/139323/any
-way-of-quotautomaticquot-lip-syncing.html
Oculus VR, LLC. (2014-2015). The All New Oculus Rift Development Kit 2 (DK2)
Virtual Reality Headset. Retrieved February 18th, 2015, from https://www.oculus
.com/dk2/
Pausch, R., Proffitt, D., & Williams, G. (1997). Quantifying immersion in virtual reality.
In Proceedings of the 24th annual conference on computer graphics and interactive
techniques. New York, NY, USA: ACM.
Slater, M., Usoh, M., & Steed, A. (1994). Depth of presence in virtual environments.
Presence: Teleoperators and Virtual Environments,3, 130–144.
Sproat, R. W. (Ed.). (1997). Multilingual text-to-speech synthesis. Norwell, MA, USA:
Kluwer Academic Publishers.
Starck, J., Miller, G., & Hilton, A. (2005). Video-based character animation. In Pro-
ceedings of the 2005 acm siggraph/eurographics symposium on computer animation
(pp. 49–58). New York, NY, USA: ACM.
Steuer, J. (1992). Defining virtual reality: Dimensions determining telepresence. Journal
of Communication,42 , 73–93.
Tolani, D., Goswami, A., & Badler, N. I. (2000). Real-time inverse kinematics techniques
for anthropomorphic limbs. Graphical models,62 (5), 353–388.
Please note that there are several mentions of commercial products and/or websites in this thesis,
some of which we deliberately excluded from this reference list, so that it contains only the sources
from which we cite information. If a product or website is merely mentioned in context, but not
cited, a web link is included as a footnote in the main text instead of here.
36
Appendix
The experiment used a digital questionnaire to acquire subject data beyond the bound-
aries of the HMD experiment. The following is a complete reproduction of the question-
naire. It uses the “for print” view in lieu of the web-based version to facilitate inclusion
in a print document, so any references that seem counterintuitive (for example directions
to click a button) would make more sense in the interactive web-based version of the
questionnaire.
The experiment included n= 15 participants in total. Each one was guided through
the questionnaire and the experiment. Since the experiment contained 144 trials per
subject, the total maximum number would have been 15 ×144 = 2160 trials. However, 9
trials were faulty and had to be discarded, mostly because of outside interruptions and
short-term software failures, leaving 2151 trial data points available for interpretation.
Because the measurement data from the hearing assessments was not used in the eval-
uation of the experiment (partly because there was no need, partly because none of the
results were at all surprising or interesting), and even though the participants consented
to a full publication of all experimental data, we have decided not to include the hearing
assessment results with this publication because we feel that the participants’ interest
in keeping potentially medically sensitive data safe and anonymous weighs heavier than
the interest of the public in fully open data access in this particular case.
37
Raw questionnaire data formatted as CSV
Raw experimental data formatted as CSV
A. Questionnaire
Experiment Questionnaires
All details are collected only in the context of the present study. Thank you for your
participation!
* Required
Age
*
1.
Height
*
2.
Profession / field of study:
*
3.
Gender
*
Mark only one oval.
Male
Female
4.
How would you rate your German language skill?
*
Mark only one oval.
Native speaker
Fluent
Proficient
Basic
None
5.
Vision correction:
*
Mark only one oval.
None
Glasses
Contact lenses
6.
38
Do you have a known eye disorder?
Check all that apply.
Color blindness
Night blindness
Dyschromatopsia (red-green color weakness)
Strong eye dominance
Other:
7.
Do you suffer from hearing loss?
Mark only one oval.
No (healthy hearing capacity)
Mild hearing loss (difficulties understanding speech)
Moderate to severe hearing loss (impossible to understand speech)
Profound hearing loss (impossible to hear speech or most noises)
8.
If you suffer from hearing loss, please check all that apply:
Check all that apply.
Asymmetrical hearing loss, more pronounced on the left side
Asymmetrical hearing loss, more pronounced on the right side
Symmetrical hearing loss (both ears affected at about the same level)
Congenital hearing loss (present since birth)
Acquired/Delayed hearing loss (onset later in life)
9.
Hearing correction:
Mark only one oval.
None
External hearing aids
Cochlear implants
Other:
10.
Do you suffer from a displacement of equilibrium or similar?
*
Mark only one oval.
Yes
No
11.
39
Do you have any experience with virtual reality HMDs (such as the Oculus
Rift)?
*
Mark only one oval.
1
2
3
4
5
no experience
a lot of experience
12.
Do you have experience with 3D computer games?
*
Mark only one oval.
1
2
3
4
5
no experience
a lot of experience
13.
How many hours do you play per
week?
*
14.
Do you have experience with 3D stereoscopic display (cinema, games etc.)?
*
Mark only one oval.
1
2
3
4
5
no experience
a lot of experience
15.
Are you left- or right-handed?
*
Mark only one oval.
Left-handed
Right-handed
Ambidextrous
16.
Inter-pupillary distance (IPD)
*
Please contact the experimenter to
measure your IPD.
17.
Hearing assessment
Please contact the experimenter for a short assessment of your hearing ability
(approximately 10 minutes).
Please note: This is a very broad test that serves only to highlight any obvious patterns in
the context of our experiment. Our staff does not (and can not) perform medical
diagnoses. This assessment is not a substitute for a hearing test conducted by trained
personnel using calibrated equipment. If you suspect that your hearing may be impaired,
please arrange further steps with your medical doctor.
40
The Lateral Preference Inventory
Simply read each of the questions below. Decide which hand, foot, etc. you use for each
activity and then put a check mark next to the answer that describes you the best. If you
are unsure of any answer, try to act out the action.
With which hand do you draw?
*
Mark only one oval.
Left
Right
Either
18.
Which hand would you use to throw a ball to hit a target?
*
Mark only one oval.
Left
Right
Either
19.
In which hand would you use an eraser on paper?
*
Mark only one oval.
Left
Right
Either
20.
Which hand removes the top card when you are dealing from a deck?
*
Mark only one oval.
Left
Right
Either
21.
With which foot would you kick a ball to hit a target?
*
Mark only one oval.
Left
Right
Either
22.
If you wanted to pick up a pebble with your toes, which foot would you use?
*
Mark only one oval.
Left
Right
Either
23.
41
Which foot would you use to step on a bug?
*
Mark only one oval.
Left
Right
Either
24.
If you had to step up onto a chair, which foot would you place on the chair
first?
*
Mark only one oval.
Left
Right
Either
25.
Which eye would you use to look through a telescope?
*
Mark only one oval.
Left
Right
Either
26.
If you had to look into a dark bottle to see how full it was, which eye would you
use?
*
Mark only one oval.
Left
Right
Either
27.
Which eye would you use to peep through a keyhole?
*
Mark only one oval.
Left
Right
Either
28.
Which eye would you use to sight down a rifle?
*
Mark only one oval.
Left
Right
Either
29.
42
If you wanted to listen in on a conversation going on behind a closed door,
which ear would you place against the door?
*
Mark only one oval.
Left
Right
Either
30.
Into which ear would you place the earphone of a transistor radio?
*
Mark only one oval.
Left
Right
Either
31.
If you wanted to hear someone’s heartbeat which ear would you place
againsttheir chest?
*
Mark only one oval.
Left
Right
Either
32.
Imagine a small box resting on a table. This box contains a small clock. Which
ear would you press against the box to find out if the clock was ticking?
*
Mark only one oval.
Left
Right
Either
33.
Simulator Sickness Questionnaire (Pre)
General discomfort (DE: "Unwohlsein")
*
Mark only one oval.
1
2
3
4
None
Severe
34.
Fatigue (DE: "Ermüdung")
*
Mark only one oval.
1
2
3
4
None
Severe
35.
43
Headache (DE: "Kopfschmerzen")
*
Mark only one oval.
1
2
3
4
None
Severe
36.
Eyestrain (DE: "Ermüdung der Augen")
*
Mark only one oval.
1
2
3
4
None
Severe
37.
Difficulty focusing (DE: "Schwierigkeiten mit der Sehschärfe")
*
Mark only one oval.
1
2
3
4
None
Severe
38.
Increased salivation (DE: "Erhöhte Speichelbildung")
*
Mark only one oval.
1
2
3
4
None
Severe
39.
Sweating (DE: "Schwitzen")
*
Mark only one oval.
1
2
3
4
None
Severe
40.
Nausea (DE: "Übelkeit")
*
Mark only one oval.
1
2
3
4
None
Severe
41.
44
Difficulty concentrating (DE: "Konzentrationsschwierigkeiten")
*
Mark only one oval.
1
2
3
4
None
Severe
42.
Fullness of head (DE: "Druckgefühl im Kopfbereich")
*
Mark only one oval.
1
2
3
4
None
Severe
43.
Blurred vision (DE: "verschwommene Sicht")
*
Mark only one oval.
1
2
3
4
None
Severe
44.
Dizzy (eyes open) (DE: "Schwindelgefühl bei geöffneten Augen")
*
Mark only one oval.
1
2
3
4
None
Severe
45.
Dizzy (eyes closed) (DE: "Schwindelgefühl bei geschlossenen Augen")
*
Mark only one oval.
1
2
3
4
None
Severe
46.
Vertigo (DE: "Gleichgewichtsstörungen")
*
Mark only one oval.
1
2
3
4
None
Severe
47.
45
Stomach awareness (DE: "Magenbeschwerden")
*
Mark only one oval.
1
2
3
4
None
Severe
48.
Burping (DE: "Aufstoßen")
*
Mark only one oval.
1
2
3
4
None
Severe
49.
Experiment Procedure
In the experiment you will be asked to perform a task in a virtual environment while
wearing a head-mounted display as well as headphones.
You will see and hear pairs of virtual actors performing an act of speech. You will then be
prompted to decide, for each pair, which one has the stronger "social presence" (this term
is defined on an introductory slide during the experiment).
Each trial lasts about 12 to 15 seconds. The experiment will be conducted in blocks of 12
trials (about 2.5 minutes each) and will end once all 12 blocks have been completed. The
experiment usually takes about 30 minutes. You may take short breaks between blocks,
but please try to hold your concentration throughout each block, as the trials within a
block happen consecutively.
Thank you!
(Please click "continue".)
You are now ready to start the experiment. Please
contact the experimenter.
If you have completed the experiment, please click "continue".
Simulator Sickness Questionnaire (Post)
General discomfort (DE: "Unwohlsein")
*
Mark only one oval.
1
2
3
4
None
Severe
50.
46
Fatigue (DE: "Ermüdung")
*
Mark only one oval.
1
2
3
4
None
Severe
51.
Headache (DE: "Kopfschmerzen")
*
Mark only one oval.
1
2
3
4
None
Severe
52.
Eyestrain (DE: "Ermüdung der Augen")
*
Mark only one oval.
1
2
3
4
None
Severe
53.
Difficulty focusing (DE: "Schwierigkeiten mit der Sehschärfe")
*
Mark only one oval.
1
2
3
4
None
Severe
54.
Increased salivation (DE: "Erhöhte Speichelbildung")
*
Mark only one oval.
1
2
3
4
None
Severe
55.
Sweating (DE: "Schwitzen")
*
Mark only one oval.
1
2
3
4
None
Severe
56.
47
Nausea (DE: "Übelkeit")
*
Mark only one oval.
1
2
3
4
None
Severe
57.
Difficulty concentrating (DE: "Konzentrationsschwierigkeiten")
*
Mark only one oval.
1
2
3
4
None
Severe
58.
Fullness of head (DE: "Druckgefühl im Kopfbereich")
*
Mark only one oval.
1
2
3
4
None
Severe
59.
Blurred vision (DE: "verschwommene Sicht")
*
Mark only one oval.
1
2
3
4
None
Severe
60.
Dizzy (eyes open) (DE: "Schwindelgefühl bei geöffneten Augen")
*
Mark only one oval.
1
2
3
4
None
Severe
61.
Dizzy (eyes closed) (DE: "Schwindelgefühl bei geschlossenen Augen")
*
Mark only one oval.
1
2
3
4
None
Severe
62.
48
Vertigo (DE: "Gleichgewichtsstörungen")
*
Mark only one oval.
1
2
3
4
None
Severe
63.
Stomach awareness (DE: "Magenbeschwerden")
*
Mark only one oval.
1
2
3
4
None
Severe
64.
Burping (DE: "Aufstoßen")
*
Mark only one oval.
1
2
3
4
None
Severe
65.
Post Questionnaire
Did you feel immersed in the virtual world?
*
Mark only one oval.
1
2
3
4
5
no
yes
66.
Were you distracted from the virtual world by real-world ambient noise?
*
Mark only one oval.
1
2
3
4
5
no
yes
67.
Have you been able to see parts of the real laboratory during the experiment?
*
Mark only one oval.
1
2
3
4
5
no
yes
68.
49
Do you think the experiment task was too difficult?
*
Mark only one oval.
1
2
3
4
5
no
yes
69.
Do you think the experiment was too long?
*
Mark only one oval.
1
2
3
4
5
no
yes
70.
How would you subjectively describe your level of attention during the
experiment?
*
Mark only one oval.
1
2
3
4
5
very low
very high
71.
Which strategy did you use (e.g., concentrating on certain signals, making a
"decision from the gut", etc.)?
*
72.
Any observations regarding the difficulty of the task that you made during the
experiment and would like to share?
73.
50
Additional comments:
74.
Slater-Usoh-Steed Questionnaire (SUS)
Please rate your sense of being in the virtual environment, on a scale of 1 to 7,
where 7 represents your normal experience of being in a place.
*
I had a sense of “being there“...
Mark only one oval.
1
2
3
4
5
6
7
not at all
very much
75.
To what extent were there times during the experience when the virtual
environment was the reality for you?
*
There were times when the virtual environment was the reality for me...
Mark only one oval.
1
2
3
4
5
6
7
not at
all
almost all the
time
76.
When you think back to the experience, do you think of the virtual environment
more as images that you saw or more as somewhere that you visited?
*
The virtual environment seems to me to be more like...
Mark only one oval.
1
2
3
4
5
6
7
images
that I saw
somewhere
that I visited
77.
During the time of the experience, which was the strongest on the whole, your
sense of being in the virtual environment or of being elsewhere?
*
I had a stronger sense of...
Mark only one oval.
1
2
3
4
5
6
7
being
elsewhere
being in the
virtual
environment
78.
51
Consider your memory of being in the virtual environment. How similar in
terms of the structure of the memory is this to the structure of the memory of
other places you have been today? By ‘structure of the memory’ consider
things like the extent to which you have a visual memory of the virtual
environment, whether that memory is in colour, the extent to which the memory
seems vivid or realistic, its size, location in your imagination, the extent to
which it is panoramic in your imagination, and other such structural elements.
*
I think of the virtual environment as a place in a way similar to other places that I have
been today...
Mark only one oval.
1
2
3
4
5
6
7
not at all
very much so
79.
During the time of your experience, did you often think to yourself that you
were actually in the virtual environment?
*
During the experiment I often thought that I was really standing in the virtual
environment...
Mark only one oval.
1
2
3
4
5
6
7
not very
often
very much
so
80.
52
B. Data: Questionnaire
subject id Timestamp Age Height Profession / field of study Gender How would you rate your
German language skill?
1 2014-12-19 15:19:46 22 190 Human-Computer-Interaction Male Native speaker
2 2014-12-19 16:18:05 27 180 Student Informatik Male Native speaker
3 2014-12-19 17:25:14 21 177 Student Informatik Male Native speaker
4 2014-12-19 18:34:41 24 172 HCI Female Native speaker
5 2014-12-19 19:33:24 19 181 MCI Male Native speaker
6 2014-12-22 11:19:25 25 172 Bachelor MCI Male Native speaker
7 2014-12-22 14:26:18 33 182 Post-doc CS Male Native speaker
8 2014-12-22 16:11:59 33 185 Informatics Male Native speaker
9 2014-12-23 11:35:20 34 168 MCI Female Native speaker
10 2014-12-23 14:05:30 24 180 MCI Male Native speaker
11 2014-12-23 17:07:39 45 155 computer science Female Native speaker
12 2014-12-23 19:01:57 20 192 Computer Science Male Native speaker
13 2015-01-12 15:18:22 21 167 MCI Student Female Native speaker
14 2015-01-12 18:00:27 24 183 HCI Male Native speaker
15 2015-01-12 19:12:41 28 173 phd student Male Native speaker
53
subject id Vision correction: Do you have a known
eye disorder?
Do you suffer from hearing loss? If you suffer from
hearing loss please
check all that apply:
1 None No (healthy hearing capacity)
2 Glasses No (healthy hearing capacity)
3 Glasses No (healthy hearing capacity)
4 Glasses No (healthy hearing capacity)
5 None No (healthy hearing capacity)
6 Glasses No (healthy hearing capacity)
7 Contact lenses No (healthy hearing capacity)
8 None No (healthy hearing capacity)
9 Glasses No (healthy hearing capacity)
10 None
11 None No (healthy hearing capacity)
12 None No (healthy hearing capacity)
13 None No (healthy hearing capacity)
14 None No (healthy hearing capacity)
15 Glasses Mild hearing loss (difficulties
understanding speech)
Symmetrical hearing
loss (both ears af-
fected at about the
same level)
54
subject id Hearing correction: Do you suffer from a
displacement of equi-
librium or similar?
Do you have any experience with
virtual reality HMDs (such as
the Oculus Rift)?
Do you have experi-
ence with 3D com-
puter games?
1 None No 1 3
2 None No 4 4
3 None No 1 5
4 None No 2 1
5 None No 2 5
6 None No 1 5
7 None No 5 3
8 None No 3 5
9 None No 1 2
10 None No 1 5
11 None No 4 4
12 None No 2 1
13 None No 2 1
14 None No 3 3
15 None No 5 5
55
subject id How many hours do
you play per week?
Do you have experi-
ence with 3D stereo-
scopic display (cin-
ema games etc.)?
Are you left- or right-handed? Inter-pupillary dis-
tance (IPD)
1 1 2 Right-handed 4.4
2 10 4 Right-handed 6.1
3 10 2 Right-handed 7.2
4 4 3 Right-handed 5.8
5 25 3 Right-handed 6.5
6 12 3 Right-handed 6.6
7 0 5 Right-handed 6.5
8 3 4 Right-handed 6.0
9 0 3 Right-handed 6.2
10 10 3 Right-handed 6.7
11 0.2 3 Right-handed 5.7
12 6 4 Right-handed 6.5
13 0 3 Right-handed 5.6
14 10 3 Right-handed 6.5
15 20 5 Right-handed 6.8
56
subject id With which hand
do you draw?
Which hand would
you use to throw a
ball to hit a target?
In which hand
would you use an
eraser on paper?
Which hand re-
moves the top
card when you
are dealing from a
deck?
With which foot
would you kick a
ball to hit a target?
1 Right Right Right Left Right
2 Right Right Right Either Right
3 Right Right Right Right Right
4 Right Right Right Either Either
5 Right Right Right Either Right
6 Right Right Right Right Right
7 Right Right Either Right Right
8 Right Right Right Left Right
9 Right Right Right Right Right
10 Right Right Right Right Right
11 Right Right Right Right Right
12 Right Right Right Either Right
13 Right Right Right Either Right
14 Right Right Right Right Right
15 Right Right Either Either Right
57
subject id If you wanted to
pick up a pebble
with your toes
which foot would
you use?
Which foot would
you use to step on
a bug?
If you had to step
up onto a chair
which foot would
you place on the
chair first?
Which eye would
you use to look
through a tele-
scope?
If you had to look
into a dark bot-
tle to see how full
it was which eye
would you use?
1 Right Right Right Left Left
2 Right Right Either Either Either
3 Right Right Either Right Right
4 Right Left Left Right Right
5 Right Right Left Either Either
6 Either Either Right Right Right
7 Either Either Left Right Either
8 Right Right Right Left Right
9 Right Either Right Left Left
10 Right Right Right Right Right
11 Right Right Right Right Right
12 Right Either Right Right Right
13 Either Either Right Left Left
14 Right Right Right Right Right
15 Either Either Either Right Right
58
subject id Which eye would
you use to peep
through a keyhole?
Which eye would
you use to sight
down a rifle?
If you wanted to
listen in on a con-
versation going on
behind a closed
door which ear
would you place
against the door?
Into which ear
would you place
the earphone of a
transistor radio?
If you wanted to
hear someone’s
heartbeat which
ear would you place
against their chest?
1 Left Left Right Either Either
2 Either Either Left Right Left
3 Right Right Left Left Left
4 Right Right Right Right Right
5 Right Left Right Right Right
6 Right Right Left Left Right
7 Either Right Either Either Either
8 Right Left Left Left Left
9 Left Left Right Right Left
10 Right Right Right Right Left
11 Right Right Right Right Left
12 Right Right Right Either Either
13 Left Left Left Right Either
14 Right Right Right Right Right
15 Right Right Either Either Either
59
subject id Imagine a small box resting on
a table. This box contains a
small clock. Which ear would
you press against the box to find
out if the clock was ticking?
General discomfort Fatigue Headache Eyestrain Difficulty focusing
1 Right 1 2 1 1 1
2 Left 1 2 1 2 1
3 Left 2 2 2 2 1
4 Right 1 2 1 1 1
5 Right 2 2 3 2 2
6 Right 1 1 1 1 1
7 Left 1 1 1 2 1
8 Left 1 1 1 2 1
9 Left 1 2 1 2 3
10 Right 1 1 2 1 1
11 Either 1 1 1 1 1
12 Right 1 1 1 2 1
13 Left 1 3 1 2 1
14 Right 1 2 1 3 1
15 Either 1 1 1 1 1
60
subject id Increased salivation Sweating Nausea Difficulty concentrating Fullness of head Blurred vision
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 2 1 2 2 1
4 1 1 1 2 1 1
5 1 1 1 2 1 2
6 1 1 1 1 1 1
7 1 1 1 2 1 1
8 1 1 1 1 1 1
9 1 1 1 3 1 2
10 1 2 1 1 1 1
11 1 1 1 1 1 1
12 1 1 1 2 1 1
13 2 1 1 2 1 1
14 1 1 1 2 1 1
15 1 1 1 1 1 1
61
subject id Dizzy (eyes open) Dizzy (eyes closed) Vertigo Stomach awareness Burping General discomfort
1 1 1 1 1 1 1
2 1 1 1 1 1 2
3 1 1 1 1 1 2
4 2 2 2 1 2 1
5 1 1 1 1 1 1
6 1 1 1 1 1 2
7 1 1 1 1 1 1
8 1 1 1 1 1 2
9 1 1 1 1 1 1
10 1 1 1 1 1 1
11 1 1 1 1 1 1
12 1 1 1 1 1 3
13 1 1 1 1 1 2
14 1 1 1 1 1 1
15 1 1 1 1 1 1
62
subject id Fatigue Headache Eyestrain Difficulty focusing Increased salivation Sweating
1 3 1 1 1 1 1
2 2 1 2 1 1 1
3 2 2 2 2 1 1
4 3 2 2 2 1 1
5 2 1 3 1 1 1
6 2 1 1 1 1 1
7 1 1 1 1 1 1
8 2 1 2 1 1 1
9 2 1 3 2 1 1
10 1 2 1 1 1 1
11 1 1 2 1 1 1
12 3 1 3 1 1 1
13 4 1 3 1 2 1
14 3 1 4 2 1 1
15 2 1 1 1 1 1
63
subject id Nausea Difficulty concentrating Fullness of head Blurred vision Dizzy (eyes open) Dizzy (eyes closed)
1 1 1 1 1 1 1
2 2 1 1 1 1 2
3 2 2 3 1 1 2
4 1 2 3 1 2 2
5 1 2 1 2 1 1
6 1 1 1 2 1 1
7 1 1 1 1 1 1
8 1 1 2 1 1 1
9 1 2 1 1 1 1
10 1 1 2 1 1 1
11 1 1 1 2 1 1
12 1 2 1 1 1 1
13 1 2 2 1 1 1
14 1 3 2 1 1 1
15 1 1 1 1 1 1
64
subject id Vertigo Stomach awareness Burping Did you feel im-
mersed in the vir-
tual world?
Were you dis-
tracted from the
virtual world by
real-world ambient
noise?
1 1 1 1 4 3
2 1 1 1 3 2
3 2 1 1 4 4
4 1 1 2 4 2
5 1 1 1 4 2
6 1 1 1 4 1
7 1 1 1 4 2
8 1 1 1 3 1
9 1 1 1 2 1
10 1 1 1 3 1
11 1 1 1 1 1
12 1 1 2 2 3
13 1 1 1 2 2
14 1 1 1 2 2
15 1 1 1 3 2
65
subject id Have you been able to
see parts of the real
laboratory during the
experiment?
Do you think the ex-
periment task was too
difficult?
Do you think the ex-
periment was too long?
How would you sub-
jectively describe your
level of attention dur-
ing the experiment?
11123
21234
31124
41124
51114
62124
72114
81124
91115
10 1 1 3 4
11 3 3 1 4
12 5 1 3 2
13 2 1 2 4
14 2 1 1 4
15 1 1 2 3
66
subject id Which strategy did you use (e.g., concentrating on certain signals,
making a “decision from the gut”, etc.)?
1 decision from the gut, voice
2 decision from the gut, clean audio
3 concentrating on audio and movement of the body, speach clearlyness
4 teils spezielle Signale, teils Bauchgefühl
5 I mainly concentrated on the voice of the actors, but didn’t really
have a strategy elsewise. “from the gut” describes it pretty well.
6 Comparing the actor’s pattern of movement, i.e. choosing the actor
with the most natural movement while speaking his text.
7 motion > no motion, actual voice > tts, rest from the gut
8 at first hearing experince, then body language and facial expressions
9 in erster Linie habe ich nach dem Ton ausgewählt, zu technische,
zu klare und wie bei einem Außenreporter verzerrte Sprache. ist als
erstes rausgeflogen. Ansonsten hab ich mich auf mein Bauchgefühl
verlassen und keine richtige Strategie verfolgt.
10 differentiate between moving and non-moving person, differentiate
between natural speech and synthezid speech
11 I thought of one of them being the real person and the other as a
virtual language teacher. Still it was not easy to decide.
12 1. loudest speaker, 2. if equal, the one who moves, 3. generally what
felt best
13 - allgemeiner Eindruck - ob Stimme “in den Raum” passt - Aufmerk-
samkeitsrichtung des Sprechers (auf mich gerichtet oder sonstwohin)
- bei gleichem Eindruck Ausfall nach dem Motto: “Zu wem passt die
Stimme besser”
14 decision from the gut, clearer voice maybee
15 Movement & computer voice vs recorded voice as hints
67
subject id Any observations regarding the difficulty of the task that you made
during the experiment and would like to share?
1
2
3
4 man konnte jeden einzelnen Bildpixel sehen, stört den “Realismus”
5
6 Slight difficulties fitting my normal glasses in the Oculus Rift, but
nothing too complicated.
7
8
9
10
11 When it was exactly the same recording I had difficulties to choose.
12 the Oculus Rift has a too low resolution for prolonged watching ->
the eyes feel severe pain
13 Die beiden Personen blinzeln wenig/gar nicht/schlecht zu erkennen,
was dazu beigetragen haben kann, dass ich selber weniger geblinzelt
habe und dadurch die Augen mehr angestrengt wurden.
14
15 Headtracking would be nice. Felt like the actors looked past me
sometimes.
68
subject id Additional comments:
1
2
3 very nice setup (and chair ;-) )
4 das neu laden der Szene nach jedem Vergleich hat das Bild manchmal
gefühlt leicht springen lassen (gefühlt leichter Ruck nach rechts oder
links) - führte zu leichten Schwindel-Attacken
5
6 No.
7 nice work!
8 if the voice sounds “metallic/robotic” than the experience is reduced
in naturalness
9
10
11
12
13
14
15
69
subject id Please rate your
sense of being in
the virtual en-
vironment on a
scale of 1 to 7
where 7 repre-
sents your nor-
mal experience
of being in a
place.
To what extent
were there times
during the expe-
rience when the
virtual environ-
ment was the re-
ality for you?
When you think
back to the
experience do
you think of the
virtual environ-
ment more as
images that you
saw or more as
somewhere that
you visited?
During the time
of the experi-
ence which was
the strongest on
the whole your
sense of being
in the virtual
environment or
of being else-
where?
Consider your
memory of being
in the virtual en-
vironment. How
similar in terms
of the structure
of the memory
is this to the
structure of the
memory of other
places you have
been today? (...)
During the
time of your
experience
did you often
think to your-
self that you
were actually
in the virtual
environment?
1 4 7 7 6 2 4
2 6 5 5 5 5 5
3 5 2 2 5 3 2
4 4 4 5 5 4 3
5 4 2 2 7 5 4
6 5 2 2 5 6 2
7 6 5 7 7 6 5
8 4 4 5 3 5 4
9 3 4 4 5 1 1
10 3 1 1 2 3 2
11 4 4 4 4 1 2
12 1 1 1 1 7 1
13 3 1 5 5 4 2
14 2 2 3 4 2 2
15 4 4 4 4 4 4
70
C. Data: Experiment
subject id trial id body left body right speech left speech right sentence order choice duration
1 0 still mocap recording text to sp eech 3 left to right left 1494
1 1 mocap reduced text to speech processed 1 left to right right 517
1 2 mocap mocap text to speech processed 6 right to left right 1478
1 3 reduced mocap processed processed 6 right to left left 1279
1 4 mocap still text to speech processed 4 right to left right 2008
1 5 mocap still text to speech text to speech 1 left to right right 1510
1 6 reduced mocap recording pro cessed 5 left to right left 517
1 7 still mocap processed pro cessed 6 right to left left 3565
1 8 mocap still text to speech recording 5 left to right right 881
1 9 still reduced processed processed 8 right to left right 2869
1 10 mocap still recording recording 5 left to right left 898
1 11 still reduced processed text to speech 5 left to right left 716
1 12 reduced still text to speech processed 3 left to right right 517
1 13 reduced mocap processed processed 5 left to right left 4376
1 14 still mocap pro cessed text to sp eech 4 right to left left 997
1 15 still mocap text to sp eech text to speech 4 right to left right 2174
1 16 still still processed recording 4 right to left right 2853
1 17 mocap still text to speech processed 3 left to right right 715
1 18 mocap reduced text to sp eech text to speech 7 left to right right 1511
1 19 reduced mo cap recording processed 6 right to left left 2919
1 20 mocap still processed recording 5 left to right right 517
1 21 mocap reduced pro cessed text to sp eech 8 right to left left 1643
1 22 reduced mo cap text to speech text to speech 3 left to right left 1462
1 23 mocap reduced text to sp eech text to speech 8 right to left right 2124
1 24 mocap still processed processed 4 right to left left 1743
1 25 reduced mo cap processed recording 8 right to left right 1743
1 26 mocap mo cap recording text to speech 5 left to right left 914
1 27 reduced still processed processed 4 right to left right 2124
1 28 reduced still processed recording 6 right to left left 1809
1 29 still mo cap processed processed 5 left to right left 189
1 30 still reduced text to speech recording 2 right to left right 1544
1 31 reduced still text to speech text to speech 2 right to left left 518
1 32 reduced mo cap text to speech recording 7 left to right right 1113
1 33 mocap reduced text to sp eech recording 3 left to right right 517
1 34 still mocap text to sp eech processed 6 right to left right 2207
1 35 mocap reduced recording text to speech 8 right to left left 517
1 36 reduced mo cap processed recording 7 left to right right 798
1 37 reduced mo cap text to speech recording 8 right to left right 4112
1 38 still still recording pro cessed 3 left to right right 517
1 39 mocap mo cap processed text to speech 5 left to right left 2570
1 40 mocap reduced recording recording 3 left to right right 881
1 41 reduced mo cap recording text to speech 3 left to right left 897
1 42 mocap mo cap text to speech processed 5 left to right right 3747
1 43 reduced reduced text to speech processed 8 right to left right 517
1 44 still mocap pro cessed recording 7 left to right right 517
1 45 still reduced recording recording 2 right to left left 948
1 46 reduced still text to speech processed 4 right to left right 2671
1 47 reduced reduced text to speech processed 7 left to right right 1909
1 48 mocap mo cap processed recording 8 right to left right 1196
1 49 reduced still processed processed 3 left to right right 2342
1 50 still reduced recording pro cessed 8 right to left left 4079
1 51 reduced still recording recording 6 right to left right 848
1 52 mocap still recording processed 4 right to left right 2289
1 53 still still recording pro cessed 4 right to left right 1875
1 54 mocap mo cap processed text to speech 6 right to left left 2323
1 55 still mocap text to sp eech recording 7 left to right right 1411
1 56 mocap reduced pro cessed text to sp eech 7 left to right left 616
1 57 mocap reduced processed processed 1 left to right right 1975
1 58 still reduced recording text to sp eech 6 right to left left 3267
1 59 mocap reduced text to sp eech recording 4 right to left right 517
71
subject id trial id body left body right speech left speech right sentence order choice duration
1 60 still reduced text to speech processed 8 right to left right 1047
1 61 mocap still processed processed 3 left to right right 1794
1 62 still mocap text to sp eech text to sp eech 3 left to right left 1710
1 63 still mocap recording processed 6 right to left right 1958
1 64 still reduced processed recording 1 left to right left 1511
1 65 mocap reduced text to sp eech processed 2 right to left left 517
1 66 still reduced text to speech recording 1 left to right right 5123
1 67 mocap reduced recording text to speech 7 left to right left 699
1 68 still still processed recording 3 left to right left 3714
1 69 reduced reduced processed text to sp eech 8 right to left left 517
1 70 mocap still processed text to speech 2 right to left left 632
1 71 reduced mo cap recording recording 8 right to left right 3036
1 72 still reduced text to speech text to speech 6 right to left right 2571
1 73 mocap still text to speech text to speech 2 right to left left 517
1 74 mocap mo cap recording text to speech 6 right to left left 964
1 75 still still text to speech recording 3 left to right right 4626
1 76 reduced still text to speech text to speech 1 left to right left 782
1 77 reduced reduced processed recording 1 left to right right 1627
1 78 still mocap recording text to speech 4 right to left left 2241
1 79 still reduced recording pro cessed 7 left to right left 1080
1 80 reduced reduced text to speech recording 1 left to right right 1080
1 81 mocap reduced recording recording 4 right to left right 2919
1 82 still reduced processed text to speech 6 right to left left 948
1 83 mocap mo cap text to speech recording 7 left to right right 1130
1 84 mocap mo cap processed recording 7 left to right left 1345
1 85 still reduced recording text to sp eech 5 left to right left 567
1 86 mocap mo cap text to speech recording 8 right to left left 8286
1 87 still still text to speech recording 4 right to left right 2339
1 88 reduced mo cap processed text to speech 3 left to right left 1047
1 89 still still text to speech processed 2 right to left right 1113
1 90 mocap mo cap recording processed 7 left to right left 1394
1 91 still mocap recording recording 8 right to left right 815
1 92 mocap reduced pro cessed recording 3 left to right right 2356
1 93 still reduced text to speech text to speech 5 left to right left 566
1 94 reduced reduced processed text to sp eech 7 left to right left 616
1 95 reduced reduced processed recording 2 right to left left 865
1 96 still reduced text to speech processed 7 left to right left 517
1 97 mocap still processed text to speech 1 left to right left 438
1 98 reduced still recording text to speech 2 right to left left 682
1 99 mocap reduced recording processed 1 left to right left 3548
1 100 reduced reduced text to speech recording 2 right to left right 4625
1 101 reduced still text to speech recording 5 left to right right 865
1 102 reduced still recording pro cessed 3 left to right right 2554
1 103 still mocap recording processed 5 left to right right 1047
1 104 reduced still recording recording 5 left to right right 749
1 105 still still recording text to sp eech 1 left to right left 1080
1 106 reduced still recording text to sp eech 1 left to right left 2637
1 107 still mocap text to sp eech pro cessed 5 left to right right 517
1 108 mo cap still recording text to sp eech 1 left to right left 517
1 109 reduced mocap pro cessed text to speech 4 right to left left 881
1 110 mo cap still processed recording 6 right to left left 1528
1 111 mo cap still recording pro cessed 3 left to right right 1726
1 112 mo cap mocap recording processed 8 right to left left 2389
1 113 mo cap reduced processed pro cessed 2 right to left left 517
1 114 still reduced processed processed 7 left to right left 517
1 115 reduced reduced recording text to sp eech 8 right to left left 517
1 116 reduced mocap recording text to speech 4 right to left left 848
1 117 reduced still recording pro cessed 4 right to left right 2554
1 118 reduced reduced recording pro cessed 1 left to right right 2770
1 119 still mocap pro cessed text to speech 3 left to right left 733
72
subject id trial id body left body right speech left speech right sentence order choice duration
1 120 reduced mocap text to sp eech pro cessed 6 right to left right 800
1 121 mo cap still text to speech recording 6 right to left right 1908
1 122 mo cap reduced recording pro cessed 2 right to left right 683
1 123 still still recording text to sp eech 2 right to left left 1776
1 124 reduced reduced recording pro cessed 2 right to left right 1345
1 125 reduced mocap text to sp eech pro cessed 5 left to right right 4643
1 126 reduced still text to speech recording 6 right to left right 1445
1 127 still still processed text to speech 2 right to left left 2091
1 128 reduced still processed text to speech 2 right to left right 517
1 129 mo cap still recording recording 6 right to left right 4179
1 130 mo cap still recording text to sp eech 2 right to left left 517
1 131 still reduced processed recording 2 right to left left 1577
1 132 reduced mocap text to sp eech text to sp eech 4 right to left right 517
1 133 still mocap recording recording 7 left to right right 583
1 134 mo cap reduced processed recording 4 right to left left 1891
1 135 reduced still processed recording 5 left to right left 1859
1 136 reduced mocap recording recording 7 left to right right 3366
1 137 still still text to speech processed 1 left to right right 750
1 138 still mocap text to sp eech recording 8 right to left right 3235
1 139 still mocap pro cessed recording 8 right to left left 1130
1 140 still still processed text to speech 1 left to right left 3879
1 141 still reduced recording recording 1 left to right right 6315
1 142 reduced still processed text to speech 1 left to right left 517
1 143 reduced reduced recording text to sp eech 7 left to right left 1709
2 0 still reduced processed text to speech 6 right to left left 2622
2 1 still reduced processed recording 2 right to left left 1477
2 2 mocap mocap processed text to speech 6 right to left left 1396
2 3 still mocap recording recording 8 right to left right 1229
2 4 still reduced recording text to speech 5 left to right left 1163
2 5 mocap reduced processed recording 3 left to right right 6249
2 6 still reduced processed recording 1 left to right right 1263
2 7 still reduced text to speech text to speech 5 left to right left 1362
2 8 mocap still recording recording 5 left to right left 699
2 9 still still text to speech processed 2 right to left right 1246
2 10 mocap still recording text to speech 1 left to right left 1842
2 11 mocap still text to speech processed 3 left to right left 931
2 12 reduced reduced text to speech processed 8 right to left right 799
2 13 mocap reduced recording processed 2 right to left left 1395
2 14 mocap mo cap processed recording 8 right to left right 521
2 15 still still text to speech recording 3 left to right right 1362
2 16 reduced reduced recording text to speech 8 right to left left 517
2 17 mocap reduced pro cessed text to sp eech 8 right to left left 11255
2 18 mocap still processed recording 6 right to left left 11190
2 19 mocap still text to speech text to speech 2 right to left right 1743
2 20 still reduced recording text to sp eech 6 right to left left 1013
2 21 still reduced recording recording 1 left to right right 517
2 22 reduced mo cap processed text to speech 4 right to left left 881
2 23 mocap still recording processed 3 left to right left 2107
2 24 reduced mo cap text to speech recording 8 right to left right 518
2 25 mocap reduced pro cessed recording 4 right to left right 666
2 26 still still recording pro cessed 3 left to right left 517
2 27 still reduced text to speech processed 8 right to left left 1147
2 28 reduced mo cap recording text to speech 3 left to right left 517
2 29 reduced mo cap processed recording 7 left to right right 1013
2 30 still reduced text to speech processed 7 left to right right 980
2 31 still mocap text to sp eech text to sp eech 3 left to right left 518
2 32 reduced mo cap recording processed 6 right to left right 1047
2 33 mocap reduced pro cessed pro cessed 2 right to left left 882
2 34 still reduced recording pro cessed 7 left to right left 2869
2 35 mocap mo cap recording processed 7 left to right left 405
73
subject id trial id body left body right speech left speech right sentence order choice duration
2 36 mocap mo cap recording processed 8 right to left left 914
2 37 reduced still recording text to speech 1 left to right left 2935
2 38 still mocap text to sp eech text to speech 4 right to left right 882
2 39 mocap reduced text to sp eech text to speech 7 left to right right 2670
2 40 still still text to speech processed 1 left to right left 518
2 41 reduced still recording processed 4 right to left left 915
2 42 still mocap pro cessed pro cessed 6 right to left left 1030
2 43 still still processed text to speech 2 right to left left 931
2 44 still mocap pro cessed text to sp eech 4 right to left left 1063
2 45 still mocap recording processed 5 left to right left 2190
2 46 mocap still text to speech processed 4 right to left right 1032
2 47 mocap mo cap text to speech recording 7 left to right right 914
2 48 still mocap text to sp eech recording 8 right to left right 914
2 49 mocap mo cap recording text to speech 6 right to left left 533
2 50 mocap reduced text to sp eech processed 2 right to left right 517
2 51 still mocap pro cessed recording 7 left to right left 749
2 52 reduced still processed recording 5 left to right right 1229
2 53 still mocap pro cessed text to sp eech 3 left to right left 1046
2 54 reduced still text to speech text to speech 2 right to left right 190
2 55 mocap reduced recording text to speech 8 right to left left 1114
2 56 reduced reduced processed recording 2 right to left left 517
2 57 mocap reduced text to sp eech processed 1 left to right left 1014
2 58 mocap reduced text to sp eech recording 4 right to left left 865
2 59 mocap still recording text to speech 2 right to left left 1014
2 60 still still recording text to sp eech 1 left to right left 1013
2 61 mocap still recording recording 6 right to left right 1279
2 62 mocap reduced text to sp eech text to speech 8 right to left left 617
2 63 mocap reduced pro cessed text to sp eech 7 left to right left 650
2 64 reduced reduced recording text to speech 7 left to right left 1031
2 65 still mocap text to sp eech processed 6 right to left right 732
2 66 reduced reduced recording processed 2 right to left left 583
2 67 reduced mo cap text to speech processed 5 left to right left 3250
2 68 reduced reduced text to speech processed 7 left to right right 517
2 69 reduced reduced text to speech recording 1 left to right right 699
2 70 reduced still processed text to speech 1 left to right right 964
2 71 still still processed text to speech 1 left to right left 517
2 72 still still processed recording 4 right to left right 2406
2 73 reduced mo cap recording recording 8 right to left left 1080
2 74 reduced reduced processed text to sp eech 8 right to left left 948
2 75 reduced reduced processed recording 1 left to right right 517
2 76 reduced mo cap recording processed 5 left to right left 517
2 77 mocap mo cap text to speech processed 6 right to left left 1047
2 78 reduced mo cap processed recording 8 right to left right 948
2 79 still mocap recording recording 7 left to right left 517
2 80 still reduced processed processed 8 right to left right 932
2 81 reduced reduced recording processed 1 left to right left 1097
2 82 reduced still processed recording 6 right to left right 517
2 83 mocap mo cap text to speech processed 5 left to right right 616
2 84 reduced mo cap recording text to speech 4 right to left left 699
2 85 still reduced text to speech recording 2 right to left right 520
2 86 still mocap recording text to speech 3 left to right left 4078
2 87 reduced still recording text to speech 2 right to left left 1187
2 88 reduced mo cap recording recording 7 left to right left 999
2 89 mocap reduced recording text to speech 7 left to right left 517
2 90 still reduced text to speech recording 1 left to right right 1428
2 91 reduced mo cap text to speech processed 6 right to left right 1014
2 92 reduced mocap processed processed 5 left to right right 848
2 93 still reduced processed processed 7 left to right left 1743
2 94 still reduced recording recording 2 right to left left 517
2 95 reduced still processed text to speech 2 right to left right 898
74
subject id trial id body left body right speech left speech right sentence order choice duration
2 96 reduced mo cap text to speech recording 7 left to right right 1196
2 97 mocap still text to speech text to speech 1 left to right left 1096
2 98 mocap mo cap processed recording 7 left to right right 517
2 99 still reduced recording pro cessed 8 right to left left 798
2 100 mo cap still recording pro cessed 4 right to left left 798
2 101 reduced mocap pro cessed processed 6 ri ght to left right 947
2 102 mo cap reduced processed pro cessed 1 left to right left 931
2 103 mo cap mocap text to speech recording 8 right to left right 882
2 104 mo cap reduced text to speech recording 3 left to right right 1677
2 105 still still text to speech recording 4 right to left right 1014
2 106 mo cap still processed text to speech 1 left to right left 1080
2 107 mo cap still text to speech recording 5 left to right right 617
2 108 still still processed recording 3 left to right right 964
2 109 reduced still text to speech recording 6 right to left right 1014
2 110 reduced still recording pro cessed 3 left to right left 947
2 111 reduced mocap text to sp eech text to sp eech 4 right to left right 1262
2 112 reduced reduced text to speech recording 2 right to left right 518
2 113 mo cap mocap recording text to speech 5 left to right left 981
2 114 reduced still recording recording 6 right to left right 1080
2 115 still mocap pro cessed processed 5 left to right right 567
2 116 still still recording pro cessed 4 right to left left 1709
2 117 reduced still text to speech processed 3 left to right left 715
2 118 reduced reduced processed text to speech 7 left to right left 11588
2 119 still mocap text to sp eech recording 7 left to right right 1909
2 120 reduced still text to speech processed 4 right to left right 931
2 121 reduced still text to speech text to speech 1 left to right right 1129
2 122 reduced still processed pro cessed 4 right to left right 898
2 123 mo cap reduced recording pro cessed 1 left to right left 815
2 124 still reduced text to speech text to speech 6 right to left left 799
2 125 still mocap recording text to speech 4 right to left right 520
2 126 mo cap still processed processed 4 right to left left 2704
2 127 mo cap still processed recording 5 left to right right 848
2 128 mo cap still processed text to speech 2 right to left left 1030
2 129 still mocap pro cessed recording 8 right to left right 521
2 130 still mocap recording processed 6 right to left left 1030
2 131 mo cap still text to speech recording 6 right to left right 518
2 132 reduced mocap text to sp eech text to sp eech 3 left to right left 517
2 133 still mocap text to sp eech pro cessed 5 left to right right 600
2 134 mo cap reduced recording recording 3 left to right left 981
2 135 mo cap still processed pro cessed 3 left to right left 1743
2 136 reduced mocap pro cessed text to speech 3 left to right left 915
2 137 reduced still text to speech recording 5 left to right right 980
2 138 still reduced processed text to speech 5 left to right left 948
2 139 mo cap mocap processed text to speech 5 left to right left 931
2 140 reduced still recording recording 5 left to right right 699
2 141 still still recording text to sp eech 2 right to left left 898
2 142 reduced still processed pro cessed 3 left to right right 567
2 143 mo cap reduced recording recording 4 right to left right 517
3 0 reduced mocap processed recording 8 right to left right 518
3 1 reduced mocap text to speech processed 5 left to right left 1643
3 2 mocap still text to speech processed 4 right to left left 1113
3 3 mocap mocap processed recording 7 left to right right 517
3 4 still mocap recording pro cessed 6 right to left left 815
3 5 mocap mocap text to speech processed 5 left to right left 1245
3 6 still reduced recording recording 2 right to left right 5072
3 7 reduced mocap recording recording 7 left to right right 716
3 8 mocap mocap processed recording 8 right to left left 848
3 9 mocap mocap text to speech recording 7 left to right right 1014
3 10 still still recording pro cessed 3 left to right left 518
3 11 reduced reduced processed text to sp eech 8 right to left left 11220
75
subject id trial id body left body right speech left speech right sentence order choice duration
3 12 mocap reduced recording recording 3 left to right right 1180
3 13 mocap reduced text to sp eech processed 2 right to left right 1064
3 14 still reduced text to speech recording 2 right to left right 518
3 15 reduced reduced text to speech processed 7 left to right right 1229
3 16 reduced mocap processed processed 5 left to right right 716
3 17 reduced reduced processed recording 2 right to left right 1213
3 18 reduced still processed processed 3 left to right left 2920
3 19 still mocap text to sp eech text to speech 4 right to left right 2769
3 20 reduced mo cap processed text to speech 3 left to right left 1047
3 21 reduced reduced recording processed 1 left to right left 1098
3 22 still mocap text to sp eech processed 6 right to left right 1229
3 23 reduced reduced recording text to speech 7 left to right left 1146
3 24 still mocap recording text to speech 4 right to left right 1180
3 25 still mocap recording recording 8 right to left right 1064
3 26 mocap still recording recording 5 left to right left 1113
3 27 still mocap pro cessed recording 7 left to right right 914
3 28 mocap reduced recording processed 2 right to left left 981
3 29 reduced mo cap recording text to speech 4 right to left left 1064
3 30 mocap mo cap recording text to speech 6 right to left right 832
3 31 reduced mo cap processed processed 6 right to left left 1014
3 32 reduced still recording text to speech 1 left to right left 1097
3 33 mocap still text to speech text to speech 2 right to left left 1212
3 34 mocap still recording processed 4 right to left left 1213
3 35 still still recording text to sp eech 1 left to right left 3664
3 36 still still processed text to speech 2 right to left right 517
3 37 still still recording pro cessed 4 right to left left 1015
3 38 still mocap pro cessed text to sp eech 3 left to right right 1213
3 39 mocap still processed processed 4 right to left left 1081
3 40 mocap still recording text to speech 1 left to right left 1229
3 41 reduced reduced text to speech processed 8 right to left left 998
3 42 mocap still recording processed 3 left to right left 1263
3 43 still mocap text to sp eech recording 7 left to right right 1048
3 44 still still text to speech recording 3 left to right right 1047
3 45 still mocap recording processed 5 left to right right 517
3 46 reduced reduced text to speech recording 2 right to left right 881
3 47 still mocap pro cessed text to sp eech 4 right to left right 1163
3 48 mocap reduced pro cessed recording 4 right to left left 1130
3 49 mocap still text to speech processed 3 left to right left 1229
3 50 reduced reduced processed text to sp eech 7 left to right right 1312
3 51 reduced still processed text to speech 1 left to right left 1196
3 52 reduced reduced recording text to speech 8 right to left right 1843
3 53 reduced mo cap recording processed 5 left to right right 1230
3 54 still still text to speech recording 4 right to left right 1065
3 55 reduced mo cap text to speech recording 8 right to left right 981
3 56 still reduced processed recording 1 left to right right 997
3 57 still reduced recording text to sp eech 5 left to right right 1096
3 58 mocap reduced text to sp eech recording 4 right to left left 1676
3 59 reduced still recording processed 3 left to right left 1312
3 60 reduced still recording text to speech 2 right to left left 2886
3 61 reduced still text to speech text to speech 1 left to right left 2273
3 62 reduced still text to speech recording 6 right to left left 1047
3 63 reduced still text to speech processed 3 left to right right 1594
3 64 mocap still recording text to speech 2 right to left left 1065
3 65 mocap still text to speech text to speech 1 left to right left 1163
3 66 still reduced recording recording 1 left to right right 1047
3 67 mocap still processed text to speech 1 left to right left 964
3 68 still reduced processed processed 8 right to left right 1097
3 69 mocap reduced pro cessed recording 3 left to right right 1247
3 70 still mocap text to sp eech text to sp eech 3 left to right right 964
3 71 reduced mo cap processed recording 7 left to right right 898
76
subject id trial id body left body right speech left speech right sentence order choice duration
3 72 reduced still text to speech recording 5 left to right left 1013
3 73 reduced reduced processed recording 1 left to right right 1461
3 74 still reduced recording pro cessed 7 left to right right 1363
3 75 reduced mo cap text to speech recording 7 left to right left 1065
3 76 still still text to speech processed 2 right to left right 1013
3 77 reduced mo cap text to speech text to speech 3 left to right right 1096
3 78 still mocap text to sp eech processed 5 left to right right 931
3 79 still mocap text to sp eech recording 8 right to left right 1131
3 80 still still processed text to speech 1 left to right right 864
3 81 mocap reduced text to sp eech text to speech 7 left to right left 1046
3 82 still reduced text to speech text to speech 5 left to right right 1031
3 83 still reduced processed text to speech 6 right to left right 1113
3 84 reduced still recording processed 4 right to left left 1378
3 85 mocap mo cap recording processed 7 left to right left 1379
3 86 still still text to speech processed 1 left to right left 981
3 87 mocap reduced recording text to speech 7 left to right left 865
3 88 still reduced processed processed 7 left to right right 815
3 89 mocap reduced recording recording 4 right to left right 1295
3 90 mocap mo cap recording text to speech 5 left to right right 866
3 91 still reduced recording text to sp eech 6 right to left right 1212
3 92 mocap mo cap processed text to speech 5 left to right right 848
3 93 still still recording text to sp eech 2 right to left right 1080
3 94 reduced mo cap recording text to speech 3 left to right left 1411
3 95 reduced mo cap processed text to speech 4 right to left left 1892
3 96 reduced mo cap recording processed 6 right to left right 2521
3 97 still still processed recording 4 right to left right 765
3 98 mocap mo cap text to speech recording 8 right to left left 998
3 99 mocap still text to speech recording 5 left to right left 568
3 100 reduced still recording recording 5 left to right left 1544
3 101 reduced still processed pro cessed 4 right to left left 914
3 102 mo cap still text to speech recording 6 right to left left 1097
3 103 mo cap reduced processed pro cessed 2 right to left right 898
3 104 still reduced recording processed 8 right to left right 1163
3 105 reduced still text to speech text to speech 2 right to left left 815
3 106 mo cap still processed recording 6 right to left left 1279
3 107 still still processed recording 3 left to right right 865
3 108 reduced mocap recording recording 8 right to left left 517
3 109 reduced reduced recording pro cessed 2 right to left left 1096
3 110 still reduced text to speech text to speech 6 right to left right 1196
3 111 reduced still processed recording 5 left to right left 832
3 112 mo cap reduced recording pro cessed 1 left to right right 1014
3 113 mo cap mocap recording processed 8 right to left right 1014
3 114 mo cap reduced processed text to speech 8 right to left left 914
3 115 reduced still processed recording 6 right to left left 981
3 116 still mocap pro cessed recording 8 right to left right 931
3 117 mo cap reduced text to speech text to speech 8 right to left left 1147
3 118 still mocap pro cessed processed 5 left to right right 1610
3 119 mo cap reduced recording text to sp eech 8 right to left left 8485
3 120 reduced reduced text to speech recording 1 left to right right 1560
3 121 still reduced processed text to speech 5 left to right right 898
3 122 still mocap recording recording 7 left to right right 1146
3 123 still mocap recording text to speech 3 left to right right 1262
3 124 reduced still recording recording 6 right to left left 1146
3 125 reduced still text to speech processed 4 right to left left 1246
3 126 mo cap mocap processed text to speech 6 right to left right 997
3 127 mo cap reduced text to speech recording 3 left to right right 898
3 128 still reduced text to speech processed 7 left to right right 997
3 129 mo cap mocap text to speech pro cessed 6 right to left left 832
3 130 still reduced text to speech processed 8 right to left right 1346
3 131 reduced still processed text to speech 2 right to left left 883
77
subject id trial id body left body right speech left speech right sentence order choice duration
3 132 still reduced text to speech recording 1 left to right right 865
3 133 mo cap still processed recording 5 left to right left 1047
3 134 mo cap reduced processed pro cessed 1 left to right right 848
3 135 mo cap reduced processed text to speech 7 left to right left 865
3 136 reduced mocap text to sp eech pro cessed 6 right to left right 799
3 137 mo cap still recording recording 6 right to left left 798
3 138 mo cap still processed pro cessed 3 left to right left 848
3 139 still reduced processed recording 2 right to left right 1195
3 140 mo cap reduced text to speech processed 1 left to right right 832
3 141 reduced mocap text to sp eech text to sp eech 4 right to left left 981
3 142 still mocap pro cessed processed 6 right to left right 848
3 143 mo cap still processed text to speech 2 right to left left 1114
4 0 still mocap recording pro cessed 6 right to left right 1412
4 1 mocap reduced recording recording 3 left to right left 1478
4 2 reduced still text to speech text to speech 2 right to left right 3068
4 3 mocap still recording recording 5 left to right right 6497
4 4 reduced mocap recording pro cessed 6 right to left right 518
4 5 mocap still processed processed 4 right to left right 1461
4 6 mocap still recording text to speech 2 right to left left 2191
4 7 still reduced processed text to speech 5 left to right left 1014
4 8 reduced still processed text to speech 1 left to right left 1080
4 9 mocap reduced processed processed 1 left to right left 3333
4 10 reduced reduced text to speech processed 8 right to left right 898
4 11 reduced mo cap text to speech recording 7 left to right right 1113
4 12 reduced still text to speech processed 3 left to right right 1047
4 13 reduced still recording text to speech 1 left to right left 1279
4 14 still mocap text to sp eech text to sp eech 3 left to right left 1163
4 15 mocap still processed recording 5 left to right left 914
4 16 still reduced processed recording 1 left to right left 1113
4 17 reduced still processed processed 4 right to left right 1096
4 18 reduced mo cap processed text to speech 4 right to left right 1180
4 19 still mocap recording recording 8 right to left right 1015
4 20 reduced mocap processed processed 5 left to right left 2588
4 21 still reduced processed processed 7 left to right right 1629
4 22 reduced still recording text to speech 2 right to left right 1014
4 23 mocap reduced pro cessed pro cessed 2 right to left left 1212
4 24 still reduced recording recording 1 left to right right 1229
4 25 mocap mo cap processed text to speech 5 left to right left 1196
4 26 mocap reduced recording text to speech 8 right to left left 998
4 27 reduced reduced text to speech recording 2 right to left right 1213
4 28 still mo cap processed processed 5 left to right left 2024
4 29 still still text to speech recording 4 right to left right 1014
4 30 mocap mo cap processed recording 8 right to left right 1063
4 31 mocap still processed text to speech 1 left to right left 1031
4 32 mocap reduced pro cessed text to sp eech 7 left to right left 882
4 33 mocap mo cap text to speech recording 7 left to right right 948
4 34 mocap still text to speech text to speech 1 left to right left 4128
4 35 reduced reduced processed recording 2 right to left right 997
4 36 reduced reduced recording text to speech 8 right to left right 948
4 37 mocap still recording text to speech 1 left to right left 1015
4 38 still mocap text to sp eech processed 5 left to right right 964
4 39 still still text to speech processed 1 left to right right 947
4 40 reduced still processed text to speech 2 right to left left 1163
4 41 reduced still text to speech processed 4 right to left right 981
4 42 mocap mo cap recording processed 8 right to left left 1627
4 43 still mocap text to sp eech text to speech 4 right to left right 948
4 44 still still processed text to speech 1 left to right left 1180
4 45 reduced still text to speech text to speech 1 left to right right 1031
4 46 mocap still text to speech recording 5 left to right right 1014
4 47 mocap reduced text to sp eech processed 2 right to left right 882
78
subject id trial id body left body right speech left speech right sentence order choice duration
4 48 still mocap pro cessed text to sp eech 4 right to left right 881
4 49 mocap still text to speech processed 3 left to right right 1180
4 50 still still processed recording 4 right to left right 5206
4 51 reduced reduced recording processed 2 right to left right 517
4 52 reduced mo cap text to speech recording 8 right to left right 964
4 53 mocap still processed processed 3 left to right left 1213
4 54 reduced reduced processed text to sp eech 8 right to left left 881
4 55 still still processed text to speech 2 right to left left 1345
4 56 still mocap pro cessed recording 8 right to left right 964
4 57 mocap reduced pro cessed text to sp eech 8 right to left left 1047
4 58 still reduced text to speech recording 2 right to left right 1246
4 59 mocap still processed recording 6 right to left right 3102
4 60 reduced mo cap recording text to speech 4 right to left right 997
4 61 reduced reduced recording processed 1 left to right left 1395
4 62 mocap reduced recording processed 1 left to right left 1229
4 63 still mocap pro cessed text to sp eech 3 left to right left 1212
4 64 still still processed recording 3 left to right right 1428
4 65 still still text to speech recording 3 left to right right 964
4 66 mocap still processed text to speech 2 right to left left 1179
4 67 reduced mo cap recording recording 8 right to left right 1229
4 68 still mocap recording recording 7 left to right left 2090
4 69 reduced still processed recording 6 right to left right 1096
4 70 mocap still recording recording 6 right to left right 1230
4 71 mocap mo cap recording text to speech 5 left to right left 1047
4 72 mocap reduced recording text to speech 7 left to right left 864
4 73 mocap reduced pro cessed recording 4 right to left left 1047
4 74 reduced mo cap processed recording 8 right to left right 932
4 75 reduced still recording processed 3 left to right right 981
4 76 still mocap recording text to speech 4 right to left right 1626
4 77 reduced still text to speech recording 6 right to left left 4608
4 78 still reduced text to speech processed 7 left to right right 881
4 79 reduced still text to speech recording 5 left to right left 1179
4 80 reduced still recording recording 6 right to left left 914
4 81 still still recording text to sp eech 2 right to left left 947
4 82 reduced reduced processed text to sp eech 7 left to right left 881
4 83 reduced mo cap recording recording 7 left to right right 4759
4 84 still mocap pro cessed pro cessed 6 right to left right 1081
4 85 reduced reduced recording text to speech 7 left to right left 1163
4 86 mocap still recording processed 3 left to right left 1709
4 87 still mocap pro cessed recording 7 left to right right 948
4 88 still reduced recording text to sp eech 5 left to right left 948
4 89 reduced mo cap recording processed 5 left to right left 1196
4 90 reduced mo cap processed text to speech 3 left to right left 881
4 91 mocap mo cap text to speech processed 6 right to left right 881
4 92 still mocap text to sp eech recording 8 right to left right 898
4 93 mocap still text to speech processed 4 right to left left 1245
4 94 mocap mo cap text to speech recording 8 right to left right 881
4 95 mocap still text to speech text to speech 2 right to left left 981
4 96 still reduced recording pro cessed 8 right to left right 881
4 97 still still recording text to sp eech 1 left to right left 881
4 98 reduced reduced processed recording 1 left to right left 2173
4 99 still reduced text to speech text to speech 5 left to right left 1212
4 100 reduced mocap text to sp eech pro cessed 5 left to right right 916
4 101 reduced mocap text to sp eech pro cessed 6 right to left right 998
4 102 still reduced recording text to sp eech 6 right to left left 914
4 103 mo cap mocap text to speech pro cessed 5 left to right right 898
4 104 still mocap recording processed 5 left to right right 897
4 105 mo cap reduced recording pro cessed 2 right to left right 981
4 106 still reduced processed processed 8 right to left right 882
4 107 still reduced text to speech processed 8 right to left right 1047
79
subject id trial id body left body right speech left speech right sentence order choice duration
4 108 mo cap still text to speech recording 6 right to left right 998
4 109 mo cap mocap processed text to speech 6 right to left right 1941
4 110 mo cap reduced processed recording 3 left to right left 914
4 111 still reduced recording recording 2 right to left right 947
4 112 still mocap text to sp eech recording 7 left to right right 964
4 113 still still text to speech processed 2 right to left right 1014
4 114 reduced reduced text to speech processed 7 left to right right 898
4 115 mo cap reduced text to speech recording 4 right to left left 964
4 116 still mocap recording text to speech 3 left to right left 964
4 117 reduced mocap text to sp eech text to sp eech 3 left to right right 1859
4 118 reduced mocap recording text to speech 3 left to right right 897
4 119 reduced still recording pro cessed 4 right to left right 1146
4 120 still reduced processed recording 2 right to left right 915
4 121 mo cap mocap recording text to speech 6 right to left right 997
4 122 mo cap reduced text to speech recording 3 left to right right 1163
4 123 mo cap reduced text to speech text to speech 8 right to left left 999
4 124 still reduced text to speech text to speech 6 right to left right 898
4 125 still still recording pro cessed 4 right to left right 1179
4 126 reduced still processed pro cessed 3 left to right left 882
4 127 mo cap reduced text to speech processed 1 left to right right 881
4 128 mo cap still recording pro cessed 4 right to left left 898
4 129 reduced reduced text to speech recording 1 left to right right 782
4 130 mo cap mocap processed recording 7 left to right left 1080
4 131 still reduced processed text to speech 6 right to left left 948
4 132 mo cap reduced recording recording 4 right to left right 964
4 133 reduced mocap text to sp eech text to sp eech 4 right to left right 1179
4 134 reduced still recording recording 5 left to right left 882
4 135 still still recording pro cessed 3 left to right left 1328
4 136 reduced still processed recording 5 left to right left 881
4 137 still reduced text to speech recording 1 left to right right 881
4 138 reduced mocap pro cessed recording 7 left to right left 1378
4 139 still reduced recording processed 7 left to right right 865
4 140 still mocap text to sp eech pro cessed 6 right to left right 914
4 141 mo cap mocap recording processed 7 left to right right 1065
4 142 mo cap reduced text to speech text to speech 7 left to right left 964
4 143 reduced mocap pro cessed processed 6 ri ght to left right 799
5 0 reduced still text to speech recording 6 right to left right 518
5 1 still still recording text to speech 2 right to left left 1179
5 2 mocap mocap recording pro cessed 8 right to left left 518
5 3 mocap still processed text to speech 1 left to right left 517
5 4 reduced still recording text to speech 1 left to right left 964
5 5 reduced mocap recording recording 8 right to left right 3267
5 6 reduced still recording recording 5 left to right left 517
5 7 mocap reduced text to speech processed 2 right to left right 517
5 8 mocap reduced recording recording 4 right to left left 666
5 9 reduced still recording recording 6 right to left right 521