Content uploaded by Florian Weidner
Author content
All content in this area was uploaded by Florian Weidner on Feb 27, 2023
Content may be subject to copyright.
©
2023 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE Virtual
Reality conference. The final version of this record is available at: 10.1109/TVCG.2023.3247099
Eating, Smelling, and Seeing: Investigating Multisensory Integration
and (In)congruent Stimuli while Eating in VR
Florian Weidner, Jana E. Maier, Wolfgang Broll
A
B
C ED
Fig. 1: Core components of the experiment with easy-to-reproduce and home-made smell samples (A & B), the low-cost Smell-O-
Spoon that delivers gustatory and olfactory cues (C), the virtual products participants saw in VR (D) and a user experiencing our
experiment (E).
Abstract— Integrating taste in AR/VR applications has various promising use cases — from social eating to the treatment of disorders.
Despite many successful AR/VR applications that alter the taste of beverages and food, the relationship between olfaction, gustation,
and vision during the process of multisensory integration (MSI) has not been fully explored yet. Thus, we present the results of
a study in which participants were confronted with congruent and incongruent visual and olfactory stimuli while eating a tasteless
food product in VR. We were interested (1) if participants integrate bi-modal congruent stimuli and (2) if vision guides MSI during
congruent/incongruent conditions. Our results contain three main findings: First, and surprisingly, participants were not always able to
detect congruent visual-olfactory stimuli when eating a portion of tasteless food. Second, when confronted with tri-modal incongruent
cues, a majority of participants did not rely on any of the presented cues when forced to identify what they eat; this includes vision
which has previously been shown to dominate MSI. Third, although research has shown that basic taste qualities like sweetness,
saltiness, or sourness can be influenced by congruent cues, doing so with more complex flavors (e.g., zucchini or carrot) proved to be
harder to achieve. We discuss our results in the context of multimodal integration, and within the domain of multisensory AR/VR. Our
results are a necessary building block for future human-food interaction in XR that relies on smell, taste, and vision and are foundational
for applied applications such as affective AR/VR.
Index Terms—Virtual reality, gustatory interfaces, olfactory interfaces, multisensory interfaces
1 INTRODUCTION
Multisensory integration (MSI) is the process that combines the infor-
mation delivered by the sensory systems into a single percept. This
influences our behavior and experiences [53]. In general, MSI is more
straightforward when the sensory systems deliver stimuli that match
with respect to their identity or meaning. This is called semantic con-
gruency [50].
Relying on MSI, it has been shown that augmented reality (AR)
and virtual reality (VR) can be used to manipulate the perceived taste
of food and beverages by displaying congruent olfactory and visual
stimuli (c.f. Sect. 2). Including such olfactory but also additional
gustatory stimuli in AR/VR and non-immersive applications has shown
potential in, for example, treatment of obesity and eating disorders [37],
psychiatric conditions [44], in consumer behavior research [62], for the
sense of presence in VR [21, 64], in learning environments [23], when
sharing emotions via smell and taste [41], or when enhancing affective
qualities of applications [40].
Despite these benefits and the eagerness of prior research to investi-
gate if perception can be manipulated altogether, it is not sufficiently
explored how olfaction, vision, and gustation interact and influence
MSI. For example, it has been shown that the perception of sweetness
(e.g., Narumi et al. [33]) can be altered by additional congruent cues.
• Florian Weidner (florian.weidner@tu-ilmenau.de, Gerd Boettcher
(gerd.boettcher@tu-ilmenau.de), Jana E. Maier
(jana.maier@tu-ilmenau.de), and Wolfgang Broll
(wolfgang.broll@tu-ilmenau.de are) with the Virtual Worlds and Digital
Games Group, Technische Universit¨
at Ilmenau, Germany.
However, it is unclear how vision, olfaction, and gustation interplay
and influence MSI when trying to change perception beyond the basic
tastes of salty, sweet, bitter, sour, and umami. Further, while it has
been shown that vision dominates when participants are confronted
with competing visual and olfactory cues [29, 57], it is unclear how
a third stimulus — in our case, a tasteless food product — impacts
MSI. Thus, our objective is to further expand the understanding of MSI
in multisensory AR/VR applications by investigating the following
research questions:
RQ1:
Do participants integrate congruent visual and olfactory stimuli
into a single percept while eating a tasteless food?
RQ2:
Are participants guided by their vision when forced to identify
what they consume during visual-olfactory-gustatory incongru-
ency?
To do this, we report on two pre-studies that we performed to find
a tasteless and odorless grocery and suitable odor samples. Based on
these results, we report on our main study and its three experiments
where participants experienced and rated pictures, odors, and a multi-
sensory VR environment. Our core contributions can be summarized
as follows:
•
We present food and smell samples that can easily be reproduced and
do not rely on expensive equipment.
•
We present the design of our prototype “Smell-O-Spoon”, a device
that can be used to alter smell in VR when eating a mash or soups.
•
We report on the interplay of vision, olfaction, and gustation in VR
and how (in)congruency influences perception.
1
By that, we add to the fundamental understanding of MSI — specifi-
cally, how humans react when two or more senses do not agree. We
also enlarge the body of literature on whether complex flavor objects
can be produced or influenced by artificial and virtual stimuli and try to
reproduce and verify prior results.
The remainder of this paper is structured as follows: In Sect. 2, we
present related work on human-food interaction in AR/VR and MSI.
Sect. 3 presents the pre-studies that derive a close-to-tasteless food
product and the appropriate smell samples. Next, we briefly introduce
the Smell-O-Spoon in Sect. 4. The design of the main study is outlined
in Sect. 5. Following, Sect. 6 and Sect. 7 present results and discusses
them. We finish in Sect. 8 with a conclusion and outlook.
2 BACKG RO UN D & RE LATED WORK
2.1 Multisensory Integration (MSI)
The fusion of information delivered by various senses in a spatial and
temporal relationship is called MSI and is often researched by deliver-
ing cross-modal stimuli [52]. MSI is highly important as multisensory
perception has been shown to be stronger than uni-sensory percep-
tion [10] due to cross-modal summation. In addition to that, congruent
stimuli also improve speed and accuracy during perception [63]. In
general, MSI is influenced by the strength, spatial location, and timing
of the stimuli [7].
Considering gustation, olfaction, and vision, not only the quality
of the stimuli is important but also aspects like background color,
background noise, color of the plateware/glassware, or scene lighting
[51]. It has been shown that the flavor of an object can be changed inside
and outside of VR by presenting cross-modal correspondent stimuli.
For example, Sinding et al. [46] successfully changed the perceived
saltiness of a food item by adding a salty-congruent odor. Similarly,
Frank and Byram [13] showed that sweet-congruent odors can increase
perceived sweetness. Narumi et al. [34] added chocolate and tea flavors
(both sweet) to a cookie via various combinations of always congruent
visual and olfactory stimuli and changed the perceived flavor in up to
80% of the cases. While these and other applications have investigated
the interplay of congruent stimuli, there is still a need to fully explore
the delicate mechanisms during MSI as well as for the reproduction of
results, especially in more complex cases beyond the modification of
isolated flavors as in the mentioned examples who change basic tastes
of saltiness or sweetness.
Some authors argue that there exists only one perceptual system and
integration information from different perceptual systems is therefore
not necessary [55, 56]. In this understanding, the perceived information
is already structured but the human cannot interpret them. We discuss
our results in context of this theoretical framework in Sect. 7.4.
2.2 Food, gustation, olfaction, and vision (in VR)
The domain of human-food-interaction has a long history and covers
devices and approaches that facilitate and use smell, taste, electro-
stimulation on the tongue, haptics, and more when working with food
and user interfaces. We briefly present relevant work in the domain of
olfaction, gustation, and multimodal human-food interaction and put it
in relation to MSI and our work.
2.2.1 Olfaction
So far, research has shown that smell can be used for increasing pres-
ence [38], attention management [11], navigation [4], and also to con-
trol the feeling of satiation [24]. Thus, various methods for smell
displays have been presented (next to bulky and expensive commer-
cial olfactometers): Brooks et al. [4] presented a device for digital
experiences that are clipped onto the nose and can simulate smell via
electro-stimulation. Dozio et al. [11] used a diffuser placed on top of a
monitor to disperse the smell towards the user (essential oils) to suc-
cessfully guide attention. Smell-O-Vision [30] (no evaluation) uses a
similar approach and couples it with machine learning to emit essential
oils that match the video that participants are watching. While these
devices are either desktop- or body-mounted, Niedenthal et al. [35]
present a hand-held device that can be used to disperse various smells
according to the VR scene that participants experience. By that, they
enhance presence. Devices that simulate smell in VR are also already
commercially available (e.g., Feelreal [19] or OVR Ion [58]) and have
been used in research. However, they are often expensive, bulky, and
limited to the provided odor samples.
2.2.2 Gustation
Similar to olfactometers, gustometers are used in research, development,
and clinical applications. However, these devices are again bulky,
expensive, and hard to integrate with various VR and AR applications.
Thus, researchers investigated alternatives to create smell samples and
to integrate them into user interfaces (c.f. Vi et al. [59]).
Creating taste samples traditionally follows either a chemical or a
digital approach. The chemical approach uses chemical substitutes
with particular smells or tastes that participants lick or consume. For
example, ideal substitutes for sweet (glucose), sour (citric acid), bitter
(caffeine or quinine), salty (sodium chloride), and umami (monosodium
glutamate), have been identified [59]. This approach was applied by,
for example, Mainez-Aminzade [27] who injected jellybeans with the
above-mentioned ingredients and served them to participants and suc-
cessfully improved memory retention. The digital approach [59] creates
different sensations of taste through electrical and thermal stimulation.
For example, Karunanayaka et al. [22] showed that temperature in-
creases sensitivity to certain sensations such as sweetness. Similarly,
Ranasinghe et al. [40] present a well-perceived device for electro-
stimulation. A third approach — using real food items in user studies —
has been used in human-computer interaction due to its relatively easy
use (e.g., Ranasinghe et al. [40] and Narumi et al. [34]). Here, people
use modified utensils, plateware, or liquid containers to consume the
food. However, no structured approach for generating such samples
has been derived yet. We also opt for this method because of it is ease
of use and present such an approach.
2.2.3 Vision
An often-used approach is to use augmented reality (AR) to modify the
visual appearance of food. For example, Narumi et al. [31] changed
the size of a food item or the color [33] using AR. By that, they could
successfully influence satiety and control nutritional intake. Nakano et
al. [28] did a similar experiment and changed the flavor of noodles by
overlaying machine-learning-generated images in AR. Similarly but in
VR, Ammann et al. [2] investigated how a simple change in the color
of a cake (yellow = lemon/sour; brown = chocolate/sweet) influences
the perception of participants: Results revealed that people had more
difficulties identifying the real flavor when the color was modified,
indicating a conflict in the process of MSI. We expand on this study by
examining the influence of smell in a similar experiment.
2.2.4 Multimodal
Multimodal approaches stimulate several senses via a plethora of ac-
tuators. Ranasinghe et al. enhanced flavor using electro-stimulation
on the tongue and color (Taste+, [42]); electric stimulation, smell, and
color (Vocktail, [43]), and electric stimulation, smell, color, and ther-
mal stimuli [40] by using glasses and spoons equipped with electrodes,
LEDs, fans, and Peltier elements: they present congruent stimuli to
successfully influence the process of MSI and to change the perceived
flavor of a beverage. They also highlight the need of such systems for
applications like communicating smell and taste [41]. Narumi et al.
present MetaCookie [33] and MetaCookie+ [34], a multisensory device
that can change the taste of a cookie by overlaying an image and dis-
playing a different smell via an AR-HMD equipped with tubes and fans.
Similar to our study, they investigate if modulated vision and smell
affect perception by relying on the fact that artificial congruent stimuli
(vision and olfaction) can change the perception of the cookie by being
the dominant stimuli during MSI. However, the influence and interplay
of various combinations of congruent and especially incongruent cues
have not been explored yet.
Lin et al. [25] developed a tool called “TransFork”. As it was an
inspiration for our setup, we present it in more detail. The device
consists of a regular fork, a container with the olfactory stimulus, a fan,
a battery, and an AR marker. An AR-HMD tracks the marker and by
2
©
2023 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE Virtual Reality
conference. The final version of this record is available at: 10.1109/TVCG.2023.3247099
that, knows the position of the food item. Thus, the color of food can
be changed. The fan disperses the smell from the container toward the
user’s nose. We build upon this device with our Smell-O-Spoon (c.f.
Sect. 4).
The related work shows that there is ongoing research interest in un-
derstanding and formalizing how VR and AR can be used to influence
MSI. So far, it has been shown that inherent properties like saltiness
or sweetness can be modified by presenting quality-congruent odors.
It has also been shown that these properties can be influenced while
consuming a food or a beverage by applying one or a set of congruent
stimuli (e.g., MetaCookie+ or Vocktail). Besides trying to reproduce
these results in other settings and with other food items — a contribu-
tion in itself — we set out to investigate the relationship during MSI
when visual, olfactory, and gustatory stimuli are congruent but also
incongruent and when related to more complex flavor objects (e.g.,
zucchini to cucumber). By that, we hope to deepen the understanding
of MSI and pave the way for future multisensory user interfaces. Thus,
we present an experimental laboratory study where we investigate how
participants react when they eat a tasteless food product and are, at the
same time, presented with virtual visual and artificial olfactory stimuli.
3 PR E-STUDIES
In the following sections, we present the results of two pre-studies. In
the first one, we derived a food product with a neutral taste and without
odor. In the second one, we selected appropriate odor samples. The
pre-studies were performed in accordance with the ethical guidelines
of the host institution and the guidelines proposed in the declaration of
Helsinki.
3.1 Identifying a neutral food product
As we want to investigate vision and olfaction, we needed a product
that — at best — has no taste and odor at all. At the same time, we
want to be able to process the food into a mash. We selected a mash as
our gustatory stimulus as it is easy to process but is also a believable
product as various food mashes already exist (e.g., for children). To get
a neutral product, we tested a variety of food products regarding taste
and odor but also towards consistency when mashed. A pre-selection of
fruits and vegetables resulted in the inner of raw zucchini, raw potatoes,
raw cucumber, and cooked tofu (inspired by Narumi et al. [34]).
For our pre-study, we invited participants to taste the products that
were served as a cold mash. Participants wore sleeping masks, so they
could not see the color of each mash and were told to focus on gustation
and olfaction. The experimenter sampled a piece of mash on the tip
of a spoon and told the participants to start eating. When a participant
finished the trial, they were told to put down the mask and answer
the questionnaire. The mash of each product was served one after the
other, following the same order for each participant. After each trial,
participants were advised to drink water to clean the oral cavity [25].
3.1.1 Measures
The questionnaire asks for the name of the product, for neutrality (1
= “not neutral at all” to 5 = “very neutral”) and if participants are
confident in identifying the product (1 = “not able to identify at all“ to
5 = “clear identification”). The next questions ask for familiarity (1
= “not familiar at all” to 5 = “very familiar”) and intensity (1 = “very
weak” to 5 = “very strong”). The questionnaire can be found in the
supplemental material [61].
3.1.2 Sample
We invited ten participants (6 female, 4 male, mean age 26.5 years, 21 –
36 years, SD = 4.4 years). None had food allergies or intolerances. All
gave informed consent, were informed about their rights, and were told
that they can stop the experiment at any time. We did not undertake any
specific measures for sample diversity. There was no compensation.
3.1.3 Results and Discussion
Fig. 2 shows the results on neutrality (highest better), identification
(lowest better), familiarity (lowest better), and intensity (lowest better).
Raw zucchini mash and boiled tofu mash were evaluated as very neutral
1
2
3
4
5
6
raw potato mash raw zucchini
mash
soja tofu mash cucumber mash
Neutral Identification Familiarity Intensity
Fig. 2: Evaluation results presenting the mean and standard deviation
of the neutral trial product [min = 1, max = 5].
with an average of 4.7 (SD = 0.46) compared to tofu with M = 4.0
(SD = 0.63). Zucchini mash received the lowest intensity rating (M =
1.3, SD = 0.64). Raw zucchini mash reached also received the lowest
identification value (M = 2.1, SD = 1.3), and 7 out of 10 participants
could not identify the product at all. The other 3 had a vague idea but
could not confidently identify zucchini. Based on these results, we
chose a mash made out of the inner of zucchinis (cooked but served
cold) as our close-to-neutral and close-to-odorless product.
3.2 Deriving odor samples
Having a food product that has little-to-no identifiable taste, another
ingredient for our study are odor samples of food items.
3.2.1 Initial odor selection and sample creation process
We tested artificial odors (essential oils) from two companies called
herrlan-shop.de [49] and aromakonzentrate.com [20]. We ordered the
following samples: apple cherry, banana, tomato [49], pear, orange,
radish, cucumber, carrot, and cabbage [20]. This selection was done
based on availability.
For each chemical odor, we created a matching natural odor [15,16].
Here, we took 45 grams of the most intense part — the peel [17] — and
mixed it with 15 ml of glycerin. The mixture was then conserved in
a hermetic preserving jar for two weeks. Every day the glasses were
shaken for 20 seconds. After two weeks of preserving, the liquid was
separated from the peel and the glycerin had acquired the smell [16]
(c.f. A and B in Fig. 1).
Having a set of odors, an initial pre-selection by the authors was
necessary: the samples of apple, pear, and cabbage were excluded
because of a weak natural smell. The samples of radish and cherry
were excluded because of an aggressive and unpleasant smell.
3.2.2 Procedure
Participants arrived on site and were informed about their rights and
the purpose of the study. Next, participants were instructed to smell the
odor and to answer the questionnaire. The odors are banana, cucumber,
tomato, carrot, orange and for each one, we sampled a chemical as well
as a natural version. For each odor, we applied 5 drops on a separate
piece of tissue. Participants experienced the odors in random order. We
ensured that the testing environment was well-aired to avoid lingering
odors.
3.2.3 Measures
We again asked for the name of the product, the perceived neutrality
of the smell, confidence in the identification of the smell, familiarity,
and intensity. Inclusion criteria are high values of identification and
3
1
2
3
4
5
Pleasantness Identification Familiarity Intensity
Fig. 3: Results of the smell sample evaluation [min = 1, max = 5]. Se-
lected items are labeled with “*”. Ideal product would have (5,5,5,2.5-
3.5).
medium to high-intensity ratings. A product that smells very strong is
not ideal, because the level of pleasantness decreases [6]. Moreover,
the odors should reach high values in familiarity [6]. The questionnaire
can be found in the supplemental material [61].
3.2.4 Sample
The preliminary smell experiment was conducted with ten participants
(5 female, 5 male, M = 26.4 years, 21 – 36, SD = 4.03 years). All
reported no condition that impaired their sense of smell or taste and
no food allergies or intolerances. All gave informed consent, were
informed about their rights, and were told that they can stop the ex-
periment at any time. We did not undertake any specific measures for
sample diversity. There was no compensation.
3.2.5 Results and Discussion
Fig. 3 illustrates the results. 4 products achieve high identification
results: chemical banana, natural carrot, chemical tomato, and natural
cucumber. Chemical banana reached the highest mean value with 5.0
(SD = 0). Natural carrot scored M = 4.2 (SD = 0.56), natural cucumber
was rated M = 4.2 (SD = 0.98), and chemical tomato was rated with
M = 3.7 (SD = 0.781). Chemical banana as well as natural cucumber
also score high in pleasantness and familiarity. They also show an
intensity above average. Chemical tomato is also rather familiar and
intense but tends to be slightly unpleasant. The natural carrot was also
easily identified, familiar, and intense, but less pleasant. Here, the high
identification value led to us selecting it for the main study. Thus, the
final odors are chemical banana, natural carrot, natural cucumber, and
chemical tomato (labeled with “*” in Fig. 3).
4 SM EL L-O-SPOON
We needed a device to display the smell while providing participants
with the possibility to eat the mash while wearing the VR headset.
Fig. 4 shows our Smell-O-Spoon, inspired by “TransFork” [25]. It
consists of a fan (A, 15mm, 5V, 9300rpm), a metal spring for the smell
samples (B), a common household spoon (C), a USB power supply
(D), and a marker for a motion tracking system (E) attached to a 3D-
printed extension. We selected a wired solution (0.25mm wire) to avoid
the heavy weight of a battery and avoid a non-uniform airflow due to
decreasing battery life. The metal spring as well as the marker are
attached via Velcro tape. We covered the metal spring with black, non-
reflective tape to avoid interference with the optical tracking system.
Similarly, we sanded down the spoon to minimize reflections. The fan
has a USB cable connection including a switch and a potentiometer.
The potentiometer was used to regulate the speed of the fan and set to
A
B
C
D E
Fig. 4: The olfactory feedback device “Smell-O-Spoon”: (a) mini fan,
(b) container for aromatic box, (c) main spoon, (d) usb power supply
with on/off switch and potentiometer, (e) OptiTrack markers
Fig. 5: Smell sample for the “Smell-O-Spoon” (2.5 cm ×0.5 cm).
500
Ω
. To minimize the vibration caused by the fan, a piece of foam
was added directly below the smell container (Fig. 4, bottom/right).
An OptiTrack motion tracking system [36] tracked the real spoon’s
position.
Following the procedure of Vi et al. [59], a little smell pad was
prepared for each of the four odors. The pad consists of tissue infused
with five drops of the odor sample and is wrapped in tape. We used
tweezers to place the smell pad into the spring. The pads are easily
interchangeable and due to the tape, no liquid contaminates the Smell-
O-Spoon. Fig. 5 shows such a smell sample.
Together, the Smell-O-Spoon and the pad weighs 83 grams. For
comparison, the spoon alone weighs 23 grams.
5 MAIN ST UDY
With the odor samples and the neutral food product ready, we describe
the main study in the following sections. Visual and olfactory stimuli
are banana, carrot, tomato, and cucumber. In addition to that, two
irritation products were included (visual only; mushroom and radish).
We included these products as we assumed that participants would
quickly match the smells and images. The irritation products were
supposed to disrupt any pattern. Fig. 1 (D) show images of the products
as seen in VR.
Our main experiment had three phases:
1.
(online) Evaluation of screenshots of the products: participants
received a link to the online questionnaire.
2. (onsite) Evaluation of the four odors.
3. (onsite) Evaluation of the multisensory VR application.
In each phase, participants answered a series of questions. In gen-
eral, our questionnaires are inspired by prior research from Chen et
al. [5], Chrea et al. [8] and Chifala and Polzella [6] (as indicated in
the appendix). Chen et al. researched the influence of visual-olfactory
(in)congruency on sweetness in VR; their article is from 2020 and is
rather new. Chrea et al. research the impact of culture on odor percep-
tion and categorization. (2003, cited 135 times according to Google
Scholar in works that revolve around odor perception and multimedia).
We adopted their questions about familiarity, pleasantness, and intensity.
4
©
2023 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE Virtual Reality
conference. The final version of this record is available at: 10.1109/TVCG.2023.3247099
Fig. 6: Third-person view of the virtual environment. It is a recreation
of the local laboratory.
Chifala and Polzella investigated the contribution of odor to flavor per-
ception. We adopted their grid-based rating system for sweet/sour and
intense/mild as well (1995, cited by 12 according to Google Scholar;
not in the scope of this paper). We had to make modifications to their
questions to evaluate the specific aspects within our tri-modal setting
that also includes gustation. None of these questionnaires have been
validated. We will acknowledge this limitation but still believe that the
questions are appropriate to answer our research questions.
5.1 Phase 1: Picture evaluation
5.1.1 Procedure
After they registered for the experiment, participants received a link
to the online questionnaire to evaluate the screenshots. It contained
the informed consent form, and the purpose of the study, and asked
for demographic information (age, gender), and VR familiarity. The
screenshots were presented in random order.
5.1.2 Measures
For each image, we asked for identification of the product, how much
participants liked it (1 = “not at all” to 5 = “like it a lot”, and for
pleasantness (1 = “not pleasant at all” to 5 = “very pleasant”). We also
asked for familiarity (1 = “not at all familiar” to 5 = “very familiar”)
and intensity (1 = “very weak’ to 5 = “very strong”).
5.2 Phase 2: Smell evaluation
5.2.1 Procedure
The second phase took place onsite in the laboratory and included an
evaluation of the olfactory impulse. The experimenter handed the smell
samples to the participant in randomized order.
5.2.2 Measures
For each sample, participants answered a questionnaire. First, they had
to identify the odor. Next, they answered the same questions as in phase
1 about how much they like this product, its pleasantness, familiarity,
and intensity. We use the values from phases 1 and 2 to determine if
the perception of the products changes when congruent or incongruent
stimuli are presented in phase 3.
5.3
Phase 3: Evaluation of congruent and incongruent
pairs of olfactory and visual stimuli in VR
5.3.1 Procedure
When participants had finished phase 2, they were guided to the VR
lab. Here, they put on the HTC Vive Pro Wireless, standing in the door
frame. The VR scene showed a replication of the actual laboratory and
static objects provided passive haptic feedback (e.g., desks, computers;
c.f. Fig. 6). Next, participants received training in handling the spoon
so that they could confidently put it in their mouth and back on the
table (without food). As soon as they felt comfortable, the experiment
started. For each trial, the food was visualized as a whole, sliced, and
as mash to present salient visual cues. At the beginning of each trial,
the experimenter inserted a smell pad into the Smell-O-Spoon, scooped
a portion of the neutral product on the spoon, and laid it in the bowl.
condition Trials per
participant Participants Trials per
product
Total
trials
Bi-modal congruent 4 30 1 120
Tri-modal incongruent 4 30 3 360
Irritation products 4 30 1 120
Table 1: Number of trials per condition participants experienced.
Then the participant was asked to pick it up and eat the product while
smelling the odor from the pad and seeing the product in VR.
Each participant performed 20 trials in a counterbalanced order.
Each trial represents a combination of olfactory and visual stimuli
banana, carrot, tomato, and cucumber (4
×
4 = 16). In addition to
that, 2 trials with mushrooms and 2 trials with radish were added as
irritation products (without any artificial olfactory stimulus). Note,
participants always ate the raw zucchini mash but saw and smelled the
selected products. We split up the 16 + 2 + 2 = 20 combinations into
4 sub-groups. Each group contained 3 incongruent pairs of olfactory
and visual stimuli, one irritation product, and one where olfaction and
vision are congruent. Thus, each group contains three unmatched trials,
one matched trial, and one irritation product. Overall, we ensured that
every participant tried every combination of the olfactory and visual
stimuli under observation (16 in total).
Table 1 provides an overview of the total trials in the VR condition.
30
×
4 = 120 trials were bi-modal congruent trials where visual and
olfactory stimuli displayed the same product (30 participants, 4 trials
per participant, 1 product) and the gustatory stimulus was incongruent.
In 30
×
4
×
3 = 360 trials, all three stimuli were incongruent (tri-modal
incongruency, 30 participants, 4 trials per product, 3 products). 30
×
2
×
2 = 120 of those were irritation products (2
×
30 = 60 for
radish and 2
×
30 = 60 for mushroom). We added them to disrupt
any patterns. Thus, one participant had to do 20 trials, 4 of those were
irritation products, 4 were congruent (each product once) and 12 were
incongruent.
Between each trial, participants drank pure, non-sparkling water to
clean the oral cavity [25]. The laboratory was ventilated with a constant
stream of fresh air.
5.3.2 Measures
Following each trial, participants answered a questionnaire. We again
asked for the name of the product they thought they ate, pleasantness
(1 to 5), and intensity (1 to 5). In addition to that, we asked participants
if what they saw and what they smelled differed (yes/no).
5.3.3 Materials
The VR environment was built using Unity 2020.3. The Smell-O-Spoon
was tracked using Motive 2.1. A view from a third-person perspective
is illustrated in Fig. 1 (E). More information on the VR application is
available upon request.
5.4 Sample
A total of 30 participants (15 female, 15 male, mean age is 28.83 years,
max. 61 years, min 21 years, SD = 8.79 years) were recruited via a mail-
ing list of the local university and via Facebook. The selection criteria
were as follows: healthy individuals with normal vision and olfaction
and no interfering history of neurological or psychiatric disorders, no
synaesthesia, and no food intolerance or allergies considering fruits and
vegetables. Participants were informed about the risks of body reactions
and that the food products could trigger adverse effects. Besides that, a
contact tracing form due to the COVID-19 situation had to be signed by
the participants. All gave informed consent, were informed about their
rights, and were told that they can stop the experiment at any time. The
study was executed following the guidelines of the local university, the
national research organization, and the declaration of Helsinki. We did
not undertake any specific measures for sample diversity. The average
duration of the experiment was scheduled for about 45 minutes for each
participant. There was no compensation.
5
Influence What I see and what I smell...
...is the same ...is not the same
Congruent trials 11.0% (12.5%; -1.50%) 14.0% (12.5%; +1.50%)
Incongruent trials 15.4% (37.5%; -22.1%) 59.6% (37.5%; +22.1%)
Table 2: Percentages of people who thought they experienced congruent
or incongruent visual-olfactory stimuli grouped by trials. The number
in brackets is the chance at random and the deviation of our results
from random.
6 RE SU LTS
Participants experienced congruent or incongruent visual, gustatory,
and olfactory stimuli in VR. The visual stimuli were banana, cucumber,
tomato, carrot, radish, and mushroom (the latter two were irritation
products). Each olfactory and visual stimulus was sampled four times
per participant. For the irritation products, no smell at all was presented.
The olfactory stimuli were banana, cucumber, tomato, and carrot. The
gustatory stimuli was always a cold served mash of zucchini. The
olfactory and neutral gustatory stimuli were derived in pre-studies (c.f.
Sect. 3).
6.1
RQ1: Ability to integrate congruent visual and olfactory
stimuli while eating tasteless food
We split the analysis into two parts: First, we present data about whether
participants were able to form a unified percept out of the bi-modal
congruent visual-olfactory stimuli (“Does what you smell and see
differ/is the same?”). Second, we report on whether the bi-modal
congruent visual-olfactory stimuli together with gustation lead to a
percept that aligns with the visual-olfactory stimuli (“What do you
eat?”).
6.1.1 Identification of visual-olfactory congruency
In the VR environment, participants had to state if what they smelled
and what they saw represented the same stimulus with yes or no. Table 2
shows the percentages of people who said the stimuli are congruent
or incongruent. The figure also contains the values of chance and
the difference between chance and our results. Note, out of 480 total
trials, the number of bi-modal congruent trials (120) was lower than
the number of tri-modal incongruent trials (360).
A Chi-squared test revealed a difference in distributions between
congruent and incongruent trials (
X2
(1, 480) = 24.6, p
<
0.0001,
φ
= 0.23). That means that participants answered differently in the tri-
modal incongruent trials compared to the bi-modal congruent trials.
Responses on a perceived visual-olfactory congruency in bi-modal
visual-olfactory congruent trials are not significantly different to chance
(p
>
0.3338), according to a binomial test. Thus, participants were —
unexpectedly — close to chance (12.5% +/- 1.5%; 53 and 67 out of 480
in total and out of 120 bi-modal congruent trials). That indicates that
participants were not able to integrate the visual and olfactory stimuli
into a single percept while eating a tasteless food product: they did not
detect matching olfactory and visual stimuli.
Results of a binomial test suggest that responses on perceived visual-
olfactory congruency in tri-modal incongruent trials are significantly
different to chance (p
<
0.001). Thus, participants were — as ex-
pected — better than chance when detecting the incongruent stimuli
with 59.58% (286 out of 480) (+22.2%). Still, 15.42% (74 out of 480)
stated that what they saw was the same as what they smelled (22.1%
less than chance).
6.1.2 Identification of consumed food (gustation)
Overall and considering all trials, participants mentioned a large variety
of products. Fig. 7 illustrates these products.
Zucchini was only mentioned 7 out of 480 times, indicating that the
zucchini mash is a good neutral product whose lingering taste can be
hidden by olfactory and visual stimuli. Radish and mushroom, partici-
pants mentioned 21 and 23 times. However, radish was mentioned only
once while the visual cue also showed radish. The remaining times
0
10
20
30
40
50
60
70
80
90
100
110
120
Cucumber*
Carrot*
Banana*
Tomato*
Mushroom
n/a
Radish
Vegetable
Cabbage
Apple
Orange
Zucchini⁺
Potato
Pumpkin
Fruit
Cranberries
Mango
Melon
Pear
Califlower
Pepper
Coconut
Cornflakes
Tri-modal incongruency; followed none
Tri-modal incongruency; followed olfaction
Tri-modal incongruency; followed vision
Bi-modal congruency; did not follow congruent stimuli
Bi-modal congruency; followed congruent stimuli
Bi-modal congruency; unrelated food
Fig. 7: Overview of the products participants thought they consumed
grouped by tri-modal incongruency (blue, orange, gray) and visual-
olfactory congruency (yellow, green, red). Cucumber, carrot, banana,
and tomato (
∗
) were visual and olfactory stimuli. Participants actually
ate zucchini (+).
Product N Identified food is equal to visual-olfactory
bi-modal congruent stimuli
% of
N = 480
Cucumber 30 20 4.17
Carrot 30 17 3.54
Banana 30 14 2.92
Tomato 30 9 1.88
Table 3: Number of identifications of matched stimuli when considering
only the 120 matched trials: participants named the product they saw
and smelled (chance: 3.13%)
radish and mushroom were named were either during visual-olfactory
congruency (where neither radish nor mushroom was displayed) or
during tri-modal incongruency (where, again, neither mushroom nor
radish was displayed). Note, both products acted as distractors to avoid
participants from focusing on the four core products banana, tomato,
carrot, and cucumber. In retrospect, given the large variety of foods
that were named, these distractors would not have been necessary.
While participants mentioned cucumber, carrot, banana, and tomato
most often — regardless of congruent or incongruent condition — they
also mentioned other products like apple, melon, or cornflakes. Some
participants were not able to name any product and gave no answer at
all — even after being prompted repeatedly — or simply said vegetable
or fruit. This means that participants had trouble forming a unified
percept out of the multisensory stimuli. Note, this happened almost
equally often in tri-modal incongruency (n = 36; 10% out of 360 trials)
and in trials with visual-olfactory congruency (n = 10, 9.2% out of 120
trials). During tri-modal incongruency, many items were completely
unrelated to the stimuli we displayed to them (201 out of 360, blue in
Fig. 7). This means that participants mentioned a food that was neither
displayed by vision, olfaction, or gustation.
For 120 trials, vision and olfaction were congruent (bi-modal visual-
olfactory congruency; yellow, green, and red in Fig. 7). Table 3 shows
the number of correct identifications for the matched trials, grouped by
stimulus under investigation. Cucumber was identified most often (20
out of 30). Besides this, more than half of the matched influences of
carrot were identified as such (17 out of 30). Banana was mentioned
46.7% out of 30 matched trials and tomato was named 9 times. Out of
a total of 120 matched trials, participants identified 60 products as what
they saw and smelled, following the congruent visual and olfactory
stimuli. Assuming that participants do not integrate the multisensory
6
©
2023 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE Virtual Reality
conference. The final version of this record is available at: 10.1109/TVCG.2023.3247099
Influence N Stimulus aligned with the identified
product during tri-modal incongruency
% of
N = 480
Visual stimuli 360 97 20.2
Olfactory stimuli 360 54 11.3
Neither 360 209 43.5
Table 4: Number of times participants followed vision or olfaction dur-
ing incongruent trials. Chance to randomly follow vision or olfaction
and not gustation or neither is 18.75% each.
cues and considering the 480 trials, a 25.0% chance to get a congruent
trial, and a 50% chance to mention either vision or olfaction (and not
zucchini or a random product), and the four products to guess, the
chance to guess the product at random is 3.13%.
Results of a binomial test suggest that the number of times partici-
pants followed the bi-modal congruent visual-olfactory cues was not
significantly different to chance (p
>
0.1467). Thus, our results are
very close to chance with cucumber (4.17%) and carrot (3.54%) being
slightly above chance whereas banana (2.92%) and carrot (1.88%) are
below chance. This means that, while some participants named the
product they saw and smelled, overall, the bi-modal visual-olfactory
congruent stimuli did not influence multisensory perception while eat-
ing our neutral food product. Rather, results are close to chance.
6.2
RQ2: Guiding sensation during tri-modal incongruency
Table 4 illustrates the proportions of which stimulus aligned with the
product identification of participants when presented with tri-modal
incongruent stimuli. Here, 97 (26.9% of 360) named the visual stimulus
when asked to identify the product. 54 (15.0% of 360) identified the
smell. 203 times, participants named a completely different product.
This means that 41.9% of 360 named a product that was represented
by either the visual or the olfactory stimulus. 58.1% on the other hand
mentioned a completely different product (or no product at all; c.f.
Fig. 7). Assuming that participants do not integrate the multisensory
cues and considering the 480 trials, a 75% chance to get an incongruent
trial, and a 25% chance to mention either vision or olfaction (and not
zucchini or a random product) the chance for following vision and
olfaction are 18.75% each, whereas the chance for another product is
35%. According to binomial tests, participants did not mention the
product they saw more or less often than chance (p = 0.413). However,
they mentioned what they smelled less often than chance (p
<
0.001)
and mentioned other products more often than chance (p
<
0.001).
Thus, fewer people than chance predicts followed olfaction (-7.5%).
More people than chance predicts mentioned a completely different
product (+6.04%).
6.3 Intensity and Pleasantness
Fig. 8 shows the data on pleasantness and intensity grouped by the
three phases. In phase 1, participants were asked about how intense
and pleasant an image of the product was, in phase 2 how intense
and pleasant the smell was, and in phase 3 they answered the same
questions about the product they thought they consumed.
A repeated measures ANOVA for pleasantness suggests significant
differences between the three phases (picture, odor, VR;
F
(2, 717)
= 27.66,
p<
.001). We calculated a post-hoc Tukey-Kramer test
which revealed differences between the picture evaluation and the odor
evaluation (p
<
0.001, 95% C.I. = [0.53; 1.07]) as well as the picture
evaluation and the VR evaluation (p
<
0.001, 95% C.I. = [-0.80; -0.38]).
That means that participants — overall — gave higher pleasantness
ratings when they just saw the image.
Similarly, we analyzed intensity by phases. A repeated measures
ANOVA indicated significant differences (
F
(2, 717) = 17.58,
p<
.001) between three phases (picture, odor, VR). A post-hoc Tukey-
Kramer test states differences between the VR evaluation and the odor
evaluation (p
<
0.001, 95% C.I. = [-0.67; -0.20]) as well as the VR
evaluation and the picture evaluation (p
<
0.001, 95% C.I. = [-0.70;
-0.23]). This means that participants rated the VR experience as, on
average, less intense.
1
1,5
2
2,5
3
3,5
4
4,5
5
Picture
(Phase 1)
Odor
(Phase 2)
VR
(Phase 3)
Picture
(Phase 1)
Odor
(Phase 2)
VR
(Phase 3)
Pleasentness Intensity
***
***
***
***
Fig. 8: Mean values of intensity and pleasantness according to the three
phases with standard deviation (c.f. Sect. 6.3). Note, these values do
not come from the pre-studies but from the main study.
For both, intensity and pleasantness in the VR trial, split by type
of trial (bi-modal congruent vs. tri-modal incongruent), there were no
significant differences in group means as indicated by a Students t-test
(
t
(478)
<
-0.8751,
p>
.362) with M
congruent
= 2.80 (SD = 0.91) and
M
incongruent
= 2.71 (SD = 0.96) for intensity and M
congruent
= 3.23 (SD
= 0.82) and Mincongruent = 3.16 (SD = 0.77) for pleasantness.
7 DISCUSSION
7.1 Intensity and Pleasantness
We measured the intensity and pleasantness of the three phases (picture,
odor, VR) to control for the general attitude of participants towards the
product.
Overall, pleasantness ratings were average (M = 3.3) on a scale
from 1–5. However, there are significant differences between the
picture phase (phase 1) and phases 2 and 3. A possible explanation
for this could be that, when humans see a product, they imagine the
best taste [60]. It could also indicate that the selected odors were not
perceived as pleasant and impaired the ratings. The latter assumption is
backed by Fig. 3 where the pleasantness ratings of the products range
from M = 2.1 (SD = 1.37) for tomato to only M = 3.6 (SD = 0.97)
for cucumber. In addition to that, in the pre-studies, the scents were
compared with ones that did not end up in the main study, but which
were rated as less pleasant, thereby inflating the ratings of the ones that
were, comparatively, better.
Regarding intensity, the VR condition was rated less intense com-
pared to odor and picture evaluation. We believe that the cross-modal
mismatch led to the low-intensity ratings because participants could
not form a common percept and attribute intensity to it. They rather
perceived the incongruent cues and found none of them particularly
intense (c.f. Dalton et al. [10]). Thus, our results suggest that when
intensity is relevant in VR-based human-food-interaction, cross-modal
congruency is a factor to consider.
7.2 RQ1: MSI during bi-modal congruency
RQ1 Do participants integrate congruent visual and olfactory stimuli
into a single percept while eating a tasteless food?
RQ1 asks for the ability of people to integrate congruent visual-
olfactory stimuli while experiencing gustatory incongruency. We ex-
pected that a majority of participants would detect the tri-modal in-
congruent and also the bi-modal congruent stimuli. To answer this
question, we asked participants if what they saw and what they smelled
is the same and to name the product they ate.
7
As expected, significantly more participants than chance predicts
detected the tri-modal incongruency (see Sect. 6.1.1). Still, in 74
incongruent trials, participants thought they sensed matching stimuli
— thus seemingly integrating the three incongruent cues. A closer
look at the data did not reveal a specific product combination among
those 74 trials. This rather large number motivates further studies on
what led to their answers. We suggest adding open questions or a semi-
structured interview to the experimental procedure to further investigate
participants’ reasoning and decision processes. Grounding the research
within an alternative theoretical framework might also explain these
results (c.f. Sect. 7.4).
Contrary to our expectations, the number of identified visual-
olfactory congruent stimuli is not significantly different than chance
predicts (see Sect. 6.1.1). These results suggest that people were not
able to integrate the bi-modal congruent cues (vision and olfaction)
while eating a neutral food product into a single percept. One reason
for this could be that the texture of the real food product as well as the
retronasal smelling lead to a sensory conflict and hampered MSI. The
former means that, while we presented a mash, the mouthfeel of the
zucchini mash was different from when consuming a mash depicted
by our bi-modal congruent cues (banana, tomato, carrot, cucumber).
The latter would mean that a light lingering smell of the zucchini mash
— although not identifiable (c.f. Sect. 3) — creates a sensation that is
perceived by the olfactory system through the oral cavity. In addition
to that and despite the encouraging results of the pre-study where we
selected the smell samples (c.f. Fig. 3), we can not rule out that the
visual and olfactory stimuli were simply perceived as different by par-
ticipants. In summary, most participants did not integrate tri-modal
incongruent stimuli (as expected) and had problems integrating the
congruent visual-olfactory stimuli in a unified percept (unexpected).
We further expected that MSI in the bi-modal congruent stimuli
would lead to participants identifying the consumed food more often
as what they saw and smelled (as it has been shown for AR with a
cookie [33] and for beverages [43]).
In 60 out of 120 trials participants named the product that they saw
and smelled (c.f. Table 3). Interestingly, this includes 7 identifications
where participants did not think that what they saw and what they
smelled was the same — but they still answered correctly. The other 60
trials did not lead to identifications that followed the congruent cues.
Overall, the results were not significantly different to chance and in
opposition to other approaches such as MetaCookie+ [32], Ammann
et al. [2], and Vocktail [43]. It could be that changing an inherent
quality of a food (neutral to sweet or chocolate [32] or neutral to sour
or citrus [2]) is more promising than changing the food category (e.g.,
zucchini to tomato) as the difference between the perceived cues and
the actual tasted food is too large. This is backed by the assimilation-
contrast theory [48] which warns of too large discrepancies during MSI.
It could be that our discrepancies were too large. These specific aspects
need to be explored by further psychophysiological and perceptual user
studies. For example, by investigating if combined effects from Narumi
et al. [32] and Ammann et al. [2] can be reproduced with mashes (e.g.,
change sweetness and sourness).
7.3
RQ2: Guided by vision during tri-modal incongruency?
RQ2: Are participants guided by their vision when forced to identify
what they consume during visual-olfactory-gustatory incongruency?
To answer RQ2, we analyzed the 360 incongruent trials and tested if
more people than chance predicted named the product they saw.
Considering only the 360 tri-modal incongruent trials, the identifica-
tion of products revealed that 97 of 360 times, participants named the
product they saw, regardless of what they ate and what it smelled like.
Similarly, 54 out of 360 times, they mentioned the smell and ignored
what they tasted and saw. We expected that the visual stimuli are more
dominant than olfaction when identifying incongruent stimuli, as vision
is the dominant sense of humans [14, 18].
However, our results do not support this dominance during tri-modal
incongruency. While participants mentioned the smell significantly
less often than chance, they mentioned what they saw not significantly
more or less often than chance. This is indeed different from previous
results observed in bi-modal experiments (e.g., Nambu et al. [29] and
Tanikawa et al. [57]). We assume that the tri-modal incongruency
(but also the combination of mouth-feel and texture and latent/faint
retro-nasal smelling due to the residual smell of the zucchini) hamper
multisensory integration compared to previous bi-modal conditions.
Participants often mentioned other products that were not related to
any cue: in more than half of the incongruent trials (209 of 360), they
selected a completely different product (c.f. Fig. 7). Here, it seems
that participants were aware of the incongruent stimuli and were not
able to integrate them into a unified percept. Thus, they tried to find
a food product that matches all three sensations as close as possible —
without relying on any of the presented stimuli. This, again, could be
related to the assimilation-contrast theory that states that too large of a
discrepancy reduces participants’ ability to form a percept [45, 47].
7.4
Our work in the context of ecological psychology and
the existence of only a single perceptual system
Stoffregen & Brady [54,56] argue that there are no different parallel per-
ceptual systems that perceive input from corresponding single ambient
energy arrays (e.g., the optical energy array). In this case, there would
be no need for imposing a structure on those inputs (a process related
to multisensory integration). In consequence, they argue, intersensory
conflicts do not happen or exist at all as information does not need to be
integrated. They postulate that perception in the organism-environment
system happens via higher-order information available in the global en-
ergy array. In this understanding, errors in perception and performance
do not imply a lack of specificity but rather the need for further percep-
tual differentiation (learning) to properly exploit the information in the
global energy array. In other words, there is no conflict between visual,
gustatory, and olfactory stimulation but rather the pattern described
by these three single energy arrays does not correspond to the human
experience and, thus, cannot be interpreted by our single perception
system (without further perceptual differentiation). This view offers
alternative explanations for our observations. For example, this would
explain why so many participants in our experiment mentioned com-
pletely seemingly unrelated items or had trouble settling on a single
item: the mapping between the higher-order information of the global
array and the perception system was simply not specified.
With our setup, we present an experimental design that allows for
the manipulation of individual parameters in the global array. In this
interpretation, we do not manipulate the individual senses with our
setup but rather expose the single perception systems with a global
energy array that is composed of our manipulated single-energy arrays.
Thus, we follow the reciprocal method of the one proposed by Fouque
et al. [12] where we keep parts of the global array fixed and vary the
individual forms of energy (e.g., within the optical array by changing
the visual stimulus).
With that, a procedure similar to that of Mantel et al. [26] that objec-
tively quantifies the contribution of our individual stimuli towards the
global energy array might provide means to better interpret and under-
stand perception in tri-modal situations. Continuing this line of thought
and inspired by ecological psychology, this might result in equations
that describe the human’s perception process of the global array and
the parameter specifying “taste” within the organism-environment sys-
tem. To do this, additional measures can be applied. For example, a
scale from 0–100 matching the proportion of food the mash was made
out of to what you see/smell and/or a scale from 0–100 to judge the
strength of the flavor of the corresponding food could help to explore
this theoretical framework. By that, it is possible to test the influence
of the stimuli on the perceived strength of food’s flavor (whereas flavor
can be seen as a higher-order variable).
To our knowledge, the perception of a global array by a single per-
ception system has not yet been discussed within the context of AR/VR,
taste/flavor, and the human-food-interaction community. However, our
technical setup and our results can act as a basis for a complete set of
pairwise comparisons following Fouque et al. [12]. Here, further con-
ditions are necessary to allow for a full description of the comparison
space (i.e. adding tomato, carrot, banana, and cucumber mashes) to
describe the individual influence of single energy arrays on the struc-
8
©
2023 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE Virtual Reality
conference. The final version of this record is available at: 10.1109/TVCG.2023.3247099
ture in the global array and thus the higher order information perceived
by humans. Thus, our work can then help to describe the organism-
environment interaction within the context of taste perception similar
to Mantel et al. [26]. By that, our work can help to explore this theory
and is also part of the research that tries to understand the fundamental
processes of human perception and the sensitivity to structures in the
global array [54, 56].
7.5 Limitations
There are numerous individual factors that can affect the perception of
taste and due to the scope of this research, we could only integrate a few.
For example, trigeminal sensations, mouthfeel, and also the individual
contribution of retro-nasal and ortho-nasal olfaction [50] are promising
factors for future research. Further, we had an international sample
who have various experiences with food and products — a factor that
should be investigated further [39]. We also did not instruct participants
to not wear any scented deodorants, aftershaves, or perfume so our
environment might have been contaminated despite the ventilation
system. In addition to that, our VR system had some limitations. The
spoon always displayed the smell as soon as the sample has been
inserted and with the same strength (the fan was always on). However,
a more advanced but still low-fidelity version that uses the position of
the spoon and the headset to drive the fan speed via a microcontroller
could better support the bottom-up processes of MSI, such as strength
and synchronicity.
A further limitation of our study is the fact that we only investi-
gated perception in tri-modal incongruency and visual-olfactory con-
gruency. In return, this means that we did not investigate conditions
with tri-modal congruency, gustatory-visual congruency, and gustatory-
olfactory congruency. However, our results still provide a significant
offset to prior work. For example, we show that an effect previously ob-
served during bi-modal incongruency, the dominance of vision [29, 57],
does not necessarily happen during tri-modal incongruency We also
show that adding a seemingly tasteless food item disturbs MSI when
trying to change food categories with congruent olfactory-visual cues
(contrary to strengthening or weakening an inherent quality of a food
item which is possible [1,43]) Also, investigating all 5 conditions would
have meant keeping participants in the lab for a long time and risking
habituation or adaptation due to a prolonged and repeated exposure to
the stimuli [3, 9]. Still, our interpretation of results on detecting incon-
gruence Sect. 6.1 and the guiding sensation Sect. 6.2 are limited. For
example, we cannot say if participants would have wrongly detected
an incongruency in a tri-modal congruent condition, a gustatory-visual
congruency, or a gustatory-olfactory congruency (i.e., Sect. 6.1). We
aimed at avoiding this problem by having a rigorous selection procedure
for our stimuli which ensured that participants were able to identify the
single stimuli independently.
8 CONCLUSION AND FUTURE WORK
This research investigated multisensory integration (MSI) by modify-
ing visual and olfactory sensations while eating real food in a virtual
environment. We let participants eat a zucchini mash — a neutral food
product — and presented them with congruent or incongruent visual
and olfactory stimuli in VR. We were interested in whether participants
can integrate the bi-modal visual-olfactory congruent cues into a single
percept that differs from zucchini mash and how their sensory system
reacts when vision, olfaction, and gustation are incongruent.
We first elicited a neutral food product (a cold mash of cooked zuc-
chini) and smell samples (natural cucumber, chemical banana, natural
carrot, and natural banana). We also presented the “Smell-O-Spoon”,
an easy-to-build device that allows eating in VR and can also display
smell. Results of our main study indicate that participants are not
guided by vision during tri-modal incongruency (as is the case in bi-
modal incongruency). We further show that trying to change complex
flavor objects (e.g., zucchini to banana) seems to be more complex
than changing the inherent properties of a food item (e.g., increase
sweetness).
AR/VR systems that cater to multiple senses — besides vision and
audio and especially including gustation and olfaction — promise
to enhance aspects such as presence but also social applications in
AR/VR, social eating in AR/VR, and applications for the treatment
of disorders such as autism spectrum disorder or schizophrenia. To
realize these objectives and to build human-food user interfaces for
AR/VR, a deep and thorough understanding of MSI and perception is
necessary. Considering that, our tools (Smell-O-Spoon, smell samples,
neutral food) and findings (complex properties harder to change, vision
not a guiding sense during tri-modal incongruency) motivate further
research in the domain of multi-sensory virtual environments to allow
for, enable, and support these beneficial applications.
Possible future research directions are manifold (next to those men-
tioned in the limitations): Our work motivates future research on MSI,
immersion, and perceived intensity to further explain the our results:
integrating real food mashes of carrot, banana, cucumber, and tomato
would provide knowledge on how three congruent stimuli (gustation,
vision, olfaction) but also other congruent combinations (only gustation
and olfaction) influence MSI. In addition to that and to overcome the
issue of participants being aware of incongruent stimuli, it might help
to embed the experiment into a more playful or game-like setting to
redirect the focus. For example, the whole experiment could take part
in a restaurant setting.
ACKNOWLEDGMENTS
This work has partially been funded by the CYTEMEX project funded
by the Free State of Thuringia, Germany (FKZ: 2018-FGI-0019).
REFERENCES
[1]
H. Aisala, J. Rantala, S. Vanhatalo, M. Nikinmaa, K. Pennanen,
R. Raisamo, and N. S
¨
ozer. Augmentation of Perceived Sweetness in
Sugar Reduced Cakes by Local Odor Display. In Companion Publication
of the 2020 International Conference on Multimodal Interaction, ICMI
’20 Companion, pp. 322–327. Association for Computing Machinery, New
York, NY, USA, Oct. 2020. doi: 10.1145/3395035.3425650
[2]
J. Ammann, M. Stucki, and M. Siegrist. True colours: Advantages and
challenges of virtual reality in a sensory science experiment on the in-
fluence of colour on flavour identification. Food Quality and Prefer-
ence, 86:103998, 2021-04-29, 2020-12-01. doi: 10. 1016/j.foodqual.2020.
103998
[3]
K. Benson and H. A. Raynor. Occurrence of habituation during repeated
food exposure via the olfactory and gustatory systems. Eating Behaviors,
15(2):331–333, Apr. 2014. doi: 10.1016/j. eatbeh.2014.01. 007
[4]
J. Brooks, S.-Y. Teng, J. Wen, R. Nith, J. Nishida, and P. Lopes. Stereo-
Smell via Electrical Trigeminal Stimulation. In Proceedings of the 2021
CHI Conference on Human Factors in Computing Systems, CHI ’21, pp.
1–13. Association for Computing Machinery, New York, NY, USA, May
2021. doi: 10. 1145/3411764.3445300
[5]
Y. Chen, A. X. Huang, I. Faber, G. Makransky, and F. J. A. Perez-Cueto.
Assessing the influence of visual-taste congruency on perceived sweetness
and product liking in immersive VR. Foods (Basel, Switzerland), 9(4):465,
2021-04-29, 2020-04-09. doi: 10. 3390/foods9040465
[6]
W. Chifala and D. Polzella. Smell and taste classification of the same
stimuli. The Journal of general psychology, 122:287–94, Aug. 1995. doi:
10.1080/00221309. 1995.9921240
[7]
I. Choi, J.-Y. Lee, and S.-H. Lee. Bottom-up and top-down modulation of
multisensory integration. Current Opinion in Neurobiology, 52:115–122,
Oct. 2018. doi: 10. 1016/j.conb.2018. 05.002
[8]
C. Chrea, D. Valentin, C. Sulmont-Ross
´
e, H. L. Mai, D. H. Nguyen, and
H. Abdi. Culture and odor categorization: Agreement between cultures
depends upon the odors. Food Quality and Preference, 15(7):669–679,
2021-05-13, 2021. doi: 10. 1016/j.foodqual.2003. 10.005
[9]
P. Dalton. Psychophysical and Behavioral Characteristics of Olfactory
Adaptation. Chemical Senses, 25(4):487–492, Aug. 2000. doi: 10. 1093/
chemse/25.4. 487
[10]
P. Dalton, N. Doolittle, H. Nagata, and P. a. S. Breslin. The merging of the
senses: Integration of subthreshold taste and smell. Nature Neuroscience,
3(5):431–432, May 2000. doi: 10. 1038/74797
[11]
N. Dozio, E. Maggioni, D. Pittera, A. Gallace, and M. Obrist. May I
Smell Your Attention: Exploration of Smell and Sound for Visuospatial
Attention in Virtual Reality. Frontiers in Psychology, 12:671470, July
2021. doi: 10. 3389/fpsyg.2021.671470
9
[12]
F. Fouque, B. G. Bardy, T. A. Stoffregen, and R. J. Bootsma. Action and In-
termodal Information Influence the Perception of Orientation. Ecological
Psychology, 11(1):1–43, Mar. 1999. doi: 10.1207/s15326969eco1101 1
[13]
R. A. Frank and J. Byram. Taste-smell interactions are tastant and odorant
dependent. Chemical Senses, 13(3):445–455, 1988. doi: 10. 1093/chemse/
13.3. 445
[14]
J. A. Gottfried and R. J. Dolan. The Nose Smells What the Eye Sees:
Crossmodal Visual Facilitation of Human Olfactory Perception. Neuron,
39(2):375–386, July 2003. doi: 10. 1016/S0896-6273(03)00392-1
[15]
K. Hadidi and F. Mohammed. Nicotine content in tobacco used in hubble-
bubble smoking. Saudi medical journal, 25:912–7, Aug. 2004.
[16]
M. Herbs. How to make glycerine extracts.
https://blog.mountainroseherbs.com/make-glycerin-extracts-glycerites,
June 2021.
[17]
T. P. Hilditch. Production and Uses of Glycerine. Nature, 172(4389):1066–
1067, Dec. 1953. doi: 10. 1038/1721066b0
[18]
R. J. Hirst, L. Cragg, and H. A. Allen. Vision dominates audition in adults
but not children: A meta-analysis of the Colavita effect. Neuroscience and
Biobehavioral Reviews, 94:286–301, Nov. 2018. doi: 10.1016/j.neubiorev.
2018.07. 012
[19]
F. Inc. FEELREAL Multisensory VR Mask. https://feelreal.com/, Feb.
2022.
[20]
L. Jakubowski. 10 gr. Aroma Typ Radieschen.
https://www.aromakonzentrate.com/50-gr-Aroma-Typ-Radieschen,
May 2021.
[21]
L. Jones Moore, C. Bowers, D. Washburn, A. Cortes, and R. Satya. The
effect of olfaction on immersion into virtual environments” in human per-
formance, situation awareness and automation: Issues and considerations
for the 21st century. Lawrence Erlbaum Associates, pp. 282–285, Jan.
2004.
[22]
K. Karunanayaka, N. Johari, S. Hariri, H. Camelia, K. S. Bielawski,
and A. D. Cheok. New Thermal Taste Actuation Technology for Future
Multisensory Virtual Reality and Internet. IEEE Transactions on Visual-
ization and Computer Graphics, 24(4):1496–1505, Apr. 2018. doi: 10.
1109/TVCG.2018. 2794073
[23]
A. Kla
ˇ
snja-Mili
´
cevi
´
c, Z. Maro
ˇ
san, M. Ivanovi
´
c, N. Savi
´
c, and B. Vesin.
The future of learning multisensory experiences: Visual, audio, smell
and taste senses. In T. Di Mascio, P. Vittorini, R. Gennari, F. De la
Prieta, S. Rodr
´
ıguez, M. Temperini, R. Azambuja Silveira, E. Popescu,
and L. Lancia, eds., Methodologies and Intelligent Systems for Technology
Enhanced Learning, 8th International Conference, pp. 213–221. Springer
International Publishing, Cham, 2019.
[24]
B. J. Li and J. N. Bailenson. Exploring the influence of haptic and olfac-
tory cues of a virtual donut on satiation and eating behavior. Presence:
Teleoperators and Virtual Environments, 26(3):337–354, May 2018. doi:
10.1162/pres a 00300
[25]
Y.-L. Lin, T.-Y. Chou, Y.-C. Lieo, Y.-C. Huang, and P.-H. Han. Trans-
Fork: Using olfactory device for augmented tasting experience with video
see-through head-mounted display. In Proceedings of the 24th ACM Sym-
posium on Virtual Reality Software and Technology, VRST ’18, pp. 1–2.
Association for Computing Machinery, New York, NY, USA, 2021-04-29,
2018-11-28. doi: 10. 1145/3281505.3281560
[26]
B. Mantel, T. A. Stoffregen, A. Campbell, and B. G. Bardy. Ex-
ploratory Movement Generates Higher-Order Information That Is Suf-
ficient for Accurate Perception of Scaled Egocentric Distance. PLOS
ONE, 10(4):e0120025, Apr. 2015. doi: 10.1371/journal. pone.0120025
[27]
D. Maynes-aminzade. Edible bits: Seamless interfaces between people,
data and food. In Proceedings of the 2005 ACM Conference on Human
Factors in Computing Systems (CHI’2005, 2005.
[28]
K. Nakano, D. Horita, N. Sakata, K. Kiyokawa, K. Yanai, and T. Narumi.
DeepTaste: Augmented Reality Gustatory Manipulation with GAN-Based
Real-Time Food-to-Food Translation. In 2019 IEEE International Sympo-
sium on Mixed and Augmented Reality (ISMAR), pp. 212–223, Oct. 2019.
doi: 10.1109/ISMAR. 2019.000-1
[29]
A. Nambu, T. Narumi, K. Nishimura, T. Tanikawa, and M. Hirose. Visual-
olfactory display using olfactory sensory map. In 2010 IEEE Virtual
Reality Conference (VR), pp. 39–42, Mar. 2010. doi: 10.1109/VR.2010.
5444817
[30]
P. Nandal. Smell-O-Vision Device. In V. Singh, V. K. Asari, S. Kumar,
and R. B. Patel, eds., Computational Methods and Data Engineering,
Advances in Intelligent Systems and Computing, pp. 321–329. Springer,
Singapore, 2021. doi: 10. 1007/978-981-15-7907-3 24
[31]
T. Narumi, Y. Ban, T. Kajinami, T. Tanikawa, and M. Hirose. Augmented
perception of satiety: Controlling food consumption by changing apparent
size of food with augmented reality. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems, CHI ’12, pp. 109–
118. Association for Computing Machinery, New York, NY, USA, May
2012. doi: 10. 1145/2207676.2207693
[32]
T. Narumi, T. Kajinami, S. Nishizaka, T. Tanikawa, and M. Hirose. Pseudo-
gustatory display system based on cross-modal integration of vision, olfac-
tion and gustation. In 2011 IEEE Virtual Reality Conference, pp. 127–130,
Mar. 2011. doi: 10.1109/VR. 2011.5759450
[33]
T. Narumi, T. Kajinami, T. Tanikawa, and M. Hirose. Meta cookie. In
ACM SIGGRAPH 2010 Emerging Technologies, SIGGRAPH ’10, p. 1.
Association for Computing Machinery, New York, NY, USA, July 2010.
doi: 10.1145/1836821. 1836839
[34]
T. Narumi, S. Nishizaka, T. Kajinami, T. Tanikawa, and M. Hirose. Meta
Cookie+: An illusion-based gustatory display. In R. Shumaker, ed., Virtual
and Mixed Reality - New Trends, Lecture Notes in Computer Science, pp.
260–269. Springer, Berlin, Heidelberg, 2011. doi: 10.1007/978-3-642
-22021-0 29
[35]
S. Niedenthal, P. Lund
´
en, M. Ehrndal, and J. K. Olofsson. A Handheld
Olfactory Display For Smell-Enabled VR Games. In 2019 IEEE Inter-
national Symposium on Olfaction and Electronic Nose (ISOEN), pp. 1–4,
May 2019. doi: 10. 1109/ISOEN.2019.8823162
[36]
OptiTrack. Motion Capture Systems.
http://www.optitrack.com/index.html, Mar. 2022.
[37]
F. Pallavicini, S. Serino, P. Cipresso, E. Pedroli, I. A. Chicchi Giglioli,
A. Chirico, G. M. Manzoni, G. Castelnuovo, E. Molinari, and G. Riva. Test-
ing augmented reality for cue exposure in obese patients: An exploratory
study. Cyberpsychology, Behavior and Social Networking, 19(2):107–114,
Feb. 2016. doi: 10.1089/cyber. 2015.0235
[38]
S. Persky and A. P. Dolwick. Olfactory Perception and Presence in a
Virtual Reality Food Environment. Frontiers in Virtual Reality, 1, 2020.
https://www.frontiersin.org/article/10.3389/frvir.2020.571812.
[39]
M. D. Rabin and W. S. Cain. Odor recognition: Familiarity, identifiability,
and encoding consistency. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 10(2):316–325, 1984. doi: 10.1037/0278-7393.
10.2. 316
[40]
N. Ranasinghe, M. N. James, M. Gecawicz, J. Bland, and D. Smith. In-
fluence of Electric Taste, Smell, Color, and Thermal Sensory Modalities
on the Liking and Mediated Emotions of Virtual Flavor Perception. In
Proceedings of the 2020 International Conference on Multimodal Interac-
tion, pp. 296–304. Association for Computing Machinery, New York, NY,
USA, Oct. 2020. https://doi.org/10.1145/3382507.3418862.
[41]
N. Ranasinghe, K. Karunanayaka, A. D. Cheok, O. N. N. Fernando, H. Nii,
and P. Gopalakrishnakone. Digital taste and smell communication. In
Proceedings of the 6th International Conference on Body Area Networks,
BodyNets ’11, pp. 78–84. ICST (Institute for Computer Sciences, Social-
Informatics and Telecommunications Engineering), Brussels, BEL, Nov.
2011.
[42]
N. Ranasinghe, K.-Y. Lee, G. Suthokumar, and E. Y.-L. Do. Taste+: Digi-
tally Enhancing Taste Sensations of Food and Beverages. In Proceedings
of the 22nd ACM International Conference on Multimedia, MM ’14, pp.
737–738. Association for Computing Machinery, New York, NY, USA,
Nov. 2014. doi: 10.1145/2647868.2654878
[43]
N. Ranasinghe, T. N. T. Nguyen, Y. Liangkun, L.-Y. Lin, D. Tolley, and
E. Y.-L. Do. Vocktail: A Virtual Cocktail for Pairing Digital Taste, Smell,
and Color Sensations. In Proceedings of the 25th ACM International
Conference on Multimedia, MM ’17, pp. 1139–1147. Association for
Computing Machinery, New York, NY, USA, Oct. 2017. doi: 10.1145/
3123266.3123440
[44]
P. A. Schroeder, J. Lohmann, M. V. Butz, and C. Plewnia. Behavioral
bias for food reflected in hand movements: A preliminary study with
healthy subjects. Cyberpsychology, Behavior and Social Networking,
19(2):120–126, Feb. 2016. doi: 10.1089/cyber. 2015.0311
[45]
H. S. Seo, D. Buschh
¨
uter, and T. Hummel. Contextual influences on the
relationship between familiarity and hedonicity of odors. Journal of Food
Science, 73(6):S273–278, Aug. 2008. doi: 10.1111/j. 1750-3841.2008.
00818.x
[46]
H.-S. Seo, E. Iannilli, C. Hummel, Y. Okazaki, D. Buschh
¨
uter, J. Gerber,
G. E. Krammer, B. van Lengerich, and T. Hummel. A salty-congruent
odor enhances saltiness: Functional magnetic resonance imaging study.
Human Brain Mapping, 34(1):62–76, 2013. doi: 10.1002/hbm.21414
[47]
M. Shankar, C. Simons, B. Shiv, S. McClure, C. A. Levitan, and C. Spence.
An expectations-based approach to explaining the cross-modal influence
10
©
2023 IEEE. This is the author’s version of the article that has been published in the proceedings of IEEE Virtual Reality
conference. The final version of this record is available at: 10.1109/TVCG.2023.3247099
of color on orthonasal olfactory identification: The influence of the degree
of discrepancy. Attention, Perception & Psychophysics, 72(7):1981–1993,
Oct. 2010. doi: 10. 3758/APP.72.7. 1981
[48]
M. U. Shankar, C. A. Levitan, and C. Spence. Grape expectations: The
role of cognitive influences in color-flavor interactions. Consciousness
and Cognition, 19(1):380–390, Mar. 2010. doi: 10.1016/j. concog.2009.
08.008
[49]
H. Shop. Obst Aromen. https://herrlan-shop.de/aromen/lebensmittel-
aroma/obst-aromen/?language=de, May 2021.
[50]
C. Spence. Crossmodal correspondences: A tutorial review. Attention,
Perception, & Psychophysics, 73(4):971–995, May 2011. doi: 10.3758/
s13414-010-0073-7
[51]
C. Spence. Chapter 10 - Multisensory flavor perception: A cognitive
neuroscience perspective. In K. Sathian and V. S. Ramachandran, eds.,
Multisensory Perception, pp. 221–237. Academic Press, Jan. 2020. doi:
10.1016/B978-0-12-812492-5. 00010-3
[52]
B. E. Stein and T. R. Stanford. Multisensory integration: Current issues
from the perspective of the single neuron. Nature Reviews Neuroscience,
9(4):255–266, Apr. 2008. doi: 10.1038/nrn2331
[53]
B. E. Stein, T. R. Stanford, and B. A. Rowland. The Neural Basis of
Multisensory Integration in the Midbrain: Its Organization and Maturation.
Hearing research, 258(1-2):4–15, Dec. 2009. doi: 10.1016/j.heares.2009.
03.012
[54]
T. A. Stoffregen and B. G. Bardy. On specification and the senses. Be-
havioral and Brain Sciences, 24(2):195–213, Apr. 2001. doi: 10.1017/
S0140525X01003946
[55]
T. A. Stoffregen and B. G. Bardy. Specification in the global array.
Behavioral and Brain Sciences, 24(2):246–254, Apr. 2001. doi: 10.1017/
S0140525X0157394X
[56]
T. A. Stoffregen, B. Mantel, and B. G. Bardy. The Senses Considered
as One Perceptual System. Ecological Psychology, 29(3):165–197, July
2017. doi: 10. 1080/10407413.2017.1331116
[57]
T. Tanikawa, A. Nambu, T. Narumi, K. Nishimura, and M. Hirose. Olfac-
tory Display Using Visual Feedback Based on Olfactory Sensory Map. In
R. Shumaker, ed., Virtual and Mixed Reality - New Trends, Lecture Notes
in Computer Science, pp. 280–289. Springer, Berlin, Heidelberg, 2011.
doi: 10.1007/978-3-642-22021-0 31
[58]
O. Technology. OVR Technology. https://ovrtechnology.com/, Feb. 2022.
[59]
C. T. Vi, D. Ablart, D. Arthur, and M. Obrist. Gustatory interface: The
challenges of ‘how’ to stimulate the sense of taste. In
Proceedings of the 2nd ACM SIGCHI International Workshop on Multi-
sensory Approaches to Human-Food Interaction, MHFI 2017, pp. 29–33.
Association for Computing Machinery, New York, NY, USA, Nov. 2017.
doi: 10.1145/3141788. 3141794
[60]
P. Vrticka, L. Lordier, B. Bediou, and D. Sander. Human amygdala
response to dynamic facial expressions of positive and negative sur-
prise. Emotion (Washington, D.C.), 14(1):161–169, 2014. doi: 10.1037/
a0034619
[61]
F. Weidner, J. E. Maier, and W. Broll. Eating, smelling, and seeing:
Investigating multisensory integration and (in)congruent stimuli while
eating in vr, 2023. doi: 10. 5281/zenodo.7524934
[62]
C. Xu, M. Siegrist, and C. Hartmann. The application of virtual reality in
food consumer behavior research: A systematic review. Trends in Food
Science & Technology, 116:533–544, 2021. doi: 10. 1016/j.tifs.2021. 07.
015
[63]
W. Zhou, Y. Jiang, S. He, and D. Chen. Olfaction Modulates Visual
Perception in Binocular Rivalry. Current Biology, 20(15):1356–1358,
Aug. 2010. doi: 10. 1016/j.cub.2010. 05.059
[64]
M. Zybura and G. A. Eskeland. Olfaction
for virtual reality. Industrial Engineering 543.
http://hitl.washington.edu/people/tfurness/courses/inde543/reports/3doc/,
Dec. 1999.
11