ArticlePDF Available

Metaverse: Perspectives from graphics, interactions and visualization

Authors:

Abstract and Figures

The metaverse is a visual world that blends the physical world and digital world. At present, the development of metaverse is still in the early stage, and there lacks a framework for the visual construction and exploration of metaverse. In this paper, we propose a framework that summarizes how graphics, interaction, and visualization techniques support the visual construction of the metaverse and user-centric exploration. We introduce three kinds of visual elements that compose the metaverse and the two graphical construction methods in a pipeline. We propose a taxonomy of interaction technologies based on interaction tasks, user actions, feedback and various sensory channels, and a taxonomy of visualization techniques that assist user awareness. Current potential applications and future opportunities are discussed in the context of visual construction and exploration of metaverse. We hope this paper can provide a stepping stone for further research in the area of graphics, interaction and visualization in metaverse.
Content may be subject to copyright.
Journal Pre-proof
Metaverse: Perspectives from graphics, interactions and visualization
Yuheng Zhao, Jinjing Jiang, Yi Chen, Richen Liu, Yalong Yang,
Xiangyang Xue, Siming Chen
PII: S2468-502X(22)00015-8
DOI: https://doi.org/10.1016/j.visinf.2022.03.002
Reference: VISINF 132
To appear in: Visual Informatics
Received date : 16 February 2022
Revised date : 11 March 2022
Accepted date : 13 March 2022
Please cite this article as: Y. Zhao, J. Jiang, Y. Chen et al., Metaverse: Perspectives from graphics,
interactions and visualization. Visual Informatics (2022), doi:
https://doi.org/10.1016/j.visinf.2022.03.002.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
©2022 The Authors. Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang
University Press Co. Ltd. This is an open access article under the CC BY-NC-ND license (http:
//creativecommons.org/licenses/by-nc-nd/4.0/).
Journal Pre-proof
Journal Pre-proof
Metaverse: Perspectives from Graphics, Interactions and
Visualization
Yuheng Zhaoa, Jinjing Jianga, Yi Chenb, Richen Liuc, Yalong Yangd,
Xiangyang Xuea, Siming Chena,
a School of Data Science, Fudan University, China
b Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and
Business University, China
c School of Computer and Electronic Information, Nanjing Normal University, China
d Department of Computer Science, Virginia Tech, USA
Keywords: Metaverse, Virtual/Augmented/Mixed reality, Graphics, User interaction,
Immersive visualization
Siming Chen is the corresponding author of this paper.
Title Page (with Author Details)
Journal Pre-proof
Journal Pre-proof
Visual Informatics (2022)
Contents lists available at ScienceDirect
Visual Informatics
journal homepage: www.elsevier.com/locate/visinf
Metaverse: Perspectives from Graphics, Interactions and Visualization
Yuheng Zhaoa, Jinjing Jianga, Yi Chenb, Richen Liuc, Yalong Yangd, Xiangyang Xuea, Siming Chena,
aSchool of Data Science, Fudan University, China
bBeijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, China
cSchool of Computer and Electronic Information, Nanjing Normal University, China
dDepartment of Computer Science, Virginia Tech, USA
ARTICLE INFO
Article history:
Keywords:
Metaverse
Virtual reality/Augmented reality
Computer graphics
User interaction
Immersive visualization
ABSTRACT
The metaverse is a visual world that blends the physical world and digital world. At present, the
development of metaverse is still in the early stage, and there lacks a framework for the visual
construction and exploration of metaverse. In this paper, we propose a framework that summa-
rizes how graphics, interaction, and visualization techniques support the visual construction of the
metaverse and user-centric exploration. We introduce three kinds of visual elements that compose
the metaverse and the two graphical construction methods in a pipeline. We propose a taxonomy
of interaction technologies based on interaction tasks, user actions, feedback and various sensory
channels, and a taxonomy of visualization techniques that assist user awareness. Current poten-
tial applications and future opportunities are discussed in the context of visual construction and
exploration of metaverse. We hope this paper can provide a stepping stone for further research in
the area of graphics, interaction and visualization in metaverse.
©2022 Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University Press.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
The word “metaverse” was first coined from the 1992 science
fiction novel named Snow Crash, written by Neal Stephenson
in 1992 (Joshua (2017)). The novel depicts people in a vir-
tual reality world competing against each other for social status
by controlling their digital avatars. Over the next 30 years or
so, the concept of metaverse has appeared in books and movie
shows. At this stage, the concept of the metaverse is ambigu-
ous and is more understood as a virtual world parallel to the
real world. For example, the film Ready Player One (Spielberg
et al. (2018)) described a virtual world in which everyone could
customize their avatars and explore freely through multimedia
techniques (VR/AR, etc.). It was considered a classic film that
interpreted the concept of the metaverse. However, it is not
enough to rely on multimedia technologies. Metaverse needs
Corresponding author.
e-mail: yuhengzhao@fudan.edu.cn (Yuheng Zhao),
simingchen@fudan.edu.cn (Siming Chen)
to be able to provide users with a more realistic experience and
rich activities, which requires more advanced technologies to
support metaverse’s construction and user-centric exploration.
Recently, some scholars defined metaverse from an aspect of a
comprehensive technical architecture. Lee et al. defined meta-
verse as a 3D virtual cyberspace blending the physical and dig-
ital world, facilitated by the convergence between the Internet
and Web technologies and Extended Reality (XR) (Lee et al.
(2021b)). Duan et al. also categorized the related technologies
of metaverse into three levels: infrastructure, interaction and
ecosystem (Duan et al. (2021)). Dierent from their overall
perspective, we don’t discuss the infrastructure techniques like
Blockchain, Network and Edge computing. We focus on the
critical technologies that support visual construction and user
exploration, including Graphics, Interaction and Visualization.
The visual construction of metaverse is based on the graph-
ical techniques that build the integrated world combining the
physical and virtual world, including the 3D construction of
scenes, non-player characters (NPCs) and player characters
(Avatar). Interactive technology enables users to operate visual
Manuscript Click here to view linked References
Journal Pre-proof
Journal Pre-proof
2 Yuheng Zhao et al. /Visual Informatics (2022)
Fig. 1. The framework of this survey. The essential visual elements users
can see compose the metaverse environment. The explorative behaviors
that users can conduct in such an environment. Three key technologies
fuel the visual construction and exploration of the metaverse.
elements, explore freely in the metaverse and provide an im-
mersive experience. To improve users’ awareness of the virtual
world, supplementary instructions and guidance are required.
Visualization can provide such guidance by processing the data
in the metaverse and presenting it to users in an appropriate
form. The development of these technologies makes metaverse
more realistic and interesting for users to perceive and explore.
In this work, we propose a framework for visual construc-
tion and exploration of metaverse (Figure 1), where we start
with what can be seen in the metaverse and what people can do
in it, summarizing (1) the visual elements (scenes, NPCs and
Avatar) in metaverse environment and how the graphical tech-
niques used to create them, and (2) exploration of metaverse
(interaction and awareness) and how interaction and visualiza-
tion techniques can support these behaviors. We discuss the
current applications and future opportunities in the context of
our framework. To our best knowledge, this survey serves as the
first eort to oer a view of visual construction and exploration
of the metaverse. The detailed contributions are as followed.
Aframework of visual construction and user exploration
of metaverse from the perspectives of graphics, interaction
and visualization.
A visual construction pipeline that consists of two meth-
ods and three phases supported by graphics techniques
and two taxonomies of interaction and visualization tech-
niques for user interaction and awareness when exploring
in the metaverse.
A set of research challenges and opportunities derived
from our review of techniques and applications.
We hope this paper can provide a stepping stone for further
research in the area of visual construction and user exploration
for the metaverse.
2. Overview
As the metaverse is still in its infancy of development, no
work has been done to systematically summarize the technical
framework for its complete visual construction and exploration,
nor have graphics, interaction and visualization been explored
separately from the context of the metaverse. The work most
closely related to ours investigates the current state of research
in their respective fields.
In terms of computer graphics, many scholars have intro-
duced how to build 3D models. Some of them proposed
software-based authoring processes (Tang and Ho (2020)) and
automated generation to reduce processes (Freiknecht and Ef-
felsberg (2017)). Some introduced 3D reconstruction of real
objects (Intwala and Magikar (2016)). However, there lacks a
comprehensive introduction of visual elements construction in
the metaverse. We summarize three kinds of elements and pro-
pose a pipeline that consists of the above two methods and com-
pare the construction dierences between dierent elements.
As the elements in metaverse are often dynamic and interactive,
we investigated 3D animation as a part of the pipeline.
Surveys on interactions specifically related to metaverse are
limited as well. Some of them summarize from dierent sen-
sory channels, such as haptic solutions (Bouzbib et al. (2021)).
Besides, there are surveys summarizing interaction methods for
certain devices, such as smart glasses (Lee and Hui (2018)),
optical head-mounted displays (Grubert et al. (2017), virtual
reality headset (Kelly et al. (2021)). Rather than focusing on
interaction methods via certain sensory channels or concentrat-
ing on interaction methods supported by particular devices, we
try to summarize interactions methods used in metaverse from
perspectives of interaction tasks, user action and feedback via
various channels and devices.
In the perspective of visualization, many scholars have sum-
marized immersive visual analytics from the data types and en-
vironments (AR/VR) (Kraus et al. (2021)), interactions for vi-
sualization (Fonnet and Prie (2019); Besanc¸on et al. (2021)) or
its future challenges (Ens et al. (2021a)). However, there lack of
discussions about the connections and challenges in the meta-
verse. A basic connection is to help users perceive the environ-
ment, for example, the map visualization of spatial data is used
for navigation. However, with the development of the meta-
verse, there will be more complex visualizations that need to be
investigated to help perceive the environment or further applica-
tion to understanding and analysis. Therefore, we will discuss
visualization from data types, visualization view, position, in-
teractions, and environments, by fully considering the diversity,
shareability, and user-centered experience of the metaverse.
This paper serves as the first eort to summarize the technical
framework for visual construction and exploration in the meta-
verse. In the next section, we will first introduce the visual ele-
ments that compose the metaverse and how they are constructed
in terms of design-based and physical-based construction meth-
ods with a pipeline. In section 4, we will introduce the two tax-
onomies of interaction that support user interaction, and visual-
ization techniques that support user awareness in exploring the
metaverse. In section 5, we will summarize current metaverse
applications (including virtual social, smart city, medical and
games) in terms of visual construction and exploration. We will
discuss the current limitations and future research opportunities
derived from our analysis in section 6.
Journal Pre-proof
Journal Pre-proof
Yuheng Zhao et al. /Visual Informatics (2022) 3
Fig. 2. The two ways to construct a virtual scene in metaverse: physical-based construction and design-based construction, and the construction process
can be divided into three steps: Initialization, Modeling and Rendering, and Animation.
3. Virtual World Construction
The virtual environment of metaverse blends physical and
digital, which consists of various scenes, non-player characters
(NPCs) and player characters (Avatars). Scene refers to diversi-
fied virtual spaces, such as virtual campus (Duan et al. (2021))
or virtual museum (Beer (2015)). NPC is an object that cannot
be controlled by the player but has an important role in the game
itself so that it makes the world in the game feel alive (Warpefelt
and Verhagen (2015)). Avatar refers to the digital representa-
tion of players in the metaverse, where players interact with the
other players or the computer agents through the avatar (Davis
et al. (2009)). The creation of these objects is based on com-
puter graphics techniques. The scene, NPC, and PC dier in
the detail of their creation because they focus on dierent fea-
tures. In this section, we will introduce a pipeline and two ways
for constructing virtual models in metaverse: physical-based
or design-based construction, and compare the construction of
scenes, NPCs and Avatars.
3.1. Visual Construction Pipeline
The pipeline can be divided into three stages: initialization,
modeling and rendering, and animation (Figure 2).
3.1.1. Physical-based Construction
One way to build 3D models is the physical-based construc-
tion by using 3D measurement methods, primarily laser scan-
ning and photogrammetry, to create digital twin models, termed
3D reconstruction (Deng et al. (2021)). Recently, the global
trend towards Virtual Reality (VR) and Augmented Reality
(AR) has increased the demand for creating high-quality and
detailed photorealistic 3D content based on real objects and en-
vironments. One major diculty is how to restore high fidelity
with collected data.
Data acquisition is the first stage of 3D model reconstruction
in the metaverse, which determines how realistic the model is.
With the rapid development of 3D acquisition technologies, 3D
sensors are becoming increasingly available and aordable. At
present, there are two popular and convenient point cloud ac-
quisition methods: laser scanners (LiDARs), RGB-D cameras
(such as Kinect, RealSense and Apple depth cameras). 3D data
acquired by these sensors can provide rich geometric, shape and
scale information (Guo et al. (2021)).
Preprocessing refers to the registration, denoising and re-
sampling of the obtained point cloud model (Ma and Liu
(2018)). Notably, which of three steps are required depends
on the application requirements and the quality of data.
Normally, an object usually needs many site scans, which
requires multi-site data registration, that is, the data of each
site is converted to the same coordinate by using a cloud-to-
cloud alignment tool (Intwala and Magikar (2016)). The al-
gorithm typically used for registration is the ICP algorithm
(Khaloo and Lattanzi (2017)), which estimates the rigid trans-
formation between two point clouds iteratively by minimizing
the distance. Recently, the developments of optimization-based
methods and deep learning methods have improved registration
robustness and eciency (Huang et al. (2021); Perez-Gonzalez
et al. (2019); Wang and Solomon (2019)). Point clouds after
registration incorporate the points from the objects of inter-
est and the noise points. Therefore, de-noising is normally a
needed step. An eective method is random sample consensus
(Liu et al. (2016)). Finally, the amount of data after registration
will be extremely large and makes overlapped regions denser,
which will reduce the eciency of subsequent processing. Re-
sampling is necessary to solve such problems. In practice, three
commonly used approaches are random, uniform, and point-
spacing (Son et al. (2015)).
Modeling is the core process in the visual construction. The
prepossessed point cloud to build contains rich information of
the target object, but it still needs to be converted into a 3D
model represented by basic geometric shapes such as planes,
surfaces and cuboids, etc. There are major two modeling meth-
ods: geometric modeling and mesh reconstruction.
One modeling method is to generate geometric models by
Journal Pre-proof
Journal Pre-proof
4 Yuheng Zhao et al. /Visual Informatics (2022)
point cloud segmentation and modeling. A segment is charac-
terized by a planar cluster because it is a set of points located
within a given threshold distance about a calculated plane. Af-
ter segmentation, the segmented planar clusters are used to ex-
tract the main contours of architectural components. The region
growth (Pu et al. (2006)) and RANSAC (Liu et al. (2016)) are
two ecient segmentation methods (Nguyen and Le (2013)).
Another commonly used method is to generate a mesh model
since it can be used for complex surface modeling, which can
be generally established by constructing a triangular net or
NURBS (Non-uniform rational basis spline). Nevertheless, the
two approaches can also be combined to create semantically
segmented 3D mesh models (Leotta et al. (2019)).
Rendering is an important step for endowing virtual objects
real features and presenting them in front of our eyes. Ren-
dering usually includes coloring, texturing and lighting. Since
laser scanning lacks color information that is required in many
applications. Therefore, a hybrid 3D reconstruction based on
images and scan data is adopted to colorize the point cloud
data. For example, Ma et al. proposed a dierential framework
for freestyle material capturing (Ma et al. (2021)). This tech-
nique can greatly improve the sampling eciency and restore
the material properties with higher precision, which supports
the creation of more realistic “virtual things” in the metaverse.
Physical-based animation has emerged as a core area of
computer graphics and is widely applied in many areas such as
film, game and virtual reality (Bargteil et al. (2020)). In general,
physical-based animation is a method to simulate and animate
the dynamic changes of objects based on physical rules or mo-
tion control theory. For example, the animation of rain falling
on leaves, the swing of leaves under the wind and the floating
animation of leaves on the water can be realized through physi-
cal simulation. These physical animations of objects in a virtual
scene will make people wander the metaverse as if they were in
the real world. When applied in the virtual reality environment,
the research focuses more on how to generate animations based
on interactive models (Llobera et al. (2021)).
3.1.2. Design-based Construction
Another way to create a 3D model is using specialized 3D
modeling software.
Detail demands in the start of design-based construction
from the designer’s ideas (e,g. conception of shape and appear-
ance) and iterates during the drawing process. The advantage
of this approach is that we can design entirely based on imagi-
nation, for example with objects designed with futuristic tech-
nology and unheard of fantasy creatures, or things that cannot
be seen or scanned up close.
Modeling and rendering can be implemented by a num-
ber of modeling software available and they can be divided
into three categories. First, parametric 3D modeling or CAD
(computer-aided design) such as AutoCAD1and SketchUp2, is
the preferred method for engineers and designers to create mod-
els by setting the parameters as the real thing: materials, weight,
1https://web.autocad.com/
2https://www.sketchup.com
Fig. 3. Examples of scene construction. (a) Urban 3D reconstruction from
multi-view satellite imagery (Leotta et al. (2019)). (b) A room scene 3D
reconstruction with a RGB-D camera (Navarro et al. (2017)). (c) A remote
client-server system for real-time reconstruction (Stotko et al. (2019)).
etc. Second, unlike CAD modeling, polygon models are more
concept-oriented than measurement-oriented. The popular soft-
ware are 3Ds Max3, Maya4, Blender5, etc. Third, digital sculp-
ture modeling software such as ZBrush6, requires more artistic
skill than polygon modeling. The modeling software can be
combined with engines (Unity3D7or Engine8) to construct the
virtual environment.
Design-based animation is achieved by computer graphics
software to make objects appear to move in 3D space. The de-
signers use such software to construct the simple object first,
followed by rigging. The animator places rigs at strategic points
to make it appear to be moving. Some of the common modeling
software we mentioned above can support animation creations.
For example, Blender is an all-rounder, capable of handling a
full line of tasks from 3D modeling and animation to video edit-
ing. Recently, some interactive controller-based animating are
proposed. For example, AnimationVR is proposed as a plugin
for Unity, which supports beginners to easily create animations
in virtual reality (Vogel et al. (2018)).
3.2. Scene Construction
In the metaverse, users will see many colorful and realistic
scenes. The construction of the scene focuses more on the real-
ism of architecture and layout.
In the modeling process, mesh models are more suitable for
reconstructing some buildings with complex surface shapes,
such as murals and statues. For example, Leotta et al. recon-
structed urban buildings combining segmentation methods into
mesh models (Leotta et al. (2019), Figure 3a). Navarro et al.
used the mesh method to reconstruct indoor rooms into virtual
worlds (Navarro et al. (2017), Figure 3b). Some researchers
also studied how to make remote multiple clients realize real-
time reconstruction access in the virtual environment. SLAM-
Cast achieved a practical client-server system for real-time cap-
ture and many-user exploration of static 3D scenes (Stotko et al.
(2019), Figure 3c).
3https://www.autodesk.com.hk/products/3ds-max/
4https://www.autodesk.com.hk/products/maya/
5https://www.blender.org/
6https://pixologic.com/
7https://unity.com/or Unreal
8https://www.unrealengine.com/
Journal Pre-proof
Journal Pre-proof
Yuheng Zhao et al. /Visual Informatics (2022) 5
Fig. 4. Examples of construction and animation of NPCs. (a) A deep neural
network that directly reconstructs the motion of a 3D human skeleton from
monocular video (Shi et al. (2020)) (b) Adapted Q-networks to create a
control system that can respond to disturbances and allows steering and
other user interactions (Liu and Hodgins (2017)).
3.3. Non-player Character (NPC)
In metaverse, people can interact with NPCs by communica-
tion and gestures, etc, and even befriending AI-driven NPCs
(Duan et al. (2021)). These vivid characters are created by
computer graphics technologies. The creation process is sim-
ilar to the scene construction, but the modeling and animations
of NPCs mainly focuses on how to make them more like real
people or animals, both in appearance, in behavior and intelli-
gence.
Modeling and rendering for characters in metaverse pay more
attention to the features of people, such as skin, hair and cloth-
ing, etc. This has high requirements for modeling and rendering
technology. Many researchers have proposed relevant meth-
ods, for example, Lyu et al. (2020) proposed the first CNN-
integrated framework for simulating various hairstyles.
The research goal of generating animation for NPCs is to
model the action, behavior and decision-making process of hu-
mans and animals. Li et al. summarized seven commonly
used animation methods (Li et al. (2021)), of which finite
state machines (FSM) are the most straightforward and widely
adopted model for NPC to respond to players’ behavior (Lau
and Kuner (2005)). For example, Dehesa et al. proposed a
framework based on a state machine to generate responsive in-
teractive sword fighting animations against player attacks in vir-
tual reality (Dehesa et al. (2020)). Some scholars also used deep
learning techniques to reconstruct motions from videos (Shi
et al. (2020), Figure 4a) and reinforcement learning has com-
bined to enable NPCs to learn automatically from the interac-
tive experience of their surroundings (Liu and Hodgins (2017),
Figure 4b).
3.4. Player Character (Avatar)
Compared to the other two elements, the construction of the
avatar is more user-defined, with many features coming from
users, in both the 3D modeling and animation creation.
Traditionally, users can modify and edit the appearance of
their avatars with many options. To mimic the users’ real-life
appearances, there occurs some applications that allow users to
scan their physical appearance, and subsequently choose their
virtual outfits. Although design-based avatars have many im-
provements in their sense of realism, which are still carton-like.
During various social activities in the metaverse, the details of
the avatar’s face (Wei et al. (2004)) and the micro-expression
(Murphy (2017)), the whole body (Kocur et al. (2020)) could
impact the user perceptions. Therefore, to improve the senses
of realism, many reconstruction technologies are developed for
highly realistic 3D faces and bodies.
Reconstruction of the face is an important part, which is usu-
ally based on 3D Morphable Model (Booth et al. (2016)) gener-
ated by Principal Component Analysis (PCA) algorithm, but the
performance is limited by data quality, dicult to express facial
details. In recent years, with the development of deep learn-
ing technology, a series of Generative Adversarial Networks
(GAN) and SDF have emerged and achieved a higher degree
of realism. For example, Luo et al. (2021) used StyleGAN to
generate highly photorealistic 3d face template.
The research of 3D body has been widely done, e.g. the
3D body micro-neural rendering based on dierent data types
(Wu et al. (2020)), the high realistic human dynamic geome-
try and texture reconstruction based on simple input (Liu et al.
(2021b)). When scanning or capturing the full-body, hands
usually occluded due to the small size. To cope with such
low-resolution, occlusion problems, researchers have proposed
many models to express human hands. The most famous model
is MANO (Romero et al. (2017)), which can well adapt to the
deep learning network.
The animation of avatars is commonly generated by user ma-
nipulation, e.g. interactions with controllers or real-time track-
ing (Genay et al. (2021)). On one hand, through interaction,
users can share actions with their avatars. On the other hand,
since control through inputs that are not physically representa-
tive of the user’s face, hands or body, might therefore not play
in favor of strong ownership illusions, real-time tracking tech-
nologies that can provide a mapping of user movements have
been the important tool for animation generation. For example,
Saragih et al. proposed a real-time system that achieved con-
vincing photorealistic avatar and faces animation from a single
image (Saragih et al. (2011)). Mueller et al. combined a convo-
lutional neural network with a kinematic 3D hand model, which
addressed the highly challenging problem of real-time 3D hand
tracking (Mueller et al. (2018)).
4. Real-world User in Metaverse
The metaverse is a user-centric application by design. As
such, every component of the multiverse should place the hu-
man user at its core (Lee et al. (2021b)). Therefore, the key
to exploring the metaverse is to provide a good experience for
users, including reasonable and eective interaction, and acces-
sible visual guidance or hints to aid in rapid awareness, com-
prehension and analysis. This section aims to describe state-of-
the-art interactive and visualization techniques that can support
user interaction and awareness in the metaverse.
4.1. User Interaction
To a given interaction task, an eective and complete inter-
action process can be realized by the user action (input) and
feedback (output) from various devices. To realize the whole
process, various sensory channels are combined (Figure 5).
Interaction tasks refer to various ways that enable users to
contact, control or influence the metaverse. There are various
Journal Pre-proof
Journal Pre-proof
6 Yuheng Zhao et al. /Visual Informatics (2022)
Fig. 5. The taxonomy of interaction tasks, user action (input) and device feedback (output) with various sensory channels that we summarized in this paper.
kinds of classification methods, for example, Raaen and Sørum
divided interactions tasks into menus, locomotion and interac-
tion (Raaen and Sørum (2019)). In the context of the metaverse,
we decompose interaction tasks into three elementary manipu-
lation processes: navigation, contact and editing.
User Actions refer to behaviors that users can achieve
through various sensory channels, such as gaze and gestures
through body language channel, which is the input of an inter-
action task.
Feedback refers to the responses from devices to a user ac-
tion, such as changes of view in smart glasses and forces gener-
ated by controllers, which requires various sensory channels of
users to participate at the same time. Feedback can be decom-
posed into 5 types: visual channel, acoustic channel (auditory
channel), haptic channel, olfactory channel and gustatory chan-
nel based on dierent sensory channels used.
Sensory channels refer to the perceptual senses used by de-
vices during an interaction task. We divide channels in two
ways by dierent tasks: converting user action into digital in-
put and converting digital output as feedback to users.
4.1.1. Interaction Tasks
Navigation refers to user operations that result in view
changes in the metaverse. Navigation tasks can be decom-
posed into navigation by geographic cues and navigation by
non-geographic cues. Mainly there are four dierent kinds
of navigation by geographic cues: real walking with the dis-
placement of users, panning using controllers, changing view-
point with head movement and pointing and teleport using con-
trollers. Users can also be navigated by non-geographic cues
such as navigation by query, by tasks, or other specified move-
ments (Jankowski and Hachet (2013)).
Contact refers to the ability to touch and feel objects by con-
trollers in the environment. Contact tasks can be divided into
two dierent ways: direct contact and indirect contact. Di-
rect contact refers to interactions by using controllers as bare
hands, fingertips or any body parts, and are highly similar to
movements in the real world when direct touching happens. In-
direct touching refers to interactions using certain things in the
metaverse. Under that circumstance, devices, such as hand con-
trollers, are considered as continuity to the user’s hands. For
instance, the user may need to use the controller as a knife to
touch and feel the organs of the body when operating surgeries
in the metaverse.
Editing refers to the interaction process that involves chang-
ing either properties, position or orientation of any objects in
metaverse. The editing process includes two phases: selection
and manipulation.
4.1.2. Various Channels of User Action
Body language and sonification channel can be decoded
from gaze, which requires eye-tracking technology, or from
gesture, which can be recognized with gesture recognition-
based sensors, or from sonification, which can communicate
with objects in metaverse.
Haptic channel consists of two types: tactile and kinesthetic,
which combine the sense of touch used in the interaction pro-
cess. The tactile cues are developed through the skin, while the
kinesthetic ones come from proprioception and are through the
muscles and the tendons (Bouzbib et al. (2021)).
Brain signal channel also can be used as an input, which is
widely utilized in brain-computer interface systems. Electro-
Encephalo-Graphy(EEG), electromyography(EMG) biopoten-
tials, SSVER, P3 EP and many other signals can be used as
control inputs to several specific tasks in the metaverse (Fried-
man et al. (2007)). For example, the brain can let users change
the camera position in metaverse toward the left or right by us-
ing two dierent brain signals, such as left- or right-hand mo-
tor imagery (MI) or two steady-state visual-evoked potentials
(SSVEPs) at dierent frequencies (L´
ecuyer et al. (2008)).
4.1.3. Various Channels of Feedback
Visual channel and auditory channel refers to feedback
generated based on the vision and the sense of acoustic. The
Journal Pre-proof
Journal Pre-proof
Yuheng Zhao et al. /Visual Informatics (2022) 7
visual feedback is related to changes generated by user action,
displayed in a stereoscopic viewpoint via smart glasses or head-
sets. The auditory feedback can be further decomposed into
two dierent types: sound eect, which can make users bet-
ter immerse themselves into metaverse, and voice which is key
acoustic feedback to allow users to communicate with an avatar
in metaverse with more sense of reality.
Haptic channel refers to feedback generated based on the
sense of touch. Users can feel dierent materials, textures, tem-
peratures or feel shapes and patterns through their fingertips,
and can perceive stickiness, smoothness, pressure, vibration or
friction. Feedback mentioned above involves multiple force
types: tension, traction, reaction, resistance and impact, which
help enhance the user experience in the metaverse. Gloves or
exoskeletons constrain the user’s hands for simulating shapes
or stimulate other haptic features such as stiness, friction or
slippage (Bouzbib et al. (2021)).
Olfactory channel refers to feedback generated based on the
sense of smell. The major way of olfactory feedback is dispers-
ing specific scents to match the requirements of certain interac-
tion tasks. Olfactory displays are widely used to disperse scent,
which synchronizes odorants from a digital description. The
feedback from the olfactory channel can better immerse users
in the metaverse. Generally, an Olfactory display consists of
a palette of odorants, a flow delivery system and a control al-
gorithm that determines the mixing ratios, concentration and
timing of the stimulus (Richard et al. (2006)).
Gustatory channel refers to feedback generated based on
the sense of taste. There are mainly two ways of actuating the
sense of taste, using the chemical combination to spray into the
user’s mouth or to spray to an area where the user can lick with
tongue, or using digital taste actuation technology to produce
various taste sensations. Devices can be developed to fulfill
the process of gustatory stimulation with or without chemicals,
for example, the “Virtual cocoon” can spray flavors directly
into the user’s mouth, and the “Food Simulator” uses chemical
and mechanical linkages to simulate food chewing sensations
by providing flavoring chemicals, biting force, chewing sound,
and vibration to the user (Ranasinghe et al. (2011)). Applying
thermal stimulation on the tongue and stimulating the TRPM5
(Transient receptor potential cation channel subfamily M mem-
ber 5) taste channel can enhance the flavor of sweet, bitter, and
umami tastes (Karunanayaka et al. (2018)).
4.1.4. Collaborative Interaction
Generally, collaborative interaction can be decomposed into
three categories: communication, joint navigation and collabo-
rative editing.
Communication can be further categorized as gestural com-
munication, verbal communication and other ways using body
language. Users can transfer information by gestures, such as
tracing along the boundaries of objects or simply pointing at
each other (Beck et al. (2013)). Certain gestures can be utilized
as established interaction methods according to design mecha-
nism, taking advantage of users’ social intuition and commu-
nicative skills (Roth et al. (2015)). In verbal communication,
the combination ratio of intonation, speech speed and sound
volume of an avatar, if not transmitted directed from a user’s
voice, are frequently adjusted to aect the level of eective-
ness during the social interaction process (Eynard et al. (2015)).
Other body languages, such as body motion, facial expressions
can be tracked, transmitted and represented via virtual avatars
in the metaverse (Roth et al. (2017)). Devices and algorithms
may be required for high rapport and better understanding when
intercultural communications happen. For instance, in a con-
versation between a Japanese and a German, the bow may need
to translate into the handshake displayed by the corresponding
avatar (Roth et al. (2015)).
Joint navigation refers to multi-user operations that result in
view changes in the metaverse. In a complete joint navigation
process, four dierent techniques are required: forming naviga-
tional groups by multi-users, distributing navigational respon-
sibilities, performing navigation tasks together and ending joint
navigation by splitting up (Weissker et al. (2020)).
Collaborative editing refers to the interaction process that
involves changing either properties, position or orientation of
any objects by multi-users in the metaverse. Users can manipu-
late objects together, and any user-crested element can be kept,
changed or moved by other users (Greenwald et al. (2017)).
4.2. User Awareness
In the metaverse, users are exposed to the environment com-
posed of dierent data. The perception of the complex environ-
ment through the processing of data is needed, and in further
scenarios, especially with understanding and analytical needs
(VR meeting, etc.), a detailed analysis of the data is required.
These perception, comprehension and analysis scenarios are
considered as user awareness when exploring in the metaverse.
In this section, we summarize existing advanced research from
five dimensions (Figure 6) to help guide the related research
in the metaverse. To build our taxonomy, we collected papers
published on leading conferences and journals in Visualization
and HCI, including Vis, TVCG, Eurovis, CHI, etc. We chose
the papers if they discuss the metaverse from perspectives of vi-
sualization, immersive environment and immersive interaction.
Eventually, we found 25 key publications and organized them
into a meaningful taxonomy for our main analysis.
4.2.1. Data Visualization
Dierent data types often correspond to dierent scenarios
and exploration tasks in the metaverse, so there are dierent
data visualizations and dierent ways of interaction, etc. We
find that the currently available types of immersive visualization
can be divided into spatial and tabular data, multidimensional
and rational data, medium data.
Spatial and tabular data generally refers to common data
such as spatial data (e.g. geographic, scientific) (Lin et al.
(2021); Hurter et al. (2018)), temporal data (Cantu et al. (2018);
Prouzeau et al. (2020)). For example, some scholars proposed
immersive map visualizations (Yang et al. (2018b, 2020); Satri-
adi et al. (2020); Yang et al. (2018a)) or visualized geographic
data (White and Feiner (2009)), which are essential for naviga-
tion in metaverse.
Rational data includes multidimensional data (Filho et al.
(2018)), graph/tree structure data, etc, and often corresponds to
Journal Pre-proof
Journal Pre-proof
8 Yuheng Zhao et al. /Visual Informatics (2022)
In-Situ Visualization …
Kaplan et al. (2016)
ImAxes …
Cordeil et al. (2017)
Maps and Globes …
Yang et al. (2018)
Neuron Tracing …
Usher et al. (2018)
HeloVis …
Cantu et al. (2018)
VirtualDesk…
Filho et al. (2018)
Clusters, Trends …
Butscher et al. (2018)
The immersive …
Onorati et al. (2018)
Exploration …
Ivanov et al. (2018)
Information …
Patnaik et al. (2018)
Origin-Destination …
Yang et al. (2018)
FiberClay …
Hurter et al. (2019)
Scaptics and …
Prouzeau et al. (2019)
OSC-XR …
Johnson et al. (2019)
Maps Around …
Satriadi et al. (2020)
Embodied Axes …
Cordeil et al. (2020)
Corsican twin …
Prouzeau et al. (2020)
Personal …
Reipschlager et al. (2020)
Tilt Map …
Yang et al. (2020)
Narrative scientic …
Liu et al. (2021)
Towards an …
Lin et al. (2021)
Shared Surfaces …
Lee et al. (2021)
Uplift …
Ens et al. (2021)
Egocentric Network …
Sorger et al. (2021)
The MADE-Axis …
Smiley et al. (2021)
1
3
2
3
3
12
9
13
9
14
11
1
2
6
5
22
5
8
9
18
21
17
7
12
7
13
15
8
8
9
11
14
12
8
9
6
9
8
10
8
9
11
6
11
7
11
6
Spatial
Temporal
Multi-dimension
Graph / Tree
Text
Audio
Image
Video
Egocentric
Exocentric
Single view
Multiple view
Selection
Navigation
Connection
Organization
Haptics
Sonication
Gesture
Gaze
AR / MR
VR
Data Type Position
View Environment
Collaboration
Olfaction
Interaction with Visualization
Fig. 6. We reviewed 20 key articles from 2016 to 2021 and summarized the taxonomy of immersive visualization from 5 dimensions. The boxes with
dierent colors represent Data Visualization ( ), View ( ), Position ( ), Interaction for visualization ( ), and Environment ( ).
higher analytical needs in the metaverse, such as virtual analy-
sis workshops or presentation. For example, ImAxes is an in-
teractive multi-dimensional visualization tool for understanding
such rational data, which is a basic technology for future anal-
ysis in the metaverse (Cordeil et al. (2017)).
Multimedia data (text, audio, image and video) that often
appears in metaverse that need to be perceived in an appropri-
ate way. Due to the complexity and unstructured feature, the re-
lated work is still relatively scarce. Currently, text visualization
is primarily used to help users explore by generating descrip-
tions to aid comprehend (Ivanov et al. (2018); Liu et al. (2021c))
or help analysts to understand semantic features (Onorati et al.
(2018)). Audio data also can be visualized in an immersive
environment for music interaction (Johnson et al. (2019)), sto-
rytelling (Latif et al. (2021)), and (Kaplan et al. (2016)) visual-
izing kinetic metrics of foot pedals in video data to aid motor
training.
4.2.2. View
A visualization may consist of a single view or multiple
views. A single view refers to the representation of a set of
data in a single window. Multiple views refer to any instance
where data is represented in multiple windows. While simple
exploration can be implemented with a single view, many com-
plex tasks such as multiple or data types, visual comparison
tasks benefit from multiple views. For example, users can per-
form map exploration search, comparison and route-planning
tasks by a multi-view map visualization (Satriadi et al. (2020)).
Therefore, connection and organization interactions between
multiple views are required for such tasks. Such connection
includes simultaneous updating or highlighting corresponding
information between views, etc. The organization is to arrange
the views, e.g., in a gallery or sequence (Batch et al. (2019)).
4.2.3. Position
To provide a good perception for the user, the position of
the view(s) relative to the user also needs to be adjusted with
an egocentric or an exocentric perspective (Ens et al. (2014)).
An egocentric position means that view is arranged regarding
the user’s position and around the user. An exocentric position
means that the view(s) is arranged regardless of the user’s posi-
tion. By reviewing the literature, we found that position type is
related to task accuracy and task mode.
The exocentric position is more accurate for search and judg-
ment tasks since being outside of the data in exocentric aords
a full overview, which is less fatiguing and be easier to use in
existing analyst work-spaces (Wagner et al. (2021)). By con-
trast, being inside the data in egocentric allows the observation
of details through spontaneous exploration. Yang et al. also
found that exocentric globes are more accurate and faster with
overview observations (direction or area), while egocentric is
more suitable for observing small variations in detail (Yang
et al. (2018b)). Therefore, exocentric and egocentric are suit-
able for dierent tasks. When exploring details of a large net-
work, a simple egocentric interface can considerably improve
the eciency (Sorger et al. (2021)).
There are two kinds of task modes: exploration and presenta-
tion. Batch et al. conducted an evaluation (Batch et al. (2019))
Journal Pre-proof
Journal Pre-proof
Yuheng Zhao et al. /Visual Informatics (2022) 9
Fig. 7. Examples of interaction for visualization. (a) Tilting a 3D prism
map to 2D chart by controllers (Yanget al. (2020)). (b)Comparing tangible
axes and gesture interaction (Cordeil et al. (2020)). (c) Vibration feedback
encodes the density in a 3D scatter visualization (Prouzeau et al. (2019)).
that used the ImAxes immersive visualization system (Cordeil
et al. (2017)). The results indicated that participants placed vi-
sualizations egocentrically and in close-range exploration tasks
since it is more ecient to use local space around the user. Con-
versely, participants used more space to arrange visualizations
in an exocentric way when presenting insights to others in a
collaborative setting. Based on such evaluation, Satriadi et al.
also adopted an egocentric position that allows users to create
large hierarchies of multiple maps at dierent scales and ar-
range them in 3D space (Satriadi et al. (2020)).
4.2.4. Interaction for Visualization
Dierent from section 4.1, this section focuses on the inter-
actions in the exploration of dierent data visualizations in the
metaverse. For example, we add connection and organization
for multi-views, which refer to the capability to support the co-
ordination and arrangement of multiple views. Some channels,
such as brain signal and gustation, have limited integration with
data visualization and are therefore not explicitly listed.
Interaction for visualization with various channels is im-
plemented by using haptic controllers or other senses to perform
various manipulations.
In terms of haptic controllers, Yang et al. introduced tilting
for transitioning between a 2D and 3D map (Yang et al. (2020),
Figure 7a). When using controllers for navigation, zoom or
overview is better than standard locomotion alone (Yang et al.
(2021)). Moreover, some scholars explored tangible widgets in
immersive visualization (Smiley et al. (2021)). For example,
embodied axes (Cordeil et al. (2020)) utilized three orthogonal
arms to represent data axes for selection and navigation (Figure
7b). Butscher et al. explored parallel coordinates for multi-
dimensional data using a touchable desktop (Butscher et al.
(2018)). These eorts tend to have better accuracy since they
are fixed in physical space.
Other sensing channels can also help interactions, which can
be freer and more in tune with human behaviors, but with a cor-
responding accuracy reduction. Body languages like gestures
are widely used, for example embodied axes supported mid-air
gestures for selection (Cordeil et al. (2020), Figure 7). The soni-
fication channel is usually used as a voice input command, the
combination of natural language techniques allows the machine
to help users interact with the visualization (Liu et al. (2021a)).
However, such methods with immersive visualization in meta-
verse still need to be explored.
Interaction for visualization with feedback refers to using
various channels to perceive the results of an interaction. Haptic
Fig. 8. Examples of collaborative interaction. (a) The FIESTA system for
collaborative immersive analytics in VR (Lee et al. (2021a)). (b) The UpLift
system uses tangibles widgets and the Hololens to visualise and understand
building energy (Ens et al. (2021b)).
feedback often takes the form of vibration, which improves the
accuracy of the user’s perception. Prouzeau et al. encoded data
density as vibration intensity and provided feedback through the
haptic controller (Prouzeau et al. (2019), Figure 7c). The vibra-
tion can improve user performance for identifying void regions,
which is helpful for users to perceive with the assistance of data
visualization, especially unknown areas in the metaverse. Usher
et al. applied vibration with selection, which enables users to
have a stronger perception of accuracy (Usher et al. (2018)).
Sonic feedback tends to be sound eects that fit the context
of the exploration. Ivanov et al. implemented a novel unit vi-
sualization with many 3D avatars, and they proposed the op-
portunity to leverage audio mappings to create a strong associa-
tion between visual and audio elements, such as unique sounds,
pitches, or speech (Ivanov et al. (2018)). Olfaction can also
used to transmit information (Batch et al. (2020)). For example,
in a 2D network graph, Patnaik et al. chose smell-color com-
binations such as lemon-orange, leather-red, coconut-white to
represent nodes (Patnaik et al. (2018).
Collaboration interaction for visualization allows multi-
ple users to simultaneously explore a meta-cosmic environment
with the help of data visualization.
From a perspective of environment, most of them are imple-
mented in an AR environment (Ens et al. (2021b); Lee et al.
(2021a); Butscher et al. (2018); Reipschlager et al. (2020))
since AR environments can decrease the cognitive and func-
tional load on the user (Billinghurst and Kato (1999)). How-
ever, these works are with external displays such as tabletops
and large displays (Figure 8b), which are less free to move (Ens
et al. (2021b); Butscher et al. (2018)). For this reason, the FI-
ESTA collaborative system (Lee et al. (2021a)) is built in an
unconstrained VR environment (Figure 8a), whereby users can
move freely.
In terms of interaction channels, collaborative interactions
are more based on voice chat, through co-located or remote
communication. For example, Embodied Axes (Cordeil et al.
(2020)) deployed the system to a remote client, but the com-
munication is still through third-party video calling software.
In addition to sonic communication, gaze, deictic pointing ges-
tures or placement gestures can also indicate collaborator’s at-
tention focus to facilitate communication (Figure 8b). More-
over, Lee et al. mentioned that the use of avatars and pointers
also facilitated collaboration, with deixis allowing participants
to work while up close or when far apart (Lee et al. (2021a),
Figure 8a).
Journal Pre-proof
Journal Pre-proof
10 Yuheng Zhao et al. /Visual Informatics (2022)
5. Application Status
Since metaverse applications are still in their infancy, we in-
vestigated potential applications that might be the core appli-
cations of the future metaverse, such as visual social, virtual
medical, virtual city or virtual games, to prospect what the fu-
ture metaverse might look like.
5.1. Virtual Social
A number of virtual platforms have been proposed that o-
cially support social networking. These platforms have strong
collaborative interaction capabilities, such as Rec Room9, VR-
Chat10 and AltspaceVR11. These prototypes are dierent from
each other in the aspects of navigation, spaces, and social me-
chanics. For example, Rec Room and AltspaceVR use tele-
portation as the the primary mode of navigation. Rec Room
supports more embodied modes of friending (through a virtual
“handshake”) as well as muting (by putting one’s hand up as
if to say “stop”). Recently, Duan et al. implemented an initial
micro metaverse prototype of a university campus, which sup-
ports rich virtual social activities (Duan et al. (2021)). These
platforms have very strong collaborative communication capa-
bilities through high fidelity graphical interface and rich inter-
active approaches.
5.2. Virtual Medical
Metaverse can help medical professionals work faster,
cleaner, virtualer, and safer when caring for their patients. For
example, AccuVein12 projects a map of a patient’s veins onto
the skin. Beyond Metaverse13 makes innovative extended re-
ality (XR) solutions to improve medical education, training for
clinicians, surgical planning, procedure, treatment, and diag-
nosis. Among these applications, we found that visualization
aids to understanding are most used in the medical field and are
most closely linked to related scientific visualization research,
but users in these applications are usually co-located to explore
in AR environments.
5.3. Virtual City
Although there are fewer specialized platforms for virtual
cities, more integrated into other areas. We found that the sim-
ulation of transport and reconstruction of urban buildings were
more prominent in the few applications. For example, Mega-
World14 is an open platform with avatars, public transport such
as subways and buses, etc. Users can choose from any mode
of transportation, such as dierent bus routes, to experience
real city life. In this world, handcraft is allowed for users to
create something by themselves. In addition, like the real city
life, the core of the this application is the vibrant player-driven
economics, such as trade and taxes. We can imagine that the
9https://recroom.com
10https://hello.vrchat.com/
11https://altvr.com/
12https://www.accuvein.com/
13https://www.veyondmetaverse.com/
14https://megaworld.io
future metaverse platform can simulate real government aairs
and life business, so as to facilitate the management of cities.
The scenes in current applications is very similar to real cities,
which means that the high fidelity city reconstruction technol-
ogy will become an important foundation for the realization of
the combination of the metaverse and smart city.
5.4. Virtual Games
Virtual games are the closest applications to the concept of
a metaverse, which are much more flexible once they are at-
tached to holographic virtual reality environments with futur-
istic technologies. Bringing the metaverse to the global stage
was the listing of Roblox15. Compared to other games, Roblox
games have their own characters, focus on social needs, which
is considered an early form of the metaverse. Other games or
apps that most closely resemble the form of the metaverse in-
cludes: Decentraland16, The Sandbox17 , Cryptovoxels18. Com-
pare with the framework we have proposed for metaverse, we
found that virtual games have many of the positive qualities,
high realism, freedom, and high sharing and sociality. How-
ever, the visual representation and exploration of the metaverse
still has further requirements to be explored, such as complex
exploration and perception tasks for single or multiple users,
with very high demands for real-time rendering, interaction and
visualization.
6. Opportunities for Future Research
In this section, we discuss the future research opportunities
of graphics for visual construction of metaverse, and that of
interaction and visualization for user-centered exploration.
6.1. Visual Construction
Building a more realistic virtual world. As technology
continues to innovate and the real world evolves at a rapid pace,
how to make the metaverse more realistic will need to be fur-
ther developed through graphic technology. For example, the
realism of the avatar’s dress code and fabrics, and the high fi-
delity of the faces, all require more accurate modeling algo-
rithms. This high realism is also reflected in the creation of an-
imations in the metaverse, where people often see dynamic and
interactive scenes and NPCs and avatars, however, everything
in the world cannot rely entirely on manual modeling and re-
quires certain automation. By reviewing the literature, we have
found that deep learning-based techniques help to improve 3D
construction, and thus modeling with high accuracy, automation
and interactive VR animation construction is a future challenge
and research direction.
Incorporating human creativity. Real-world people can be
the inspiration and source of many products in the metaverse.
The construction of the metaverse should allow for user input in
15https://www.roblox.com/
16https://decentraland.org
17https://www.sandbox.game/en/
18https://www.cryptovoxels.com/
Journal Pre-proof
Journal Pre-proof
Yuheng Zhao et al. /Visual Informatics (2022) 11
it, which is then generated with the aid of graphical techniques.
In addition to the creation of avatars, many 3D artifacts of a
person, such as dwellings, vehicles, etc., could be allowed to be
reconstructed in the metaverse. This places a higher demand on
image-based or video-based reconstructions taken by portable
devices. In addition, some 2D objects including images, videos,
artworks, visualizations or human-computer interfaces created
by people, can be transferred into metaverse and combined with
3D objects to enable better immersive experiences.
6.2. User Interaction in metaverse
Reducing user interaction burden. The variety of user ac-
tions depends on the development of sensors and devices. The
types of action input captured by various devices can fulfill the
requirements of many dierent types of immersive interaction.
As mentioned before, there are various devices designed to sup-
port users to navigate, touch and edit things in the metaverse.
Users can use their body language, such as gaze, gesture, or
head positions, besides, users can use action through sonifica-
tion and haptic channels. Users can also take action simply by
thoughts, that is, using brain signals such as EEG. There are a
few limits. Types of user action are limited due to the design
mechanism of dierent developers and are still dierent from
natural interaction action that happens in the real world. Users
are required to facilitate certain devices and to remember spe-
cific operations designed by dierent developers, applications
and platforms.
Feedback with multi-sensory channels. From the visual
channel, acoustic channel and haptic channel to olfactory chan-
nel and gustatory channel, types of feedback of users have been
researched. Various devices try to restore the realism of the
virtual environment from all kinds of aspects as much as pos-
sible. Improvements are implemented in all aspects to refine
the immersive user experience. For example, via the gustatory
channel, instead of using chemicals to simulate a taste, other
methods such as electronic, thermal and magnetic are applied
to reduce the use of physical instances, which always creates
aftermarket problems such as the need for refilling. We expect
to see the emergence of tighter, all-in-one devices that better
combine multiple sensory channels to better immerse users in
the metaverse.
6.3. User Comprehension in metaverse
Enriching visualizations in metaverse. Data visualization
in the real world has already penetrated all aspects of life and is
deeply integrated with dierent kinds of data to assist people’s
perception of life, but in the metaverse, visualization is not yet
widely used and people’s experience of information perception
in the metaverse is not yet sucient. In the future, if people
work and study in the metaverse, or even present their work
remotely, it is an open question how complex data, such as net-
work structures, trees, 3D scatters can be reasonably perceived
and analyzed. Moreover, in order to better experience the meta-
verse and understand things in it, data visualization should be
more deeply integrated into the basic elements of the metaverse,
such as scene self-description, storytelling, etc.
Associating visualization with user needs. How to ratio-
nally link visualization to user needs is the main challenge to
providing a user-centric awareness experience, the main man-
ifestations are when, where and how. First, when the user
will need this information may require the user behavioral
data (trajectories, clicks, eye movements, etc.) to develop
an information-assisted strategy or automatic recommendations
with the help of machine learning algorithms. Second, the pre-
set arrangement of such visualization and whether its presence
interferes with the observation of other things require empiri-
cal analysis of the interaction design. Third, the way to interact
with the visualization can be controlled by the user or can be
done by NPCs. For example, intelligent robots can help users
perceive the metaverse environment and the visualizations by
oering hints, communicating and interacting with users.
Enhancing collaboration in visualization. One of the ad-
vantages for users to engage with the metaverse is that it comes
with social properties that facilitate collaborative communica-
tion. As the user perceives the virtual world, whose personal
insights and observations should be passed on to others. This
transmission is sometimes limited to co-located shared observa-
tions with voice communication. Other methods of interaction
and sensory channels should be explored to aid collaboration.
It is worth noting that the avatar plays an important role that al-
lows collaborators to quickly and naturally learn what the user
wants to convey, just as in the real world.
7. Conclusions
In this work, we present a framework describing metaverse
from the perspectives of graphics, interaction and visualization.
We first describe the graphical techniques used to construct vi-
sual elements of metaverse (scenes, NPCs and avatars) through
a technical pipeline within a taxonomy. And we propose two
taxonomies to summarize the research status of interaction and
visualization techniques that can support user interaction and
awareness of such visual elements in the metaverse. Through
our framework, scholars in related fields can readily know how
to create interactive, comprehensible visual elements of the
metaverse. We investigate the related potential applications for
metaverse in the fields of virtual games, virtual social, virtual
medical or city), and prospect what the metaverse might look
like in the future. Finally, we discuss the research opportunities
for future work based on our review. We believe this survey can
provide useful insights into the field of visual construction and
exploration of the metaverse.
Acknowledgments
This work is supported by Shanghai Municipal Science
and Technology Major Project 2018SHZDZX01 and ZJLab.
This work is also supported by Shanghai Sailing Program
No.21YF1402900 and the Science and Technology Commis-
sion of Shanghai Municipality (Grant No. 21ZR1403300).
Journal Pre-proof
Journal Pre-proof
12 Yuheng Zhao et al. /Visual Informatics (2022)
References
Bargteil, A.W., Shinar, T., Kry, P.G., 2020. An introduction to physics-based
animation, in: SIGGRAPH Asia 2020 Courses, Association for Computing
Machinery, New York, NY, USA.
Batch, A., Cunningham, A., Cordeil, M., Elmqvist, N., Dwyer, T., Thomas,
B.H., Marriott, K., 2019. There is no spoon: Evaluating performance, space
use, and presence with expert domain users in immersive analytics. IEEE
transactions on visualization and computer graphics 26, 536–546.
Batch, A., Patnaik, B., Akazue, M., Elmqvist, N., 2020. Scents and sensibil-
ity: Evaluating information olfactation, in: Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems, pp. 1–14.
Beck, S., Kunert, A., Kulik, A., Froehlich, B., 2013. Immersive group-to-group
telepresence. IEEE transactions on visualization and computer graphics 19,
616–625.
Beer, S., 2015. Virtual museums: an innovative kind of museum survey, in:
Proceedings of the 2015 Virtual Reality International Conference, pp. 1–6.
Besanc¸ on, L., Ynnerman, A., Keefe, D.F., Yu, L., Isenberg, T., 2021. The state
of the art of spatial interfaces for 3d visualization, in: Computer Graphics
Forum, Wiley Online Library. pp. 293–326.
Billinghurst, M., Kato, H., 1999. Collaborative mixed reality, in: Proceedings
of the First International Symposium on Mixed Reality, pp. 261–284.
Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A., Dunaway, D., 2016. A 3d
morphable model learnt from 10,000 faces, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5543–5552.
Bouzbib, E., Bailly, G., Haliyo, S., Frey, P., 2021. can i touch this?”:
Survey of virtual reality interactions via haptic solutions. arXiv preprint
arXiv:2101.11278 .
Butscher, S., Hubenschmid, S., M¨
uller, J., Fuchs, J., Reiterer, H., 2018. Clus-
ters, Trends, and Outliers: How Immersive Technologies Can Facilitate the
Collaborative Analysis of Multidimensional Data. Association for Comput-
ing Machinery, New York, NY, USA. p. 1–12.
Cantu, A., Duval, T., Grisvard, O., Coppin, G., 2018. Helovis: A helical vi-
sualization for sigint analysis using 3d immersion, in: 2018 IEEE Pacific
Visualization Symposium (PacificVis), IEEE. pp. 175–179.
Cordeil, M., Bach, B., Cunningham, A., Montoya, B., Smith, R.T., Thomas,
B.H., Dwyer, T., 2020. Embodied axes: Tangible, actuated interaction for
3d augmented reality data spaces, in: Proceedings of the 2020 CHI Confer-
ence on Human Factors in Computing Systems, Association for Computing
Machinery, New York, NY, USA. p. 1–12.
Cordeil, M., Cunningham, A., Dwyer, T., Thomas, B.H., Marriott, K., 2017.
Imaxes: Immersive axes as embodied aordances for interactive multivariate
data visualisation, Association for Computing Machinery, New York, NY,
USA. p. 71–83.
Davis, A., Murphy, J.D., Owens, D., Khazanchi, D., Zigurs, I., 2009. Avatars,
people, and virtual worlds: Foundations for research in metaverses. Journal
of the Association for Information Systems 10, 90.
Dehesa, J., Vidler, A., Lutteroth, C., Padget, J., 2020. Touch´
e: Data-driven
interactive sword fighting in virtual reality, in: Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems, pp. 1–14.
Deng, T., Zhang, K., Shen, Z.J.M., 2021. A systematic review of a digital twin
city: A new pattern of urban governance toward smart cities. Journal of
Management Science and Engineering .
Duan, H., Li, J., Fan, S., Lin, Z., Wu, X., Cai, W., 2021. Metaverse for Social
Good: A University Campus Prototype. Association for Computing Machin-
ery, New York, NY, USA. p. 153–161.
Ens, B., Bach, B., Cordeil, M., Engelke, U., Serrano, M., Willett, W., Prouzeau,
A., Anthes, C., B¨
uschel, W., Dunne, C., et al., 2021a. Grand challenges in
immersive analytics, in: Proceedings of the 2021 CHI Conference on Human
Factors in Computing Systems, pp. 1–17.
Ens, B., Goodwin, S., Prouzeau, A., Anderson, F., Wang, F.Y., Gratzl, S., Lu-
carelli, Z., Moyle, B., Smiley, J., Dwyer, T., 2021b. Uplift: A tangible and
immersive tabletop system for casual collaborative visual analytics. IEEE
Transactions on Visualization and Computer Graphics 27, 1193–1203.
Ens, B., Hincapi´
e-Ramos, J.D., Irani, P., 2014. Ethereal planes: a design frame-
work for 2d information space in 3d mixed reality environments, in: Pro-
ceedings of the 2nd ACM symposium on Spatial user interaction, pp. 2–12.
Eynard, R., Pallot, M., Christmann, O., Richir, S., 2015. Impact of verbal
communication on user experience in 3d immersive virtual environments,
in: 2015 IEEE International Conference on Engineering, Technology and
Innovation/International TechnologyManagement Conference (ICE/ITMC),
IEEE. pp. 1–8.
Filho, J.A.W., Freitas, C.M., Nedel, L., 2018. VirtualDesk: A Comfortable and
Ecient Immersive Information Visualization Approach. Computer Graph-
ics Forum .
Fonnet, A., Prie, Y., 2019. Survey of immersive analytics. IEEE transactions
on visualization and computer graphics .
Freiknecht, J., Eelsberg, W., 2017. A survey on the procedural generation of
virtual worlds. Multimodal Technologies and Interaction 1, 27.
Friedman, D., Leeb, R., Guger, C., Steed, A., Pfurtscheller, G., Slater, M.,
2007. Navigating virtual reality by thought: What is it like? Presence:
Teleoperators and virtual environments 16, 100–110.
Genay, A.C.S., Lecuyer, A., Hachet, M., 2021. Being an avatar” for real”: a
survey on virtual embodiment in augmented reality. IEEE Transactions on
Visualization and Computer Graphics .
Greenwald, S.W., Corning, W., Maes, P., 2017. Multi-user framework for col-
laboration and co-creation in virtual reality, 12th International Conference
on Computer Supported Collaborative Learning .. . .
Grubert, J., Itoh, Y., Moser, K., Swan, J.E., 2017. A survey of calibration
methods for optical see-through head-mounted displays. IEEE transactions
on visualization and computer graphics 24, 2649–2662.
Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M., 2021. Deep learn-
ing for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis
and Machine Intelligence 43, 4338–4364.
Huang, X., Mei, G., Zhang, J., Abbas, R., 2021. A comprehensive survey on
point cloud registration. arXiv preprint arXiv:2103.02690 .
Hurter, C., Riche, N.H., Drucker, S.M., Cordeil, M., Alligier, R., Vuillemot, R.,
2018. Fiberclay: Sculpting three dimensional trajectories to reveal structural
insights. IEEE transactions on visualization and computer graphics 25, 704–
714.
Intwala, A.M., Magikar, A., 2016. A review on process of 3d model recon-
struction, in: 2016 International Conference on Electrical, Electronics, and
Optimization Techniques (ICEEOT), IEEE. pp. 2851–2855.
Ivanov,A., Danyluk, K.T., Willett, W.,2018. Exploration & anthropomorphism
in immersive unit visualizations, in: Extended Abstracts of the 2018 CHI
Conference on Human Factors in Computing Systems, pp. 1–6.
Jankowski, J., Hachet, M., 2013. A survey of interaction techniques for inter-
active 3d environments, in: Eurographics 2013-STAR.
Johnson, D., Damian, D., Tzanetakis, G., 2019. Osc-xr: A toolkit for extended
reality immersive music interfaces, in: Proc. Sound Music Comput. Conf,
pp. 202–209.
Joshua, J., 2017. Information bodies: Computational anxiety in neal stephen-
son’s snow crash. Interdisciplinary Literary Studies 19, 17 47.
Kaplan, O., Yamamoto, G., Yoshitake, Y., Taketomi, T., Sandor, C., Kato, H.,
2016. In-situ visualization of pedaling forces on cycling training videos,
in: 2016 IEEE International Conference on Systems, Man, and Cybernetics
(SMC), IEEE. pp. 000994–000999.
Karunanayaka, K., Johari, N., Hariri, S., Camelia, H., Bielawski, K.S., Cheok,
A.D., 2018. New thermal taste actuation technology for future multisensory
virtual reality and internet. IEEE transactions on visualization and computer
graphics 24, 1496–1505.
Kelly, J.W., Cherep, L.A., Lim, A.F., Doty, T., Gilber, S.B., 2021. Who are
virtual reality headset owners? a survey and comparison of headset owners
and non-owners, in: 2021 IEEE Virtual Reality and 3D User Interfaces (VR),
IEEE. pp. 687–694.
Khaloo, A., Lattanzi, D., 2017. Hierarchical dense structure-from-motion re-
constructions for infrastructure condition assessment. Journal of Computing
in Civil Engineering 31, 04016047.
Kocur, M., Graf, S., Schwind, V., 2020. The impact of missing fingers in virtual
reality, in: 26th ACM Symposium on Virtual Reality Software and Technol-
ogy, Association for Computing Machinery, New York, NY, USA.
Kraus, M., Fuchs, J., Sommer, B., Klein, K., Engelke, U., Keim, D., Schreiber,
F., 2021. Immersive analytics with abstract 3d visualizations: A survey, in:
Computer Graphics Forum, Wiley Online Library.
Latif, S., Tarner, H., Beck, F., 2021. Talking realities: Audio guides in virtual
reality visualizations. IEEE Computer Graphics and Applications .
Lau, M., Kuner, J.J., 2005. Behavior planning for character animation, in:
Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on
Computer animation, pp. 271–280.
L´
ecuyer, A., Lotte, F., Reilly, R.B., Leeb, R., Hirose, M., Slater, M., 2008.
Brain-computer interfaces, virtual reality, and videogames. Computer 41,
66–72.
Lee, B., Hu, X., Cordeil, M., Prouzeau, A., Jenny, B., Dwyer,T., 2021a. Shared
surfaces and spaces: Collaborative data visualisation in a co-located immer-
Journal Pre-proof
Journal Pre-proof
Yuheng Zhao et al. /Visual Informatics (2022) 13
sive environment. IEEE Transactions on Visualization and Computer Graph-
ics 27, 1171–1181.
Lee, L.H., Braud, T., Zhou, P., Wang, L., Xu, D., Lin, Z., Kumar, A., Bermejo,
C., Hui, P., 2021b. All one needs to know about metaverse: A complete
survey on technological singularity, virtual ecosystem, and research agenda.
arXiv preprint arXiv:2110.05352 .
Lee, L.H., Hui, P., 2018. Interaction methods for smart glasses: A survey. IEEE
access 6, 28712–28732.
Leotta, M.J., Long, C., Jacquet, B., Zins, M., Lipsa, D., Shan, J., Xu, B., Li, Z.,
Zhang, X., Chang, S.F., et al., 2019. Urban semantic 3d reconstruction from
multiview satellite imagery, in: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition Workshops, pp. 0–0.
Li, Y., Feng, C., Yu, H., Pang, L., 2021. A survey of physics-based character
animation synthesis methods. animation 3, 41–42.
Lin, T., Singh, R.P., Yang, Y., Nobre, C., Beyer, J., Smith, M.A., Pfister, H.,
2021. Towards an understanding of situated ar visualization for basketball
free-throw training. Proceedings of the 2021 CHI Conference on Human
Factors in Computing Systems .
Liu, C., Han, Y., Jiang, R., Yuan, X., 2021a. Advisor: Automatic visualization
answer for natural-language question on tabular data, in: 2021 IEEE 14th
Pacific Visualization Symposium (PacificVis), IEEE. pp. 11–20.
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., Theobalt, C., 2021b.
Neural actor: Neural free-view synthesis of human actors with pose control.
arXiv preprint arXiv:2106.02019 .
Liu, L., Hodgins, J., 2017. Learning to schedule control fragments for physics-
based characters using deep q-learning 36.
Liu, R., Wang, H., Zhang, C., Chen, X., Wang, L., Ji, G., Zhao, B., Mao, Z.,
Yang, D., 2021c. Narrative scientific data visualization in an immersive
environment. Bioinformatics .
Liu, Y.F., Cho, S., Spencer Jr, B., Fan, J.S., 2016. Concrete crack assessment
using digital image processing and 3d scene reconstruction. Journal of Com-
puting in Civil Engineering 30, 04014124.
Llobera, J., Booth, J., Charbonnier, C., 2021. New techniques in interactive
character animation, in: ACM SIGGRAPH 2021 Courses, pp. 1–6.
Luo, H., Nagano, K., Kung, H.W., Xu, Q., Wang, Z., Wei, L., Hu, L., Li, H.,
2021. Normalized avatar synthesis using stylegan and perceptual refine-
ment, in: Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pp. 11662–11672.
Lyu, Q., Chai, M., Chen, X., Zhou, K., 2020. Real-time hair simulation with
neural interpolation. IEEE Transactions on Visualization and Computer
Graphics , 1–1.
Ma, X., Kang, K., Zhu, R., Wu, H., Zhou, K., 2021. Free-form scanning of
non-planar appearance with neural trace photography 40.
Ma, Z., Liu, S., 2018. A review of 3d reconstruction techniques in civil engi-
neering and their applications. Advanced Engineering Informatics 37, 163–
174.
Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D.,
Theobalt, C., 2018. Ganerated hands for real-time 3d hand tracking from
monocular rgb, in: Proceedings of Computer Vision and Pattern Recognition
(CVPR).
Murphy, D., 2017. Building a hybrid virtual agent for testing user empathy and
arousal in response to avatar (micro-)expressions, in: Proceedings of the
23rd ACM Symposium on Virtual Reality Software and Technology, Asso-
ciation for Computing Machinery, New York, NY, USA.
Navarro, F., Fdez, J., Garz´
on, M., Rold´
an, J.J., Barrientos, A., 2017. Integrating
3d reconstruction and virtual reality: A new approach for immersive teleop-
eration, in: ROBOT.
Nguyen, A., Le, B., 2013. 3d point cloud segmentation: A survey, in: 2013 6th
IEEE Conference on Robotics, Automation and Mechatronics (RAM), pp.
225–230.
Onorati, T., D´
ıaz, P., Zarraonandia, T., Aedo, I., 2018. The immersive bub-
ble chart: a semantic and virtual reality visualization for big data, in: The
31st Annual ACM Symposium on User Interface Software and Technology
Adjunct Proceedings, pp. 176–178.
Patnaik, B., Batch, A., Elmqvist, N., 2018. Information olfactation: Harness-
ing scent to convey data. IEEE transactions on visualization and computer
graphics 25, 726–736.
Perez-Gonzalez, J., Luna-Madrigal, F., Pi˜
na-Ramirez, O., 2019. Deep learning
point cloud registration based on distance features. IEEE Latin America
Transactions 17, 2053–2060.
Prouzeau, A., Cordeil, M., Robin, C., Ens, B., Thomas, B.H., Dwyer, T., 2019.
Scaptics and Highlight-Planes: Immersive Interaction Techniques for Find-
ing Occluded Features in 3D Scatterplots. Association for Computing Ma-
chinery, New York, NY, USA. p. 1–12.
Prouzeau, A., Wang, Y., Ens, B., Willett, W., Dwyer, T., 2020. Corsican twin:
Authoring in situ augmented reality visualisations in virtual reality, in: Pro-
ceedings of the International Conference on Advanced VisualInterfaces, As-
sociation for Computing Machinery, New York, NY, USA.
Pu, S., Vosselman, G., et al., 2006. Automatic extraction of building features
from terrestrial laser scanning. International Archives of Photogrammetry,
Remote Sensing and Spatial Information Sciences 36, 25–27.
Raaen, K., Sørum, H., 2019. Survey of interactions in popular vr experiences.,
in: NIK.
Ranasinghe, N., Karunanayaka, K., Cheok, A.D., Fernando, O.N.N., Nii, H.,
Gopalakrishnakone, P., 2011. Digital taste and smell communication, in:
Proceedings of the 6th international conference on body area networks, pp.
78–84.
Reipschlager, P., Flemisch, T., Dachselt, R., 2020. Personal augmented reality
for information visualization on large interactive displays. IEEE Transac-
tions on Visualization and Computer Graphics 27, 1182–1192.
Richard, E., Tijou, A., Richard, P., Ferrier, J.L., 2006. Multi-modal virtual envi-
ronments for education with haptic and olfactory feedback. Virtual Reality
10, 207–225.
Romero, J., Tzionas, D., Black, M.J., 2017. Embodied hands: Modeling and
capturing hands and bodies together. ACM Trans. Graph. 36.
Roth, D., Latoschik, M.E., Vogeley, K., Bente, G., 2015. Hybrid avatar-agent
technology–a conceptual step towards mediated “social” virtual reality and
its respective challenges. i-com 14, 107–114.
Roth, D., Waldow, K., Latoschik, M.E., Fuhrmann, A., Bente, G., 2017. So-
cially immersive avatar-based communication, in: 2017 IEEE Virtual Real-
ity (VR), IEEE. pp. 259–260.
Saragih, J.M., Lucey, S., Cohn, J.F., 2011. Real-time avatar animation from a
single image, in: 2011 IEEE International Conference on Automatic Face
and Gesture Recognition (FG), IEEE. pp. 117–124.
Satriadi, K.A., Ens, B., Cordeil, M., Czauderna, T., Jenny, B., 2020. Maps
around me: 3d multiview layouts in immersive spaces. Proceedings of the
ACM on Human-Computer Interaction 4, 1–20.
Shi, M., Aberman, K., Aristidou, A., Komura, T.,Lischinski, D., Cohen-Or, D.,
Chen, B., 2020. Motionet: 3d human motion reconstruction from monocular
video with skeleton consistency. ACM Transactions on Graphics (TOG) 40,
1–15.
Smiley, J., Lee, B., Tandon, S., Cordeil, M., Besanc¸ on, L., Knibbe, J., Jenny,
B., Dwyer, T., 2021. The made-axis: A modular actuated device to embody
the axis of a data dimension. Proceedings of the ACM on Human-Computer
Interaction 5, 1–23.
Son, H., Kim, C., Kim, C., 2015. Fully automated as-built 3d pipeline extraction
method from laser-scanned data based on curvature computation. Journal of
Computing in Civil Engineering 29, B4014003.
Sorger, J., Arleo, A., K´
an, P., Knecht, W., Waldner, M., 2021. Egocentric
network exploration for immersive analytics, in: Computer Graphics Forum,
Wiley Online Library. pp. 241–252.
Spielberg, S., Silvestri, A., Penn, Z., Cline, E., De Line, D., 2018. Ready player
one. Warner Bros USA.
Stotko, P., Krumpen, S., Hullin, M.B., Weinmann, M., Klein, R., 2019. Slam-
cast: Large-scale, real-time 3d reconstruction and streaming for immersive
multi-client live telepresence. IEEE Transactions on Visualization and Com-
puter Graphics 25, 2102–2112.
Tang, Y.M., Ho, H.L., 2020. 3d modeling and computer graphics in virtual
reality, in: Mixed Reality and Three-Dimensional Computer Graphics. Inte-
chOpen.
Usher, W., Klacansky, P., Federer, F., Bremer, P.T., Knoll, A., Yarch, J., An-
gelucci, A., Pascucci, V., 2018. A virtual reality visualization tool for neu-
ron tracing. IEEE Transactions on Visualization and Computer Graphics 24,
994–1003.
Vogel, D., Lubos, P., Steinicke, F., 2018. Animationvr-interactive controller-
based animating in virtual reality, in: 2018 IEEE 1st Workshop on Anima-
tion in Virtual and Augmented Environments (ANIVAE), IEEE. pp. 1–6.
Wagner, J., Stuerzlinger, W., Nedel, L., 2021. The eect of exploration mode
and frame of reference in immersive analytics. IEEE Transactions on Visu-
alization and Computer Graphics .
Wang, Y., Solomon, J.M., 2019. Deep closest point: Learning representations
for point cloud registration, in: Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV).
Warpefelt, H., Verhagen, H., 2015. Towards an updated typology of non-player
Journal Pre-proof
Journal Pre-proof
14 Yuheng Zhao et al. /Visual Informatics (2022)
character roles, in: Proceedings of the International Conference on Game
and Entertainment Technologies, pp. 1–9.
Wei, X., Yin, L., Zhu, Z., Ji, Q., 2004. Avatar-mediated face tracking and lip
reading for human computer interaction, in: Proceedings of the 12th Annual
ACM International Conference on Multimedia, Association for Computing
Machinery, New York, NY, USA. p. 500–503.
Weissker, T., Bimberg, P., Froehlich, B., 2020. Getting there together: Group
navigation in distributed virtual environments. IEEE Transactions on Visu-
alization and Computer Graphics 26, 1860–1870.
White, S., Feiner, S., 2009. Sitelens: Situated visualization techniques for urban
site visits, in: Proceedings of the SIGCHI conference on human factors in
computing systems, pp. 1117–1120.
Wu, M., Wang, Y.,Hu, Q., Yu, J., 2020. Multi-view neural human rendering, in:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 1682–1691.
Yang, Y., Cordeil, M., Beyer, J., Dwyer, T., Marriott, K., Pfister, H.,
2021. Embodied navigation in immersive abstract data visualization: Is
overview+detail or zooming better for 3d scatterplots? IEEE Transactions
on Visualization and Computer Graphics 27, 1214–1224.
Yang, Y., Dwyer, T., Jenny, B., Marriott, K., Cordeil, M., Chen, H., 2018a.
Origin-destination flow maps in immersive environments. IEEE transactions
on visualization and computer graphics 25, 693–703.
Yang, Y., Dwyer, T., Marriott, K., Jenny, B., Goodwin, S., 2020. Tilt map:
Interactive transitions between choropleth map, prism map and bar chart in
immersive environments. IEEE Transactions on Visualizationand Computer
Graphics .
Yang, Y., Jenny, B., Dwyer, T., Marriott, K., Chen, H., Cordeil, M., 2018b.
Maps and globes in virtual reality. Computer Graphics Forum 37, 427–438.
Journal Pre-proof
Journal Pre-proof
Author Contributions
Yuheng Zhao: Conceptualization, Formal analysis, Writing - original draft;
Jinjing Jiang: Writing - original draft;
Yi Chen: Writing - review & editing;
Richen Liu: Writing - review & editing;
Yalong Yang: Writing - review & editing;
Xiangyang Xue: Project administration, Resources;
Siming Chen: Conceptualization, Supervision, Writing- review & editing
Author Contributions Section
Journal Pre-proof
Journal Pre-proof
Ethical Approval
This study does not contain any studies with Human or animal subjects performed by any
of the authors.
Ethical Approval
Journal Pre-proof
Journal Pre-proof





... Finally, VR provides a fully immersive experience that effectively deceives the user's senses, creating the illusion of being situated in an alternative environment or world distinct from the physical reality. Users often utilise head-mounted displays (HMDs) or headsets to immerse themselves in a computer-generated environment consisting of visual and auditory elements [23,24]. Within this virtual world, users have the ability to interact with virtual objects and navigate their surroundings with the use of controllers. ...
Article
In the age of the Metaverse, Virtual Reality (VR) training could enhance facilitation of the Aeroplane Private Pilot Licence (PPL) training. Hence, the research objective of this study revolves around the opportunities and challenges faced by pilot trainees while utilising medium-fidelity VR during PPL training. The qualitative research methodology aligned with the ontological perspective that emphasised subjectivism. The epistemological framework has been based on the interpretivism paradigm. This research employed a grounded theory technique and utilised semi-structured interviews as the primary data collection method. The findings derived from thematic analysis suggest that a significant proportion of pilot trainees agree with VR's efficacy in theoretical instruction, pre-flight aeroplane inspection, and procedure training. However, the trainees shared their isolating experiences and challenges in acquiring specific competencies and issues developing their motor skills while using VR. These findings are anticipated to be utilised to develop VR instructional frameworks that could facilitate future PPL training
... It is possible that dance movements were evaluated differently than every-day (object), goal-directed actions. It is also possible that since technology has evolved at such a fast pace, and people are more exposed to and have greater overall familiarity with computer animation in recent times