PreprintPDF Available

What if we have Meta GPT? From Content Singularity to Human-Metaverse Interaction in AIGC Era

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The global metaverse development is facing a "cooldown moment", while the academia and industry attention moves drastically from the Metaverse to AI Generated Content (AIGC) in 2023. Nonetheless, the current discussion rarely considers the connection between AIGCs and the Metaverse. We can imagine the Metaverse, i.e., immersive cyberspace, is the black void of space, and AIGCs can simultaneously offer content and facilitate diverse user needs. As such, this article argues that AIGCs can be a vital technological enabler for the Metaverse. The article first provides a retrospect of the major pitfall of the metaverse applications in 2022. Second, we discuss from a user-centric perspective how the metaverse development will accelerate with AIGCs. Next, the article conjectures future scenarios concatenating the Metaverse and AIGCs. Accordingly, we advocate for an AI-Generated Metaverse (AIGM) framework for energizing the creation of metaverse content in the AIGC era.
Content may be subject to copyright.
What if we have MetaGPT?
From Content Singularity to
Human-Metaverse Interaction
in AIGC Era
Lik-Hang Lee
Hong Kong Polytechnic University, Hong Kong SAR
Pengyuan Zhou
University of Science and Technology of China, China
Chaoning Zhang
Kyung Hee University, South Korea
Simo Hosio
University of Oulu, Finland
Abstract
The global metaverse development is facing a “cooldown moment”, while the academia and
industry attention moves drastically from the Metaverse to AI Generated Content (AIGC) in 2023.
Nonetheless, the current discussion rarely considers the connection between AIGCs and the
Metaverse. We can imagine the Metaverse, i.e., immersive cyberspace, is the black void of space,
and AIGCs can simultaneously offer content and facilitate diverse user needs. As such, this
article argues that AIGCs can be a vital technological enabler for the Metaverse. The article first
provides a retrospect of the major pitfall of the metaverse applications in 2022. Second, we
discuss from a user-centric perspective how the metaverse development will accelerate with
AIGCs. Next, the article conjectures future scenarios concatenating the Metaverse and AIGCs.
Accordingly, we advocate for an AI-Generated Metaverse (AIGM) framework for energizing the
creation of metaverse content in the AIGC era.
1. Retrospect: Experimental Metaverse
We have witnessed a surge of investment and
rigorous discussion regarding the Metaverse since
2021. Many believe a fully realized metaverse
is not far off, so tech firms, e.g., Meta, Niantic,
Roblox, Sandbox, just to name a few, have started
creating their immersive cyberspaces with diver-
sified visions and business agendas. After the
metaverse heat wave in 2022, all of us are still as
vague about what the Metaverse is. At the same
time, the hype surrounding the metaverse shows
a sign of slowing down anytime soon, primarily
due to multiple metrics reflecting constantly low
numbers of active daily users, the decreasing
volume of projects, and a high uncertainty of
return on investment.
When the tech giants dipped their toes into the
experimentation pool in 2022, they brought a few
playful tasks to their self-defined virtual venues,
giving users something to do. The fascinating dif-
The article is under review. All rights Reserved, March 2023 © IEEE 1
ficulty is that the metaverse is already fundamen-
tally split among the forwarding-thinking firms
establishing their metaverse realm. These firms
tried hard to resolve technical issues that shape
their immersive cyberspace, due to limited time
and resources, for instance, developing efficient
infrastructure that supports unlimited numbers
of users in the same virtual venues or offering
a decentralized transaction ecosystem driven by
blockchain technology.
Nonetheless, content development is dele-
gated to third parties and thus goes beyond the
firms’ core concerns. Tech firms commonly leave
content creation to the designers and creators,
having an unattainable hope that designers and
creators can fill up the rest of the metaverse.
As a result, one can argue the current virtual
spaces have become aimless, primarily caused
by the lack of content and, therefore, activi-
ties, while users cannot find good reasons to
spend time at such venues daily. Moreover, the
experimental metaverses of 2022 often neglect
usability issues leading to user experiences far
from satisfactory. A prominent example is that
first-time users struggle to understand the inter-
action techniques with their avatars in 3D virtual
environments. Even worse, after hours of practice,
these unskillful users can still not master such
interaction techniques, causing poor usability en-
tirely. Without addressing the gaps in content
and usability, the firms’ ambition exceeds what is
practically feasible. Their ambition refers to the
massive uses of the Metaverse, i.e., the immersive
cyberspace [1]. The core values surrounding the
users are not there yet to make the Metaverse a
reality.
We can briefly look back at the transition from
the static web (Web 1.0) to its interactive coun-
terpart (Web 2.0) in the 2D-UIs era, characterized
by the empowerment of content creation. Among
the static webpages in Web 1.0, limited people
with relevant skills can publish information on-
line. At the same time, users can only read the
information but have no way to make a two-way
interaction. Accordingly, Web 2.0, also known as
social networks (SNS), offers participatory and
dynamic methods and empowers two-way user
interaction, i.e., reading and writing information
in 2D UIs. The critical transition from Web 1.0
to 2.0 is that users, regardless of their technology
literacy, can freely contribute content on SNS,
such as text and images, and then put the content
online. We must note that we are good at writing
a message on a (soft-)keyboard and taking photos
or videos with cameras. Also, most 2D UIs follow
certain design paradigms, requiring only simple
yet intuitive interactions like clicks, taps, swipes,
drags, etc., to accomplish new content creation.
In contrast, although the metaverse suppos-
edly allows everyone to access many different
virtual worlds, three unprecedented barriers arise.
First, the current users have extensive experience
with 2D UIs but not their 3D counterparts. As the
entire experiences proceed with 3D UIs, the users
in the Metaverse have to deal with unfamiliar
virtual worlds with increasing complexity. More
importantly, 3D objects do not explicitly show
user interaction cues. As the Metaverse claims to
be digital twins of our physical environment [2], a
user encounters a virtual chair and employs analo-
gies between the virtual and physical worlds. A
simple question could be: Can the user’s virtual
hands lift or push the chair? As such, users,
in general, may not be aware of how virtual
objects interact in the Metaverse and thus rely on
educated guesses and trial-and-error approaches.
The above question can be generalized into sub-
questions, including but not limited to: What
are the available interaction techniques? When to
activate the user-object interaction? How does the
user understand the related functions mapped to
the object? How can we manage the user expec-
tation after a particular click? Which visual and
audio effects impact the user’s task performance?
Second, the current interaction techniques al-
low users to manipulate a virtual object, such
as selecting, rotating, translating, etc. Still, user
efforts in object manipulation are a big concern.
Commercial input hardware for headsets (e.g.,
controllers or joysticks) or even hand gestural
inputs are barely sufficient for simple point-and-
select operations on 2D UIs in virtual environ-
ments [3] but largely insufficient for 3D models,
especially those with irregular shape causing in-
tolerably long editing time and high dissimilarity
with the intended shape [4]. Therefore, users with
the current techniques, primarily point-and-select
or drag-and-drop, can only manipulate objects
with low granularity. However, content creation
involves careful manipulation of a 3D object, i.e.,
2
Figure 1: AIGCs can prevent us from falling into another ‘Web 1.0’ in the metaverse era the layman
end-users suffer from the missing capability of creating unique content. We are natively skilful at
texting and photo-taking in social networks but not editing 3D content in virtual 3D spaces. AIGCs
may serve as a saviour to enable the general users freely express themselves, while owners of the
platforms or virtual spaces can still delegate the content creation tasks to the peer users.
modifying the vertex positions in great detail.
Even though nowadays users engage in an im-
mersive 3D environment, most can only create 2D
texts and select some standard 3D objects from an
asset library. The creation of metaverse content
is not fully leveraged by the current authoring
tools and the existing techniques supporting user
interaction with the Metaverse. In the past two
decades, the human-computer interaction commu-
nity has attempted to alleviate the ease of user
interaction in diversified virtual environments.
Nonetheless, usability gaps still exist, resulting
in low efficiency and user frustration [5]. We
see that such gaps will not be overcome if we
purely rely on investigating user behaviours with
alternative interfaces and interaction techniques,
especially since the tasks inside virtual 3D spaces
grow more complicated.
Third, creating large objects, e.g., a dragon
floating in mid-air, requires a relatively spatial
environment. Users unavoidably encounter a lot
of distal operations between the user position
and the virtual creation. It is worth mentioning
that users are prone to errors during such distal
operations. A prior work [6] provides evidence
that users with headsets achieve lower pointing
accuracy to a distal target. Considering such
complicated operations in content creation, typ-
ical metaverse users can not immediately create
objects except those already in the asset library.
In other words, metaverse users have no appro-
priate approaches to unleash the full potential
of creating content in the endless canvas of the
Metaverse. Instead, they hire professionals to
draw and mould virtual instances on traditional
desktops. For virtual space owners, a team of pro-
fessionals, e.g., unity developers, may spend days
or weeks creating virtual environments. Further
change requests (e.g., adding a new 3D model)
for such environments may take additional hours
or days. Without time or skills, general users can
only experience the contents being built by virtual
space owners. As shown in Figure 1, this rigid
circumstance is analogous to the ‘read mode’ in
Web 1.0. Creating unique metaverse content has
become highly inconvenient and demanding. We
will likely face the circumstance of ‘Web 1.0’ in
3D virtual worlds, with some features inherited
from Web 2.0, such as making new texts and
uploading photos.
To alleviate the barriers mentioned above,
this article argues for using AI-generated con-
tent (AIGCs) in both content generation in the
metaverse and AI-mediated user interaction in
the metaverse. The article has a vision that the
GPT-alike model can trigger content singularity
in the Metaverse and assist interaction between
human users and virtual objects in the Metaverse.
Before we move on to the main discussion, we
provide some background information regarding
the Metaverse and AIGCs, as follows.
Metaverse: The Metaverse refers to the
NEXT Internet featured with diversified virtual
spaces and immersive experiences [1]. Similar
to existing cyberspace, we can regard the Meta-
verse as a gigantic application that simultaneously
accommodates diverse types of countless users.
The application comprises computer-mediated
3
Figure 2: Generating a vessel that fits the context of Victoria Habour, Hong Kong. As a result, a junk
boat appears: Original view (Left), Sketching (middle), the generated vessel on top of the physical
world (Right).
worlds under the Extended Reality (XR) spec-
trum and emerging derivatives like Diminished
Reality (DR). Ideally, users will create content
and engage in activities surrounding such content.
Multitudinous underlying technologies serve as
the backbone of the Metaverse, including AI, IoT,
mobile networks, edge and cloud servers, etc.
Among the technologies, we can view AI as the
fuel to support the automation of various tasks
and content creation. Our discussion in this article
goes beyond the well-known applications, in-
cluding creating avatars, virtual buildings, virtual
computer characters and 3D objects, automatic
digital twins, and personalized content presenta-
tion [2].
AI-Generated Content (AIGC): Apart from
the analytical AI focusing on traditional prob-
lems like classification, AIGC can leverage high-
dimensional data, such as text, images, audio,
and video, to generate new content. For instance,
OpenAI announces its conversational agent, Chat-
GPT [7], of which the latest GPT-3 and GPT-4
can create texts and images, respectively. More-
over, the generated content can support the gen-
eration of metaverse objects, such as speech
for in-game agents, 3D objects, artist artefacts
and background scenes in many virtual worlds.
The most popular techniques, including GANs,
Diffusion models, and transformer architectures,
support the challenging context-to-content task.
It is important to note that generative AI and
AIGC differ subtly [8]. AIGC focuses on content
production problems, whereas generative AI ana-
lyzes the underlying technological underpinnings
that facilitate the development of multiple AIGC
activities.
2. Content Singularity
The most widely used metaverse applications
have appeared in industrial applications in the
past two decades [9]. The firms have the resources
to build up proprietary systems and prepare the
content of their interested domain. The work
content drives the adoption of AR/VR applica-
tions in industrial sectors, with the following two
examples. First, labour at warehouse docks and
assembly lines can obtain helpful information
(e.g., the next step) through the lens of AR [10].
Second, personnel at elderly caring centres can
nurture compassion from perspective-taking sce-
narios of virtual reality (VR) [11]. Content is one
of the incentives, and end-users achieve enhanced
abilities or knowledge, perhaps resulting in better
productivity.
As we discussed the main three barriers in
Retrospect, users have limited ability and re-
sources to create unique content in the Metaverse.
The general users can only draw some simple
yet rough sketch to indicate an object in Ex-
tended Reality. Nonetheless, such expressiveness
is insufficient for daily communication or on-site
discussion for specific work tasks. We may expect
the content on AR devices to be no worse than
what we have in Web 2.0. To alleviate the issue,
AIGCs can play an indispensable role in lowering
the barriers and democratizing content creation.
Figure 2 illustrates a potential scenario where
users can effectively create content in virtual-
physical environments. For instance, a user with
an AR device is situated in a tourist spot and
attempts to show the iconic vessels to explain
the cultural heritage of Hong Kong’s Victoria
Harbour. First, the AIGC model can understand
the user’s situation and context through sensors
on the AR device, for instance, depth cameras.
Second, the users can make a dirty and speedy
4
sketch to indicate the shape and position of the
generated object. In addition, the prompt which
contains the user’s description, i.e., a prompt of ‘a
vessel fits this view’, is sent to the AIGC model
through methods like speech recognition. It is
important to note that our speech always involves
‘this’ or ‘that’ to indicate a particular scene and
object. The AIGC model can employ the user’s
situation and context in such a scenario. Finally, a
junk boat appears in the Victoria Habour through
the lens of AR.
Singularity can refer to a point in time or a
condition in which something undergoes a sig-
nificant and irreversible milestone, depending on
the context of such changes. Also, it is frequently
used in technology and artificial intelligence (AI)
to describe the hypothetical moment when robots
or AI transcend human intellect and become
self-improving or perhaps independent [12]. This
notion is also known as technological singularity
or AI singularity. It is a contentious issue to
the Metaverse when AIGCs are widely adopted
by end users. We believe the occurrence of AI-
generated content might have far-reaching con-
sequences for cyberspace. Next, the concept of
content singularity refers to the belief that we
are reaching a time when there will be abundant
virtual material available on the Internet that
people will consume as their daily routine. This
is owing to the demand for immersive cyberspace
and related technological ability, perhaps AIGCs,
to pave the path towards the exponential prolif-
eration of virtual 3D content. This is similar to
the social network in which people contribute and
consumer content.
Since the launch of ChatGPT1, pioneering
prototypes shed light on the daily uses of GPT-
driven intelligence on AR wearables, such as
generating simple 3D contents using WebAR
(a.frame) by entering prompts2and providing
suggested answers for conversations during dat-
ings and job interviews3. These examples go
beyond the industrial scenarios, implying that
AIGC-driven conversational interfaces can open
new opportunities for enriching virtual-physical
1https://openai.com/blog/chatgpt
2https://www.youtube.com/watch?v=J6bSCVaXoDs&ab
channel=ARMRXR
3https://twitter.com/bryanhpchiang/status/
1639830383616487426?cxt=HHwWhMDTtfbC7MEtAAAA
blended environments [13]. Generative AI models
can recognise the user context using the sensors
on mobile devices (e.g., cameras on AR headsets
or smartphones) to generate appropriate objects
according to given prompts. In this decade, gen-
eral users will treat generative AI models as
utilities like water, electricity, and mobile net-
work. Meanwhile, the metaverse is an endless
container to display AI-generated content so users
can read and interact with the AI utility mid-
air. Users can make speech prompts to generative
AI models to create characters, objects, back-
drop scenes, buildings, and even audio feedback
or speeches in virtual 3D environments. These
content generations should not pose any hurdle
or technical difficulties to the general users. It
will be as simple as posting a new photo on
Instagram, typing a tweet on Twitter, or uploading
a new video on Tiktok. The lowered barrier will
encourage people to create content, and more
content consumers will follow, eventually leading
to a metaverse community. In addition, rewarding
schemes should be established when the content
singularity arrives to sustain the content creation
ecosystem. AIs, the data owners behind them, and
users become the primary enablers and principal
actors, respectively. The way of splitting the re-
ward among them is still unknown, and ongoing
debates will continue.
Generative AI models are obviously drivers of
content generation. But we should not neglect its
potential of removing contents, primarily physical
counterparts, through the lens of XR devices,
also known as Diminished Reality (DR). It is
important to note that the naive approach of
overlaying digital content on top of the physical
world may hurt the user experience. A virtual in-
stance may not match the environmental context,
and it may be necessary to change the context
to show better perceptions when the metaverse
application strongly relates to daily functions.
We may accept a virtual Pok´
emon appearing on
top of a physical rubbish bin. However, we feel
weird when a virtual table overlaps the physical
table being disposed of. Therefore, AIGCs may
serve as a critical step of DR to smoothen the
subsequent addition of digital overlays (AR). In
this sense, the demands of AIGCs will penetrate
throughout the entire process of metaverse con-
5
tent generation. More importantly, the diminished
items should comply with the user’s safety or
ethical issues. Hiddening a warning sign may put
the users in danger. Also, putting off a person’s
clothes may show inappropriate content, i.e., a
naked body. It is essential to reinforce regulation
and compliance when generative AI models are
widely adopted in the content-generation pipeline.
On the other hand, content singularity can also
refer to the challenges of information overload in
a virtual-physical blended environment, in which
people are assaulted with so much information
that it is impossible to digest and make sense of
it all [14]. The sheer volume of online informa-
tion, including text, photos, videos, and music,
is already daunting and rapidly increasing. As
such, the virtual-physical blended environment
may cause a lot of disturbance to users if we
neglect such exponential proliferation of 3D con-
tent.
Information or knowledge in the tangible
world can indeed be thought limitless, whereas
augmentation within the relatively limited field
of view of headsets is complex. Consequently,
we must optimise the presentation of digital con-
tent. Typically, metaverse users with a naive ap-
proach to virtual content delivery will experience
information inundation, thereby requiring addi-
tional time to consume the augmentation. Context
awareness, such as the users, environments, and
social dynamics, is a prominent strategy for man-
aging the information display. The AIGCs at the
periphery, with the assistance of recommendation
systems, can interpret user context and provide
the most pertinent augmentation [14].
Although we foresee a rise in content volume
when AIGCs are fully engaged as a utility in the
Metaverse, two significant issues should be ad-
dressed. First, content uniqueness raises concerns
about the quality and relevancy of the material
provided. With so much material accessible, users
are finding it increasingly difficult to identify
what they seek and discern between high-quality
and low-quality content. To address the issues
of content singularity, additional research studies
should have been made to create new tools and
methodologies that will assist users in filtering,
prioritizing, and personalizing the material they
consume. Current solutions in Web 2.0 include
search engines, recommendation algorithms, and
content curation tools. Yet, the issue of content
singularity remains a complicated and continuing
one that will undoubtedly need further innovation
and adaptation as the volume and diversity of
digital information increase in the Metaverse.
Second, contemporary conversational inter-
faces have long been criticized for lacking trans-
parency as a ‘black box’ [15]. In other words,
conversational AIs do not show a complete list
of their ability, while the general users usually
have no clues about what the AI can achieve.
Significantly, users with low AI literacy cannot
quickly master the interaction with GPT-like AI
agents through a conversational interface. Ex-
ploring the perfect fit between the generative AI
models and the XR environment is necessary.
For instance, the AI models can suggest some
potential actions to the users by putting digital
overlays on top of the user’s surroundings. As
such, the user can understand the AI’s ability
and will not make ineffective enquiries or wasted
interactions with the generative AI model. In
addition, more intuitive clues should be prepared,
according to the user context, to inform the user
about ‘what cannot be done’ with a generative AI
model.
3. Human-Metaverse Interaction
Besides generating virtual content, AIGC can
be considered an assistive tool for user interaction
in the metaverse. From other users’ perspectives,
a user’s movements and interaction with virtual
objects can be a part of the content in virtual
worlds. The difficulties of controlling an avatar’s
movements and interacting with virtual objects
can negatively impact an individual’s workload
and the group’s perceptions of a metaverse appli-
cation. For example, a group awaits an individual
to finish a task, causing frustration.
Before discussing how the prompt should be
extended in the Metaverse for easier interaction
between users and metaverse instances, some
fundamentals are considered in human-computer
interaction (HCI) and prompt engineering [16].
Prompts have different concerns in HCI and NLP.
From the HCI perspective, Effective prompts are
clear, concise, and intuitive. Users have to design
prompts for an interactive system, and users’
workloads exist to take specific actions or provide
relevant input. Once the user’s needs and goals
6
Figure 3: An example pipeline of content creation and human-metaverse interaction supported by
AIGCs: (a) brainstorming with conversational agents (collecting requirements simultaneously); (b)
auto-generation of the contents; (c) start manual edition but huge pointing errors exist; (d) following
(c), AI-assisted pointing for selecting vertex; (e) following (d), AI-assisted vertex editing; (f) manual
editing of subtle parts; (g) AI-assigned panel and user interaction on the virtual objects; (h) user
reviews of the objects while AIGCs attempt to understand the user perceptions; (I) content sharing,
e.g., educational purpose in a classroom. Photos are extracted and modified from [4] for illustration
purposes.
have been identified, the next step is to craft
effective prompts that guide the user towards
achieving those goals. And the AI-generated re-
sults provide users with the information they need
to take action in a particular context. Therefore,
prompt engineering is an essential aspect of de-
signing interactive systems that are easy to use
and achieve high levels of user satisfaction.
Prompt engineering, in NLP and particularly
LLMs, refers to the methods for how commu-
nicating with LLM to steer their behavior for
desired outcomes. The traditional chatbot (e.g.,
ChatGPT) considers primarily text prompts. In
contrast, the prompts from the metaverse users
can become more diverse by considering both
the context as discussed above and multiple user
modalities, including gaze, body movements, and
psychological and physiological factors. In ad-
dition, perhaps employing certain personaliza-
tion techniques, prompts should be tested and
refined iteratively to ensure that they effectively
guide LLMs towards the desired output. As such,
metaverse-centric prompt engineering requires a
new understanding of the user’s needs and goals,
as well as their cognitive abilities and limitations.
This information can be gathered through user
testing, A/B testing, user surveys and usability
testing in many virtual worlds.
The prompt design can be extended to the sub-
tle interaction between virtual objects and users.
VR sculpting is a popular application where users
can freely mould their virtual objects in virtual
spaces. The usability of VR, inaccurate pointing
to vertex, becomes a hurdle [4]. It is still far away
from being the main tool of creativity due to its
efficiency. A hybrid model can be considered:
generative AI models like LLMs can first generate
a model of 3D content, and then we customize
the model with manual editing in VR. In this
sense, an important issue arises we cannot get
7
rid of manual operations with virtual instances.
AIGCs, in the future, should assist human users
in virtual tasks that inherit the nature of complex-
ity and clumsiness, under hardware constraints,
such as limited Field-of-view (FOV). AIGCs can
parse the user actions in virtual environments, for
instance, limb movements and gazes towards a
virtual object, to provide appropriate work done
from the manual editing. As such, AIGCs can
serve as assistants for metaverse users. It is impor-
tant to note that AI-assisted tasks always happen
on the daily ubiquitous devices, i.e., smartphones.
A prevalent example of 2D UIs is typing text on
soft keyboards. Users tap on the key repetitively
and make typos if triggered by adjacent keys.
Such an erroneous task can be assisted by auto-
correction. Users can tap the mistyped word and
select the correct spelling from the suggested
words. To achieve this, an AI model learns the
words in the English dictionary and then under-
stands user habits by recording the user’s word
choice.
Typing on a soft keyboard is a good example
of an AI-assisted task. In virtual environments,
the interaction tasks, including dragging an object
to a precise position and editing an object of
irregular shapes, can be challenging to the users.
AIGCs open opportunities to help human users
accomplish the task. Nonetheless, the typing tasks
on soft keyboards can be manageable because
the dictionary is a reasonable search space. In
contrast, AIGC-driven assistance can encounter a
much larger room. In the editing task, a user can
first select a vertex at a rabbit’s tail. The next
action can be changing the vertex property and
then moving to another vertex. The next vertex
can happen on the head, bottom, etc. With the cur-
rent technology, predicting the user’s following
action at a highly accurate rate is very unlikely.
However, if available, AIGCs may leverage the
prior users’ behaviours from a dataset containing
user interaction footprint, and accordingly recom-
mend several ‘next’ edits to facilitate the process.
Eventually, the user can choose one of them and
accomplish the task without huge burdens.
In a broader sense, diversified items exist
in many virtual worlds, and a virtual item can
have many possible relationships with another. As
such, the user interaction with AIGCs’ predictions
becomes complicated. For instance, I pick up
an apple and then lift a tool to cut it. Other
possible actions include putting down the apple,
grabbing an orange, etc. It is also important
to note that building an ontology for unlimited
items in the Metaverse is nearly impossible. One
potential tactic is to leverage the user’s in-situ
actions. Generative AI models can read the user’s
head and hand movements to predict the user’s
interested regions and, thus, upcoming activities.
Ideally, a user may give a rough pointing location
to a particular item. Then, Generative AI models
can make personalized and in-situ suggestions
for the user’s subsequent interactions with virtual
objects, with sufficient visualization to ensure
intuitiveness. We believe that the above examples
are only the tip of the iceberg but sufficient to
illustrate the necessity of re-engineering the ways
of making metaverse-ready prompts for Genera-
tive AI models.
Then, there is the issue of how natural people
will feel in the metaverse environments built,
or in some cases hallucinated, with AIGCs. Ur-
ban designers and architects are now looking
into what factors of our ordinary environments
matter most when attempting to translate the
environments into digital ones, beyond the 3D
replication. Here, issues such as subjective pres-
ence (comfort, feeling, safety, senses) or active
involvement (activities taking place, other peo-
ple’s presence), in addition to the traditionally
considered structural aspects (colour, furnishing,
scale, textures), will play a pivotal role in how the
metaverse experience will feel like for its users
(see, e.g., [17]). And so the questions to solve
will include such as to what degree will we want
generative AI to be able to spawn experiences that
feel safe, or should the spaces more closely reflect
the world as we know it, outside the metaverse,
where different even adjacent spaces will have a
lot of different perceived human characteristics to
them.
The technical capability of AIGCs only opens
a landscape of generating metaverse content, re-
gardless of adding backdrops (AR) or removing
objects causing strong emotions (DR). But we
know very little about user aspects once AIGCs
can be scaled. As the metaverse moves beyond the
sole digital interfaces, i.e., 2D UIs, the AIGC can
be embedded in the physical worlds and alter the
user’s situated environments for fulfilling users’
8
Figure 4: AIGM framework showing the relationship between human users, AIGCs and virtual-physical
cyberspace (i.e., the Metaverse).
subjective presence that can be abstract. It can
vary greatly due to the user’s beliefs (norms,
customs, ego, and so on) and their environment.
A machine may not truly interpret the mean-
ing of ‘calm’, especially if multiple subjective
presences are underlying, e.g., ‘safe and calm’.
A user makes a simple prompt of ‘calm’ to
an AIGC model. Consequently, the results are
unsatisfactory, as the user does not make effective
prompts, for example, adding words like ‘medi-
tation, wellness and sleep’ if the users are inside
a bedroom. It is worth noting users with headsets
may expect quick and accurate feedback, instead
of requesting the generative AI models to revise
the content with multiple iterations. In addition,
subjective presence does not limit to a single
user. Multiple users will interact with metaverse
content in a shared space, potentially causing co-
perception and communication issues. Generating
the right content at the right time leads to a
challenging problem more than technical aspects.
AIGC in the Metaverse will lead to a novel niche
of understanding the dynamics among metaverse
content, physical space, and users.
4. Towards AIGM Framework
We argue AIGM is a must should we aim
to unleash all of the latent potentials in the
metaverse concept. This is, regardless of who
is the leading developer, the metaverse must be
built for humans, and as humans, everything we
do is embodied in the space around us [17].
The leading developers do not have the authority
to arrange what content we should have on the
Next Internet, as we have seen in the Metaverse
of 2022, in which the virtual spaces are office-
like environments. We usually spend eight work
hours at the physical office, and it is insane to
spend the rest of 8 hours in the virtual office.
Ironically, except for the standard item given in
asset libraries, we don’t have the right to decorate
such office space with our unique creations. It is
eventually the user’s call for the popular trend
in the Metaverse. When Google’s Image searches
were conducted since Q3 2021, it was evident
that the creators had always defined the metaverse
with blue, dark, and purple colours. We believe
the trend of popular content is ever-changing.
Driven by the vital role of AIGCs in democ-
9
ratizing content creation, everyone in the Meta-
verse can decide, (co-)create, and promote their
unique content. To scale up the use of AIGCs,
we propose a framework for an AI-Generated
Metaverse (AIGM) that depicts the relationships
among AIGCs, virtual-physical blended worlds,
and human users, see Figure 4. AIGC is the fuel
to spark the content singularity, and Metaverse
content is expected to surround everyone like
the atmosphere. This creates an entire creation
pipeline in which AIGCs are the key actors. First,
the users can talk to generative AI models to
obtain inspiration during human-AI conversations
(Human-AI collaboration). Consequently, gener-
ative AI models provide the very first edition
of the generated content (AI-Generation). It then
supports subtle editing during content creation
(AI-Assistance). Some precise details can be done
manually (Human users); if necessary, multiple
users can be involved in the task (Multi-user
collaboration). In addition, it is important to note
that AIGCs can assign properties of how the
users and virtual instances will interact, e.g.,
through a tap on a panel, and accordingly, AIGC-
driven evaluations will perform to understand the
user performance and their cognitive loads [18].
Eventually, content sharing and the corresponding
user interaction can be backed by AIGCs.
5. Concluding Remarks
During a deceleration of global metaverse
development, the author contends that AIGCs can
be a critical facilitator for the Metaverse. This
article shares some perspectives and visions of
when AIGCs meet the Metaverse. Our discussion
started with a look back at the key flaws of
metaverse applications in 2022. We also highlight
the fundamental difficulties the metaverse en-
countered. Accordingly, we examine how AIGCs
will speed up metaverse development from a user
standpoint. The article eventually speculates on
future possibilities that combine the Metaverse
with AIGCs. We call for a conceptual framework
of AIGM that facilitates content singularity and
human-metaverse interaction in the AIGC era. We
also hope to provide a more expansive discussion
within the HCI and AI communities.
REFERENCES
1. L.-H. Lee, P. Zhou, T. Braud, and P. Hui, “What is
the metaverse? an immersive cyberspace and open
challenges, ArXiv, vol. abs/2206.03018, 2022.
2. L.-H. Lee, T. Braud, P. Zhou, L. Wang, D. Xu, Z. Lin,
A. Kumar, C. Bermejo, and P. Hui, “All one needs to
know about metaverse: A complete survey on tech-
nological singularity, virtual ecosystem, and research
agenda, 2021.
3. L. H. Lee, T. Braud, F. H. Bijarbooneh, and P. Hui,
“Ubipoint: Towards non-intrusive mid-air interaction for
hardware constrained smart glasses, in Proceedings
of the 11th ACM Multimedia Systems Conference,
ser. MMSys ’20. New York, NY, USA: Association
for Computing Machinery, 2020, p. 190–201. [Online].
Available: https://doi.org/10.1145/3339825.3391870
4. K. Y. Lam, L.-H. Lee, and P. Hui, “3deformr: Freehand
3d model editing in virtual environments considering
head movements on mobile headsets, in Proceedings
of the 13th ACM Multimedia Systems Conference,
ser. MMSys ’22. New York, NY, USA: Association
for Computing Machinery, 2022, p. 52–61. [Online].
Available: https://doi.org/10.1145/3524273.3528180
5. L.-H. Lee, T. Braud, S. Hosio, and P. Hui,
“Towards augmented reality driven human-city
interaction: Current research on mobile headsets
and future challenges, ACM Comput. Surv.,
vol. 54, no. 8, oct 2021. [Online]. Available:
https://doi.org/10.1145/3467963
6. A. U. Batmaz, M. D. B. Machuca, D.-M. Pham, and
W. Stuerzlinger, “Do head-mounted display stereo de-
ficiencies affect 3d pointing tasks in ar and vr?” 2019
IEEE Conference on Virtual Reality and 3D User Inter-
faces (VR), pp. 585–592, 2019.
7. C. Zhang, C. Zhang, C. Li, S. Zheng, Y. Qiao, S. K. Dam,
M. Zhang, J. U. Kim, S. T. Kim, G.-M. Park, J. Choi, S.-
H. Bae, L.-H. Lee, P. Hui, I. S. Kweon, and C. S. Hong,
“One small step for generative ai, one giant leap for agi:
A complete survey on chatgpt in aigc era, researchgate
DOI:10.13140/RG.2.2.24789.70883, 2023.
8. C. Zhang, C. Zhang, S. Zheng, Y. Qiao, C. Li, M. Zhang,
S. K. Dam, C. M. Thwal, Y. L. Tun, L. L. Huy, D. kim, S.-
H. Bae, L.-H. Lee, Y. Yang, H. T. Shen, I.-S. Kweon, and
C.-S. Hong, “A complete survey on generative ai (aigc):
Is chatgpt from gpt-4 to gpt-5 all you need?” ArXiv, vol.
abs/2303.11717, 2023.
9. S. B ¨
uttner, M. Prilla, and C. R¨
ocker, “Augmented reality
training for industrial assembly work - are projection-
based ar assistive systems an appropriate tool for
assembly training?” in Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems,
10
ser. CHI ’20. New York, NY, USA: Association
for Computing Machinery, 2020, p. 1–12. [Online].
Available: https://doi.org/10.1145/3313831.3376720
10. A. C. C. Reyes, N. P. A. Del Gallego, and J. A. P.
Deja, “Mixed reality guidance system for motherboard
assembly using tangible augmented reality,” in
Proceedings of the 2020 4th International Conference
on Virtual and Augmented Reality Simulations, ser.
ICVARS 2020. New York, NY, USA: Association for
Computing Machinery, 2020, p. 1–6. [Online]. Available:
https://doi.org/10.1145/3385378.3385379
11. V. Paananen, M. S. Kiarostami, L.-H. Lee, T. Braud,
and S. J. Hosio, “From digital media to empathic reality:
A systematic review of empathy research in extended
reality environments, ArXiv, vol. abs/2203.01375, 2022.
12. T. J. Prescott, “The ai singularity and runaway human
intelligence, in Living Machines, 2013.
13. P. Zhou, “Unleasing chatgpt on the metaverse: Savior or
destroyer?” arXiv preprint arXiv:2303.13856, 2023.
14. K. Y. Lam, L. H. Lee, and P. Hui, “A2w: Context-
aware recommendation system for mobile augmented
reality web browser, in Proceedings of the 29th
ACM International Conference on Multimedia, ser. MM
’21. New York, NY, USA: Association for Computing
Machinery, 2021, p. 2447–2455. [Online]. Available:
https://doi.org/10.1145/3474085.3475413
15. A. B. Arrieta, N. D. Rodr´
ıguez, J. D. Ser, A. Bennetot,
S. Tabik, A. Barbado, S. Garc´
ıa, S. Gil-Lopez, D. Molina,
R. Benjamins, R. Chatila, and F. Herrera, “Explain-
able artificial intelligence (xai): Concepts, taxonomies,
opportunities and challenges toward responsible ai,
ArXiv, vol. abs/1910.10045, 2019.
16. V. Liu and L. B. Chilton, “Design guidelines for prompt
engineering text-to-image generative models, Proceed-
ings of the 2022 CHI Conference on Human Factors in
Computing Systems, 2021.
17. V. Paananen, J. Oppenlaender, J. Goncalves, D. Hetti-
achchi, and S. Hosio, “Investigating human scale spa-
tial experience, Proceedings of the ACM on Human-
Computer Interaction, vol. 5, no. ISS, pp. 1–18, 2021.
18. Y. Hu, M. L. Yuan, K. Xian, D. S. Elvitigala, and A. J.
Quigley, “Exploring the design space of employing ai-
generated content for augmented reality display,” 2023.
11
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
3D objects are the primary media in virtual reality environments in immersive cyberspace, also known as the Metaverse. Users, through editing such objects, can communicate with other individuals on mobile headsets. Knowing that the tangible controllers cause the burden to carry such addendum devices, the body-centric interaction techniques, such as hand gestures, get rid of such burdens. However, object editing with hand gestures is usually overlooked. Accordingly, we propose and implement a palm-based virtual embodiment for hand gestural model editing, namely 3DeformR. We employ three optimized hand gestures on bi-harmonic deformation algorithms that enable selecting and editing objects in fine granularity. Our evaluation with nine participants considers three interaction techniques (two-handed tangible controller (OMC), a naive implementation of hand gestures (SH), and 3DeformR. Two experimental tasks of planar and spherical objects imply that 3DeformR outperforms SH, in terms of task completion time (∼51%) and required actions (∼17%). Also, our participants with 3DeformR make significantly better performance than the commercial standard (OMC) – saved task time (∼43%) and actions (∼3%). Remarkably, the edited objects by 3DeformR show no discernible difference from those with tangible controllers characterised by accurate and responsive detection.
Technical Report
Full-text available
Since the popularisation of the Internet in the 1990s, the cyberspace has kept evolving. We have created various computer-mediated virtual environments including social networks, video conferencing, virtual 3D worlds (e.g., VR Chat), augmented reality applications (e.g., Pokemon Go), and Non-Fungible Token Games (e.g., Upland). Such virtual environments, albeit non-perpetual and unconnected, have bought us various degrees of digital transformation. The term `metaverse' has been coined to further facilitate the digital transformation in every aspect of our physical lives. At the core of the metaverse stands the vision of an immersive Internet as a gigantic, unified, persistent, and shared realm. While the metaverse may seem futuristic, catalysed by emerging technologies such as Extended Reality, 5G, and Artificial Intelligence, the digital `big bang' of our cyberspace is not far away. This survey paper presents the first effort to offer a comprehensive framework that examines the latest metaverse development under the dimensions of state-of-the-art technologies and metaverse ecosystems, and illustrates the possibility of the digital `big bang'. First, technologies are the enablers that drive the transition from the current Internet to the metaverse. We thus examine eight enabling technologies rigorously - Extended Reality, User Interactivity (Human-Computer Interaction), Artificial Intelligence, Blockchain, Computer Vision, IoT and Robotics, Edge and Cloud computing, and Future Mobile Networks. In terms of applications, the metaverse ecosystem allows human users to live and play within a self-sustaining, persistent, and shared realm. Therefore, we discuss six user-centric factors -- Avatar, Content Creation, Virtual Economy, Social Acceptability, Security and Privacy, and Trust and Accountability. Finally, we propose a concrete research agenda for the development of the metaverse.
Preprint
Full-text available
Interaction design for Augmented Reality (AR) is gaining increasing attention from both academia and industry. This survey discusses 205 articles (75% of articles published between 2015 - 2019) to review the field of human interaction in connected cities with emphasis on augmented reality-driven interaction. We provide an overview of Human-City Interaction and related technological approaches, followed by a review of the latest trends of information visualization, constrained interfaces, and embodied interaction for AR headsets. We highlight under-explored issues in interface design and input techniques that warrant further research, and conjecture that AR with complementary Conversational User Interfaces (CUIs) is a key enabler for ubiquitous interaction with immersive systems in smart cities. Our work helps researchers understand the current potential and future needs of AR in Human-City Interaction.
Conference Paper
Full-text available
We developed a mixed reality guidance system (MRGS) for motherboard assembly with purpose of exploring the usability and viability of using mixed reality and tangible augmented reality (TAR) for simulating hands-on manual assembly tasks. TAR was used to remove the need for real-world parts as well as to provide a natural interaction medium for our system. To evaluate our system, we conducted two usability studies involving 25 (10 experienced and 15 naive) participants. For the first study, participants were tasked to rate only the proposed interaction technique. Both experienced and naive participants gave acceptable scores, with experienced users giving significantly higher ratings. In the second study, participants were tasked to partially assemble the motherboard using the MRGS. Participants who utilized the MRGS were able to properly determine the correct orientation and location of the motherboard parts, in contrast to the control group. Observations of users while performing the tasks as well as user feedback through survey questionnaires and interviews are presented and discussed in this paper.
Article
Full-text available
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI (namely, expert systems and rule based models). Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in this article examines the existing literature and contributions already done in the field of XAI, including a prospect toward what is yet to be reached. For this purpose we summarize previous efforts made to define explainability in Machine Learning, establishing a novel definition of explainable Machine Learning that covers such prior conceptual propositions with a major focus on the audience for which the explainability is sought. Departing from this definition, we propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at explaining Deep Learning methods for which a second dedicated taxonomy is built and examined in detail. This critical literature analysis serves as the motivating background for a series of challenges faced by XAI, such as the interesting crossroads of data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence , namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to the field of XAI with a thorough taxonomy that can serve as reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.
Article
Spatial experience, or how humans experience a given space, has been a pivotal topic especially in urban-scale environments. On the human scale, HCI researchers have mostly investigated personal meanings or aesthetic and embodied experiences. In this paper, we investigate the human scale as an ensemble of individual spatial features. Through large-scale online questionnaires we first collected a rich set of spatial features that people generally use to characterize their surroundings. Second, we conducted a set of field interviews to develop a more nuanced understanding of the feature identified as most important: perceived safety. Our combined quantitative and qualitative analysis contributes to spatial understanding as a form of context information and presents a timely investigation into the perceived safety of human scale spaces. By connecting our results to the broader scientific literature, we contribute to the field of HCI spatial understanding.
Conference Paper
Augmented Reality (AR) offers new capabilities for blurring the boundaries between physical reality and digital media. However, the capabilities of integrating web contents and AR remain underexplored. This paper presents an AR web browser with an integrated context-aware AR-to-Web content recommendation service named as A2W browser, to provide continuously user-centric web browsing experiences driven by AR headsets. We implement the A2W browser on an AR headset as our demonstration application, demonstrating the features and performance of A2W framework. The A2W browser visualizes the AR-driven web contents to the user, which is suggested by the content-based filtering model in our recommendation system. In our experiments, 20 participants with the adaptive UIs and recommendation system in A2W browser achieve up to 30.69% time saving compared to smartphone conditions. Accordingly, A2W-supported web browsing on workstations facilitates the recommended information leading to 41.67% faster reaches to the target information than typical web browsing.
Conference Paper
Throughout the past decade, numerous interaction techniques have been designed for mobile and wearable devices. Among these devices, smartglasses mostly rely on hardware interfaces such as touch-pad and buttons, which are often cumbersome and counter-intuitive to use. Furthermore, smartglasses feature cheap and low-power hardware preventing the use of advanced pointing techniques. To overcome these issues, we introduce UbiPoint, a freehand mid-air interaction technique. UbiPoint uses the monocular camera embedded in smartglasses to detect the user's hand without relying on gloves, markers, or sensors, enabling intuitive and non-intrusive interaction. We introduce a computationally fast and lightweight algorithm for fingertip detection, which is especially suited for the limited hardware specifications and the short battery life of smartglasses. UbiPoint processes pictures at a rate of 20 frames per second with high detection accuracy-no more than 6 pixels deviation. Our evaluation shows that UbiPoint, as a mid-air non-intrusive interface, delivers a better experience for users and smart glasses interactions, with users completing typical tasks 1.82 times faster than when using the original hardware.