Content uploaded by Lik-Hang Lee
Author content
All content in this area was uploaded by Lik-Hang Lee on Apr 18, 2023
Content may be subject to copyright.
What if we have MetaGPT?
From Content Singularity to
Human-Metaverse Interaction
in AIGC Era
Lik-Hang Lee
Hong Kong Polytechnic University, Hong Kong SAR
Pengyuan Zhou
University of Science and Technology of China, China
Chaoning Zhang
Kyung Hee University, South Korea
Simo Hosio
University of Oulu, Finland
Abstract—
The global metaverse development is facing a “cooldown moment”, while the academia and
industry attention moves drastically from the Metaverse to AI Generated Content (AIGC) in 2023.
Nonetheless, the current discussion rarely considers the connection between AIGCs and the
Metaverse. We can imagine the Metaverse, i.e., immersive cyberspace, is the black void of space,
and AIGCs can simultaneously offer content and facilitate diverse user needs. As such, this
article argues that AIGCs can be a vital technological enabler for the Metaverse. The article first
provides a retrospect of the major pitfall of the metaverse applications in 2022. Second, we
discuss from a user-centric perspective how the metaverse development will accelerate with
AIGCs. Next, the article conjectures future scenarios concatenating the Metaverse and AIGCs.
Accordingly, we advocate for an AI-Generated Metaverse (AIGM) framework for energizing the
creation of metaverse content in the AIGC era.
1. Retrospect: Experimental Metaverse
We have witnessed a surge of investment and
rigorous discussion regarding the Metaverse since
2021. Many believe a fully realized metaverse
is not far off, so tech firms, e.g., Meta, Niantic,
Roblox, Sandbox, just to name a few, have started
creating their immersive cyberspaces with diver-
sified visions and business agendas. After the
metaverse heat wave in 2022, all of us are still as
vague about what the Metaverse is. At the same
time, the hype surrounding the metaverse shows
a sign of slowing down anytime soon, primarily
due to multiple metrics reflecting constantly low
numbers of active daily users, the decreasing
volume of projects, and a high uncertainty of
return on investment.
When the tech giants dipped their toes into the
experimentation pool in 2022, they brought a few
playful tasks to their self-defined virtual venues,
giving users something to do. The fascinating dif-
The article is under review. All rights Reserved, March 2023 © IEEE 1
ficulty is that the metaverse is already fundamen-
tally split among the forwarding-thinking firms
establishing their metaverse realm. These firms
tried hard to resolve technical issues that shape
their immersive cyberspace, due to limited time
and resources, for instance, developing efficient
infrastructure that supports unlimited numbers
of users in the same virtual venues or offering
a decentralized transaction ecosystem driven by
blockchain technology.
Nonetheless, content development is dele-
gated to third parties and thus goes beyond the
firms’ core concerns. Tech firms commonly leave
content creation to the designers and creators,
having an unattainable hope that designers and
creators can fill up the rest of the metaverse.
As a result, one can argue the current virtual
spaces have become aimless, primarily caused
by the lack of content and, therefore, activi-
ties, while users cannot find good reasons to
spend time at such venues daily. Moreover, the
experimental metaverses of 2022 often neglect
usability issues leading to user experiences far
from satisfactory. A prominent example is that
first-time users struggle to understand the inter-
action techniques with their avatars in 3D virtual
environments. Even worse, after hours of practice,
these unskillful users can still not master such
interaction techniques, causing poor usability en-
tirely. Without addressing the gaps in content
and usability, the firms’ ambition exceeds what is
practically feasible. Their ambition refers to the
massive uses of the Metaverse, i.e., the immersive
cyberspace [1]. The core values surrounding the
users are not there yet to make the Metaverse a
reality.
We can briefly look back at the transition from
the static web (Web 1.0) to its interactive coun-
terpart (Web 2.0) in the 2D-UIs era, characterized
by the empowerment of content creation. Among
the static webpages in Web 1.0, limited people
with relevant skills can publish information on-
line. At the same time, users can only read the
information but have no way to make a two-way
interaction. Accordingly, Web 2.0, also known as
social networks (SNS), offers participatory and
dynamic methods and empowers two-way user
interaction, i.e., reading and writing information
in 2D UIs. The critical transition from Web 1.0
to 2.0 is that users, regardless of their technology
literacy, can freely contribute content on SNS,
such as text and images, and then put the content
online. We must note that we are good at writing
a message on a (soft-)keyboard and taking photos
or videos with cameras. Also, most 2D UIs follow
certain design paradigms, requiring only simple
yet intuitive interactions like clicks, taps, swipes,
drags, etc., to accomplish new content creation.
In contrast, although the metaverse suppos-
edly allows everyone to access many different
virtual worlds, three unprecedented barriers arise.
First, the current users have extensive experience
with 2D UIs but not their 3D counterparts. As the
entire experiences proceed with 3D UIs, the users
in the Metaverse have to deal with unfamiliar
virtual worlds with increasing complexity. More
importantly, 3D objects do not explicitly show
user interaction cues. As the Metaverse claims to
be digital twins of our physical environment [2], a
user encounters a virtual chair and employs analo-
gies between the virtual and physical worlds. A
simple question could be: Can the user’s virtual
hands lift or push the chair? As such, users,
in general, may not be aware of how virtual
objects interact in the Metaverse and thus rely on
educated guesses and trial-and-error approaches.
The above question can be generalized into sub-
questions, including but not limited to: What
are the available interaction techniques? When to
activate the user-object interaction? How does the
user understand the related functions mapped to
the object? How can we manage the user expec-
tation after a particular click? Which visual and
audio effects impact the user’s task performance?
Second, the current interaction techniques al-
low users to manipulate a virtual object, such
as selecting, rotating, translating, etc. Still, user
efforts in object manipulation are a big concern.
Commercial input hardware for headsets (e.g.,
controllers or joysticks) or even hand gestural
inputs are barely sufficient for simple point-and-
select operations on 2D UIs in virtual environ-
ments [3] but largely insufficient for 3D models,
especially those with irregular shape causing in-
tolerably long editing time and high dissimilarity
with the intended shape [4]. Therefore, users with
the current techniques, primarily point-and-select
or drag-and-drop, can only manipulate objects
with low granularity. However, content creation
involves careful manipulation of a 3D object, i.e.,
2
Figure 1: AIGCs can prevent us from falling into another ‘Web 1.0’ in the metaverse era – the layman
end-users suffer from the missing capability of creating unique content. We are natively skilful at
texting and photo-taking in social networks but not editing 3D content in virtual 3D spaces. AIGCs
may serve as a saviour to enable the general users freely express themselves, while owners of the
platforms or virtual spaces can still delegate the content creation tasks to the peer users.
modifying the vertex positions in great detail.
Even though nowadays users engage in an im-
mersive 3D environment, most can only create 2D
texts and select some standard 3D objects from an
asset library. The creation of metaverse content
is not fully leveraged by the current authoring
tools and the existing techniques supporting user
interaction with the Metaverse. In the past two
decades, the human-computer interaction commu-
nity has attempted to alleviate the ease of user
interaction in diversified virtual environments.
Nonetheless, usability gaps still exist, resulting
in low efficiency and user frustration [5]. We
see that such gaps will not be overcome if we
purely rely on investigating user behaviours with
alternative interfaces and interaction techniques,
especially since the tasks inside virtual 3D spaces
grow more complicated.
Third, creating large objects, e.g., a dragon
floating in mid-air, requires a relatively spatial
environment. Users unavoidably encounter a lot
of distal operations between the user position
and the virtual creation. It is worth mentioning
that users are prone to errors during such distal
operations. A prior work [6] provides evidence
that users with headsets achieve lower pointing
accuracy to a distal target. Considering such
complicated operations in content creation, typ-
ical metaverse users can not immediately create
objects except those already in the asset library.
In other words, metaverse users have no appro-
priate approaches to unleash the full potential
of creating content in the endless canvas of the
Metaverse. Instead, they hire professionals to
draw and mould virtual instances on traditional
desktops. For virtual space owners, a team of pro-
fessionals, e.g., unity developers, may spend days
or weeks creating virtual environments. Further
change requests (e.g., adding a new 3D model)
for such environments may take additional hours
or days. Without time or skills, general users can
only experience the contents being built by virtual
space owners. As shown in Figure 1, this rigid
circumstance is analogous to the ‘read mode’ in
Web 1.0. Creating unique metaverse content has
become highly inconvenient and demanding. We
will likely face the circumstance of ‘Web 1.0’ in
3D virtual worlds, with some features inherited
from Web 2.0, such as making new texts and
uploading photos.
To alleviate the barriers mentioned above,
this article argues for using AI-generated con-
tent (AIGCs) in both content generation in the
metaverse and AI-mediated user interaction in
the metaverse. The article has a vision that the
GPT-alike model can trigger content singularity
in the Metaverse and assist interaction between
human users and virtual objects in the Metaverse.
Before we move on to the main discussion, we
provide some background information regarding
the Metaverse and AIGCs, as follows.
Metaverse: The Metaverse refers to the
NEXT Internet featured with diversified virtual
spaces and immersive experiences [1]. Similar
to existing cyberspace, we can regard the Meta-
verse as a gigantic application that simultaneously
accommodates diverse types of countless users.
The application comprises computer-mediated
3
Figure 2: Generating a vessel that fits the context of Victoria Habour, Hong Kong. As a result, a junk
boat appears: Original view (Left), Sketching (middle), the generated vessel on top of the physical
world (Right).
worlds under the Extended Reality (XR) spec-
trum and emerging derivatives like Diminished
Reality (DR). Ideally, users will create content
and engage in activities surrounding such content.
Multitudinous underlying technologies serve as
the backbone of the Metaverse, including AI, IoT,
mobile networks, edge and cloud servers, etc.
Among the technologies, we can view AI as the
fuel to support the automation of various tasks
and content creation. Our discussion in this article
goes beyond the well-known applications, in-
cluding creating avatars, virtual buildings, virtual
computer characters and 3D objects, automatic
digital twins, and personalized content presenta-
tion [2].
AI-Generated Content (AIGC): Apart from
the analytical AI focusing on traditional prob-
lems like classification, AIGC can leverage high-
dimensional data, such as text, images, audio,
and video, to generate new content. For instance,
OpenAI announces its conversational agent, Chat-
GPT [7], of which the latest GPT-3 and GPT-4
can create texts and images, respectively. More-
over, the generated content can support the gen-
eration of metaverse objects, such as speech
for in-game agents, 3D objects, artist artefacts
and background scenes in many virtual worlds.
The most popular techniques, including GANs,
Diffusion models, and transformer architectures,
support the challenging context-to-content task.
It is important to note that generative AI and
AIGC differ subtly [8]. AIGC focuses on content
production problems, whereas generative AI ana-
lyzes the underlying technological underpinnings
that facilitate the development of multiple AIGC
activities.
2. Content Singularity
The most widely used metaverse applications
have appeared in industrial applications in the
past two decades [9]. The firms have the resources
to build up proprietary systems and prepare the
content of their interested domain. The work
content drives the adoption of AR/VR applica-
tions in industrial sectors, with the following two
examples. First, labour at warehouse docks and
assembly lines can obtain helpful information
(e.g., the next step) through the lens of AR [10].
Second, personnel at elderly caring centres can
nurture compassion from perspective-taking sce-
narios of virtual reality (VR) [11]. Content is one
of the incentives, and end-users achieve enhanced
abilities or knowledge, perhaps resulting in better
productivity.
As we discussed the main three barriers in
Retrospect, users have limited ability and re-
sources to create unique content in the Metaverse.
The general users can only draw some simple
yet rough sketch to indicate an object in Ex-
tended Reality. Nonetheless, such expressiveness
is insufficient for daily communication or on-site
discussion for specific work tasks. We may expect
the content on AR devices to be no worse than
what we have in Web 2.0. To alleviate the issue,
AIGCs can play an indispensable role in lowering
the barriers and democratizing content creation.
Figure 2 illustrates a potential scenario where
users can effectively create content in virtual-
physical environments. For instance, a user with
an AR device is situated in a tourist spot and
attempts to show the iconic vessels to explain
the cultural heritage of Hong Kong’s Victoria
Harbour. First, the AIGC model can understand
the user’s situation and context through sensors
on the AR device, for instance, depth cameras.
Second, the users can make a dirty and speedy
4
sketch to indicate the shape and position of the
generated object. In addition, the prompt which
contains the user’s description, i.e., a prompt of ‘a
vessel fits this view’, is sent to the AIGC model
through methods like speech recognition. It is
important to note that our speech always involves
‘this’ or ‘that’ to indicate a particular scene and
object. The AIGC model can employ the user’s
situation and context in such a scenario. Finally, a
junk boat appears in the Victoria Habour through
the lens of AR.
Singularity can refer to a point in time or a
condition in which something undergoes a sig-
nificant and irreversible milestone, depending on
the context of such changes. Also, it is frequently
used in technology and artificial intelligence (AI)
to describe the hypothetical moment when robots
or AI transcend human intellect and become
self-improving or perhaps independent [12]. This
notion is also known as technological singularity
or AI singularity. It is a contentious issue to
the Metaverse when AIGCs are widely adopted
by end users. We believe the occurrence of AI-
generated content might have far-reaching con-
sequences for cyberspace. Next, the concept of
content singularity refers to the belief that we
are reaching a time when there will be abundant
virtual material available on the Internet that
people will consume as their daily routine. This
is owing to the demand for immersive cyberspace
and related technological ability, perhaps AIGCs,
to pave the path towards the exponential prolif-
eration of virtual 3D content. This is similar to
the social network in which people contribute and
consumer content.
Since the launch of ChatGPT1, pioneering
prototypes shed light on the daily uses of GPT-
driven intelligence on AR wearables, such as
generating simple 3D contents using WebAR
(a.frame) by entering prompts2and providing
suggested answers for conversations during dat-
ings and job interviews3. These examples go
beyond the industrial scenarios, implying that
AIGC-driven conversational interfaces can open
new opportunities for enriching virtual-physical
1https://openai.com/blog/chatgpt
2https://www.youtube.com/watch?v=J6bSCVaXoDs&ab
channel=ARMRXR
3https://twitter.com/bryanhpchiang/status/
1639830383616487426?cxt=HHwWhMDTtfbC7MEtAAAA
blended environments [13]. Generative AI models
can recognise the user context using the sensors
on mobile devices (e.g., cameras on AR headsets
or smartphones) to generate appropriate objects
according to given prompts. In this decade, gen-
eral users will treat generative AI models as
utilities like water, electricity, and mobile net-
work. Meanwhile, the metaverse is an endless
container to display AI-generated content so users
can read and interact with the AI utility mid-
air. Users can make speech prompts to generative
AI models to create characters, objects, back-
drop scenes, buildings, and even audio feedback
or speeches in virtual 3D environments. These
content generations should not pose any hurdle
or technical difficulties to the general users. It
will be as simple as posting a new photo on
Instagram, typing a tweet on Twitter, or uploading
a new video on Tiktok. The lowered barrier will
encourage people to create content, and more
content consumers will follow, eventually leading
to a metaverse community. In addition, rewarding
schemes should be established when the content
singularity arrives to sustain the content creation
ecosystem. AIs, the data owners behind them, and
users become the primary enablers and principal
actors, respectively. The way of splitting the re-
ward among them is still unknown, and ongoing
debates will continue.
Generative AI models are obviously drivers of
content generation. But we should not neglect its
potential of removing contents, primarily physical
counterparts, through the lens of XR devices,
also known as Diminished Reality (DR). It is
important to note that the naive approach of
overlaying digital content on top of the physical
world may hurt the user experience. A virtual in-
stance may not match the environmental context,
and it may be necessary to change the context
to show better perceptions when the metaverse
application strongly relates to daily functions.
We may accept a virtual Pok´
emon appearing on
top of a physical rubbish bin. However, we feel
weird when a virtual table overlaps the physical
table being disposed of. Therefore, AIGCs may
serve as a critical step of DR to smoothen the
subsequent addition of digital overlays (AR). In
this sense, the demands of AIGCs will penetrate
throughout the entire process of metaverse con-
5
tent generation. More importantly, the diminished
items should comply with the user’s safety or
ethical issues. Hiddening a warning sign may put
the users in danger. Also, putting off a person’s
clothes may show inappropriate content, i.e., a
naked body. It is essential to reinforce regulation
and compliance when generative AI models are
widely adopted in the content-generation pipeline.
On the other hand, content singularity can also
refer to the challenges of information overload in
a virtual-physical blended environment, in which
people are assaulted with so much information
that it is impossible to digest and make sense of
it all [14]. The sheer volume of online informa-
tion, including text, photos, videos, and music,
is already daunting and rapidly increasing. As
such, the virtual-physical blended environment
may cause a lot of disturbance to users if we
neglect such exponential proliferation of 3D con-
tent.
Information or knowledge in the tangible
world can indeed be thought limitless, whereas
augmentation within the relatively limited field
of view of headsets is complex. Consequently,
we must optimise the presentation of digital con-
tent. Typically, metaverse users with a naive ap-
proach to virtual content delivery will experience
information inundation, thereby requiring addi-
tional time to consume the augmentation. Context
awareness, such as the users, environments, and
social dynamics, is a prominent strategy for man-
aging the information display. The AIGCs at the
periphery, with the assistance of recommendation
systems, can interpret user context and provide
the most pertinent augmentation [14].
Although we foresee a rise in content volume
when AIGCs are fully engaged as a utility in the
Metaverse, two significant issues should be ad-
dressed. First, content uniqueness raises concerns
about the quality and relevancy of the material
provided. With so much material accessible, users
are finding it increasingly difficult to identify
what they seek and discern between high-quality
and low-quality content. To address the issues
of content singularity, additional research studies
should have been made to create new tools and
methodologies that will assist users in filtering,
prioritizing, and personalizing the material they
consume. Current solutions in Web 2.0 include
search engines, recommendation algorithms, and
content curation tools. Yet, the issue of content
singularity remains a complicated and continuing
one that will undoubtedly need further innovation
and adaptation as the volume and diversity of
digital information increase in the Metaverse.
Second, contemporary conversational inter-
faces have long been criticized for lacking trans-
parency as a ‘black box’ [15]. In other words,
conversational AIs do not show a complete list
of their ability, while the general users usually
have no clues about what the AI can achieve.
Significantly, users with low AI literacy cannot
quickly master the interaction with GPT-like AI
agents through a conversational interface. Ex-
ploring the perfect fit between the generative AI
models and the XR environment is necessary.
For instance, the AI models can suggest some
potential actions to the users by putting digital
overlays on top of the user’s surroundings. As
such, the user can understand the AI’s ability
and will not make ineffective enquiries or wasted
interactions with the generative AI model. In
addition, more intuitive clues should be prepared,
according to the user context, to inform the user
about ‘what cannot be done’ with a generative AI
model.
3. Human-Metaverse Interaction
Besides generating virtual content, AIGC can
be considered an assistive tool for user interaction
in the metaverse. From other users’ perspectives,
a user’s movements and interaction with virtual
objects can be a part of the content in virtual
worlds. The difficulties of controlling an avatar’s
movements and interacting with virtual objects
can negatively impact an individual’s workload
and the group’s perceptions of a metaverse appli-
cation. For example, a group awaits an individual
to finish a task, causing frustration.
Before discussing how the prompt should be
extended in the Metaverse for easier interaction
between users and metaverse instances, some
fundamentals are considered in human-computer
interaction (HCI) and prompt engineering [16].
Prompts have different concerns in HCI and NLP.
From the HCI perspective, Effective prompts are
clear, concise, and intuitive. Users have to design
prompts for an interactive system, and users’
workloads exist to take specific actions or provide
relevant input. Once the user’s needs and goals
6
Figure 3: An example pipeline of content creation and human-metaverse interaction supported by
AIGCs: (a) brainstorming with conversational agents (collecting requirements simultaneously); (b)
auto-generation of the contents; (c) start manual edition but huge pointing errors exist; (d) following
(c), AI-assisted pointing for selecting vertex; (e) following (d), AI-assisted vertex editing; (f) manual
editing of subtle parts; (g) AI-assigned panel and user interaction on the virtual objects; (h) user
reviews of the objects while AIGCs attempt to understand the user perceptions; (I) content sharing,
e.g., educational purpose in a classroom. Photos are extracted and modified from [4] for illustration
purposes.
have been identified, the next step is to craft
effective prompts that guide the user towards
achieving those goals. And the AI-generated re-
sults provide users with the information they need
to take action in a particular context. Therefore,
prompt engineering is an essential aspect of de-
signing interactive systems that are easy to use
and achieve high levels of user satisfaction.
Prompt engineering, in NLP and particularly
LLMs, refers to the methods for how commu-
nicating with LLM to steer their behavior for
desired outcomes. The traditional chatbot (e.g.,
ChatGPT) considers primarily text prompts. In
contrast, the prompts from the metaverse users
can become more diverse by considering both
the context as discussed above and multiple user
modalities, including gaze, body movements, and
psychological and physiological factors. In ad-
dition, perhaps employing certain personaliza-
tion techniques, prompts should be tested and
refined iteratively to ensure that they effectively
guide LLMs towards the desired output. As such,
metaverse-centric prompt engineering requires a
new understanding of the user’s needs and goals,
as well as their cognitive abilities and limitations.
This information can be gathered through user
testing, A/B testing, user surveys and usability
testing in many virtual worlds.
The prompt design can be extended to the sub-
tle interaction between virtual objects and users.
VR sculpting is a popular application where users
can freely mould their virtual objects in virtual
spaces. The usability of VR, inaccurate pointing
to vertex, becomes a hurdle [4]. It is still far away
from being the main tool of creativity due to its
efficiency. A hybrid model can be considered:
generative AI models like LLMs can first generate
a model of 3D content, and then we customize
the model with manual editing in VR. In this
sense, an important issue arises – we cannot get
7
rid of manual operations with virtual instances.
AIGCs, in the future, should assist human users
in virtual tasks that inherit the nature of complex-
ity and clumsiness, under hardware constraints,
such as limited Field-of-view (FOV). AIGCs can
parse the user actions in virtual environments, for
instance, limb movements and gazes towards a
virtual object, to provide appropriate work done
from the manual editing. As such, AIGCs can
serve as assistants for metaverse users. It is impor-
tant to note that AI-assisted tasks always happen
on the daily ubiquitous devices, i.e., smartphones.
A prevalent example of 2D UIs is typing text on
soft keyboards. Users tap on the key repetitively
and make typos if triggered by adjacent keys.
Such an erroneous task can be assisted by auto-
correction. Users can tap the mistyped word and
select the correct spelling from the suggested
words. To achieve this, an AI model learns the
words in the English dictionary and then under-
stands user habits by recording the user’s word
choice.
Typing on a soft keyboard is a good example
of an AI-assisted task. In virtual environments,
the interaction tasks, including dragging an object
to a precise position and editing an object of
irregular shapes, can be challenging to the users.
AIGCs open opportunities to help human users
accomplish the task. Nonetheless, the typing tasks
on soft keyboards can be manageable because
the dictionary is a reasonable search space. In
contrast, AIGC-driven assistance can encounter a
much larger room. In the editing task, a user can
first select a vertex at a rabbit’s tail. The next
action can be changing the vertex property and
then moving to another vertex. The next vertex
can happen on the head, bottom, etc. With the cur-
rent technology, predicting the user’s following
action at a highly accurate rate is very unlikely.
However, if available, AIGCs may leverage the
prior users’ behaviours from a dataset containing
user interaction footprint, and accordingly recom-
mend several ‘next’ edits to facilitate the process.
Eventually, the user can choose one of them and
accomplish the task without huge burdens.
In a broader sense, diversified items exist
in many virtual worlds, and a virtual item can
have many possible relationships with another. As
such, the user interaction with AIGCs’ predictions
becomes complicated. For instance, I pick up
an apple and then lift a tool to cut it. Other
possible actions include putting down the apple,
grabbing an orange, etc. It is also important
to note that building an ontology for unlimited
items in the Metaverse is nearly impossible. One
potential tactic is to leverage the user’s in-situ
actions. Generative AI models can read the user’s
head and hand movements to predict the user’s
interested regions and, thus, upcoming activities.
Ideally, a user may give a rough pointing location
to a particular item. Then, Generative AI models
can make personalized and in-situ suggestions
for the user’s subsequent interactions with virtual
objects, with sufficient visualization to ensure
intuitiveness. We believe that the above examples
are only the tip of the iceberg but sufficient to
illustrate the necessity of re-engineering the ways
of making metaverse-ready prompts for Genera-
tive AI models.
Then, there is the issue of how natural people
will feel in the metaverse environments built,
or in some cases hallucinated, with AIGCs. Ur-
ban designers and architects are now looking
into what factors of our ordinary environments
matter most when attempting to translate the
environments into digital ones, beyond the 3D
replication. Here, issues such as subjective pres-
ence (comfort, feeling, safety, senses) or active
involvement (activities taking place, other peo-
ple’s presence), in addition to the traditionally
considered structural aspects (colour, furnishing,
scale, textures), will play a pivotal role in how the
metaverse experience will feel like for its users
(see, e.g., [17]). And so the questions to solve
will include such as to what degree will we want
generative AI to be able to spawn experiences that
feel safe, or should the spaces more closely reflect
the world as we know it, outside the metaverse,
where different even adjacent spaces will have a
lot of different perceived human characteristics to
them.
The technical capability of AIGCs only opens
a landscape of generating metaverse content, re-
gardless of adding backdrops (AR) or removing
objects causing strong emotions (DR). But we
know very little about user aspects once AIGCs
can be scaled. As the metaverse moves beyond the
sole digital interfaces, i.e., 2D UIs, the AIGC can
be embedded in the physical worlds and alter the
user’s situated environments for fulfilling users’
8
Figure 4: AIGM framework showing the relationship between human users, AIGCs and virtual-physical
cyberspace (i.e., the Metaverse).
subjective presence that can be abstract. It can
vary greatly due to the user’s beliefs (norms,
customs, ego, and so on) and their environment.
A machine may not truly interpret the mean-
ing of ‘calm’, especially if multiple subjective
presences are underlying, e.g., ‘safe and calm’.
A user makes a simple prompt of ‘calm’ to
an AIGC model. Consequently, the results are
unsatisfactory, as the user does not make effective
prompts, for example, adding words like ‘medi-
tation, wellness and sleep’ if the users are inside
a bedroom. It is worth noting users with headsets
may expect quick and accurate feedback, instead
of requesting the generative AI models to revise
the content with multiple iterations. In addition,
subjective presence does not limit to a single
user. Multiple users will interact with metaverse
content in a shared space, potentially causing co-
perception and communication issues. Generating
the right content at the right time leads to a
challenging problem more than technical aspects.
AIGC in the Metaverse will lead to a novel niche
of understanding the dynamics among metaverse
content, physical space, and users.
4. Towards AIGM Framework
We argue AIGM is a must – should we aim
to unleash all of the latent potentials in the
metaverse concept. This is, regardless of who
is the leading developer, the metaverse must be
built for humans, and as humans, everything we
do is embodied in the space around us [17].
The leading developers do not have the authority
to arrange what content we should have on the
Next Internet, as we have seen in the Metaverse
of 2022, in which the virtual spaces are office-
like environments. We usually spend eight work
hours at the physical office, and it is insane to
spend the rest of 8 hours in the virtual office.
Ironically, except for the standard item given in
asset libraries, we don’t have the right to decorate
such office space with our unique creations. It is
eventually the user’s call for the popular trend
in the Metaverse. When Google’s Image searches
were conducted since Q3 2021, it was evident
that the creators had always defined the metaverse
with blue, dark, and purple colours. We believe
the trend of popular content is ever-changing.
Driven by the vital role of AIGCs in democ-
9
ratizing content creation, everyone in the Meta-
verse can decide, (co-)create, and promote their
unique content. To scale up the use of AIGCs,
we propose a framework for an AI-Generated
Metaverse (AIGM) that depicts the relationships
among AIGCs, virtual-physical blended worlds,
and human users, see Figure 4. AIGC is the fuel
to spark the content singularity, and Metaverse
content is expected to surround everyone like
the atmosphere. This creates an entire creation
pipeline in which AIGCs are the key actors. First,
the users can talk to generative AI models to
obtain inspiration during human-AI conversations
(Human-AI collaboration). Consequently, gener-
ative AI models provide the very first edition
of the generated content (AI-Generation). It then
supports subtle editing during content creation
(AI-Assistance). Some precise details can be done
manually (Human users); if necessary, multiple
users can be involved in the task (Multi-user
collaboration). In addition, it is important to note
that AIGCs can assign properties of how the
users and virtual instances will interact, e.g.,
through a tap on a panel, and accordingly, AIGC-
driven evaluations will perform to understand the
user performance and their cognitive loads [18].
Eventually, content sharing and the corresponding
user interaction can be backed by AIGCs.
5. Concluding Remarks
During a deceleration of global metaverse
development, the author contends that AIGCs can
be a critical facilitator for the Metaverse. This
article shares some perspectives and visions of
when AIGCs meet the Metaverse. Our discussion
started with a look back at the key flaws of
metaverse applications in 2022. We also highlight
the fundamental difficulties the metaverse en-
countered. Accordingly, we examine how AIGCs
will speed up metaverse development from a user
standpoint. The article eventually speculates on
future possibilities that combine the Metaverse
with AIGCs. We call for a conceptual framework
of AIGM that facilitates content singularity and
human-metaverse interaction in the AIGC era. We
also hope to provide a more expansive discussion
within the HCI and AI communities.
REFERENCES
1. L.-H. Lee, P. Zhou, T. Braud, and P. Hui, “What is
the metaverse? an immersive cyberspace and open
challenges,” ArXiv, vol. abs/2206.03018, 2022.
2. L.-H. Lee, T. Braud, P. Zhou, L. Wang, D. Xu, Z. Lin,
A. Kumar, C. Bermejo, and P. Hui, “All one needs to
know about metaverse: A complete survey on tech-
nological singularity, virtual ecosystem, and research
agenda,” 2021.
3. L. H. Lee, T. Braud, F. H. Bijarbooneh, and P. Hui,
“Ubipoint: Towards non-intrusive mid-air interaction for
hardware constrained smart glasses,” in Proceedings
of the 11th ACM Multimedia Systems Conference,
ser. MMSys ’20. New York, NY, USA: Association
for Computing Machinery, 2020, p. 190–201. [Online].
Available: https://doi.org/10.1145/3339825.3391870
4. K. Y. Lam, L.-H. Lee, and P. Hui, “3deformr: Freehand
3d model editing in virtual environments considering
head movements on mobile headsets,” in Proceedings
of the 13th ACM Multimedia Systems Conference,
ser. MMSys ’22. New York, NY, USA: Association
for Computing Machinery, 2022, p. 52–61. [Online].
Available: https://doi.org/10.1145/3524273.3528180
5. L.-H. Lee, T. Braud, S. Hosio, and P. Hui,
“Towards augmented reality driven human-city
interaction: Current research on mobile headsets
and future challenges,” ACM Comput. Surv.,
vol. 54, no. 8, oct 2021. [Online]. Available:
https://doi.org/10.1145/3467963
6. A. U. Batmaz, M. D. B. Machuca, D.-M. Pham, and
W. Stuerzlinger, “Do head-mounted display stereo de-
ficiencies affect 3d pointing tasks in ar and vr?” 2019
IEEE Conference on Virtual Reality and 3D User Inter-
faces (VR), pp. 585–592, 2019.
7. C. Zhang, C. Zhang, C. Li, S. Zheng, Y. Qiao, S. K. Dam,
M. Zhang, J. U. Kim, S. T. Kim, G.-M. Park, J. Choi, S.-
H. Bae, L.-H. Lee, P. Hui, I. S. Kweon, and C. S. Hong,
“One small step for generative ai, one giant leap for agi:
A complete survey on chatgpt in aigc era,” researchgate
DOI:10.13140/RG.2.2.24789.70883, 2023.
8. C. Zhang, C. Zhang, S. Zheng, Y. Qiao, C. Li, M. Zhang,
S. K. Dam, C. M. Thwal, Y. L. Tun, L. L. Huy, D. kim, S.-
H. Bae, L.-H. Lee, Y. Yang, H. T. Shen, I.-S. Kweon, and
C.-S. Hong, “A complete survey on generative ai (aigc):
Is chatgpt from gpt-4 to gpt-5 all you need?” ArXiv, vol.
abs/2303.11717, 2023.
9. S. B ¨
uttner, M. Prilla, and C. R¨
ocker, “Augmented reality
training for industrial assembly work - are projection-
based ar assistive systems an appropriate tool for
assembly training?” in Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems,
10
ser. CHI ’20. New York, NY, USA: Association
for Computing Machinery, 2020, p. 1–12. [Online].
Available: https://doi.org/10.1145/3313831.3376720
10. A. C. C. Reyes, N. P. A. Del Gallego, and J. A. P.
Deja, “Mixed reality guidance system for motherboard
assembly using tangible augmented reality,” in
Proceedings of the 2020 4th International Conference
on Virtual and Augmented Reality Simulations, ser.
ICVARS 2020. New York, NY, USA: Association for
Computing Machinery, 2020, p. 1–6. [Online]. Available:
https://doi.org/10.1145/3385378.3385379
11. V. Paananen, M. S. Kiarostami, L.-H. Lee, T. Braud,
and S. J. Hosio, “From digital media to empathic reality:
A systematic review of empathy research in extended
reality environments,” ArXiv, vol. abs/2203.01375, 2022.
12. T. J. Prescott, “The ai singularity and runaway human
intelligence,” in Living Machines, 2013.
13. P. Zhou, “Unleasing chatgpt on the metaverse: Savior or
destroyer?” arXiv preprint arXiv:2303.13856, 2023.
14. K. Y. Lam, L. H. Lee, and P. Hui, “A2w: Context-
aware recommendation system for mobile augmented
reality web browser,” in Proceedings of the 29th
ACM International Conference on Multimedia, ser. MM
’21. New York, NY, USA: Association for Computing
Machinery, 2021, p. 2447–2455. [Online]. Available:
https://doi.org/10.1145/3474085.3475413
15. A. B. Arrieta, N. D. Rodr´
ıguez, J. D. Ser, A. Bennetot,
S. Tabik, A. Barbado, S. Garc´
ıa, S. Gil-Lopez, D. Molina,
R. Benjamins, R. Chatila, and F. Herrera, “Explain-
able artificial intelligence (xai): Concepts, taxonomies,
opportunities and challenges toward responsible ai,”
ArXiv, vol. abs/1910.10045, 2019.
16. V. Liu and L. B. Chilton, “Design guidelines for prompt
engineering text-to-image generative models,” Proceed-
ings of the 2022 CHI Conference on Human Factors in
Computing Systems, 2021.
17. V. Paananen, J. Oppenlaender, J. Goncalves, D. Hetti-
achchi, and S. Hosio, “Investigating human scale spa-
tial experience,” Proceedings of the ACM on Human-
Computer Interaction, vol. 5, no. ISS, pp. 1–18, 2021.
18. Y. Hu, M. L. Yuan, K. Xian, D. S. Elvitigala, and A. J.
Quigley, “Exploring the design space of employing ai-
generated content for augmented reality display,” 2023.
11