Content uploaded by Eytan Mann
Author content
All content in this area was uploaded by Eytan Mann on Dec 13, 2022
Content may be subject to copyright.
Toward a Generative Pipeline
for an AR Tour of Contested Heritage Sites
Eytan Mann
Technion Israel Institute of Technology
Haifa, Israel
https://orcid.org/0000-0003-0146-0677
Jonathan Dortheimer
Technion Israel Institute of Technology
Haifa, Israel
https://orcid.org/0000-0002-7464-8526
Aaron Sprecher
Technion Israel Institute of Technology
Haifa, Israel
https://orcid.org/0000-0002-2621-7350
Abstract—This paper envisions a pipeline for automating the
generation of augmented reality tours of contested heritage sites
while employing a critical approach toward the representation
of history. Through the design of a generative pipeline, the
paper identifies and discusses the potential and pitfalls associated
with extracting spatial features from archival manuscripts and
presenting them using an augmented reality application. The
paper proposes a number of design approaches that assist in
automating the transformation of manuscripts into interactive
tours while taking into consideration historical, narrative, and
technical challenges.
Index Terms—augmented reality, historiography, tour, gener-
ative, artificial intelligence
I. INTRODUCTION
Augmented Reality (AR) can serve as a powerful tool to
engage in an exploration of the city augmented by archival
materials, effectively turning an urban explorer into a reader.
In this paper, we outline a design pipeline that automatically
analyzed and spatially contextualized archival materials about
a site, toward generating an AR touring experience. We
focus on historical manuscripts - archival materials such as
letters, reports, articles, memoirs, and recorded interviews. We
aim to allow urban historians and tourists to interact with
historical information while at the location, to contextualize
archival materials. To enable what philosopher Timothy Barker
describes as making the past present, a way to re-present the
past [1].
A first step toward spatializing information can be achieved
by analyzing written materials, followed by archival photog-
raphy and reconstruction modeling. The task of augmenting
the site with archival materials involves methodologies from
historiography and computing: How to assess the validity of
historical materials, which sources can be used, and why
certain materials should be included in a historical account
while others should be ignored? Can we automate the pro-
cess of identifying spatial attributes from large amounts of
documents from a computational standpoint? The pipeline we
propose takes a step toward generating a tour of the site while
maintaining a critical eye.
We envision an AR historical tour that is less a final-cut
film and more an ongoing collection of historical materials
about a site, following a logic of “database histories” defined
by media theorist Steve Anderson as “histories comprised of
not narratives that describe an experience of the past but rather
collections of infinitely retrievable fragments, situated within
categories and organized according to changing associations”
[2]. Being physically present at the site while engaging with
digital materials proving a user a sense of historical empathy
that cannot be achieved from a classroom with a textbook
[3]. This state of immersion is a “combination of feelings of
attachment, dependence, concern, identity, and belonging that
people develop regarding a place” [4]. AR allows Situated
Learning - a pedagogical model based upon the notion that
knowledge is contextually situated and is by the activity,
context, and culture in which it is used [5].
In the context of sites holding difficult heritage, AR can
potentially address sites uniquely associated with war in
contested cultural heritage - places that include multiple and
often conflicting narratives.
II. BACKGRO UND A ND RELATED WORKS
Geographers and computer scientists are both concerned
with the semantic spatial extraction of locations and other
spatial dimensions from unstructured text. In Literary Geogra-
phy, scholars have attempted to map and model literary worlds
described in books. In such attempts, the “space of the text”,
is always anchored in some form to the ‘reality’ of existing
spaces and places [6]. Literary Geography reveals the ‘place-
bound nature of literary forms’, using maps and other visual
diagrams to explore the internal logic of narrative [7].
In computing and language analysis, extracting locations
from human language has seen some advancements and
several approaches. For example, Corpus Linguistics is a
methodology used to study language using a large naturally
occurring body of text – a corpus – on various levels, including
lexis, syntax, semantics, and pragmatics or discourse [8].
Corpus techniques are increasingly being exploited across a
wide range of areas within linguistics, such as the description
of grammar, the analysis of literary style, or the investigation
of language change. As a preliminary step in many corpus-
based techniques, automatic language analysis techniques from
the closely related area of computational linguistics, otherwise
known as natural language processing (NLP), are used to
enhance the corpus data with some annotation to code one
or more levels of the analysis robustly and consistently [9].
NLP techniques can be used for named entity extraction
[10]. These techniques allow the automatic discovery of names
130
2022 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR)
2771-7453/22/$31.00 ©2022 IEEE
DOI 10.1109/AIVR56993.2022.00026
of people, places, organizations, times, dates, and other quan-
tities, using criteria other than simply spotting grammatical
markers of proper nouns. A variety of other approaches
exists, including knowledge-based [11], rule-based [12], and
statistical methods [13] exist, and these may need to be tailored
for a particular domain or input text.
Extracting space from natural language can also focus on
language descriptions rather than place names, which may
provide a more detailed spatial context. In Spatial Semantics,
spatial relations are understood using language. For example,
in the sentence “the book is on the table,” a spatial arrangement
emerges that is based on the naming of locations. The semantic
primitives used are “trajectors” (the book) and “landmarks”
(the table), whereby the location of the first is determined by
its relationship with the second [14].
III. AUGMENTED REALITY GENERATIVE PIPELINE
IV. METHODOLOGY
Our methodology combines historical archival research
along with a Research through Design (RtD) approach, espe-
cially as a prototyping human-computer interaction research
approach [15]. We selected a case study site and evaluated the
potential and pitfall of deploying the envisioned pipeline.
To examine the suggested pipeline, we used a case study of
Wadi Salib, a neighborhood in downtown Haifa, Israel. The
site tells a painful story of violence and destruction during
the early years of the founding of the state of Israel. First,
during the Arab-Israel war of 1948, the Palestinian inhabitants
fled the city; then, from 1960-1980, a second displacement
occurred when the state of Israel emptied the neighborhood of
its Moroccan Jewish inhabitants that settled in the abandoned
Palestinian homes [16]. Since the evacuation of Wadi Salib’s
last inhabitants, many of the neighborhood’s homes were
leveled down, with some still standing empty, as part of a
strategic plan to “modernize the city”(see Figure ??). Today,
Wadi Salib stands mostly ruined and was never rebuilt. It is
an urban-archaeological site stratified by narratives. Although
the destroyed neighborhood seems silent, multiple narratives
are stored in archives such as the National Archive, the Mu-
nicipal Archive, NGOs’ archives, the city museum, recorded
interviews from film and television, and in publications and
social media groups.
These sources tell multiple narratives about Wadi Salib, and
naturally use varying rhetoric. Some are more objective, such
as reports, and some are subjective, such as interviews. Also,
In addition to general descriptions, some give more detailed
descriptions of places, such as houses, streets, and shops.
A. Large Language Models for data extraction
For this preliminary experiment, we used a publicly avail-
able Large Language Model (LLM) named GPT-3 to identify
locations and other spatial dimensions in a body of unstruc-
tured text [17]. One of the GPT-3 use cases is understanding
and extracting data from a text, which fits well to our needs.
We designed a simple prompt that made the language model
predict the location and time of the described event in each
sentence.To test the accuracy of the LLM’s predictions, we
compared the predicted locations and times to a manually
labeled data set. Through an iterative process, we fine-tuned
the text prompt to improve the accuracy of the predictions.
We conceptualize the generative process as consisting of
four steps that derive from an archival collection about a
defined site.
1) Archival Collecting: Following the definition of the site
boundaries, historical archival materials are collected. The
collection phase includes archival research and bibliography
collection. The sources are digitized and labeled according
to parameters. The collection targets sources that may include
spatial descriptions, mentioning of place names, and addresses.
The materials that lack any geographical context are discarded.
2) Extracting and Segmenting Texts: Original manuscripts
are automatically analyzed and segmented according to lo-
cation and times of the described events using GPT-3. This
segmentation process outputs an array of textual segments
ready to be attached to the physical environment.
3) Geocoding: The extracted locations are located using
a geographic information system (GIS). Available geocoding
services, such as municipal GIS or OpenStreetMap, can iden-
tify the places that are exact and exist today. The outcome of
this process is a geocoder that converts location texts into GIS
spatial entities.
4) Environmental Scanning: After extracting the locations
associated with various text segments, Terrestrial Laser Scan-
ning (TLS) is deployed to acquire point cloud data of the
relevant locations. Generation of pointcloud data is essential
for an AR experience to “recognize” the physical context and
to improve surface recognition [18]. As a result of the previous
step, we used a TLS to produce point clouds of the relevant
extracted places.
5) Reconstructive Modelling: After acquiring the locations’
current state, additional 3D modeling might be required.
Architectural drawings, photographs, and municipal records
can help resolve this. The produced 3D models correspond to
the level of detail provided by supporting evidence - from low
detail block models to detailed architectural models.
6) Scene Assembling: This phase aims to augment the site
with text segments, 3D reconstruction models, and comple-
mentary photographs. The process outputs a collection of
three-dimensional AR-ready scenes. We explored methods
for automatic sequencing of segments, based on metadata
(such as source identity, historical period, keywords): (1)
Linear sequencing entails composing a tour in accordance with
the original text’s sequence, and (2) Nonlinear sequencing
refers to mixing scenes taken from various sources. This is
gained through hyperlinks which navigate the user towards the
location where a related scene appears. These phases allow a
user to navigate through and in between narratives.
7) AR User Interface: The user interface targets mobile de-
vice’s screen while adding interactivity through hyperlinks. It
integrates geo-location, pointcloud data, and real-time camera
data from a mobile device to render a narrative sequence.
131
Fig. 1. Generative Framework process diagram, from raw materials to AR experience
V. I NITIAL ASSESSMENT
We envisioned a system that can potentially transform
archival materials in text format into an AR tour experience.
However, several significant pitfalls arise:
A. Historiographical Pitfalls
1) Non-spatial manuscripts: It was often challenging to
work with archival materials that did not refer to spatial
features or locations. How can these be integrated into an AR
application? Such samples may contain language written in an
overview format which relates to the site in general without
mentioning specific locations, or in a way that does not relate
to the physicality of the site at all. If we wish to produce a
rich and scientifically sound AR application that may serve as
a research tool, these materials may offer valuable evidence
on the site’s history. As a result, we determine that non-spatial
information must be incorporated into the AR tour.
We suggest adding a voice over narrator that would offer to
insert information that is not directly linked to physical feature
on site, but rather provides an extra layer of information, a sort
of footnote that is related to the site. We suggest using GPT-
3 to identify and aggregate non-spatial overview descriptive
language.
2) Scientifically Sound Information: Archival materials
hold a range of historical validity and integrity of sources,
some are official governmental reports, and some are personal
accounts. To avoid a reductive presentation of the sources, we
suggest integrating a labeling system into the application’s user
interface that may indicate validity levels. Such visual cues
may assist the user in touring places augmented by textual
and visual materials with some criticality - while maintaining
awareness of the sources.
3) Language and Translation: Another critical concern is
that most NLP methods were developed for English, and
current LLMs were primarily trained on English corpus. This
means that historic materials needs be translated from the
original language (such as Hebrew and Arabic) to English.
As a result, an the original linguistic form and an important
cultural significance might be lost in translation. To reduce
misinterpretation and reduction, we employ professional trans-
lators. Moreover, the original text must be kept alongside
translations for users to view and read texts in their original
language and meaning.
4) Sampling Bias: A limited dataset may undermine our
motivation of augmenting the site with multiple narratives,
by offering a biased perspective [19]. How can we overcome
biased representation in the archival collecting phase? While
the problem of bias can never be wholly solved, we suggest en-
gaging researchers from various subject positions concerning
the site’s history to minimize misrepresentation. For example,
in the case of Wadi Salib, this means engaging historians
and research assistants from various cultural backgrounds,
including both Jewish and Palestinian, who can read multiple
languages and access different archives.
B. Information extraction challenges
We identified several technical challenges in automating the
extraction:
1) Several or Relative Locations: There can be several
locations in a sentence, which makes it difficult for an al-
gorithm to determine which one is the story’s actual location.
Addressing this challenge requires algorithms that can extract
locations based on their context in a sentence. We achieved
this partly by using an LLM.
The challenge of extracting spatial features from unstruc-
tured text based on spatial terms led us to realize that there
is a need not only to extract spatial terms but also to cross-
reference other text segments that may indicate a landmark, so
that they serve as a spatial anchor. Thus, to overcome scenarios
with relational terms only, we suggest identifying previously
132
Fig. 2. AR application mock-up scenes (from top to bottom): Top scene,
showing augmentation using graphical markings on relevant building as
extracted from the text, and navigational cue towards next scene; Top middle
scene, indicating general relational position; Bottom middle, indicating the
location of the building; Bottom scene, an example of augmenting the site
with a photograph that is taken from the same location as given to segment.
mentioned locations in other segments that can inform the
landmark of relations.
2) Uncertainty: Locations within manuscripts may be re-
ferred to by different aliases by different sources or even by
the same source. While there is a list of location aliases, this
challenge refers to the context of the alias. Moreover, places
mentioned are often vague and demand further inquiry.
C. Storytelling Challenges
In our review of several methods for sequencing segments,
we maintain that linear and branching narratives should be
used to maintain the original sequence of events, as described
in the text, in order to maintain historical validity and co-
herence. By facilitating new connections between thematically
related and related texts, hyperlinks encourage serendipity and
new perspectives on the site’s heritage.
VI. CONCLUSION
This paper envisions and outlines a generative pipeline of
AR tour of contested heritage sites, and presents an initial
assessment of the framework. The extraction of locations and
spatial relations from unstructured text retrieved from various
archives is identified as a major challenge in a generative
pipeline for AR in contested heritage sites. This challenge
highlights potential and pitfalls across disciplines - from
cultural heritage practices such as history and preservation to
computing, with a necessity to analyze and process natural
language.
Historiographical challenges as we identify them, are mostly
related to authenticity, which stands as a core ethical concern
when writing history - a claim to facts, “faithful to an
original” or a “reliable, accurate representation” [20]. In order
to maintain ”claim to facts” inside an AR user interface,
we suggest maintaining connection to the presentation of
original materials. This can be achieved by keeping a citation
trail, indicating the types of sources, and maintaining original
language as accessible data. Balancing story and fact may be
achieved by foregrounding the original source metadata, this
while using visual and auditory cues to produce an immersive
experience. We found that the design of a user interface in the
AR production phase must foreground the archive to allow
voicing multiple narratives.
From a language analysis perspective, we found that it
is common for spatial entities that appear within historical
manuscripts, to have uncertain and vague boundaries. In such
cases, determining the precise location is not always possible.
Uncertainty of literary worlds underlines the challenge of
spatial extraction from text, where ”the available data are
often rather coarse’ and thus, for visualization, they ‘must be
converted to sharply delineated data; so it could be said, that
a non-existing precision is assumed” [6]. In order to augment
the site with text segments from the archive, we suggest
incorporating degrees of uncertainty, and devising a language
extraction method that can output a range of resolutions, and
levels-of-details.
REFERENCES
[1] T. S. Barker, Time and the Digital: connecting technology, aesthetics,
and a process philosophy of time. UPNE, 2012.
[2] S. F. Anderson, Technologies of History: Visual Media and the Eccen-
tricity of the Past. UPNE, 2011.
[3] J. Challenor and M. Ma, “A review of augmented reality applications for
history education and heritage visualisation,” Multimodal Technologies
and Interaction, vol. 3, no. 2, p. 39, 2019.
[4] Y.-L. Chang, H.-T. Hou, C.-Y. Pan, Y.-T. Sung, and K.-E. Chang, “Apply
an augmented reality in a mobile guidance to increase sense of place for
heritage places,” Journal of Educational Technology & Society, vol. 18,
no. 2, pp. 166–178, 2015.
[5] J. S. Brown, A. Collins, and P. Duguid, “Situated learning and the culture
of learning,” Education Researcher, vol. 18, no. 1, pp. 32–42, 1989.
[6] A.-K. Reuschel and L. Hurni, “Mapping literature: Visualisation
of spatial uncertainty in fiction,” The Cartographic Journal,
vol. 48, no. 4, pp. 293–308, 2011. [Online]. Available:
https://doi.org/10.1179/1743277411Y.0000000023
[7] N. Alexander, “On literary geography,” Literary Geographies, vol. 1,
no. 1, pp. 3–6, 2015.
133
[8] I. Gregory, D. Cooper, A. Hardie, and P. Rayson, “Spatializing and
analyzing digital texts: Corpora, gis, and places,” pp. 150–178, 2015.
[Online]. Available: https://core.ac.uk/download/pdf/161889410.pdf
[9] J. Song, J. Kim, and J.-K. Lee, “Spatial information enrichment using
nlp-based classification of space objects for school bldgs. in korea.”
International Association for Automation and Robotics in Construction
I.A.A.R.C), 5 2019, pp. 415–420.
[10] A. Goyal, V. Gupta, and M. Kumar, “Recent named entity recognition
and classification techniques: A systematic review,” Computer Science
Review, vol. 29, pp. 21–43, 8 2018.
[11] D. Nadeau and S. Sekine, “A survey of named entity recognition and
classification,” Lingvisticae Investigationes, vol. 30, pp. 3–26, 8 2007.
[12] T. Eftimov, B. Korouˇ
si´
c Seljak, and P. Koroˇ
sec, “A rule-based named-
entity recognition method for knowledge extraction of evidence-based
dietary recommendations,” PloS one, vol. 12, no. 6, p. e0179488, 2017.
[13] A. Mansouri, L. S. Affendey, and A. Mamat, “Named entity recognition
approaches,” International Journal of Computer Science and Network
Security, vol. 8, no. 2, pp. 339–344, 2008.
[14] J. Zlatev, “Spatial semantics,” 6 2012. [Online]. Available:
http://oxfordhandbooks.com/view/10.1093/oxfordhb/9780199738632.001.0001/oxfordhb-
9780199738632-e-13
[15] J. Zimmerman, J. Forlizzi, and S. Evenson, “Research through
design as a method for interaction design research in hci,” in
Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems. ACM, 4 2007, pp. 493–502. [Online]. Available:
https://dl.acm.org/doi/10.1145/1240624.1240704
[16] Y. Weiss, “Central european ethnonationalism and zionist binationalism,”
Jewish Social Studies, vol. 11, no. 1, pp. 93–117, 2004.
[17] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal,
A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal,
A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M.
Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin,
S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford,
I. Sutskever, and D. Amodei, “Language models are few-shot learners,”
2020. [Online]. Available: https://arxiv.org/abs/2005.14165
[18] W. Liu, B. Lai, C. Wang, X. Bian, W. Yang, Y. Xia, X. Lin, S.-H.
Lai, D. Weng, and J. Li, “Learning to match 2d images and 3d lidar
point clouds for outdoor augmented reality,” in 2020 IEEE Conference
on Virtual Reality and 3D User Interfaces Abstracts and Workshops
(VRW), 2020, pp. 654–655.
[19] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan,
“A survey on bias and fairness in machine learning,” ACM Computing
Surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021.
[20] S. Varga, Authenticity as an ethical ideal. Routledge, 2012.
134