Content uploaded by Fatma Aladağ
Author content
All content in this area was uploaded by Fatma Aladağ on Nov 28, 2023
Content may be subject to copyright.
1
I. INTERNATIONAL EVLIYA CELEBI SYMPOSIUM:
(5-7 October 2023, Istanbul)
(Preprint V1)
The Potential of GPT in Ottoman Studies: Computational Analysis of Evliya Çelebi’s
Travelogue with NLP and Text Mining and Digital Edition with TEI
Fatma Aladağ
1
Abstract
-century
travelogue of Istanbul through the application of Natural Language Processing (NLP), multi-
model text mining, particularly the use of GPT, and a digital edition based on the Text Encoding
Initiative (TEI). Utilizing co-occurrence and frequency analysis, sentiment analysis through
transformer models like BERT, and topic modeling with Latent Dirichlet Allocation (LDA),
along with Named Entity Recognition (NER) applications, the research tests the technical
boundaries in uncovering semantic and thematic patterns within the travelogue. It specifically
addresses the challenges of integrating computational analysis methods with Ottoman Turkish
and emphasizes the need for developing language models tailored for historical texts,
highlighting the potentials of GPT in this context. Additionally, the study undertakes the
application of TEI standards for the digital edition of the travelogue, addressing the challenges
and potential of reconstituting Ottoman Turkish texts in digital formats. As a combination of
quantitative and thematic analyses, this work offers a deep analysis of the socio-cultural and
historical content of the travelogue. The article aims to contribute to digital humanities and
specifically digital history in Ottoman studies, through the integration of technological methods
in the analysis and understanding of historical and linguistic texts.
Keywords: GPT, LLM, NLP, NER, LDA, Co-occurrence
Analysis, Frequency Analysis, TEI, 17th Century, Digital Humanities, Digital History
Introduction
Digital Humanities open new horizons in the in-depth analysis and understanding of historical
texts in the context of text mining. This interdisciplinary field enables complex analytical
n the rich
linguistic structure of Ottoman Turkish is analyzed with the computational analysis techniques
offered to researchers by digital humanities, in-depth information on the social and cultural
dynamics of historical periods can be obtained.
2
Text mining, which has an important place in
the field of digital humanities and provides high quality information from text, is an
1
2
2
interdisciplinary field that intersects with data science, linguistics and computer science. Text
mining is an approach that focuses on analyzing large volumes of text using advanced
algorithms and statistical methods to extract information from it. However, various
methodologies and algorithms are used to transform the unstructured texts used as the source
of the research into structured data suitable for analysis. This article presents a comprehensive
application and methodology for analyzing the 17th century Ottoman traveler
Istanbul travelogue, which has a wealth of socio-cultural, geographical and historical
but also an important historical document that provides detailed information about the
encyclopedic in scope,
covering descriptions of daily life, architecture, customs and administrative systems of the
Ottoman Empire. The application of text mining to such a corpus requires an in-depth
understanding of the language and context, which is complicated by the fact that Ottoman
Turkish has Arabic and Persian influences.
The unique features of Ottoman Turkish reveal the necessity of language models that can handle
of Ottoman Turkish, standard text mining tools need to be improved. Preprocessing techniques
such as tokenization, lemmatization and tagging should be redefined to accommodate historical
syntax and morphology. Efforts to digitize Ottoman Turkish, which began in the 1990s by
computer scientists, have reached a significant level of development in the field of automatic
transcription and digital edition in 2023 thanks to artificial intelligence.
3
This study discusses
the challenges, limitations and potentials of state-of-the-art text mining techniques when
applied to travelogue texts. Thus, the capacities of existing technological approaches will be
revealed through their application to both the Ottoman Turkish texts converted into Latin script
and the simplified Turkish texts of the travelogue. Thus, the study also aims to provide insights
for the computational analysis and digitization of Ottoman Turkish texts.
travelogue
3
3
Computational Analyses on Travelogue: Application of Multi-Model Text Mining and
NLP Techniques
4
Transformation from Text to Data: Preparing the Travelogue for Analysis and Preprocessing
4
5
6
7
8
Analyzing the Digital Interaction of Tradesmen and Shops
9
4
5
6
7
8
9
5
6
Frequency Analysis of the Travelogue with Named Entity Recognition (NER)
10
11
10
11
7
8
Network Based Semantic Profile of "Istanbul" and "Mahalle" (Neigborhood): Co-
occurrence Analysis
12
13
12
13
9
10
11
12
Topic Modeling of the Travelogue Using the Latent Dirichlet Allocation (LDA)
14
15
14
15
13
16
16
14
Istanbul, Galata and Üsküdar: Sentiment Analysis of Spaces
17
17
15
18
19
18
19
16
17
A Regex-Based Analysis of the Travelogue: Phrases and Reduplications
20
20
18
Designing a Digital Edition of the Travelogue with Text Encoding Initiative -TEI
21
22
21
22
19
23
23
20
24
24
21
25
Conclusion
25
22
Bibliography
23
24