ArticlePDF Available

Exploring Gulf Manumission Documents with Word Vectors

Authors:

Abstract and Figures

In this article we analyze a corpus related to manumission and slavery in the Arabian Gulf in the late nineteenth- and early twentieth-century that we created using Handwritten Text Recognition ( HTR ). The corpus comes from India Office Records ( IOR ) R/15/1/199 File 5 . Spanning the period from the 1890s to the early 1940s and composed of 977K words, it contains a variety of perspectives on manumission and slavery in the region from manumission requests to administrative documents relevant to colonial approaches to the institution of slavery. We use word2Vec with the WordVectors package in R to highlight how the method can uncover semantic relationships within historical texts, demonstrating some exploratory semantic queries, investigation of word analogies, and vector operations using the corpus content. We argue that advances in applied computer vision such as HTR are promising for historians working in colonial archives and that while our method is reproducible, there are still issues related to language representation and limitations of scale within smaller datasets. Even though HTR corpus creation is labor intensive, word vector analysis remains a powerful tool of computational analysis for corpora where HTR error is present.
Content may be subject to copyright.
Published with license by Koninklijke Brill󰎛󰃊󰎛󰂹󰆼󰆻󰂺󰆼󰆼󰇁󰆾󰃈󰆽󰇂󰇂󰆾󰆽󰆾󰇁󰆾󰄀󰆻󰆻󰆻󰆻󰇀
󰅺󰎛󰂶󰆽󰆻󰆽󰆿
This is an open access article distributed under the terms of the 󰆗󰂺󰆓󰂺
󰂺󰃈

󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Exploring Gulf Manumission Documents
with Word Vectors
Suphan Kirmizialtin | 󰂹󰆻󰆻󰆻󰆻󰄀󰆻󰆻󰆻󰆼󰄀󰇀󰆻󰆽󰆻󰄀󰆻󰇀󰇂󰇃
󰂶󰂶

suphan@nyu.edu
David Joseph Wrisley | 󰂹󰆻󰆻󰆻󰆻󰄀󰆻󰆻󰆻󰆽󰄀󰆻󰆾󰇀󰇀󰄀󰆼󰆿󰇃󰇂
󰂶󰂶

Corresponding author
djw12@nyu.edu
Received 7 󰆕󰆓󰆕󰆗󰃊Accepted 26 󰆕󰆓󰆕󰆗󰃊
󰆕󰆚󰆕󰆓󰆕󰆗
Abstract
In this article we analyze a corpus related to manumission and slavery in the Arabian
Gulf in the late nineteenth󰄀 and early 󰄀󰄀
written Text Recognition 󰃍󰃎󰂺   󰃍 󰃎
R/15/1/199 File 5󰂺󰆔󰆛󰆜󰆓󰆔󰆜󰆗󰆓
of 977K word󰂶 it contains a variety of perspectives on manumission and slavery in the
region from manumission requests to administrative documents relevant to colonial
approaches to the institution of slaver󰂺 We use word2Vec with the WordVectors pack󰄀
age in R to highlight how the method can uncover semantic relationships within his󰄀
torical text󰂶 demonstrating some exploratory semantic querie󰂶 investigation of word
analogie󰂶 and vector operations using the corpus conten󰂺 We argue that advances in
applied computer vision such as  are promising for historians working in colonial
archives and that while our method is reproducibl󰂶 there are still issues related to
language representation and limitations of scale within smaller dataset󰂺
 corpus creation is labor intensiv󰂶 word vector analysis remains a powerful tool of
computational analysis for corpora where  error is presen󰂺
󰆽
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Keywords
Handwritten Text Recognition 󰃍󰃎󰄍 word vector models 󰃍󰃎󰄍  
Records 󰃍 󰃎󰄍 manumission󰄍 Gulf Studies󰄍 colonial archives󰄍 slavery
1 Introduction
Persian Gulf archival record󰂶 shaped by colonial processes of collection and
preservatio󰂶 are scattered across global institution󰂶  
asymmetries in power that continue to determine which narratives remain
accessibl󰂺󰇟 Amid this fragmented and politically charged archival landscap󰂶
the Qatar Digital Library 󰃍󰃎 stands as a critical access point for Gulf histo󰄀
󰂺 Within
this expansive digital repositor󰂶󰄍File 5
Slave Trade 󰃍󰃈󰃈󰆼󰇀󰃈󰆼󰃈󰆼󰇄󰇄󰄀󰆽󰆾󰆿󰃎󰃍󰃎󰄍󰄀
prising approximately 1󰆗󰂶󰆓󰆓󰆓 pages related to manumission and slavery in the
Gulf during the late nineteenth and early twentieth centuries 󰃍ereafte󰂶 
󰆘󰃎󰂺󰂶 including manumission requests recorded by British
󰂶󰄀
dow into the lived experiences of servitude as well as the colonial discourse
surrounding liberatio󰂺
Responding to scholarly call󰂶 such as those by Zdanowski 󰃍󰆕󰆓󰆔󰆔󰃎󰂶󰄀
cal engagement with manumission documents to uncover embedded forms of
violence and powe󰂶 we have created a 󰄀 corpus of  File 5
using Handwritten Text Recognition 󰃍󰃎 technolog󰂺
vector analysis to map thematic structures across the corpus and close reading
to explore contextual layers within individual narrative󰂶 our approach bridges
computational and traditional textual analysi󰂺 In balancing these computa󰄀
tional and interpretive method󰂶 we follow an approach informed by existing
scholarship on close and distant readin󰂶 where iterative engagement with the
text and the computational model helps illuminate both 󰄀 patterns
󰂶 nuanced narratives 󰃍󰆕󰆓󰆔󰆜󰂷 󰆕󰆓󰆕󰆕󰂷 󰆕󰆓󰆔󰆜󰃎󰂺
This dual approach allows us to examine the language of manumission with a
󰂶 advancing the study
of Gulf histories within this comple󰂶 distributed archival framewor󰂺
󰆞 
Gulf regio󰂶 see 󰆕󰆓󰆕󰆓󰂺
󰆾
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
2 Word Vectors for the Study of Historical Corpora
In this articl󰂶 we employ word vector analysis to examine our collection of
historical document󰂺 Word vector analysis is a computational technique that
creates mathematical representations of individual words as vectors in a con󰄀
tinuous vector space derived from the corpu󰂺 This approach approximates
  “spatial analogy to the
relationship between words” 󰃍󰆕󰆓󰆔󰆘󰃎󰂺
words appearing in similar contexts tend to have similar or related meanings
󰃍erheul  󰂺 󰆕󰆓󰆕󰆕󰃎󰂺  
between words without incorporating external knowledge about their seman󰄀
tic󰂺 It is important to note that while similar vector contexts can indicate
similar meaning󰂶 proximity in a word vector model does not always imply
synonymit󰂷 it might also indicate antonym󰂶 abbreviation󰂶 or other instances
of words that appear in similar context󰂺
In our stud󰂶        󰆕 󰂶
 󰃍chmidt and  󰆕󰆓󰆔󰆚󰃎󰂶  
Women’s     󰂺󰇠 Although 󰄀
vector models are available for analyzing text collections in various languages
󰃍ennington 󰂺󰆕󰆓󰆔󰆗󰂷󰆕󰆓󰆕󰆕󰃎󰂶󰄀
istics of our historical corpus lend themselves to 󰄀 model󰂺 The
        
spac󰂶 validating i󰂶 making queries based on semantic relationship󰂶 investi󰄀
gating word analogie󰂶 and performing vector operations using our colonial
corpus conten󰂺 We will elaborate on these operations late󰂷 howeve󰂶 it is
important to note tha󰂶 like many other digital historian󰂶 we use this method
for exploratory data analysis 󰃍󰃎󰄍󰄀
d’s limitations for similar corpor󰂺
Such exploration requires familiarity with both the topical content of the cor󰄀
pus and the computational parameters of its creatio󰂶 enabling a deeper inves󰄀
tigation into the interrelated concepts it contain󰂺
3 Source Material
The archival collections at the heart of our research are sourced from the exten󰄀
󰂶 held in London by the British Librar󰂶 now digitized
󰆟   󰂶 Sarah Connell and the others in digital scholarship at the

the context of their Advanced Topics in the Digital Humanities semina󰂺
󰆿
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
and made publicly available by the 󰂺󰂶 our focus lies on
the records of the British Residency in the Persian Gulf that pertain to slavery
and manumission in the regio󰂺󰇡󰆔󰆛󰆜󰆓󰆔󰆜󰆗󰆓󰂶
these documents provide insights into manumission as well as the experiences
of enslaved or indentured individuals in the Gulf region during a time of sig󰄀
    󰂺 The collection includes manumission
requests as well as a variety of administrative records related to the slave trad󰂶
󰂶 regulation󰂶 procedures concerning manumis󰄀
sio󰂶 and correspondence detailing the fates of freed individual󰂺 They also
contain a vast array of additional documentatio󰂶 contextualizing the complex
institution of slavery and its integral role within the 󰄀 fabric of
the Indian Ocean world in general and the Arabian Peninsula in particula󰂺
The manumission statement󰂶 a key source in this collectio󰂶 were typically
created through a process shaped by the constraints and practices of colonial
administratio󰂺 Most applicants were illiterate and communicated their situ󰄀
ations orall󰂶 typically in Arabi󰂶 to assistants at British agencies or consulates
across the regio󰂺 The assistants would then translate the statements into
󰂺󰄀
cant’s place of birth and origi󰂶 followed by a brief account of their enslave󰄀
      󰂺 These
󰂶
typically validated by the applicant’s thumbprint as a mark of authenticity
󰃍󰆕󰆓󰆔󰆔󰃎󰂺
It is crucial to acknowledge that these documents are heavily mediated
through the colonial len󰂺 
people come throug󰂶       󰂶 who
recorded and translated accounts using formulaic language and selecting
information deemed relevant to colonial administratio󰂺 The power dynamics
embedded in the creation and preservation of these documents often obscure
the full experiences of enslaved individual󰂶 imposing silences and reinforcing
the priorities of colonial authoritie󰂺
󰄍
bleeding in󰂶 especially evident in typewritten text󰂶 to others left intentionally
blan󰂶 likely to prevent ink transfer between page󰂺 The documents’ complex
layou󰂶         󰂶
󰂺
󰆠 Appendix 󰂺 For more on the records
󰂶 see “The Political Residenc󰂶 Bushir󰂺
󰇀
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
4 Historical Background: Slavery in the Gulf
In the history of the Indian Ocean worl󰂶 the phenomenon of slavery stands
out for its complexity and persistenc󰂶 particularly in the age of abolitio󰂺
Despite the British Parliament’s󰆔󰆛󰆓󰆚󰂶
and subsequent naval patrols around the African coast to curb the shipments
of enslaved peopl󰂶 the trade in the Indian Ocean world not only persisted bu󰂶
in some area󰂶󰃍󰆕󰆓󰆔󰆖󰂶 2󰆖󰃎󰂺󰇢 This surge was partly due to

rerouting their operations through the Mozambique Channel into the Indian
Ocean zon󰂶  󰃍 󰆕󰆓󰆔󰆖󰂶
18󰆚󰃎󰂺󰇣 It was also due to the rise of plantation economies in Zanzibar and
󰆡              
regarding slaver󰂶 Campbell write󰂹 “During the late 󰄀 imperialist surge
in the Indian Ocean Worl󰂶         
 󰂺 Moreove󰂶 ‘liberated’ slaves were a potentially vital source of both
taxation and manpower under colonial regimes governed by precepts of 󰄀󰂺 A
colonial priority was thus to transform the local working population into a free 󰄀
forc󰂺󰄀 elite󰂶 whose assistance
was required to administer the colon󰂺󰃍󰆕󰆓󰆔󰆖󰂶󰆖󰆗󰃎󰂺
󰆢 󰂹 “These
󰂶 when added to those on Britis󰂶 Dutc󰂶 Frenc󰂶 and Portuguese slave trading within
the Indian Ocean noted earlie󰂶 
󰆼 Sample pages from the volume 󰆘󰃈󰆔󰆜󰆓 Manumission of slaves at
Musca󰂹
󰂺 Depicted here are
149󰄍󰆔51r 󰃍󰆖󰆓󰆙󰄍󰆖1󰆓󰃎󰂺
Government Licens󰂺
󰇁
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Pemb󰂶 operated predominantly by Omani Arabs with slaves brought from the
󰃍󰆕󰆓󰆔󰆖󰃎󰂺
The Gulf regio󰂶 interconnected with the broader Indian Ocean worl󰂶 expe󰄀
rienced its particular dynamics of slavery and manumission against the back󰄀
drop of economic and colonial pressures of the nineteenth and early twentieth
centuries 󰃍 󰆕󰆓󰆔󰆘󰃎󰂺         
economic and social fabric of the regio󰂶 particularly the pearl diving indus󰄀
tr󰂶         
󰃍󰆕󰆓󰆔󰆚󰃎󰂺 󰂶 enslaved men were utilized in diverse
capacities as mercenarie󰂶 agricultural hand󰂶 and other forms of manual labo󰂺
 󰂶 on the other han󰂶 primarily served as domestic servants
and concubine󰂶 highlighting the 󰄀 dependence on enslaved labor
within both the economic and domestic spheres of the Gulf societies where
general emancipation did not take place until the 󰄀 century
󰃍󰆕󰆓󰆔󰆔󰂶 86󰆗󰃎󰂺󰇤
As evident from the information abov󰂶 the questions of slavery and manu󰄀
mission in the Gulf region are complex and 󰄀󰂺 Thomas McDow
points out that the status of slaves in this particular context cannot be fully
understood through a simple “slave versus free dichotomy” as the enslaved
people “existed in hierarchies of dependenc󰂺” The rights and circumstances of

performe󰂺
independence or full freedom due to the obligations their masters had towards
the󰂶 shaped by Islamic and local custom󰂶 whereas plantation slaves were
rarely emancipate󰂺 Many enslaved individuals sought to improve their social
        󰂺 For
som󰂶 remaining a slave client under a powerful patron was economically and
socially more advantageous than achieving complete freedom 󰃍cDo󰂶󰆕󰆓󰆔󰆖󰃎󰂺
󰄀
plex and 󰄀 as the institution of slavery itself 󰃍󰆕󰆓󰆔󰆘󰂷 Bishara
󰆕󰆓󰆔󰆚󰃎󰂺󰂺
Islamic teachings emphasize the moral and religious virtues of freeing slave󰂶
and the practice of manumission was deeply embedded in the region’s cul󰄀
󰃍󰆕󰆓󰆔󰆔󰂶 86󰆙󰃎󰂺
Interestingl󰂶        
transportation of at least 82󰆙󰂶󰆓󰆓󰆓 and perhaps as many as 󰆔󰂶󰆓8󰆜󰂶󰆚󰆓󰆓
󰆔󰆘󰆓󰆓󰆔󰆛󰆘󰆓󰂺󰃍󰆕󰆓󰆔󰆖󰂶 18󰆛󰃎󰂺
󰆣 󰆔󰆜󰆓󰆓󰂶 Zdanowski estimates that enslaved individuals made up approximately
1󰆗󰂺󰆘󰈱 of the Gulf ’s populatio󰂶󰆖󰆙󰂶󰆛󰆛󰆓󰆕󰆘󰆖󰂶󰆓󰆓󰆓󰂺
󰇂
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰂺 Since individuals freed by British
authorities were not emancipated according to Islamic law and prevailing social
convention󰂶 they were often perceived as merely being purchased by colonial
authoritie󰂺 Consequentl󰂶 these freed individuals were sometimes referred to
as “slaves of the government” or “slaves of the Consul” 󰃍󰆕󰆓󰆔󰆔󰂶 86󰆙󰃎󰂺
Another issue complicating manumission practices was the diverse origins
of enslaved people in the Gul󰂺 In manumission statement󰂶 individuals were
required to identify their ethnic or geographic origin󰂶 often revealing three
main categorie󰂹 those kidnapped from British colonie󰂶 those originating from
  󰂶 and those born into slavery to par󰄀
ents who might have been born as free people but who had been kidnapped
or brought to the Gulf from the former two categorie󰂺󰇥 This distinction was
crucia󰂶󰆔󰆛󰆖󰆖
for those who could prove their origins in British colonie󰂺 Howeve󰂶 for indi󰄀
viduals from protectorates and those born into slaver󰂶 the situation was more
comple󰂶 as their status did not automatically guarantee manumissio󰂺
Throughout the period under study her󰂶
󰂶 striking agreements with local authorities
to curtail slave import󰂶 and advocating for individual cases of emancipatio󰂺
Howeve󰂶
implement 󰄀 abolitio󰂶 recognizing the importance of these leaders
as allie󰂺 The British were concerned that pushing too hard for abolition could
incite political unrest and rebellion against these local authoritie󰂶 thereby
destabilizing the region 󰃍󰆕󰆓󰆔󰆖󰃎󰂺
In the Trucial Coas󰂶󰆔󰆜󰆖󰆓󰄍
  󰂶 the advent of cultured pearl
 󰂶    󰄍  
the lives of slave󰂶 prompting many to seek manumissio󰂺 There was a sudden
󰆔󰆜󰆖󰆘󰂶 with the overwhelming major󰄀
ity of applications coming from men employed on pearling boats 󰃍
󰆕󰆓󰆔󰆔󰂶 87󰆚󰃎󰂺󰇦 Their primary complaints included inadequate food and clothing
provided by their master󰂶 being forced to div󰂶 and not receiving payment for
their wor󰂺
  
provided emancipated individuals with some protection and a means to
󰆤     󰂶  records make mention of enslaved people from
󰂶 Baluchistan and Indi󰂺
󰆥 󰆜󰆘󰆙
the Gulf for the years between 1921 and 194󰆙󰂶󰆘󰆗󰆘󰆔󰆜󰆖󰆗󰂺
󰇃
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
seek assistance against 󰄀 or mistreatmen󰂶 British policy was
not aimed at dismantling the institution of slavery in the region as a whol󰂺
Instea󰂶 the gradual abolition of slavery in the Gulf unfolded over the early to
󰄀 century 󰃍󰆕󰆓󰆔󰆔󰂶 88󰆓󰃎󰂺
5 
 File 5 involved transforming the digitized documents
available at the  into computable plain text and then analyzing them with
the computational technique of word vector󰂺 Three critical aspects of our doc󰄀
umentary base are essential to highlight in this contex󰂹 the discursive diversity
of document󰂶 their scal󰂶 and their multilingual natur󰂺 These factors are cru󰄀
       󰂶 from select󰄀
ing the appropriate  models and determining the parameters for training
the word vector models to interpreting our result󰂺 The pipeline developed for
󰂷 yet it is presented here in detail to
facilitate adaptation to analogous scenario󰂶 highlighting the versatility of 
and computational analysis frameworks in historical researc󰂺
5.1 HTR for the Automated Recognition of Documents
In digital humanitie󰂶 the transition from traditional archival research to com󰄀
putational analysis of digitized archival documentation presents numerous
opportunities and challenge󰂺 The potential of algorithmic reading and sum󰄀
    󰄍󰄀
󰄍󰂶 given the
complexities of the documents at hand and the relative immaturity of digital
archival system󰂺 Our project to 󰄀  File 5 occupies a critical
position in this transitio󰂹󰄀
ing the archives of the 󰂶 comparatively little attention was given to prepar󰄀
ing them as fully 󰄀 document󰂺󰇧
The 󰄀 volumes of manumission material in our corpu󰂶 like many
historical archive󰂶 are an assemblage of various types of documentatio󰂺
Browsing them is akin to reading a scrapboo󰂹 a heterogeneous collection of
materials loosely connected by subjec󰂶 yet presented in idiosyncratic layout󰂶
in both handwritten and typewritten format󰂶 and across multiple language󰂺
󰆦 󰂶 typewritten documents have been processed using Optical
Character Recognition 󰃍󰃎󰂶   
page level for browsin󰂺 This is not the case for the handwritten page󰂺
󰇄
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
To transform these historical documents into computationally tractable tex󰂶
the most practical technology that we had at our disposal for archival docu󰄀
ments was 󰂺󰇟󰇞 For this projec󰂶 we employed 󰄀󰄀󰄀  technol󰄀
ogy available via the Transkribus platfor󰂺
The choice of Transkribus was a pragmatic on󰂺 Recent advancements in
Transkribus 
󰂶
that wa󰂶 until recentl󰂶 out of reac󰂺 Transkribus has introduced ‘super mod󰄀
els’ powered by transformer  architectur󰂶 enabling the ing of docu󰄀
󰂺
As the Transkribus user community has historically been focused on archives
󰂶 these super models have been trained on vast
          󰂺 This
allowed us to process  File 5󰂶 the lion’s share
󰂶 both in the original and
in translatio󰂺
Prior to the advent of these advanced model󰂶 transcribing multilingual
archives would have necessitated the creation of 󰄀 bespoke
 model󰂶 customized to accommodate the diverse handwriting styles
and the mixture of handwritten and printed documents in our collectio󰂺
Language detection would have had to be integrated into layout analysis in
a computationally expensive proces󰂺 The introduction of 󰄀󰂶
general models has streamlined such researc󰂶 enabling us to transcribe all of
the 󰄀language documents in our collection simultaneously and rather
quickl󰂶 though not without challenges related to transcription accuracy and
error management 󰃍󰆕󰆓󰆕󰆗󰃎󰂺󰇟󰇟
Our choice to work with 
from other  projects that focus on creating very clean text with bespoke
trainin󰂺 While we recognize that bespoke  models potentially yield higher
accuracy rates in the transcription outpu󰂶
scholarly value in using general model󰂶 especially when working with larg󰂶
󰆞󰆝 The distinction between digitized and 󰄀 documents is crucial for any󰄀
one interested in digital humanities method󰂶 particularly for scholars working with
Arabic or other 󰄀 script writing system󰂺 Achieving computer readability means
that each word and character string within the documents can be processed computa󰄀
tionall󰂶 as opposed to merely being displayed as static digital image󰂺
󰆞󰆞 In this stud󰂶 we utilize the general model named Text Titan available to us at the time
of writin󰂷 this model was trained on historical documents in Germa󰂶 Frenc󰂶 Dutc󰂶
Finnis󰂶 Swedis󰂶  󰂶 spanning from the 16th to the 21st centur󰂺 Text Titan
󰆕󰂺󰆜󰆘󰈱󰂺
󰆼󰆻 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰄀 corpor󰂺
text analysi󰂶 which prioritizes the exploration of 󰄀 patterns and
themes within the corpus over perfect accuracy in text captur󰂶 with analytical

󰃍ordell and Smit󰂶󰆕󰆓󰆕󰆕󰃎󰂺󰇟󰇠
At the same tim󰂶 we recognize the potential downstream challenges of
using general  model󰂺 The transcription output generated by 󰄀󰄀
  󰄀
pus can be employed for distant reading and conceptual analysi󰂺 Two key
issues in this regard warrant mention at the outset as they shape our approach
to the corpu󰂷 additional considerations will emerge in the analysis belo󰂺
Firs󰂶 the process of corpus creation using  involves 󰄀 or
unsupervised steps that predict the words on the pag󰂶 sometimes resulting
in misspelling󰂶 particularly of proper names 󰃍apan 󰂺 󰆕󰆓󰆕󰆖󰂷 Zhou 󰂺
󰆕󰆓󰆕󰆗󰃎󰂺󰂶 orthography in  File 5 is inherently unstabl󰂺 To ana󰄀
󰂶 it is essential to employ text analysis methods
that are not reliant on strict string matchin󰂶 accounting for both historically
󰄀 spelling and transcription artifacts introduced by 󰂺
An additional challenge lies in the page layou󰂶 abbreviation󰂶 and hyphen󰄀
ation present in the manumission documentatio󰂺 Our corpus includes vari󰄀
ous types of document󰂶 many of which are handwritten in an informal styl󰂺
Common word󰂶 title󰂶 and metrological terms are frequently abbreviate󰂶
often inconsistentl󰂺 Moreove󰂶 clerks copying handwritten documents broke
words at syllable boundarie󰂶 sometimes inserting hyphens and other times
󰂺 As a
resul󰂶 the corpus contains numerous orthographic variance󰂶 error󰂶 and frag󰄀
mented word󰂺 The implications of these issues are addressed in the analysis
section belo󰂺
Secon󰂶 as of the time of writin󰂶 the Transkribus user community includes
very few groups working with Arabic tex󰂶 with only one Arabic public model
currently availabl󰂺󰇟󰇡󰂶 the 󰄀
the󰄀shelf transformer models used to create our corpus of  File 5 were not
󰆞󰆟 󰄀 approaches are sometimes aligned with 󰄀  techniques such
as 󰄀󰄀 tagging and lemmatizatio󰂺 Howeve󰂶 the impact of applying these
󰄀 techniques to text generated by 󰂶 which can be messy and inconsisten󰂶 is
not yet fully understoo󰂺
methods on 󰄀 tex󰂺
󰆞󰆠 At  Abu Dhabi we have accumulated ground truth for a public handwritten Arabic
mode󰂺 We have been combining crowdsourcing approaches with synthetic data for its
creatio󰂺 See 󰆕󰆓󰆕󰆖 and  Abu Dhabi  Working 󰆕󰆓󰆕󰆗󰂺
󰆼󰆼
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
trained on 󰄀 material󰂺 Consequentl󰂶 pages written in Arabic
or containing Arabic segments remain untranscribe󰂺 Although the absence
of 󰄀 transcription might seem problematic for an archive con󰄀
taining Arabic material󰂶 the distant reading approach adopted for  File 5
mitigates this issue to some exten󰂺 Many of the Arabic texts in our corpus
 󰂶 allowing the

they are missed in Arabi󰂺 Howeve󰂶 even if the Arabic texts were transcribe󰂶
additional challenges would arise when working with multilingual corpor󰂶
including the training of word vector model󰂺 These issue󰂶 which extend
beyond Arabic text󰂶 are discussed further in 󰆘󰂺󰆕󰂺
These challenges highlight the broader implications of working with gen󰄀
eral  models for multilingual corpor󰂺 The model used to transcribe 
󰆘󰂶   󰂶  
training se󰂶 resulting in inconsistent transcriptions for other languages pres󰄀
ent in the collectio󰂺 For exampl󰂶 while the model captures some French
and German words found in the British material󰂶 it frequently misinterprets
Arabic handwritten text as Frenc󰂺 While setting a frequency threshold dur󰄀
       
word󰂶 such inconsistencies in transcription still pose challenges for querying
󰂺
Had we opted for bespoke  model󰂶 some of the issues highlighted
abov󰂶 such as handling abbreviations and hyphenated word󰂶 might have
been mitigated through approaches like the “smart model” strategy 󰃍
󰆕󰆓󰆕󰆕󰃎 or post󰄀processing technique󰂺 Howeve󰂶 the size of the corpus made
such post󰄀transcription corrections impractica󰂶 both in terms of time and
resource󰂺 Given the sheer volume of material and our decision not to pursue
extensive editin󰂶 conducting word vector analysis directly on the  out󰄀
󰂺 To extract
meaningful insights and use the imperfect  outputs for historical interpre󰄀
tatio󰂶 it remains essential to understand both the complexities of the archival
materials and the inherent imperfections of the trained word vector model󰂺
These issues will be explored further in 󰆙󰂺
5.2 Training Word Vector Models with the HTR Output
 documents into computer󰄀
readable󰂶 processed and exported by the  system volume by
volum󰂶 as detailed in Appendix 󰂺󰂶 we custom󰄀trained Word
Vector Models 󰃍󰃎     󰂺 This approach
diverges from many  projects that rely on 󰄀 embeddings like
󰆼󰆽 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰂶 which are optimized for modern language task󰂺 Scholars have high󰄀
lighted the enduring relevance of static embedding models such as word2Vec
for digital humanities researc󰂶 particularly in tracing the evolution of ideas
and language over time 󰃍hrmanntraut 󰂺󰆕󰆓󰆕󰆔󰃎󰂺
the particular needs of our projec󰂶 where understanding the historical speci󰄀
󰂺
Several factors informed our decision to train a custom 󰂺 Firs󰂶 the his󰄀
torical context of these documents required an approach tailored to our cor󰄀
pu󰂶 as 󰄀 embeddings based on modern language were less suitable
󰂺
Secon󰂶 the corpus contains numerous proper names of persons and places
from the Gulf regio󰂶 which are transliterations from Arabic script and exhibit
orthographic instabilit󰂺 Capturing these was essential to the topical analy󰄀
sis of slavery and manumissio󰂺 Finall󰂶 a custom  was also necessary to
address abbreviations and truncated word󰂶 as discussed in 󰆘󰂺󰆔󰂺
With more than 1󰆓󰂶󰆓󰆓󰆓 pages 󰃍xcluding blank one󰃎󰂶 File 5 represents
a substantial corpus for traditional historical readin󰂺 The  output for the
󰄀 pages alone resulted in approximately 97󰆚󰂶󰆓󰆓󰆓 󰄍 
󰂺󰇟󰇢 Howeve󰂶 as an experiment in 
training and analysi󰂶 this corpus falls at the lower end of what is typically rec󰄀
ommended for generating stable embeddings and conducting longitudinal
studie󰂶 such as tracing conceptual history or semantic change 󰃍evers and
󰆕󰆓󰆕󰆓󰃎󰂺
Moreove󰂶 creating a  for a corpus is not a 󰄀󰄀󰄀 proces󰂷
instea󰂶         
parameter󰂺󰇟󰇣 Our research on training s for  output from colonial

of evidence within the dat󰂺 Put another wa󰂶 we paid particular attention to
s that responded to
typical querie󰂶
models surface󰂺 Whereas there are a fair number of possible parameters in
the wordVectors packag󰂶 some of our most interesting results were found by
󰆞󰆡 Blank pages were not processed with Transkribu󰂶 nor were 󰄀 page󰂺 Some
󰂺
󰆞󰆢 Detailed information on the parameters used for model training can be found in the
documentation at http󰂹󰃈󰃈dr󰂺󰃈ithu󰃈mschmid󰃈ordVector󰃈a󰃈rain_word2vec
󰂺tml󰂺 A lesson for an analogous word2Vec method in Python has been published by the
Programming Historia󰂶 including a discussion of parameters 󰃍lankenship 󰂺󰆕󰆓󰆕󰆗󰃎󰂺
󰆼󰆾
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
varying the training using 󰄀󰂶 whic󰂶 in the en󰂶 allowed us to access
󰂺 For exampl󰂶 we found that training the  for
󰄀 󰄀 was particularly useful for word groupings common in the
narrative and formulaic manumission requests or for Arabic idafa construc󰄀
tion󰂺 By contras󰂶 󰄀word 󰄀 provided access to a more conceptua󰂶
abstract vocabulary found in administrative and political documents of the
collectio󰂺
For this stud󰂶 we have trained multiple models using the same 977󰄀
corpu󰂶󰂺 We have listed those models and their
parameters in Appendix 󰂺 These variations allowed us to examine how dif󰄀
         󰄀
el󰂺 In the following sectio󰂶 we discuss querying these models to illustrate
salient observation󰂺
6 Observations and Results: Querying the Models
The wordVectors package enables querying the s trained on one’s own
corpora and facilitates the exploration of the corpus and the variety of com󰄀
mon contexts for words found therei󰂺     󰄀
cantly faster and less 󰄀 step than the previous two steps of
generating textual transcriptions with  and training the 󰂺 In other
word󰂶 once the initial steps of transcription and training are complet󰂶 itera󰄀
󰂺
It is worth highlighting agai󰂶 howeve󰂶      
which one can query the s depend on the historian’s 󰄀 knowl󰄀
edge of the corpus and the complexity of the discourses it contain󰂺 Querying
a mode󰂶 in other word󰂶 is ideally aligned with informed historical inquir󰂺
Moreove󰂶 experimenting with training parameters can also sometimes pro󰄀
    󰂶 yielding results that are not 󰃍asil󰃎
interpretabl󰂺
 analysis has some notable limitation󰂺 Firs󰂶 an instance of any given
󰄀 in a  is a highly abstract concep󰂶     󰄀
onyms or fail to account for nuanced distinctions present in the original tex󰂺
Additionall󰂶 once a  is create󰂶 the ability to contextualize terms through
direct readin󰂶 as provided by a 󰄀󰄀 󰃍󰃎 approac󰂶 is los󰂺
For researchers accustomed to wildcard searching in database󰂶 querying a
󰂺 Howeve󰂶
when combined with string searching and other natural language processing
󰆼󰆿 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
technique󰂶 querying a  can become a powerful method for conducting
distant conceptual readin󰂶󰂺
We emphasize the importance of “learning to query” a  because the

󰂺 These models encourage histori󰄀
󰄍󰄍
critically engaging with forms of visualization to uncover insights and identify
new terms for exploratio󰂺 In the sections that follo󰂶 we discuss three key
analytic frameworks for understanding and working with 󰂹 cosine simi󰄀
larit󰂶 vector arithmeti󰂶 and 󰄀 analysi󰂺
6.1 Cosine Similarity of Query Terms
One of the simplest queries to perform with a  involves measuring the
cosine similarity of words or 󰄀󰂺 Cosine similarit󰂶 represented as a
value between 󰆓 and 1 󰃍       󰄀  
󰆔󰃎󰂶
set of term󰂺 High cosine similarity in a corpus does not necessarily imply a

between string󰂺
In the query demonstrated in 󰆔󰂶 we look at a trope of food and resource
deprivation found in the numerous requests for manumission made to British
󰂶 a query that relates to Zdanowski’s obser󰄀
vation about an increase in manumission requests during the collapse of the
󰆔󰆜󰆖󰆓󰂺 The results of the query for expressions similar
to the 󰆕󰄀 gram “st_food” reveal a strong contextual relationship
with 󰄀󰂶 physical violenc󰂶 deprivation of food and clothin󰂶 as well
as 󰄀 or seizure of earning󰂺 These common motifs recur through󰄀
out the many hundred manumission requests documented in our 󰄀
corpu󰂶 highlighting the conditions under which such documents presented to
󰂺󰇟󰇤
Additionall󰂶  󰆗󰆘  󰆗󰆛    󰃍“each_season” and “diving_
every_year”󰃎󰂺 This
particular query also reveals two artifacts of the  󰃍osition 1󰆖󰂹 “my carnings”
󰆞󰆣 Despite the closeness of this query to Zdanowski’s observatio󰂶 we did not develop our
queries directly from the critical literature on enslavemen󰂶 but rather from our selective
prior reading of the corpus combined with a list of results from a number of exploratory
cosine similarity querie󰂺 It is important to underscore that this kind of querying process
in a corpus will not reveal much of the critical vocabulary used by modern historian󰂶 but
returns a 󰄀 reading of the terms used in the corpu󰂺
󰆼󰇀
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰆼 󰆘󰆓󰆔󰄀󰂶󰆕󰄀󰂶󰆖󰄀 or 󰆗󰄀 grams for the expression “s󰄀
cient_food” based on a word vector 󰃍󰃎 calculated using our  Manumission corpu󰂺
The model has been trained for 󰆔󰄀󰂹󰆔󰆘󰆓󰂶󰆕󰆓󰂶 6 word window with
negative sampling of 󰆘󰂺
Ranking n-gram Cosine
similarity
Ranking n-gram Cosine
similarity
1t_food 󰆞26 giving_me 󰆝󰂺󰆣󰆢󰆝󰆦󰆟󰆦󰆤󰆞󰆟󰆣
2 never_gave_me 󰆝󰂺󰆥󰆢󰆟󰆡󰆝󰆦󰆢󰆦󰆤󰆦 27 was_always_
illtreating_me
󰆝󰂺󰆣󰆡󰆢󰆢󰆤󰆟󰆤󰆣
󰆖 t_food_and_
clothing
󰆝󰂺󰆥󰆠󰆞󰆞󰆢󰆠󰆟󰆡󰆤󰆣 28 giving_me_much_
trouble
󰆝󰂺󰆣󰆡󰆞󰆟󰆦󰆞󰆞󰆥󰆦󰆤
4 i_therefore_managed 󰆝󰂺󰆤󰆦󰆥󰆦󰆟󰆡󰆤󰆝󰆝󰆢 29 wear 󰆝󰂺󰆣󰆠󰆦󰆢󰆠󰆟󰆠󰆟󰆟󰆟
5 enough_food 󰆝󰂺󰆤󰆦󰆞󰆝󰆦󰆡󰆣󰆥󰆟󰆡 󰆖󰆓 me_every_year 󰆝󰂺󰆣󰆠󰆟󰆤󰆦󰆣󰆝󰆢󰆝󰆠
6was_not_supplying 󰆝󰂺󰆤󰆡󰆥󰆢󰆞󰆤󰆡󰆡󰆢󰆠 󰆖󰆔 beating_me 󰆝󰂺󰆣󰆟󰆦󰆢󰆠󰆢󰆤󰆝󰆥󰆢
7 given_st_food 󰆝󰂺󰆤󰆡󰆤󰆝󰆟󰆣󰆞󰆥󰆢󰆤 󰆖󰆕 illtreat 󰆝󰂺󰆣󰆟󰆞󰆢󰆞󰆞󰆠󰆠󰆢󰆡
8 all_my_earnings 󰆝󰂺󰆤󰆞󰆥󰆝󰆞󰆤󰆠󰆝󰆤󰆠 󰆖󰆖 without_any_
reason
󰆝󰂺󰆣󰆞󰆣󰆣󰆦󰆢󰆦󰆤󰆦󰆟
9 and_clothing 󰆝󰂺󰆤󰆝󰆥󰆥󰆥󰆠󰆢󰆟󰆝󰆤 󰆖󰆗 was_taking_all 󰆝󰂺󰆣󰆞󰆡󰆟󰆟󰆞󰆥󰆡󰆞󰆡
󰆔󰆓 illtreating_me 󰆝󰂺󰆤󰆝󰆣󰆝󰆡󰆝󰆝󰆢󰆣󰆣 󰆖󰆘 take_my_earnings 󰆝󰂺󰆣󰆞󰆠󰆝󰆟󰆝󰆠󰆟󰆢󰆤
11 with_st_food 󰆝󰂺󰆤󰆝󰆠󰆦󰆢󰆡󰆠󰆤󰆤󰆡 󰆖󰆙 taking_all_my_
earnings
󰆝󰂺󰆣󰆞󰆞󰆞󰆤󰆟󰆤󰆝󰆠󰆞
12 was_illtreating 󰆝󰂺󰆣󰆦󰆣󰆡󰆡󰆞󰆣󰆞󰆠󰆠 󰆖󰆚 taking_my_
earnings
󰆝󰂺󰆣󰆞󰆝󰆦󰆝󰆤󰆤󰆞󰆞󰆣
󰆔󰆖 my_carnings 󰆝󰂺󰆣󰆥󰆣󰆤󰆝󰆤󰆡󰆟 󰆖󰆛 food_and_clothing 󰆝󰂺󰆣󰆞󰆝󰆢󰆡󰆤󰆥󰆤󰆥󰆢
14 not_giving_me 󰆝󰂺󰆣󰆤󰆢󰆞󰆣󰆟󰆢󰆡󰆝󰆢 󰆖󰆜 treat_me 󰆝󰂺󰆣󰆝󰆢󰆤󰆤󰆠󰆣󰆣󰆡󰆟
15 was_illtreating_me 󰆝󰂺󰆣󰆤󰆠󰆦󰆠󰆦󰆤󰆝󰆦󰆣 󰆗󰆓 clothes 󰆝󰂺󰆣󰆝󰆟󰆟󰆡󰆢󰆢󰆟󰆠󰆣
16 he_was_always 󰆝󰂺󰆣󰆤󰆠󰆟󰆡󰆟󰆥󰆥󰆥󰆤 41 ford 󰆝󰂺󰆣󰆝󰆝󰆥󰆤󰆤󰆟󰆝󰆠󰆢
17 cruelly 󰆝󰂺󰆣󰆤󰆞󰆡󰆣󰆥󰆣󰆞󰆟󰆠 42 me 󰆝󰂺󰆣󰆝󰆝󰆡󰆣󰆤󰆦󰆢󰆦󰆡
18 very_cruel 󰆝󰂺󰆣󰆤󰆞󰆡󰆟󰆟󰆦󰆠󰆦 󰆗󰆖 treating_me 󰆝󰂺󰆢󰆦󰆤󰆠󰆣󰆤󰆤󰆣󰆦󰆟
19 supplying_me 󰆝󰂺󰆣󰆣󰆦󰆤󰆟󰆞󰆡 44 illtreating 󰆝󰂺󰆢󰆦󰆡󰆞󰆦󰆥󰆥󰆠󰆝󰆦
󰆕󰆓 me_anything 󰆝󰂺󰆣󰆣󰆢󰆡󰆝󰆞󰆤󰆤󰆤󰆞 45 each_season 󰆝󰂺󰆢󰆦󰆟󰆥󰆥󰆥󰆢󰆤󰆝󰆡
21 beat_me 󰆝󰂺󰆣󰆢󰆦󰆞󰆥󰆠󰆥󰆟󰆣󰆡 46 was_always 󰆝󰂺󰆢󰆦󰆞󰆝󰆦󰆟󰆤󰆤󰆠󰆣
22 escape_from_him 󰆝󰂺󰆣󰆢󰆥󰆤󰆠󰆦󰆣󰆟󰆥󰆡 47 therefore_ran_
away_from
󰆝󰂺󰆢󰆦󰆝󰆢󰆦󰆥󰆥󰆠󰆤󰆟
󰆕󰆖 my_earnings 󰆝󰂺󰆣󰆢󰆥󰆞󰆣󰆠󰆤󰆤󰆦󰆠 48 diving_every_year 󰆝󰂺󰆢󰆥󰆠󰆣󰆝󰆢󰆢󰆟󰆝󰆣
24 give_me_anything 󰆝󰂺󰆣󰆢󰆣󰆡󰆦󰆞󰆡󰆠󰆡󰆞 49 clothing 󰆝󰂺󰆢󰆥󰆟󰆞󰆟󰆠󰆥󰆤󰆟󰆥
25 did_not_give 󰆝󰂺󰆣󰆢󰆠󰆤󰆥󰆝󰆥󰆝󰆢󰆤 󰆘󰆓 ghaus 󰆝󰂺󰆢󰆥󰆝󰆦󰆥󰆦󰆤󰆡󰆠
󰆼󰇁 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰃍arning󰂶 si󰃎󰂶󰆗󰆔󰂹 “ford” 󰃍oo󰂶 si󰃎󰃎󰂶
corpus contains spelling errors that can impede human understanding of the
expression󰂶 generally speakin󰂶 the  approach detects them as belonging
to similar semantic context󰂺 We have found that  artifacts in the corpus
are not as distracting as we previously imagine󰂺
In this exampl󰂶 there is a relatively close semantic relationship between
the 󰄀󰂶 but this is not always necessarily the cas󰂺 Although the corpus
    󰄀
ing manumissio󰂶 it is not composed entirely of the manumission statements
mentioned by Zdanowks󰂶 as this query demonstrate󰂺 The many thousands
of other documents articulated other discourses about the institution of slav󰄀
er󰂺            󰄀
sions underscores how repetitiv󰂶 even formulai󰂶 the nature of these requests
wa󰂺󰂶
    
therei󰂶 we hav󰂶󰂶 found a particular “corner” of the 󰂺 Had we cho󰄀
󰄍󰄀 such as “foo󰂶“clothin󰂶” or “divin󰂶
the query would have returned a wider variet󰂶 and perhaps not as obviously
related or clea󰂶 set of related term󰂺 While useful for building semantic clusters
in a corpus based on iterativ󰂶 individual query term󰂶 cosine similarity’s mea󰄀
sure of the position of various words in a  can also be seen as somewhat
󰄀󰂶 with limited discovery potentia󰂺
6.2 Vector Arithmetic with Multiple Query Terms
Since cosine similarity is calculated based on the value of a given string within
vector spac󰂶 more complex and combined queries are also possibl󰂺 Vector
arithmeti󰂶 for exampl󰂶 allows us to perform mathematical operations on
s to explore relationships between a number of given query term󰂺 And
importantl󰂶 as the term “arithmetic” suggest󰂶 the exploration is not always an
additive on󰂶 but works like boolean logi󰂺 By adding and subtracting multiple
vector󰂶 semantic relationships and patterns can emerge at the intersection
of concepts of interest to the historia󰂺 Such relationships might be thought
of as combined or contrasting similarities that can yield more complex rela󰄀
tionships akin to analog󰂺 Such an arithmetic approach could be very useful
in exploring concepts that have received limited attention by historians in the
critical literatur󰂶 sa󰂶 the gendered aspects of manumission in the Gul󰂺
For exampl󰂶 in  󰆕󰂶 the operation combining both subtraction and
󰃍“boat”󰈵“house”󰈴“master”󰃎󰄀
    󰃍“boat”󰃎
󰃍“master”󰃎󰂶
󰆼󰇂
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰃍“house”󰃎󰂺󰂶 in 󰆖󰂶󰃍“house”󰈵“boat”󰈴“master”󰃎
“house” combined with “mas󰄀
te󰂶󰄛      “boa󰂺” Her󰂶 the operation of subtraction

subtracted ter󰂶󰂺
In Tables 2 and 󰆖󰂶 we see that the order of the query terms in vector arithmetic
󰂶 creating a list of words and cosine similarities that illustrate the
󰂺

slavery in the Gul󰂶 as detailed above in 󰆗󰂺 The experiences of enslaved
      󰂺 Women were pre󰄀
dominantly employed in domestic servic󰂶 while men were more commonly
engaged in labor outside the househol󰂶 particularly in pearl divin󰂺 This gen󰄀
      
corpu󰂶 most of which were submitted by pearl diver󰂺
In Tables 󰆕󰆖󰂶 any number of query words could have been cho󰄀
se󰂶 but we use “house” and “boa󰂶” suggesting the gendered quality of space󰂶
allowing us to investigate how these contexts intersect with the concept of a
󰂶 represented by the term “master”󰄍󰄀
quently used by enslaved persons requesting manumission to refer to slave
owner󰂺 The comparison indicates the diverging experiences of male and
󰆽 Top results for the vector operation “boat”󰈵“house”󰈴“maste󰂶” highlighting terms
associated with maritime labor and authority in the context of enslavemen󰂺 The
model has been trained for 󰆔󰄀󰂹󰆔󰆘󰆓󰂶󰆕󰆓󰂶 6 word window
with negative sampling of 󰆘󰂺
Word <chr> Similarity to “boat”󰈵“house”󰈴“master” <dbl>
boat 󰆝󰂺󰆣󰆞󰆡󰆤󰆦󰆟󰆠
master 󰆝󰂺󰆢󰆥󰆟󰆠󰆤󰆢󰆦
beat 󰆝󰂺󰆢󰆟󰆢󰆦󰆠󰆥󰆡
nakhuda 󰆝󰂺󰆡󰆤󰆟󰆞󰆟󰆢󰆡
dhahi 󰆝󰂺󰆡󰆟󰆦󰆥󰆢󰆤󰆥
sailed 󰆝󰂺󰆡󰆟󰆤󰆥󰆢󰆢󰆟
jolly 󰆝󰂺󰆡󰆞󰆣󰆠󰆡󰆠󰆤
lauuch 󰆝󰂺󰆡󰆝󰆣󰆣󰆤󰆣󰆣
bankes 󰆝󰂺󰆡󰆝󰆝󰆞󰆤󰆝󰆤
diving 󰆝󰂺󰆠󰆦󰆤󰆣󰆤󰆟󰆞
󰆼󰇃 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
female enslaved person󰂺 The term “beating” appearing in the “boat” context
suggests that “master” was frequently mentioned alongside physical violence
in relation to men working on boat󰂺 In contras󰂶 in the “house” contex󰂶 the
term “husband” emerge󰂶󰄀
ized power dynamics for wome󰂺
It is worth mentioning that for the two arithmetic operation󰂶 there is a
slightly higher overall cosine similarity score for “house”󰈵 “boat”󰈴“maste󰂶
suggesting a richer relationship between these terms than for the second se󰂺
The choice of query terms is crucial in such an inquiry since it situates us in the
zones of the  where related terms li󰂺 The choice of “hous󰂶” “boa󰂶” and
“master” would almost certainly not lead us directly to lega󰂶 administrativ󰂶
or diplomatic discussions of manumissio󰂶 but as we saw in 󰆙󰂺󰆔󰂶 they
situate us squarely in the part of the corpus with the manumission request󰂺
In a corpus that captures a range of voices and discourse󰂶 vector arithmetic
is invaluable for identifying localized semantic “neighborhoods”󰄍
󰂶 such as
we have seen here with domestic or maritime setting󰂺 Howeve󰂶 understand󰄀
ing the broader relationships between these clusters requires an approach
that can map comple󰂶 󰄀 association󰂺 By utilizing principal
component analysis 󰃍󰃎 in the next sectio󰂶 we can extend this exploration
to visualize how multiple terms and their contextual neighborhoods interact
󰆾 Top results for the vector operation “house”󰈵“boat”󰈴“maste󰂶” highlighting terms
associated with domestic labor and authority in the context of enslavemen󰂺 The
model has been trained for 󰆔󰄀󰂹󰆔󰆘󰆓󰂶󰆕󰆓󰂶 6 word window
with negative sampling of 󰆘󰂺
󰈽󰈾 Similarity to “house”󰈵“boat”󰈴“master”󰈽󰈾
house 󰆝󰂺󰆣󰆥󰆦󰆠󰆟󰆣󰆝
master 󰆝󰂺󰆣󰆣󰆤󰆞󰆤󰆣󰆤
husband 󰆝󰂺󰆢󰆞󰆤󰆢󰆟󰆟󰆥
naster 󰆝󰂺󰆢󰆞󰆡󰆝󰆡󰆟󰆡
mistress 󰆝󰂺󰆢󰆝󰆝󰆣󰆟󰆞󰆤
died 󰆝󰂺󰆡󰆦󰆟󰆥󰆝󰆢󰆞
death 󰆝󰂺󰆡󰆥󰆦󰆦󰆞󰆤󰆤
serri 󰆝󰂺󰆡󰆥󰆟󰆝󰆠󰆥󰆤
hasir 󰆝󰂺󰆡󰆤󰆥󰆝󰆞󰆥󰆦
grandmother 󰆝󰂺󰆡󰆤󰆢󰆥󰆤󰆢󰆝
󰆼󰇄
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
within a larger semantic spac󰂺 This method allows us to observe not only how
individual “neighborhoods” are organized but also how they overlap and con󰄀
tras󰂶     
corpus and highlighting both expected and novel connection󰂺
6.3 Terms Associated with Multiple Keywords with PCA
When analyzing a corpu󰂶 certain concepts might be nuanced or 󰄀󰂶

󰃍chmidt and 󰆕󰆓󰆕󰆔󰃎󰂺      
perspectives within a corpu󰂺 One last example of how we use the wordVec󰄀
tors package to explore complex word relationships in our custom  is by
plotting terms associated with multiple keywords using Principal Component
󰃍󰃎󰂺 This approach has proven useful given the heterogeneity and
󰄀 quality of the  󰆘󰂺 By using  to plot terms associated
with multiple input keyword󰂶 we can explore how these concepts interact and
overlap within the corpu󰂺 By examining these relationship󰂶 we gain insights
into the semantic connections between term󰂶 enabling a richer understand󰄀
ing of the corpus’s multi󰄀layered natur󰂺
Figure 2󰂹 “nakhuda” 󰃍 boat captai󰃎󰂶
“power󰂶 “weapon󰂶“t󰂶 “cler󰂶” and “mistres󰂺” For exampl󰂶 the term
“power󰂶” chosen as one of the search term󰂶 is used by colonial forces to refer
to themselve󰂶 and it can be found in the upper right quadrant of the plo󰂺
󰂶 we see related terms like “powe󰂶 “undertak󰂶“regulation󰂶“signa󰄀
tory power󰂶” and “limit󰂶󰄛 
and sovereignt󰂺 While this terminology has not appeared in earlier analyses
in this articl󰂶 it is prominent in  󰆘󰂶 where the colonial understand󰄀
ing of enslavement intersects with the economic and political concerns of
empir󰂺 By selecting terms that represent key topics in the corpu󰂶 this type
of analysis and visualization highlights the nuances and interconnectedness
of these concept󰂺
As noted earlier in the discussion of cosine similarity querie󰂶 the keywords
for this analysis were not selected based on critical literature about enslave󰄀
ment in the Gul󰂺 Instea󰂶 they emerged through iterative cycles of close and
distant reading of  󰆘󰂺 For demonstration purpose󰂶 we chose terms that
produced a clear and evenly distributed arrangement in Figure 2󰂺 This distri󰄀
󰂶 with one notable
exceptio󰂺 In the 󰄀 area of the plo󰂶 the terms “tc” and “weap󰄀
ons” appear much closer to each othe󰂺 The term “weapon󰂶” in fac󰂶 is partially
obscured by the red vector line and its related term󰂶 emphasizing that these
two keywords are semantically the most connected in this particular quer󰂺
󰆽󰆻 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
󰆽 󰃍“nakhuda” [upper left]󰂶“powers” [upper right]󰂶“weapons” [center
right]󰂶“tc” [center right]󰂶“clerk” [bottom center]󰂶“mistress” [lower left]󰃎
in red and their contextual associations in black plotted in a 󰄀dimensional
spac󰂶 generated using  on our custom trained 󰂺
󰂺
The model has been trained for 󰆕󰄀󰂹󰆔󰆘󰆓󰂶󰆕󰆓󰂶 6 word
window with negative sampling of 󰆘󰂺
This proximity suggests a meaningful relationship in the corpus between the
󰄍󰄀
ther research exploratio󰂺
Another observation about the  plot concerns the overall orientation of
its axe󰂺   󰄀
terns in the dat󰂺 In Figure 2󰂶 clusters on the right side of the plot largely cor󰄀
respond to colonial infrastructur󰂶 Western sovereignt󰂶 and the arms and slave
trade󰂺 In contras󰂶 the left side contains terms related to the lived experiences
󰆽󰆼
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
of enslaved individuals in the Gul󰂺 This dichotomy provides insight into the
contrasting discursive landscapes within the corpu󰂺
The distinct clusters in Figure 2 were chosen deliberately to illustrate
key discursive “neighborhoods” in the dat󰂺 By using multiterm querie󰂶 we
can uncover nuanced relationships between term󰂶 
understanding of how concepts are interconnected and contributing to broader
themes in the debates surrounding slaver󰂺 This exploratory approac󰂶 like
those discussed in 󰆙󰂺󰆔 and 󰆙󰂺󰆕󰂶 reveals both familiar and unexpected
pattern󰂶 deepening our interpretation of  󰆘󰂺 Had we selected rare or
󰄍
󰄍
less informativ󰂺 By carefully choosing keyword󰂶 we can identify meaningful
semantic cluster󰂶 which serve as starting points for further comparative close
readin󰂺 This iterative process of “reading” and “rereading” enables researchers
to navigate 󰄀 concepts in the corpu󰂶 ultimately bridging the diverse
󰂺
6.4 Discussion
Our research into the manumission and slavery documents from the Gulf
region underscores the complexities and limitations inherent in applying
computational methods to 󰄀󰂶 multilingual corpor󰂺 Such docu󰄀
ment󰂶 which include a wealth of entities from 󰄀 languages and
explore topics both inside and outside Western perspective󰂶 present unique
challenge󰂺󰂶 draw󰄀
ing on decades of digital humanities expertise in Western environment󰂶 are
not always easily or directly transferable to such context󰂺 This disparity is
󰂹 the digital infrastructures and resources
that support 󰄀 computational analysis at scale 󰃍 model󰂶 󰂶 pre󰄀
trained language model󰃎 are more advanced in Western contexts than in the
Islamicate world and other no󰄀 region󰂺
In this article we have used  to create a corpus and then used a single R
package written by digital humanist󰂶 demonstrating three analytical querying
approaches with custom 󰂺 The package uses a “bag of words” approach
to custom train its mode󰂺 Our approach should be considere󰂶 therefor󰂶 an
initial and exploratory one for understanding the larger digitization of the
 by the 󰂺    󰄀
able within the same packag󰂶 but there are other methods for working with
word vectors including dynamic vector󰂶 for exampl󰂶 if we are interested in
semantic change over tim󰂶 or even others that stretch beyond the word2Vec
methodology 󰃍󰆕󰆓󰆕󰆕󰃎󰂺
󰆽󰆽 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
seem desirabl󰂶 our corpus in its current state does not have enough data yet
󰂺󰇟󰇥
This disparity becomes particularly evident when considering the scale and
scope of our corpu󰂺 While computational methods like word vector analysis
are powerful when applied to large dataset󰂶 our corpus is relatively small com󰄀
pared to those used in Western digital humanities project󰂺 This limitation has
implications both for the kinds of analysis we can perform and the conclusions
we can dra󰂺 Had we been working with a much larger document collection or
a more homogeneous corpu󰂶 the patterns and associations we could uncover
might be more robust and revealin󰂺 Scaling up the corpu󰂶 howeve󰂶 may not
󰄀
cerning slavery and manumissio󰂺 The idea of scale is not only complex for
discourses of manumission but might even be impractical for the way we tra󰄀
ditionally conceive many projects in the historical humanitie󰂺
The creation of a humanities corpus is an expensive proces󰂶 especially
in terms of human labo󰂺 In our researc󰂶 corpus creation not only involved
digitizing document󰂶 but also making critical decisions about how to process
them and choosing the best methods for their analysi󰂺 This challenge is ampli󰄀
󰄀 source󰂶 where the methods for text cre󰄀
ation do not always lend themselves to straightforward querying or analysi󰂺
󰂺
󰂶 where data can often be collected in large quan󰄀
tities through automated processe󰂶 humanities research frequently involves
manual data collection and curation from discrete archival collection󰂺 The
question of scaling humanities data is an important on󰂶 but it lies behind

articl󰂺  with 󰄀󰄀 transformer models allows us to create much
more text for analysis than bespoke model󰂶 and yet custom s allow us to
󰂶 including the artifacts
of a scaled  proces󰂺 Although the future of computational methods used
in archives is uncertai󰂶 it is tempting to suggest that a similar combination of
methods will become popular in historical researc󰂹 text creation as a more
generic 󰃍nd perhaps 󰄀󰃎 process and iterative analysis by researchers
using scripting languages such as Python or 󰂺
We discussed the issue of multilingual corpora abov󰂶 but the process of
working with the  
computational processes has also raised many questions of method for u󰂺
󰆞󰆤 Code available her󰂹 http󰂹󰃈󰃈ithu󰂺o󰃈󰄀󰄀󰃈󰄀
Data󰂺
󰆽󰆾
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
While it may be possible to bring together very large collections of multiple
millions of words for certain moments of time about the Gulf regio󰂶 there
is neither consistency nor representativeness within those archival dat󰂺 How
󰄀
ent kinds of voices? How can we be sure that any chosen model’s transcription
of typewritten and handwritten materials is of equal quality? And the list of
questions goes o󰂺 As we keep these questions in min󰂶 we also contend that
computational methods in historical research should not be judged solely by
whether they produce anticipated result󰂷 rathe󰂶 their value often lies in the
insights gained through experimentatio󰂶 even when unexpected outcomes
aris󰂺 In our stud󰂶 each analytical stage has provided new understandings that
inform our approac󰂶     󰂶 digitizatio󰂶
transcriptio󰂶 and broader infrastructure design for future researc󰂺 These
iterative adjustments are essential in developing robust digital resources and
     
 󰆘󰂺
7 Future Works and Conclusion
While the documents we have selected provide insight into the history of
manumission and slavery in the Gulf regio󰂶 they represent just one part of a

Records 󰃍󰃎󰂺󰄀
spective among man󰂶󰂺
We are fortunate to have the  for the study of the historical Gulf from
the perspective of the 󰂺 It contains a wealth of information about how the
British were trying to make sense of questions such as manumission and its
position within Islamic societ󰂺 The  materials also provide perspectives
through which this moment in Gulf history can be studied from the perspec󰄀
tive of colonialit󰂺
Some doubt remains about using the method we have discussed here at
scal󰂺 While working on a robust  model for Arabic remains an important
objectiv󰂶󰄀
tents of the  archives given the method we have laid out in this articl󰂺 The
 is indeed one of the largest online repositories of digitized material in
󰂶 but since so much of the 󰄀 material contained
in it was removed before the collections were transferred back to London
in the twentieth centur󰂶 it is an open question whether a custom word vec󰄀
tor model would be a feasible approac󰂺 An alternative approach using 󰄀
󰆽󰆿 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
created text and 󰄀 vector models may need to be adopte󰂺 It is also
unclear whether 󰄀   
the corpu󰂺
󰂶 the method we propose of creating a corpus using  from
the digitized  collections is not only transferable to other 󰄀 or
󰄍
 󰄍          󰂺 In our
cas󰂶 we believe that there is promise in identifying other 
interest that are topically connecte󰂶 such as those about pearling and pirac󰂺
We expect that the topical diversity within them will be similar to that of the
results discussed in  󰆙󰂺󰆖󰂶 but will provide new vantage points from
which to explore more deeply the interconnected discourses of coloniality in
the Gulf with computational method󰂺
Bibliography
Allen󰂶 󰂺 󰆕󰆓󰆔󰆖󰂺 “Slave Tradin󰂶 Abolitionis󰂶 and 󰄖ew Systems of Slavery’ in
the 󰄀 Indian Ocean World󰂺” In Indian Ocean Slavery in the Age
of Abolition󰂶 edited by 󰂺 Harms󰂶 󰂺 Freamon󰂶 and 󰂺 Blight󰂶
181󰄍2󰆓4󰂺n󰂹s󰂺
Bishara󰂶 Fahad Ahmad󰂺󰆕󰆓󰆔7󰂺 A Sea of Debt: Law and Economic Life in the Western Indian
Ocean, 1780–1950󰂺 Cambridge󰂹   s󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆔󰆓󰆔7
󰃈9󰆚󰆛󰆔󰆖󰆔󰆙󰆙󰆘󰆜󰆓󰆛󰆖󰂺
Blankenship󰂶 Avery󰂶 Sara Connell󰂶 and Quinn Dombrowski󰂺󰆕󰆓󰆕4󰂺nderstanding and
s󰂺Programming Historian󰂶  󰆖󰆔󰂶󰆕󰆓󰆕4󰂺 http󰂹󰃈󰃈oi
󰂺r󰃈󰆔󰆓󰂺󰆗󰆙󰆗󰆖󰆓󰃈󰆓󰆔󰆔6󰂺
Bsheer󰂶 Rosie󰂺 󰆕󰆓󰆕󰆓󰂺 Archive Wars: The Politics of History in Saudi Arabia󰂺 Stanford󰂹
s󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆔51󰆘󰃈󰆜󰆚󰆛󰆔󰆘󰆓󰆖󰆙󰆔󰆕󰆘󰆛7󰂺
Campbell󰂶 Gwyn󰂺󰆕󰆓󰆔󰆖󰂺 “Servitude and the Changing Face of the Demand for Labor
in the Indian Ocean Worl󰂶 󰂺 󰆔󰆛󰆓󰆓󰄍󰆔󰆜󰆓󰆓󰂺” In Indian Ocean Slavery in the Age of
Abolition󰂶 edited by 󰂺 Harms 󰂺󰂶 󰆔󰆛1󰄍2󰆓4󰂺n󰂹
Press󰂺
Cordell󰂶 Ryan󰂶 and David Smith󰂺󰆕󰆓󰆕2󰂺 Viral Texts: Mapping Networks of Reprinting in
19th-Century Newspapers and Magazines󰂺 Accessed  󰆙󰂶 󰆕󰆓󰆕4󰂺 http󰂹󰃈󰃈iral
text󰂺org󰂺
t󰂶 Anton󰂶 Thora Hagen󰂶 Leonard Konle󰂶s󰂺󰆕󰆓󰆕1󰂺 󰄀
and 󰄀ased s󰂺” In Computational
Humanities Research󰂶 n󰂶 Folgert Karsdorp󰂶 Melvin Wevers󰂶
Tara Lee Andrews󰂶 Manuel Burghardt󰂶 Mike Kestemont󰂶  s󰂶
󰆽󰇀
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Michael Piotrowski󰂶t󰂶 16󰄍󰆖󰆛󰂺 Amsterdam󰂹 󰄀󰂺rg󰂺 http󰂹󰃈󰃈
ww󰂺󰄀󰂺r󰃈󰄀󰆕98󰆜󰃈ong_p󰆖󰆘󰂺df󰂺
fastTex󰂺 Accessed 󰆙󰂶󰆕󰆓󰆕󰆗󰂺 http󰂹󰃈󰃈asttex󰂺󰃈󰂺
Freamon󰂶 Bernard󰂺󰆕󰆓󰆔󰆖󰂺 “Straigh󰂶󰂹 Slaver󰂶 Abolitio󰂶 and Modern Islamic
Thought󰂺” In Indian Ocean Slavery in the Age of Abolition󰂶 edited by  󰂺
Harms󰂶 󰂺 Freamon󰂶 and 󰂺 Blight󰂶 181󰄍2󰆓4󰂺 n󰂹󰄀
versity Press󰂺
Gavin󰂶 Michael󰂺󰆕󰆓󰆔9󰂺 “Is There a Text in My Data? 󰃍󰆔󰃎󰂹s󰂺Journal
of Cultural Analytics󰂶 September 17󰂶󰆕󰆓󰆔9󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆕214󰆛󰃈󰆓󰆓󰆔󰂺󰆔󰆔󰆛󰆖󰆓󰂺
Harms󰂶 Robert󰂺󰆕󰆓󰆔󰆖󰂺 “Introduction󰂺” In Indian Ocean Slavery in the Age of Abolition󰂶
edited by 󰂺 Harms󰂶 󰂺 Freamon󰂶 and 󰂺 Blight󰂶 1󰄍15󰂺 
Haven󰂹s󰂺
Hopper󰂶 󰂺󰆕󰆓󰆔5󰂺 Slaves of One Master: Globalization and Slavery in Arabia in
the Age of Empire󰂺n󰂹s󰂺
f󰂶 Gabi󰂺󰆕󰆓󰆕2󰂺 “Computation as Contex󰂹󰃈istant
Reading Debate󰂺 College Literature 49 󰃍1󰃎󰂹 1󰄍25󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆔󰆖󰆘󰆖󰃈i󰂺󰆕󰆓󰆕2
󰂺󰆓󰆓󰆓󰆓󰂺
n󰂶 Fady󰂺󰆕󰆓󰆕󰆖󰂺 “Building Handwritten Ground Truth for  with the Google Vision
 in Google Drive󰂺 OpenGulf󰂺 http󰂹󰃈󰃈pengul󰂺ithu󰂺󰃈t󰃈󰄀
󰄀󰃈󰂺
Kapa󰂶 Almazha󰂶 Suphan Kirmizialti󰂶 Rhythm Kukrej󰂶 󰂺
󰆕󰆓󰆕󰆕󰂺 󰄙󰄀        
Collections from the Multilingual Persian Gul󰂺” In Proceedings of the 6th Digital
Humanities in the Nordic and Baltic Countries Conference (DHNB 2022)󰂶
Berglun󰂶 Matti La Mel󰂶 and Inge Zwar󰂶 28󰆛󰄍󰆕9󰆙󰂺󰂶 Swede󰂶 󰆔󰆘󰄍󰆔󰆛󰂶
󰆕󰆓󰆕󰆕󰂺 htt󰂹󰃈󰃈󰄀󰂺r󰃈󰄀󰆖󰆕󰆖󰆕󰃈󰂺
McDow󰂶 󰂺 “Deeds of Freed Slave󰂹   
Mobility in 󰄀 Zanzibar󰂺” In Indian Ocean Slavery in the Age of Abolition󰂶
edited by 󰂺 Harms󰂶 󰂺 Freamon󰂶 and 󰂺 Blight󰂶 16󰆓󰄍181󰂺
Haven󰂹s󰂶󰆕󰆓󰆔󰆖󰂺
 Abu Dhabi  Working Group󰂺󰆕󰆓󰆕4󰂺 “Arabic 󰆔󰆛󰄍󰆕󰆓s󰂺
HTR Model󰂺󰎛http󰂹󰃈󰃈w󰂺 ranskribu󰂺r󰃈ode󰃈󰄀󰄀󰆔󰆚󰄀󰆕󰆓󰄀󰄀
written󰂺
Pedrazzin󰂶󰂶 and Barbara McGillivra󰂺󰆕󰆓󰆕󰆕󰂺 “D
19t󰄀   [Data set]󰂺󰄛 󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆘281
󰃈enod󰂺󰆚181682󰂺
Penningto󰂶󰂶 Richard Soche󰂶 and 󰂺 Mannin󰂺󰆕󰆓󰆕󰆗󰂺 “GloV󰂹 Global
Vectors for Word Representatio󰂺󰄛  󰂺 Accessed August 6󰂶 󰆕󰆓󰆕󰆗󰂺
http󰂹󰃈󰃈l󰂺tanfor󰂺d󰃈roject󰃈lov󰃈󰂺
󰆽󰇁 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Rabus󰂶 Achim󰂺󰆕󰆓󰆕2󰂺 “Handwritten Text Recognition for Croatian Glagolitic󰂺SLOVO 72󰂹
181󰄍192󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆖174 󰆘󰃈󰂺󰆚󰆕󰂺󰆘󰂺
ReadCoo󰂺 󰆕󰆓󰆕4󰂺 “Introducing Transkribus Super Model󰂹 Get Access to the ‘Text
Titan 󰂺󰄛ReadCoop [blog]󰂺󰆔󰆘󰂶󰆕󰆓󰆕4󰂺 http󰂹󰃈󰃈eadcoo󰂺󰃈
󰄀󰄀󰄀󰄀󰄀󰄀󰄀󰄀󰄀󰄀󰃈󰂺
Schmidt󰂶 Ben󰂺 󰆕󰆓󰆔5󰂺 “Vector Space Models for the Digital Humanities󰂺 Bookworm
[blog]󰂺󰆖󰆓󰂶󰆕󰆓󰆕4󰂺 http󰂹󰃈󰃈ookwor󰂺enschmid󰂺r󰃈ost󰃈󰆕󰆓󰆔󰆘󰄀󰆔󰆓󰄀󰆕5
󰄀󰄀󰂺tml󰂺
Schmid󰂶 Be󰂶 and Sarah Connel󰂺 󰆕󰆓󰆕󰆔󰂺 “Word Vectors Visualization” [notebook]󰂺
Accessed  󰆖󰆓󰂶 󰆕󰆓󰆕󰆗󰂺 http󰂹󰃈󰃈ithu󰂺o󰃈󰄀󰃈󰄀󰄀󰄀
󰃈lo󰃈ai󰃈ordVector󰃈󰄀󰄀󰂺md󰂺
“The Political Residenc󰂶 Bushire󰂺󰄛󰆕󰆓󰆕4󰂺 Qatar Digital Library󰂺 Accessed 󰆙󰂶󰆕󰆓󰆕4󰂺
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈󰄀󰄀󰂺
d󰂶 Ted󰂺󰆕󰆓󰆔9󰂺 Distant Horizons: Digital Evidence and Literary Change󰂺 Chicago󰂹
s󰂺
Verheul󰂶p󰂶 Hannu Salmi󰂶 Martin Riedl󰂶a󰂶 Lorella Viola󰂶 k󰂶 and
 l󰂺 󰆕󰆓󰆕2󰂺 sing Word Vector Models to Trace Conceptual Change Over
󰂶 184󰆓󰄍󰆔914󰂺Digital Humanities Quarterly
16 󰃍2󰃎󰂺   󰆙󰂶 󰆕󰆓󰆕4󰂺 htt󰂹󰃈󰃈igitalhumanitie󰂺r󰂹󰆛󰆓󰆛󰆔󰃈h󰃈o󰃈󰆔󰆙󰃈󰆕
󰃈󰆓󰆓󰆓󰆘󰆘󰆓󰃈󰆓󰆓󰆓󰆘󰆘󰆓󰂺tml󰂺
Wevers󰂶 Melvin and Marijn Koolwen󰂺󰆕󰆓󰆕󰆓󰂺 “D󰂹 Tracing Seman󰄀
s󰂺Historical Methods: A Journal of Quantitative
and Interdisciplinary History 󰆘󰆖 󰃍4󰃎󰂹󰆕󰆕6󰄍24󰆖󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆔󰆓󰆛󰆓󰃈󰆓161544󰆓󰂺󰆕󰆓󰆕󰆓
󰂺󰆔󰆚󰆙󰆓󰆔󰆘7󰂺
Women Writers Projec󰂶  󰂺 Accessed  󰆙󰂶 󰆕󰆓󰆕󰆗󰂺 http󰂹󰃈󰃈
ww󰂺ortheaster󰂺d󰃈󰂺
Zdanowski󰂶y󰂺󰆕󰆓󰆔1󰂺 “The Manumission Movement in the Gulf in the First Half of
the Twentieth Century󰂺Middle Eastern Studies 47 󰃍6󰃎󰂹󰆛󰆙󰆖󰄍88󰆖󰂺 Accessed 󰆙󰂶
󰆕󰆓󰆕4󰂺 http󰂹󰃈󰃈w󰂺andfonlin󰂺co󰃈o󰃈ul󰃈󰆔󰆓󰂺󰆔󰆓󰆛󰆓󰃈󰆓󰆓󰆕󰆙󰆖󰆕󰆓󰆙󰂺󰆕󰆓󰆔󰆓󰂺󰆘27121󰂺
Zho󰂶󰂶 Camille Lyans Col󰂶 and 󰂺 Che󰂺󰆕󰆓󰆕󰆗󰂺 “Basreh or Basra? Geoparsing
Historical Locations in the Svoboda Diarie󰂺” In Proceedings of the 62nd Annual
Meeting of the Association for Computational Linguistics (Volume 4: Student Research
Workshop)󰂶󰆕󰆜󰆔󰄍󰆖󰆓󰆗󰂺 󰆔󰆔󰄍󰆔󰆙󰂶󰆕󰆓󰆕󰆗󰂺 http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆔865󰆖󰃈󰆔󰃈󰆕󰆓󰆕󰆗󰂺󰄀󰂺󰆖󰆖󰂺
󰆽󰇂
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Appendix 
  󰆘󰄍     
copies located in the Qatar Digital Librar󰂺
Title of  document Document 
󰆘󰃈󰆔󰆓󰆗󰂶󰄍
trade correspondence
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈
vdc_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆙󰆛󰄍 Manumission of slaves
on Arab Coas󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆘
󰆘󰃈󰆔68 󰄍
Arab Coas󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆙
󰆘󰃈󰆔󰆙󰆛󰄍 Manumission of slaves
on Arab Coas󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆛
File 5󰃈󰆔󰆙󰆛󰄍 Manumission of slaves
on Arab Coas󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈8󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆜
󰆘󰃈󰆔󰆛󰆖󰃍󰆖󰆔󰃎󰄍
slaves at Kuwait
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔87 󰄍
slave trade
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔88 󰂶 189 󰄍
as a result of slaves taking refuge in
consulates and agencie󰂷 manumission
of slaves and general treatment of slave
trade cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆜󰆓󰄍 Manumission of slaves at
Musca󰂹 individual cases
http󰂹󰃈󰃈w󰂺qd󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆜󰆓󰄍 Manumission of slaves
at Musca󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈8󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆜󰆓󰄍 Manumission of slaves at
Musca󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆜󰆓󰄍
Musca󰂹 individual cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆜󰆔󰄍
and Indians on the Mekran Coast and
exporting them for sale at Oman and
Trucial Coast
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆕
󰆽󰇃 
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Title of  document Document 
󰆘󰃈󰆔󰆜󰆔󰄍 Individual slavery cases http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆖
󰆘󰃈󰆔󰆜󰆔 Individual slavery cases http󰂹󰃈󰃈w󰂺d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆘
󰆘󰃈󰆔󰆜󰆖 󰃍󰆖󰆛󰃎󰄍 http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆚
󰆘󰃈󰆔󰆜󰆖 󰃍󰆖󰆛󰃎󰄍 http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆙
󰆘󰃈󰆔󰆜󰆖 󰃍 5󰆘󰃎󰄍
Persian Gulf
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈archiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆜
󰆘󰃈󰆔94 󰂶 195 󰂶 179 󰂶 169 󰂶 󰆔󰆓󰆗 󰄍
Kidnapping of individual󰂷 manumis󰄀
sion of slaves at Kuwait and Bushir󰂷
miscellaneous slavery cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔94 󰂶 195 󰂶 179 󰂶 169 󰂶 󰆔󰆓󰆗 󰄍
Kidnapping of individual󰂷 manumis󰄀
sion of slaves at Kuwait and Bushir󰂷
miscellaneous slavery cases
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓󰆓
File 5󰃈󰆔97 󰄍
Sharja and Henja
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔98 󰄍
Trucial Coast
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆔󰆜󰆛󰂶 19󰆜󰂶󰆕󰆓󰆓󰄍
persons on the Trucial Coas󰂷 purchase
and export of slaves from the Trucial
Coas󰂷 Saudi Government’s regulations

http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆕󰆓󰆔󰄍
rules relating to cases arising out of the
pearling industry
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆙 󰄍
general rules and procedure on slave

http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰃍cont.󰃎
󰆽󰇄
󰆽󰃍󰆽󰆻󰆽󰆿󰃎󰆼󰄍󰆽󰇄
Title of  document Document 
󰆘󰃈󰆙5 󰄍
emancipated slaves and proposal to
󰂶 Oman
ports and Zanzibar
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
󰆘󰃈󰆚󰆗󰄍
authorities of surrendering fugitive
slaves
http󰂹󰃈󰃈w󰂺 d󰂺󰃈󰃈rchiv󰃈󰆛󰆔󰆓󰆘󰆘󰃈dc
_1󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆓󰆔󰆜󰆖󰂺󰆓󰆓󰆓󰆓󰆓
Appendix 
We trained four Word Vector Models with the parameters described in the table belo󰂺
These models are available at http󰂹󰃈󰃈o󰂺r󰃈󰆔󰆓󰂺󰆘28󰆔󰃈enod󰂺󰆔4192194󰂺
Name of model n-gram Dimensions Iterations Window Negative
samples
Total
training
words
HTR997K1g150
d20i6w5ns.bin
󰆞 󰆞󰆢󰆝 󰆟󰆝 󰆣 󰆢 󰆦󰆞󰆡󰆢󰆦󰆣
HTR997K2g150
d20i6w5ns.bin
󰆟 󰆞󰆢󰆝 󰆟󰆝 󰆣 󰆢 󰆥󰆢󰆤󰆦󰆦󰆡
HTR997K2g150
d20i6w15ns.bin
󰆟 󰆞󰆢󰆝 󰆟󰆝 󰆣 󰆞󰆢 󰆥󰆟󰆝󰆠󰆦󰆞
HTR997K3g150
d20i6w5ns.bin
󰆠 󰆞󰆢󰆝 󰆟󰆝 󰆣 󰆢 󰆤󰆡󰆞󰆢󰆡󰆟
󰃍cont.󰃎
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Text recognition technologies increase access to global archives and make possible their computational study using techniques such as Named Entity Recognition (NER). In this paper, we present an approach to extracting a variety of named entities (NE) in unstructured historical datasets from open digital collections dealing with a space of informal British empire: the Persian Gulf region. The sources are largely concerned with people, places and tribes as well as economic and diplomatic transactions in the region. Since models in state-of-the-art NER systems function with limited tag sets and are generally trained on English-language media, they struggle to capture entities of interest to the historian and do not perform well with entities transliterated from other languages. We build custom spaCy-based NER models trained on domain-specific annotated datasets. We also extend the set of named entity labels provided by spaCy and focus on detecting entities of non-Western origin, particularly from Arabic and Farsi. We test and compare performance of the blank, pre-trained and merged spaCy-based models, suggesting further improvements. Our study makes an intervention into thinking beyond Western notions of the entity in digital historical research by creating more inclusive models using non-metropolitan corpora in English.
Article
Full-text available
Linking large digitized newspaper corpora in different languages that have become available in national and state libraries opens up new possibilities for the computational analysis of patterns of information flow across national and linguistic boundaries. The significant contribution this article presents is to demonstrate how word vector models can be used to explore the way concepts have shifted in meaning over time, as they migrated across space, by comparing newspapers from different countries published between 1840 and 1914. We define a concept, rather pragmatically, as a key term or core idea that has been used in historical discourse: an abstraction or mental representation that has served as a building block for thoughts and beliefs. We use historical newspapers in English, Finnish, German and Swedish from collections in the UK, US, Germany, and Finland, as well as the Europeana collection. As use cases, we analyze how the different conceptual constructs of “nation” and “illness” emerged and changed between 1840 and 1920. Conceptual change over time is simulated by creating a series of overlapping word vector models, each spanning ten years. Historical vocabularies are retrieved on the basis of vector space proximity. Conceptual change across space is simulated by comparing the historical change of vocabularies in newspaper collections from different nations in several languages. This computational approach to conceptual history opens up new ways to identify patterns in public discourse over longer periods of time and across borders.
Article
Full-text available
Recently, the use of word embedding models (WEM) has received ample attention in the natural language processing community. These models can capture semantic information in large corpora of text by learning distributional properties of words, that is how often particular words appear in specific contexts. Scholars have pointed out the potential of WEMs for historical research. In particular, their ability to capture semantic change might assist historians studying conceptual change or specific discursive formations over time. Concurrently, others voiced their criticism and pointed out that WEMs require large amounts of training data, that they are challenging to evaluate, and they lack the specificity looked for by historians. The ability to examine semantic change resonates with the goals of historians such as Reinhart Koselleck, whose research focused on the formation of concepts and the transformation of semantic fields. However, word embeddings can only be used to study particular types of semantic change, and the model’s use is dependent on the size, quality, and bias in training data. In this article, we examine what is required of historical data to produce reliable WEMs, and we describe the types of questions that can be answered using WEMs.
Article
Word embeddings allow you to analyze the usage of different terms in a corpus of texts by capturing information about their contextual usage. Through a primarily theoretical lens, this lesson will teach you how to prepare a corpus and train a word embedding model. You will explore how word vectors work, how to interpret them, and how to answer humanities research questions using them.
Book
This history of the African diaspora and slavery in Arabia in the nineteenth and early twentieth centuries examines the interconnected themes of enslavement, globalization, and empire and challenges previously held conventions regarding Middle Eastern slavery and British imperialism. Whereas conventional historiography regards the Indian Ocean slave trade as fundamentally different from its Atlantic counterpart, this study argues that both systems were influenced by global economic forces. The book disputes the triumphalist antislavery narrative that attributes the end of the slave trade between East Africa and the Persian Gulf to the efforts of the British Royal Navy, arguing instead that Great Britain allowed the inhuman practice to continue because it was vital to the Gulf economy and therefore vital to British interests in the region. Slaves of One Master links the personal stories of enslaved Africans to the impersonal global commodity chains their labor enabled, demonstrating how the growing demand for workers created by a global demand for Persian Gulf products compelled the enslavement of these people and their transportation to eastern Arabia.