Available via license: CC BY 4.0
Content may be subject to copyright.
Published with license by Koninklijke Brill
This is an open access article distributed under the terms of the
Exploring Gulf Manumission Documents
with Word Vectors
Suphan Kirmizialtin |
suphan@nyu.edu
David Joseph Wrisley |
Corresponding author
djw12@nyu.edu
Received 7 Accepted 26
Abstract
In this article we analyze a corpus related to manumission and slavery in the Arabian
Gulf in the late nineteenth and early
written Text Recognition
R/15/1/199 File 5
of 977K word it contains a variety of perspectives on manumission and slavery in the
region from manumission requests to administrative documents relevant to colonial
approaches to the institution of slaver We use word2Vec with the WordVectors pack
age in R to highlight how the method can uncover semantic relationships within his
torical text demonstrating some exploratory semantic querie investigation of word
analogie and vector operations using the corpus conten We argue that advances in
applied computer vision such as are promising for historians working in colonial
archives and that while our method is reproducibl there are still issues related to
language representation and limitations of scale within smaller dataset
corpus creation is labor intensiv word vector analysis remains a powerful tool of
computational analysis for corpora where error is presen
Keywords
Handwritten Text Recognition word vector models
Records manumission Gulf Studies colonial archives slavery
1 Introduction
Persian Gulf archival record shaped by colonial processes of collection and
preservatio are scattered across global institution
asymmetries in power that continue to determine which narratives remain
accessibl Amid this fragmented and politically charged archival landscap
the Qatar Digital Library stands as a critical access point for Gulf histo
Within
this expansive digital repositorFile 5
Slave Trade
prising approximately 1 pages related to manumission and slavery in the
Gulf during the late nineteenth and early twentieth centuries ereafte
including manumission requests recorded by British
dow into the lived experiences of servitude as well as the colonial discourse
surrounding liberatio
Responding to scholarly call such as those by Zdanowski
cal engagement with manumission documents to uncover embedded forms of
violence and powe we have created a corpus of File 5
using Handwritten Text Recognition technolog
vector analysis to map thematic structures across the corpus and close reading
to explore contextual layers within individual narrative our approach bridges
computational and traditional textual analysi In balancing these computa
tional and interpretive method we follow an approach informed by existing
scholarship on close and distant readin where iterative engagement with the
text and the computational model helps illuminate both patterns
nuanced narratives
This dual approach allows us to examine the language of manumission with a
advancing the study
of Gulf histories within this comple distributed archival framewor
Gulf regio see
2 Word Vectors for the Study of Historical Corpora
In this articl we employ word vector analysis to examine our collection of
historical document Word vector analysis is a computational technique that
creates mathematical representations of individual words as vectors in a con
tinuous vector space derived from the corpu This approach approximates
“spatial analogy to the
relationship between words”
words appearing in similar contexts tend to have similar or related meanings
erheul
between words without incorporating external knowledge about their seman
tic It is important to note that while similar vector contexts can indicate
similar meaning proximity in a word vector model does not always imply
synonymit it might also indicate antonym abbreviation or other instances
of words that appear in similar context
In our stud
chmidt and
Women’s Although
vector models are available for analyzing text collections in various languages
ennington
istics of our historical corpus lend themselves to model The
spac validating i making queries based on semantic relationship investi
gating word analogie and performing vector operations using our colonial
corpus conten We will elaborate on these operations late howeve it is
important to note tha like many other digital historian we use this method
for exploratory data analysis
d’s limitations for similar corpor
Such exploration requires familiarity with both the topical content of the cor
pus and the computational parameters of its creatio enabling a deeper inves
tigation into the interrelated concepts it contain
3 Source Material
The archival collections at the heart of our research are sourced from the exten
held in London by the British Librar now digitized
Sarah Connell and the others in digital scholarship at the
the context of their Advanced Topics in the Digital Humanities semina
and made publicly available by the our focus lies on
the records of the British Residency in the Persian Gulf that pertain to slavery
and manumission in the regio
these documents provide insights into manumission as well as the experiences
of enslaved or indentured individuals in the Gulf region during a time of sig
The collection includes manumission
requests as well as a variety of administrative records related to the slave trad
regulation procedures concerning manumis
sio and correspondence detailing the fates of freed individual They also
contain a vast array of additional documentatio contextualizing the complex
institution of slavery and its integral role within the fabric of
the Indian Ocean world in general and the Arabian Peninsula in particula
The manumission statement a key source in this collectio were typically
created through a process shaped by the constraints and practices of colonial
administratio Most applicants were illiterate and communicated their situ
ations orall typically in Arabi to assistants at British agencies or consulates
across the regio The assistants would then translate the statements into
cant’s place of birth and origi followed by a brief account of their enslave
These
typically validated by the applicant’s thumbprint as a mark of authenticity
It is crucial to acknowledge that these documents are heavily mediated
through the colonial len
people come throug who
recorded and translated accounts using formulaic language and selecting
information deemed relevant to colonial administratio The power dynamics
embedded in the creation and preservation of these documents often obscure
the full experiences of enslaved individual imposing silences and reinforcing
the priorities of colonial authoritie
bleeding in especially evident in typewritten text to others left intentionally
blan likely to prevent ink transfer between page The documents’ complex
layou
Appendix For more on the records
see “The Political Residenc Bushir”
4 Historical Background: Slavery in the Gulf
In the history of the Indian Ocean worl the phenomenon of slavery stands
out for its complexity and persistenc particularly in the age of abolitio
Despite the British Parliament’s
and subsequent naval patrols around the African coast to curb the shipments
of enslaved peopl the trade in the Indian Ocean world not only persisted bu
in some area 2 This surge was partly due to
rerouting their operations through the Mozambique Channel into the Indian
Ocean zon
18 It was also due to the rise of plantation economies in Zanzibar and
regarding slaver Campbell write “During the late imperialist surge
in the Indian Ocean Worl
Moreove ‘liberated’ slaves were a potentially vital source of both
taxation and manpower under colonial regimes governed by precepts of A
colonial priority was thus to transform the local working population into a free
forc elite whose assistance
was required to administer the colon”
“These
when added to those on Britis Dutc Frenc and Portuguese slave trading within
the Indian Ocean noted earlie
Sample pages from the volume Manumission of slaves at
Musca
Depicted here are
14951r 1
Government Licens
Pemb operated predominantly by Omani Arabs with slaves brought from the
The Gulf regio interconnected with the broader Indian Ocean worl expe
rienced its particular dynamics of slavery and manumission against the back
drop of economic and colonial pressures of the nineteenth and early twentieth
centuries
economic and social fabric of the regio particularly the pearl diving indus
tr
enslaved men were utilized in diverse
capacities as mercenarie agricultural hand and other forms of manual labo
on the other han primarily served as domestic servants
and concubine highlighting the dependence on enslaved labor
within both the economic and domestic spheres of the Gulf societies where
general emancipation did not take place until the century
86
As evident from the information abov the questions of slavery and manu
mission in the Gulf region are complex and Thomas McDow
points out that the status of slaves in this particular context cannot be fully
understood through a simple “slave versus free dichotomy” as the enslaved
people “existed in hierarchies of dependenc” The rights and circumstances of
performe
independence or full freedom due to the obligations their masters had towards
the shaped by Islamic and local custom whereas plantation slaves were
rarely emancipate Many enslaved individuals sought to improve their social
For
som remaining a slave client under a powerful patron was economically and
socially more advantageous than achieving complete freedom cDo
plex and as the institution of slavery itself Bishara
Islamic teachings emphasize the moral and religious virtues of freeing slave
and the practice of manumission was deeply embedded in the region’s cul
86
Interestingl
transportation of at least 82 and perhaps as many as 8
” 18
Zdanowski estimates that enslaved individuals made up approximately
1 of the Gulf ’s populatio
Since individuals freed by British
authorities were not emancipated according to Islamic law and prevailing social
convention they were often perceived as merely being purchased by colonial
authoritie Consequentl these freed individuals were sometimes referred to
as “slaves of the government” or “slaves of the Consul” 86
Another issue complicating manumission practices was the diverse origins
of enslaved people in the Gul In manumission statement individuals were
required to identify their ethnic or geographic origin often revealing three
main categorie those kidnapped from British colonie those originating from
and those born into slavery to par
ents who might have been born as free people but who had been kidnapped
or brought to the Gulf from the former two categorie This distinction was
crucia
for those who could prove their origins in British colonie Howeve for indi
viduals from protectorates and those born into slaver the situation was more
comple as their status did not automatically guarantee manumissio
Throughout the period under study her
striking agreements with local authorities
to curtail slave import and advocating for individual cases of emancipatio
Howeve
implement abolitio recognizing the importance of these leaders
as allie The British were concerned that pushing too hard for abolition could
incite political unrest and rebellion against these local authoritie thereby
destabilizing the region
In the Trucial Coas
the advent of cultured pearl
the lives of slave prompting many to seek manumissio There was a sudden
with the overwhelming major
ity of applications coming from men employed on pearling boats
87 Their primary complaints included inadequate food and clothing
provided by their master being forced to div and not receiving payment for
their wor
provided emancipated individuals with some protection and a means to
records make mention of enslaved people from
Baluchistan and Indi
the Gulf for the years between 1921 and 194
seek assistance against or mistreatmen British policy was
not aimed at dismantling the institution of slavery in the region as a whol
Instea the gradual abolition of slavery in the Gulf unfolded over the early to
century 88
5
File 5 involved transforming the digitized documents
available at the into computable plain text and then analyzing them with
the computational technique of word vector Three critical aspects of our doc
umentary base are essential to highlight in this contex the discursive diversity
of document their scal and their multilingual natur These factors are cru
from select
ing the appropriate models and determining the parameters for training
the word vector models to interpreting our result The pipeline developed for
yet it is presented here in detail to
facilitate adaptation to analogous scenario highlighting the versatility of
and computational analysis frameworks in historical researc
5.1 HTR for the Automated Recognition of Documents
In digital humanitie the transition from traditional archival research to com
putational analysis of digitized archival documentation presents numerous
opportunities and challenge The potential of algorithmic reading and sum
given the
complexities of the documents at hand and the relative immaturity of digital
archival system Our project to File 5 occupies a critical
position in this transitio
ing the archives of the comparatively little attention was given to prepar
ing them as fully document
The volumes of manumission material in our corpu like many
historical archive are an assemblage of various types of documentatio
Browsing them is akin to reading a scrapboo a heterogeneous collection of
materials loosely connected by subjec yet presented in idiosyncratic layout
in both handwritten and typewritten format and across multiple language
typewritten documents have been processed using Optical
Character Recognition
page level for browsin This is not the case for the handwritten page
To transform these historical documents into computationally tractable tex
the most practical technology that we had at our disposal for archival docu
ments was For this projec we employed technol
ogy available via the Transkribus platfor
The choice of Transkribus was a pragmatic on Recent advancements in
Transkribus
that wa until recentl out of reac Transkribus has introduced ‘super mod
els’ powered by transformer architectur enabling the ing of docu
As the Transkribus user community has historically been focused on archives
these super models have been trained on vast
This
allowed us to process File 5 the lion’s share
both in the original and
in translatio
Prior to the advent of these advanced model transcribing multilingual
archives would have necessitated the creation of bespoke
model customized to accommodate the diverse handwriting styles
and the mixture of handwritten and printed documents in our collectio
Language detection would have had to be integrated into layout analysis in
a computationally expensive proces The introduction of
general models has streamlined such researc enabling us to transcribe all of
the language documents in our collection simultaneously and rather
quickl though not without challenges related to transcription accuracy and
error management
Our choice to work with
from other projects that focus on creating very clean text with bespoke
trainin While we recognize that bespoke models potentially yield higher
accuracy rates in the transcription outpu
scholarly value in using general model especially when working with larg
The distinction between digitized and documents is crucial for any
one interested in digital humanities method particularly for scholars working with
Arabic or other script writing system Achieving computer readability means
that each word and character string within the documents can be processed computa
tionall as opposed to merely being displayed as static digital image
In this stud we utilize the general model named Text Titan available to us at the time
of writin this model was trained on historical documents in Germa Frenc Dutc
Finnis Swedis spanning from the 16th to the 21st centur Text Titan
corpor
text analysi which prioritizes the exploration of patterns and
themes within the corpus over perfect accuracy in text captur with analytical
ordell and Smit
At the same tim we recognize the potential downstream challenges of
using general model The transcription output generated by
pus can be employed for distant reading and conceptual analysi Two key
issues in this regard warrant mention at the outset as they shape our approach
to the corpu additional considerations will emerge in the analysis belo
Firs the process of corpus creation using involves or
unsupervised steps that predict the words on the pag sometimes resulting
in misspelling particularly of proper names apan Zhou
orthography in File 5 is inherently unstabl To ana
it is essential to employ text analysis methods
that are not reliant on strict string matchin accounting for both historically
spelling and transcription artifacts introduced by
An additional challenge lies in the page layou abbreviation and hyphen
ation present in the manumission documentatio Our corpus includes vari
ous types of document many of which are handwritten in an informal styl
Common word title and metrological terms are frequently abbreviate
often inconsistentl Moreove clerks copying handwritten documents broke
words at syllable boundarie sometimes inserting hyphens and other times
As a
resul the corpus contains numerous orthographic variance error and frag
mented word The implications of these issues are addressed in the analysis
section belo
Secon as of the time of writin the Transkribus user community includes
very few groups working with Arabic tex with only one Arabic public model
currently availabl the
theshelf transformer models used to create our corpus of File 5 were not
approaches are sometimes aligned with techniques such
as tagging and lemmatizatio Howeve the impact of applying these
techniques to text generated by which can be messy and inconsisten is
not yet fully understoo
methods on tex
At Abu Dhabi we have accumulated ground truth for a public handwritten Arabic
mode We have been combining crowdsourcing approaches with synthetic data for its
creatio See and Abu Dhabi Working
trained on material Consequentl pages written in Arabic
or containing Arabic segments remain untranscribe Although the absence
of transcription might seem problematic for an archive con
taining Arabic material the distant reading approach adopted for File 5
mitigates this issue to some exten Many of the Arabic texts in our corpus
allowing the
they are missed in Arabi Howeve even if the Arabic texts were transcribe
additional challenges would arise when working with multilingual corpor
including the training of word vector model These issue which extend
beyond Arabic text are discussed further in
These challenges highlight the broader implications of working with gen
eral models for multilingual corpor The model used to transcribe
training se resulting in inconsistent transcriptions for other languages pres
ent in the collectio For exampl while the model captures some French
and German words found in the British material it frequently misinterprets
Arabic handwritten text as Frenc While setting a frequency threshold dur
word such inconsistencies in transcription still pose challenges for querying
Had we opted for bespoke model some of the issues highlighted
abov such as handling abbreviations and hyphenated word might have
been mitigated through approaches like the “smart model” strategy
or postprocessing technique Howeve the size of the corpus made
such posttranscription corrections impractica both in terms of time and
resource Given the sheer volume of material and our decision not to pursue
extensive editin conducting word vector analysis directly on the out
To extract
meaningful insights and use the imperfect outputs for historical interpre
tatio it remains essential to understand both the complexities of the archival
materials and the inherent imperfections of the trained word vector model
These issues will be explored further in
5.2 Training Word Vector Models with the HTR Output
documents into computer
readable processed and exported by the system volume by
volum as detailed in Appendix we customtrained Word
Vector Models This approach
diverges from many projects that rely on embeddings like
which are optimized for modern language task Scholars have high
lighted the enduring relevance of static embedding models such as word2Vec
for digital humanities researc particularly in tracing the evolution of ideas
and language over time hrmanntraut
the particular needs of our projec where understanding the historical speci
Several factors informed our decision to train a custom Firs the his
torical context of these documents required an approach tailored to our cor
pu as embeddings based on modern language were less suitable
Secon the corpus contains numerous proper names of persons and places
from the Gulf regio which are transliterations from Arabic script and exhibit
orthographic instabilit Capturing these was essential to the topical analy
sis of slavery and manumissio Finall a custom was also necessary to
address abbreviations and truncated word as discussed in
With more than 1 pages xcluding blank one File 5 represents
a substantial corpus for traditional historical readin The output for the
pages alone resulted in approximately 97
Howeve as an experiment in
training and analysi this corpus falls at the lower end of what is typically rec
ommended for generating stable embeddings and conducting longitudinal
studie such as tracing conceptual history or semantic change evers and
Moreove creating a for a corpus is not a proces
instea
parameter Our research on training s for output from colonial
of evidence within the dat Put another wa we paid particular attention to
s that responded to
typical querie
models surface Whereas there are a fair number of possible parameters in
the wordVectors packag some of our most interesting results were found by
Blank pages were not processed with Transkribu nor were page Some
Detailed information on the parameters used for model training can be found in the
documentation at httpdrithumschmidordVectorarain_word2vec
tml A lesson for an analogous word2Vec method in Python has been published by the
Programming Historia including a discussion of parameters lankenship
varying the training using whic in the en allowed us to access
For exampl we found that training the for
was particularly useful for word groupings common in the
narrative and formulaic manumission requests or for Arabic idafa construc
tion By contras word provided access to a more conceptua
abstract vocabulary found in administrative and political documents of the
collectio
For this stud we have trained multiple models using the same 977
corpu We have listed those models and their
parameters in Appendix These variations allowed us to examine how dif
el In the following sectio we discuss querying these models to illustrate
salient observation
6 Observations and Results: Querying the Models
The wordVectors package enables querying the s trained on one’s own
corpora and facilitates the exploration of the corpus and the variety of com
mon contexts for words found therei
cantly faster and less step than the previous two steps of
generating textual transcriptions with and training the In other
word once the initial steps of transcription and training are complet itera
It is worth highlighting agai howeve
which one can query the s depend on the historian’s knowl
edge of the corpus and the complexity of the discourses it contain Querying
a mode in other word is ideally aligned with informed historical inquir
Moreove experimenting with training parameters can also sometimes pro
yielding results that are not asil
interpretabl
analysis has some notable limitation Firs an instance of any given
in a is a highly abstract concep
onyms or fail to account for nuanced distinctions present in the original tex
Additionall once a is create the ability to contextualize terms through
direct readin as provided by a approac is los
For researchers accustomed to wildcard searching in database querying a
Howeve
when combined with string searching and other natural language processing
technique querying a can become a powerful method for conducting
distant conceptual readin
We emphasize the importance of “learning to query” a because the
These models encourage histori
critically engaging with forms of visualization to uncover insights and identify
new terms for exploratio In the sections that follo we discuss three key
analytic frameworks for understanding and working with cosine simi
larit vector arithmeti and analysi
6.1 Cosine Similarity of Query Terms
One of the simplest queries to perform with a involves measuring the
cosine similarity of words or Cosine similarit represented as a
value between and 1
set of term High cosine similarity in a corpus does not necessarily imply a
between string
In the query demonstrated in we look at a trope of food and resource
deprivation found in the numerous requests for manumission made to British
a query that relates to Zdanowski’s obser
vation about an increase in manumission requests during the collapse of the
The results of the query for expressions similar
to the gram “st_food” reveal a strong contextual relationship
with physical violenc deprivation of food and clothin as well
as or seizure of earning These common motifs recur through
out the many hundred manumission requests documented in our
corpu highlighting the conditions under which such documents presented to
Additionall “each_season” and “diving_
every_year” This
particular query also reveals two artifacts of the osition 1 “my carnings”
Despite the closeness of this query to Zdanowski’s observatio we did not develop our
queries directly from the critical literature on enslavemen but rather from our selective
prior reading of the corpus combined with a list of results from a number of exploratory
cosine similarity querie It is important to underscore that this kind of querying process
in a corpus will not reveal much of the critical vocabulary used by modern historian but
returns a reading of the terms used in the corpu
or grams for the expression “s
cient_food” based on a word vector calculated using our Manumission corpu
The model has been trained for 6 word window with
negative sampling of
Ranking n-gram Cosine
similarity
Ranking n-gram Cosine
similarity
1t_food 26 giving_me
2 never_gave_me 27 was_always_
illtreating_me
t_food_and_
clothing
28 giving_me_much_
trouble
4 i_therefore_managed 29 wear
5 enough_food me_every_year
6was_not_supplying beating_me
7 given_st_food illtreat
8 all_my_earnings without_any_
reason
9 and_clothing was_taking_all
illtreating_me take_my_earnings
11 with_st_food taking_all_my_
earnings
12 was_illtreating taking_my_
earnings
my_carnings food_and_clothing
14 not_giving_me treat_me
15 was_illtreating_me clothes
16 he_was_always 41 ford
17 cruelly 42 me
18 very_cruel treating_me
19 supplying_me 44 illtreating
me_anything 45 each_season
21 beat_me 46 was_always
22 escape_from_him 47 therefore_ran_
away_from
my_earnings 48 diving_every_year
24 give_me_anything 49 clothing
25 did_not_give ghaus
arning si “ford” oo si
corpus contains spelling errors that can impede human understanding of the
expression generally speakin the approach detects them as belonging
to similar semantic context We have found that artifacts in the corpus
are not as distracting as we previously imagine
In this exampl there is a relatively close semantic relationship between
the but this is not always necessarily the cas Although the corpus
ing manumissio it is not composed entirely of the manumission statements
mentioned by Zdanowks as this query demonstrate The many thousands
of other documents articulated other discourses about the institution of slav
er
sions underscores how repetitiv even formulai the nature of these requests
wa
therei we hav found a particular “corner” of the Had we cho
such as “foo” “clothin” or “divin”
the query would have returned a wider variet and perhaps not as obviously
related or clea set of related term While useful for building semantic clusters
in a corpus based on iterativ individual query term cosine similarity’s mea
sure of the position of various words in a can also be seen as somewhat
with limited discovery potentia
6.2 Vector Arithmetic with Multiple Query Terms
Since cosine similarity is calculated based on the value of a given string within
vector spac more complex and combined queries are also possibl Vector
arithmeti for exampl allows us to perform mathematical operations on
s to explore relationships between a number of given query term And
importantl as the term “arithmetic” suggest the exploration is not always an
additive on but works like boolean logi By adding and subtracting multiple
vector semantic relationships and patterns can emerge at the intersection
of concepts of interest to the historia Such relationships might be thought
of as combined or contrasting similarities that can yield more complex rela
tionships akin to analog Such an arithmetic approach could be very useful
in exploring concepts that have received limited attention by historians in the
critical literatur sa the gendered aspects of manumission in the Gul
For exampl in the operation combining both subtraction and
“boat”“house”“master”
“boat”
“master”
“house” in “house”“boat”“master”
“house” combined with “mas
te “boa” Her the operation of subtraction
subtracted ter
In Tables 2 and we see that the order of the query terms in vector arithmetic
creating a list of words and cosine similarities that illustrate the
slavery in the Gul as detailed above in The experiences of enslaved
Women were pre
dominantly employed in domestic servic while men were more commonly
engaged in labor outside the househol particularly in pearl divin This gen
corpu most of which were submitted by pearl diver
In Tables any number of query words could have been cho
se but we use “house” and “boa” suggesting the gendered quality of space
allowing us to investigate how these contexts intersect with the concept of a
represented by the term “master”
quently used by enslaved persons requesting manumission to refer to slave
owner The comparison indicates the diverging experiences of male and
Top results for the vector operation “boat”“house”“maste” highlighting terms
associated with maritime labor and authority in the context of enslavemen The
model has been trained for 6 word window
with negative sampling of
Word <chr> Similarity to “boat”“house”“master” <dbl>
boat
master
beat
nakhuda
dhahi
sailed
jolly
lauuch
bankes
diving
female enslaved person The term “beating” appearing in the “boat” context
suggests that “master” was frequently mentioned alongside physical violence
in relation to men working on boat In contras in the “house” contex the
term “husband” emerge
ized power dynamics for wome
It is worth mentioning that for the two arithmetic operation there is a
slightly higher overall cosine similarity score for “house” “boat”“maste”
suggesting a richer relationship between these terms than for the second se
The choice of query terms is crucial in such an inquiry since it situates us in the
zones of the where related terms li The choice of “hous” “boa” and
“master” would almost certainly not lead us directly to lega administrativ
or diplomatic discussions of manumissio but as we saw in they
situate us squarely in the part of the corpus with the manumission request
In a corpus that captures a range of voices and discourse vector arithmetic
is invaluable for identifying localized semantic “neighborhoods”
such as
we have seen here with domestic or maritime setting Howeve understand
ing the broader relationships between these clusters requires an approach
that can map comple association By utilizing principal
component analysis in the next sectio we can extend this exploration
to visualize how multiple terms and their contextual neighborhoods interact
Top results for the vector operation “house”“boat”“maste” highlighting terms
associated with domestic labor and authority in the context of enslavemen The
model has been trained for 6 word window
with negative sampling of
Similarity to “house”“boat”“master”
house
master
husband
naster
mistress
died
death
serri
hasir
grandmother
within a larger semantic spac This method allows us to observe not only how
individual “neighborhoods” are organized but also how they overlap and con
tras
corpus and highlighting both expected and novel connection
6.3 Terms Associated with Multiple Keywords with PCA
When analyzing a corpu certain concepts might be nuanced or
chmidt and
perspectives within a corpu One last example of how we use the wordVec
tors package to explore complex word relationships in our custom is by
plotting terms associated with multiple keywords using Principal Component
This approach has proven useful given the heterogeneity and
quality of the By using to plot terms associated
with multiple input keyword we can explore how these concepts interact and
overlap within the corpu By examining these relationship we gain insights
into the semantic connections between term enabling a richer understand
ing of the corpus’s multilayered natur
Figure 2 “nakhuda” boat captai
“power” “weapon” “t” “cler” and “mistres” For exampl the term
“power” chosen as one of the search term is used by colonial forces to refer
to themselve and it can be found in the upper right quadrant of the plo
we see related terms like “powe” “undertak” “regulation” “signa
tory power” and “limit
and sovereignt While this terminology has not appeared in earlier analyses
in this articl it is prominent in where the colonial understand
ing of enslavement intersects with the economic and political concerns of
empir By selecting terms that represent key topics in the corpu this type
of analysis and visualization highlights the nuances and interconnectedness
of these concept
As noted earlier in the discussion of cosine similarity querie the keywords
for this analysis were not selected based on critical literature about enslave
ment in the Gul Instea they emerged through iterative cycles of close and
distant reading of For demonstration purpose we chose terms that
produced a clear and evenly distributed arrangement in Figure 2 This distri
with one notable
exceptio In the area of the plo the terms “tc” and “weap
ons” appear much closer to each othe The term “weapon” in fac is partially
obscured by the red vector line and its related term emphasizing that these
two keywords are semantically the most connected in this particular quer
“nakhuda” [upper left]“powers” [upper right]“weapons” [center
right]“tc” [center right]“clerk” [bottom center]“mistress” [lower left]
in red and their contextual associations in black plotted in a dimensional
spac generated using on our custom trained
The model has been trained for 6 word
window with negative sampling of
This proximity suggests a meaningful relationship in the corpus between the
ther research exploratio
Another observation about the plot concerns the overall orientation of
its axe
terns in the dat In Figure 2 clusters on the right side of the plot largely cor
respond to colonial infrastructur Western sovereignt and the arms and slave
trade In contras the left side contains terms related to the lived experiences
of enslaved individuals in the Gul This dichotomy provides insight into the
contrasting discursive landscapes within the corpu
The distinct clusters in Figure 2 were chosen deliberately to illustrate
key discursive “neighborhoods” in the dat By using multiterm querie we
can uncover nuanced relationships between term
understanding of how concepts are interconnected and contributing to broader
themes in the debates surrounding slaver This exploratory approac like
those discussed in and reveals both familiar and unexpected
pattern deepening our interpretation of Had we selected rare or
less informativ By carefully choosing keyword we can identify meaningful
semantic cluster which serve as starting points for further comparative close
readin This iterative process of “reading” and “rereading” enables researchers
to navigate concepts in the corpu ultimately bridging the diverse
6.4 Discussion
Our research into the manumission and slavery documents from the Gulf
region underscores the complexities and limitations inherent in applying
computational methods to multilingual corpor Such docu
ment which include a wealth of entities from languages and
explore topics both inside and outside Western perspective present unique
challenge draw
ing on decades of digital humanities expertise in Western environment are
not always easily or directly transferable to such context This disparity is
the digital infrastructures and resources
that support computational analysis at scale model pre
trained language model are more advanced in Western contexts than in the
Islamicate world and other no region
In this article we have used to create a corpus and then used a single R
package written by digital humanist demonstrating three analytical querying
approaches with custom The package uses a “bag of words” approach
to custom train its mode Our approach should be considere therefor an
initial and exploratory one for understanding the larger digitization of the
by the
able within the same packag but there are other methods for working with
word vectors including dynamic vector for exampl if we are interested in
semantic change over tim or even others that stretch beyond the word2Vec
methodology
seem desirabl our corpus in its current state does not have enough data yet
This disparity becomes particularly evident when considering the scale and
scope of our corpu While computational methods like word vector analysis
are powerful when applied to large dataset our corpus is relatively small com
pared to those used in Western digital humanities project This limitation has
implications both for the kinds of analysis we can perform and the conclusions
we can dra Had we been working with a much larger document collection or
a more homogeneous corpu the patterns and associations we could uncover
might be more robust and revealin Scaling up the corpu howeve may not
cerning slavery and manumissio The idea of scale is not only complex for
discourses of manumission but might even be impractical for the way we tra
ditionally conceive many projects in the historical humanitie
The creation of a humanities corpus is an expensive proces especially
in terms of human labo In our researc corpus creation not only involved
digitizing document but also making critical decisions about how to process
them and choosing the best methods for their analysi This challenge is ampli
source where the methods for text cre
ation do not always lend themselves to straightforward querying or analysi
where data can often be collected in large quan
tities through automated processe humanities research frequently involves
manual data collection and curation from discrete archival collection The
question of scaling humanities data is an important on but it lies behind
articl with transformer models allows us to create much
more text for analysis than bespoke model and yet custom s allow us to
including the artifacts
of a scaled proces Although the future of computational methods used
in archives is uncertai it is tempting to suggest that a similar combination of
methods will become popular in historical researc text creation as a more
generic nd perhaps process and iterative analysis by researchers
using scripting languages such as Python or
We discussed the issue of multilingual corpora abov but the process of
working with the
computational processes has also raised many questions of method for u
Code available her httpithuo
Data
While it may be possible to bring together very large collections of multiple
millions of words for certain moments of time about the Gulf regio there
is neither consistency nor representativeness within those archival dat How
ent kinds of voices? How can we be sure that any chosen model’s transcription
of typewritten and handwritten materials is of equal quality? And the list of
questions goes o As we keep these questions in min we also contend that
computational methods in historical research should not be judged solely by
whether they produce anticipated result rathe their value often lies in the
insights gained through experimentatio even when unexpected outcomes
aris In our stud each analytical stage has provided new understandings that
inform our approac digitizatio
transcriptio and broader infrastructure design for future researc These
iterative adjustments are essential in developing robust digital resources and
7 Future Works and Conclusion
While the documents we have selected provide insight into the history of
manumission and slavery in the Gulf regio they represent just one part of a
Records
spective among man
We are fortunate to have the for the study of the historical Gulf from
the perspective of the It contains a wealth of information about how the
British were trying to make sense of questions such as manumission and its
position within Islamic societ The materials also provide perspectives
through which this moment in Gulf history can be studied from the perspec
tive of colonialit
Some doubt remains about using the method we have discussed here at
scal While working on a robust model for Arabic remains an important
objectiv
tents of the archives given the method we have laid out in this articl The
is indeed one of the largest online repositories of digitized material in
but since so much of the material contained
in it was removed before the collections were transferred back to London
in the twentieth centur it is an open question whether a custom word vec
tor model would be a feasible approac An alternative approach using
created text and vector models may need to be adopte It is also
unclear whether
the corpu
the method we propose of creating a corpus using from
the digitized collections is not only transferable to other or
In our
cas we believe that there is promise in identifying other
interest that are topically connecte such as those about pearling and pirac
We expect that the topical diversity within them will be similar to that of the
results discussed in but will provide new vantage points from
which to explore more deeply the interconnected discourses of coloniality in
the Gulf with computational method
Bibliography
Allen “Slave Tradin Abolitionis and ew Systems of Slavery’ in
the Indian Ocean World” In Indian Ocean Slavery in the Age
of Abolition edited by Harms Freamon and Blight
18124ns
Bishara Fahad Ahmad7 A Sea of Debt: Law and Economic Life in the Western Indian
Ocean, 1780–1950 Cambridge s httpor7
9
Blankenship Avery Sara Connell and Quinn Dombrowski4 “nderstanding and
s” Programming Historian 4 httpoi
r6
Bsheer Rosie Archive Wars: The Politics of History in Saudi Arabia Stanford
s httpor517
Campbell Gwyn “Servitude and the Changing Face of the Demand for Labor
in the Indian Ocean Worl ” In Indian Ocean Slavery in the Age of
Abolition edited by Harms 124n
Press
Cordell Ryan and David Smith2 Viral Texts: Mapping Networks of Reprinting in
19th-Century Newspapers and Magazines Accessed 4 httpiral
textorg
t Anton Thora Hagen Leonard Konles1 “
and ased s” In Computational
Humanities Research n Folgert Karsdorp Melvin Wevers
Tara Lee Andrews Manuel Burghardt Mike Kestemont s
Michael Piotrowskit 16 Amsterdam rg http
wwr98ong_pdf
fastTex Accessed httpasttex
Freamon Bernard “Straigh Slaver Abolitio and Modern Islamic
Thought” In Indian Ocean Slavery in the Age of Abolition edited by
Harms Freamon and Blight 18124 n
versity Press
Gavin Michael9 “Is There a Text in My Data? s” Journal
of Cultural Analytics September 179 httpor214
Harms Robert “Introduction” In Indian Ocean Slavery in the Age of Abolition
edited by Harms Freamon and Blight 115
Havens
Hopper 5 Slaves of One Master: Globalization and Slavery in Arabia in
the Age of Empirens
f Gabi2 “Computation as Contexistant
Reading Debate” College Literature 49 1 125 httpori2
n Fady “Building Handwritten Ground Truth for with the Google Vision
in Google Drive” OpenGulf httppengulithut
Kapa Almazha Suphan Kirmizialti Rhythm Kukrej
Collections from the Multilingual Persian Gul” In Proceedings of the 6th Digital
Humanities in the Nordic and Baltic Countries Conference (DHNB 2022)
Berglun Matti La Mel and Inge Zwar 289 Swede
httr
McDow “Deeds of Freed Slave
Mobility in Zanzibar” In Indian Ocean Slavery in the Age of Abolition
edited by Harms Freamon and Blight 16181
Havens
Abu Dhabi Working Group4 “Arabic s”
HTR Modelhttpw ranskriburode
written
Pedrazzin and Barbara McGillivra “D
19t [Data set] httpor281
enod181682
Penningto Richard Soche and Mannin “GloV Global
Vectors for Word Representatio Accessed August 6
httpltanfordrojectlov
Rabus Achim2 “Handwritten Text Recognition for Croatian Glagolitic” SLOVO 72
181192 httpor174
ReadCoo 4 “Introducing Transkribus Super Model Get Access to the ‘Text
Titan ’ReadCoop [blog]4 httpeadcoo
Schmidt Ben 5 “Vector Space Models for the Digital Humanities” Bookworm
[blog]4 httpookworenschmidrost5
tml
Schmid Be and Sarah Connel “Word Vectors Visualization” [notebook]
Accessed httpithuo
loaiordVectormd
“The Political Residenc Bushire4 Qatar Digital Library Accessed 4
httpw d
d Ted9 Distant Horizons: Digital Evidence and Literary Change Chicago
s
Verheulp Hannu Salmi Martin Riedla Lorella Viola k and
l 2 “sing Word Vector Models to Trace Conceptual Change Over
184914” Digital Humanities Quarterly
16 2 4 httigitalhumanitierho
tml
Wevers Melvin and Marijn Koolwen “D Tracing Seman
s” Historical Methods: A Journal of Quantitative
and Interdisciplinary History 4624 httpor161544
7
Women Writers Projec Accessed http
wwortheasterd
Zdanowskiy1 “The Manumission Movement in the Gulf in the First Half of
the Twentieth Century” Middle Eastern Studies 47 688 Accessed
4 httpwandfonlincooul27121
Zho Camille Lyans Col and Che “Basreh or Basra? Geoparsing
Historical Locations in the Svoboda Diarie” In Proceedings of the 62nd Annual
Meeting of the Association for Computational Linguistics (Volume 4: Student Research
Workshop) httpor865
Appendix
copies located in the Qatar Digital Librar
Title of document Document
trade correspondence
httpw drchiv
vdc_1
Manumission of slaves
on Arab Coas individual cases
httpw drchivdc
_1
68
Arab Coas individual cases
httpw drchivdc
_1
Manumission of slaves
on Arab Coas individual cases
httpw drchivdc
_1
File 5 Manumission of slaves
on Arab Coas individual cases
httpw drchiv8dc
_1
slaves at Kuwait
httpw drchivdc
_1
87
slave trade
httpw drchivdc
_1
88 189
as a result of slaves taking refuge in
consulates and agencie manumission
of slaves and general treatment of slave
trade cases
httpw drchivdc
_1
Manumission of slaves at
Musca individual cases
httpwqdrchivdc
_1
Manumission of slaves
at Musca individual cases
httpw drchiv8dc
_1
Manumission of slaves at
Musca individual cases
httpw drchivdc
_1
Musca individual cases
httpw drchivdc
_1
and Indians on the Mekran Coast and
exporting them for sale at Oman and
Trucial Coast
httpw drchivdc
_1
Title of document Document
Individual slavery cases httpw drchivdc
_1
Individual slavery cases httpwdrchivdc
_1
httpw drchivdc
_1
httpw drchivdc
_1
5
Persian Gulf
httpw darchivdc
_1
94 195 179 169
Kidnapping of individual manumis
sion of slaves at Kuwait and Bushir
miscellaneous slavery cases
httpw drchivdc
_1
94 195 179 169
Kidnapping of individual manumis
sion of slaves at Kuwait and Bushir
miscellaneous slavery cases
httpw drchivdc
_1
File 597
Sharja and Henja
httpw drchivdc
_1
98
Trucial Coast
httpw drchivdc
_1
19
persons on the Trucial Coas purchase
and export of slaves from the Trucial
Coas Saudi Government’s regulations
httpw drchivdc
_1
rules relating to cases arising out of the
pearling industry
httpw drchivdc
_1
general rules and procedure on slave
httpw drchivdc
_1
cont.
Title of document Document
5
emancipated slaves and proposal to
Oman
ports and Zanzibar
httpw drchivdc
_1
authorities of surrendering fugitive
slaves
httpw drchivdc
_1
Appendix
We trained four Word Vector Models with the parameters described in the table belo
These models are available at httpor28enod4192194
Name of model n-gram Dimensions Iterations Window Negative
samples
Total
training
words
HTR997K1g150
d20i6w5ns.bin
HTR997K2g150
d20i6w5ns.bin
HTR997K2g150
d20i6w15ns.bin
HTR997K3g150
d20i6w5ns.bin
cont.