Table 6 - uploaded by Ryder A Wishart
Content may be subject to copyright.

Source publication
The focus of this study is Hellenistic Greek, a variation of Greek that continues to be of particular interest within the humanities. The Hellenistic variant of Greek, we argue, requires tools that are specifically tuned to its orthographic and semantic idiosyncrasies. This paper aims to put available documents to use in two ways: 1) by describing...
Context in source publication
Context 1
... almost all of the recorded errors involved unique tokens. When the tokens with a frequency of 1 are removed from consideration, and the frequency of the erroneous terms is factored in, the error counts take on a slightly different significance, as seen in Table 6. Error counts when factoring in token frequency. ...
Similar publications
The present didactic intervention aims to highlight the effective use of Electronic Text Corpora in the teaching approach of the Ancient Greek course. In particular, the teaching of a unit found in the school textbook of the 3rd grade of lower secondary school is examined at its vocabulary and semantic level, using the Digital Resources (http://www...
Citations
... Latent Dirichlet Allocation (LDA) is based on the underlying assumptions of the distributional hypothesis, (i.e. similar topics consist of similar terms) (Wishart and Prokopidis, 2017)and the statistical combination hypothesis (i.e. documents talk about several topics) for which a statistical distribution can be determined (Chen and Li, 2009). ...
Scientific research creates substantial quantities of peer-reviewed literature on a wide variety of constantly growing topics and sub-topics. Scientists and practitioners find it more arduous to assimilate this massive collection of literature. This study explores topic modelling using Latent Dirichlet Allocation (LDA) as a form of unsupervised learning and applies it to 11,187 articles’ abstracts, keywords, and titles from the Web of Science database to provide maximum contextual analyses of hydrology and flood-related literature published since 1939. We identified several essential topics in the corpus. The work structured the body of literature into its principal components, allowing researchers a quick grasp on which topics constitute hydrology and flood research. Relationships between specific individual topics were at times more prominent than others with implications on the interdisciplinary character and extensive impact of the topics such as 'Modeling and Forecasting'. In contrast, the topics 'Wetland and Ecology' and 'Urban Risk Management' are studied largely independently from other topics within hydrology and flood research. Also, trends in research themes were identified. Research on the topics 'Precipitation and Extremes' and 'Coastal hydrology' increased significantly which was not the case for all topics in hydrology and flood. In contrast to the manual literature review, this study used quantitative and qualitative methods and employed labelled topics to examine topic trends, inter topic relationships, and topic diversity. This thesis's methodology and findings may be advantageous to scientists and researchers seeking contextual knowledge of the present condition of flood-related literature. In the long run, we see topic modelling as a tool that will be used to enhance the efficiency of literature reviews, scientific communication, and can benefit science-informed legislation and decision making.
... At the same time, attention to the second direction may clarify this methodological development: that is, improving input data. There is particular need for attention to developing better tools for cleaning ancient language texts more consistently, efficiently, and accurately, and there is the simultaneous need of procuring larger corpora (Wishart and Prokopidis 2017). Thirdly, the notion of lexical fields as a hierarchical, distributional superstructure for lexical semantics requires more nuance, especially in terms of modelling what Ruhl refers to as contextual domains, which represent the contextual factors (whether stereotypical or probabilistic) that motivate the usage of more and more specific lexical sub-fields (see discussion of the 'domain condition' in [Ruhl 1989:175]). ...
This paper argues that the underdeveloped notion of semantic similarity in Louw and Nida’s lexicon can be improved by taking account of distributional information. Their use of componential analysis relies on a set of metalinguistic terms, or components, that are ultimately arbitrary. Furthermore, both the polysemy within their semantic domains and the organization of those domains problematize their categories. By contrast, distributional data provide an empirical measurement of semantic similarity, and lexicogrammatical categorization provides a non-intuition-driven principle of classification. Distributional data is gathered by word embedding, and lexicogrammatical categorization is based largely on a derived metric of abstraction. This argument is tested by considering probable semantic field relationships for a number of Greek lexemes. Ultimately, this approach provides directions to address some of the critical weaknesses in semantic domain or semantic field theory as applied to the study of Hellenistic Greek, by introducing empirical means of approximating lexical fields.
Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning.
The lexical and grammatical tradition within biblical studies leaves the interpretive guidelines for exegesis unformalized. Polysemy provides no direction in addressing this issue, but serves only to blur the distinction between the invariant meaning of linguistic signs and the contexts and co-texts that specify and constrain those invariant meanings. Rather than proliferating senses and functions, the minimalist priority of monosemy provides a better entry point into the task of modelling interpretive protocols, since it better enables empirical linguistic analysis. To this end I outline a robust theoretical basis, survey relevant works in the field, and through a case study of ἐν and its semantic field illustrate and explore the challenges and potential of empirical linguistic analysis of the biblical languages.