About
140
Publications
32,905
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,200
Citations
Publications
Publications (140)
In this article, we describe the development of two syllabification systems for South African Sesotho. First, we implemented a rule-based syllabification system based on the syllabification rules proposed by Guma (An outline structure of Southern Sotho, 2nd edn, Shooter and Shuter Publishers, Pietermaritzburg, 1982). The rules describe the location...
It is with immense pride and anticipation that we introduce the fifth volume of the Journal of Digital Humanities Association of Southern Africa (JDHASA), centred on the theme “Digital Humanities for Inclusion.”
Based on new methods and algorithms to improve the detection of emotions from facial expressions employing sophisticated software during certain situations, we explore if a virtual robotic animal appearance design with the role of a virtual instructor can affect such emotions in users. A total of 131 students from two secondary public schools in Bo...
This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repos...
There has recently been an increasing interest in Learning Management Systems (LMSs). It is currently unclear, however, exactly how these systems are perceived by their users. This article analyzes data on user acceptance for two LMSs (Blackboard and Canvas). The respective data are collected using a questionnaire modeled after the Technology Accep...
The Third workshop on Resources for African Indigenous Languages (RAIL) was held (in person) on 30 November 2022 in Potchefstroom. RAIL was organized by the South African Centre for Digital Language Resources (SADiLaR). It was co-located with the 10th Southern African Microlinguistics Workshop, which took place 1 to 3 December 2022.
The editors are happy to present the first special issue of the Journal of the Digital Humanities Association of Southern Africa (DHASA). In this issue, we bring together articles from the field of Digital Humanities with the underlying theme “Crossroads DH”. Under this umbrella topic, we investigate the manifold connections of Digital Humanities w...
The Covid-19 pandemic, which required more people to work and learn remotely, emphasized the benefits of online learning. However, these online learning environments, which are typically used on an individual basis, can make it difficult for many to finish courses effectively. At the same time, online learning allows for the monitoring of users, wh...
The book XR Academia: Research and Experiences in Virtual Reality, Augmented Reality, Mixed Reality, and Artificial Intelligence in Latin America and Europe, has at its core the objective of making immersive technology accessible and visible worldwide, with the simultaneous breaking-down of linguistic barriers. Both European and Latin American auth...
In this appendix, this work describes a framework for the creation of a conversational character in a mixed reality empathetic experience. The framework allows for the synchronization of emotional animations of the virtual character in line with the character's dialogue text, with the aim to improve the users' empathetic experience. The dialogue is...
South Africa recognizes eleven official languages, although more languages are spoken in the country. Most of these languages are considered underresourced: there is only a limited set of computational resources available. This includes linguistic data collections as well as computational linguistic tools. This scarcity of resources limits the comp...
Sands, Bonny & Kerry Jones (chief editors) in collaboration with Katrina Esau, et al. (2022). Nǀuuki Namagowab Afrikaans English ǂXoakiǂxanisi/Mîdi di ǂKhanis/Woordeboek/Dictionary. Stellenbosch: African Sun Media for African Tongue.
Available online at:
Web Portal: https://dictionary.sadilar.org/
Available as a phone app under the name Saasi Eps...
In recent years, a wide variety of mentorship programmes targeting issues that cannot be addressed through traditional teaching and learning methods alone have been developed. Mentoring plays significant roles in the growth and development of both mentors and mentees, and the positive impacts of mentoring have been well documented. Mentorship progr...
Virtual robots, including virtual animals, are expected to play a major role within affective and aesthetic interfaces, serious games, video instruction, and the personalization of educational instruction. Their actual impact, however, will very much depend on user perception of virtual characters as the uncanny valley hypothesis has shown that the...
The study of revision has been a topic of interest in writing research over the past decades. Numerous studies have, for instance, shown that learning-to-revise is one of the key competences in writing development. Moreover, several models of revision have been developed, and a variety of taxonomies have been used to measure revision in empirical s...
Feedback is important to improve writing quality; however, to provide timely and personalized feedback is a time-intensive task. Currently, most literature focuses on providing (human or machine) support on product characteristics, especially after a draft is submitted. However, this does not assist students who struggle during the writing process....
The use of virtual robot animals (VRAs) can have a potential impact on applications with affective and aesthetic interfaces. In particular, VRAs can be used in instructional videos in order to develop new ways to engage young learners and to foster personalization of educational instruction. In this paper, we explore the perception of the virtual i...
When performing a distant reading analysis of large amounts of literary texts, we would like to be able to automatically identify the high level structure or story lines of these texts. Story lines are not always linear, but contain transitions, such as flashbacks or changes of scenery. To identify these transitions, we propose a system that aims t...
Readability metrics provide information on how difficult a text is to read. This information is relevant, for instance, to identify suitable texts for learner readers. Readability metrics have been developed for several languages, but no such metrics have been developed for the indigenous South African languages. One of the limitations in the devel...
This is the preface to the DHASA2021 proceedings (including Book of Abstracts and RAIL proceedings).
Current writing support tools tend to focus on assessing final or intermediate products, rather than the writing process. However, sensing technologies, such as keystroke logging , can enable provision of automated feedback during, and on aspects of, the writing process. Despite this potential , little is known about the critical indicators that ca...
Essay tasks are a widely used form of assessment in higher education. Writing analytics can assist with challenges related to using essay tasks at scale and to identifying different issues in academic integrity. In this paper, we combined two techniques to investigate how students’ writing analytics varied across essay tasks with different cognitiv...
South Africa has eleven official languages. However, not all have received similar amounts of attention. In particular, for many of the languages, only a limited number of digital language resources (data sets and computational tools) exist. This scarcity hinders (computational) research in the fields of humanities and social sciences for these lan...
Since 2015, the Tilburg School of Humanities and Digital Sciences at Tilburg University has provided yearly funding for two student assistants in a Research Traineeship Program. The aim of this program is to provide the student assistants with practical research experience by allowing them to work together with two experienced researchers in a prac...
Learning dashboards are often used to provide teachers with insight into students' learning processes. However, simply providing teachers with data on students' learning processes is not necessarily beneficial for improving learning and teaching; the data need to be action-able. Recently, human-centered learning analytics has been suggested as a so...
Background. Empathic interactions with animated game characters can help improve user experience, increase immersion, and achieve better affective outcomes related to the use of the game.
Method. We used a 2x2 between-participant design and a control condition to analyze the impact of the visual appearance of a virtual game character on empathy and...
Given the importance of revision in writing, revision has been a main topic of interest in writing research. Several models of revision have been developed, and a variety of taxonomies have been used to measure revision in empirical studies. Current advances in data collection and analysis have made it possible to study revision in more detail. How...
In view of the growing urgency to protect wildlife, the general goal of our research is to develop an immersive virtual experience where users can step into the ‘shoes’ of wild animals. The specific objective of this research is to explore the possibility of creating a strong emotional connection experience with a virtual animal body. In a game set...
In this paper we explore whether the uncanny valley effect, which is found for human-like appearances, can also be found for animal-like virtual characters such as virtual robots and other types of virtual animals. In contrast to studies that investigate human-like appearance, there is much less information about the effects concerning how a virtua...
Background: The analysis of writing is complex, with planning, translating, and reviewing processes interacting in a non-linear fashion. Intuitively negligible activities such as the revisions of typing errors can have a large influence on the writing process, and hence also on the analysis of writing processes. For the analysis of writing, the imp...
Although music is an important part of cremation rituals, there is hardly any research regarding music and cremations. This lack of research has inspired the authors to conduct a long-term research project, focusing on musical and linguistic aspects of music played during cremations. This article presents the analysis of a playlist consisting of tw...
DHQ: Digital Humanities Quarterly. Preview: http://www.digitalhumanities.org/dhq/vol/13/3/000431/000431.html
This article offers a methodological contribution to manually-assisted topic modeling. With the availability of vast amounts of (online) texts, performing full scale literary analysis using a close reading approach is not practically feasib...
Keystroke logging is used to automatically record writers' unfolding typing process and to get insight into moments when they struggle composing text. However, it is not clear which and how features from the keystroke log map to higher-level cognitive processes, such as planning and revision. This study aims to investigate the sensitivity of freque...
Learning Management Systems (LMSs) play a significant role in educational technology. In this paper, we analyze different approaches in order to investigate the acceptance of an LMS. Utilizing questionnaire information structured on the Technology Acceptance Model (TAM), we apply descriptive network modeling and analysis complementing basic statist...
Automated writing evaluation tools have been shown to improve writing quality. However, the impact of automated feedback, and especially the timing of the feedback, on students’ writing process is still unknown. Hence, we analyzed how feedback timing influences the revision process. Three experimental conditions were implemented into the writing to...
The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using...
The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-d...
We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every li...
Languages vary in the way stress is assigned to syllables within words. This article investigates the learnability of stress systems in a wide range of languages. The stress systems can be described using finite-state automata with symbols indicating levels of stress (primary, secondary, or no stress). Finite-state automata have been the focus of r...
Learning management systems provide an easy and effective means of access to educational material. Students’ access to course material is logged and the amount of interaction is assumed to be a measure of student engagement within the course. In previous research, frequencies of student activities have typically been used, but this disregards any t...
The sequences of keystrokes that are generated when writing texts contain information about the writer as well as the writing task and cognitive aspects of the writing process. Much research has been conducted in the area of writer identification. However, research on the analysis of writing processes based on sequences of keystrokes has received o...
People often ask others for product advice. Once, word-of-mouth (WOM) was, due to practical limitations, shared locally. Nowadays, WOM is shared online (eWOM), which has a much larger reach. As eWOM is publicly accessible (unlike WOM), it can be used as information on brand attitude. eWOM can be aggregated and assessed using sentiment analysis (ide...
In the framework of the TraMOOC1(Translation for Massive Open Online Courses) research and innovation project, data collection tasks for parallel translation are implemented using a crowdsourcing platform. The educational genre (videolectures subtitles, forums discussions, course assignments), the type of text (segmentation, misspellings, syntax er...
In cremation rituals in the Netherlands, music plays an important role. However, what exactly this role is remains unclear. In the literature on cremation rituals, music has received little attention up to now. A computational analysis of music played during cremations and a subsequent comparison of the results of this analysis with results of the...
This book provides a thorough introduction to the subfield of theoretical computer science known as grammatical inference from a computational linguistic perspective. Grammatical inference provides principled methods for developing computationally sound algorithms that learn structure from strings of symbols. The relationship to computational lingu...
Grammatical inference is a subfield of theoretical computer science which aims to characterize, understand, and solve learning problems in terms of formal languages and grammars. The field of computational linguistics faces many different kinds of tasks which involve natural languages and learning. Many of these tasks aim to automate decisions and...
Research in the field of grammatical inference deals with learnability of languages. In general, the setup is as follows. Given a family of languages, one specific language is selected and a set of sample strings is extracted. The learner now has to identify the language, from the family of languages, that was used to generate the sample strings.
We conclude the book with a brief summary of what has been covered, the main lessons we wish to impart, and the open problems where research efforts ought to be directed.
In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Com...
Compounding, the process of combining several simplex words into a complex whole, is a pro-ductive process in a wide range of languages. In particular, concatenative compounding, in which the components are "glued" together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro projec...
Starcraft II is a popular real-time strategy (RTS) game, in which players compete with each other online. Based on their performance, the players are ranked in one of seven leagues. In our research, we aim at constructing a player model that is capable of predicting the league in which a player competes, using observations of their in-game behavior...
AnswerFinder is a framework for the devel- opment of question-answering systems. An- swerFinder is currently being used to test the applicability of graph representations for the de- tection and extraction of answers. In this paper we briefly describe AnswerFinder and introduce our method to learn graph patterns that link questions with their corre...
This chapter provides an overview of the most well-known settings and paradigms. The task of language learning deals with finding a language given a sample taken from that language. Typically, this language is described using a grammar, which is a compact, finite representation of a possibly infinite language. In general, language learning is perfo...
In this paper we investigate the impact of size of vocabulary, the number of classes in the classification task and the length of patterns in a pattern-based sequence classification approach. So far, the approach has been applied successfully to datasets classifying into two or four classes. We now show results on six different classification tasks...
Finding regularities in large data sets requires implementations of systems that are efficient in both time and space requirements. Here, we describe a newly developed system that exploits the internal structure of the enhanced suffixarray to find significant patterns in a large collection of sequences. The system searches exhaustively for all sign...
Grammatical inference is typically defined as the task of finding a compact representation of a language given a subset of
sample sequences from that language. Many different aspects, paradigms and settings can be investigated, leading to different
proofs of language learnability or practical systems. The general problem can be seen as a one class...
This paper revisits a problem of the evaluation of computational grammatical inference (GI) systems and discusses what role
complexity measures can play for the assessment of GI. We provide a motivation for using the Rademacher complexity and give
an example showing how this complexity measure can be used in practice.
In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large
contexts. We also show that we can build this kind of models with additional information fo...
This paper presents the outcomes of research into using lingual parts of music in an automatic mood classification system. Using a collection of lyrics and corresponding user-tagged moods, we build classifiers that classify lyrics of songs into moods. By comparing the performance of different mood frameworks (or dimensions), we examine to what exte...
In this article, we propose the use of suffix arrays to implement n-gram language models with practically unlimited size n. These unbounded n-grams are called ∞-grams. This approach allows us to use large contexts efficiently to distinguish between different alternative sequences while applying synchronous back-off. From a practical point of view,...
In this article we give an overview of various aspects of a project developing a spelling checker for Afrikaans. We discuss two of the main aims of the project, viz. for researchers to obtain practical experience, and to further learning of both researchers and students. This article, therefore, consists of two relatively independent parts that eac...
Current text-based question answering (QA) systems usually contain a named en-tity recogniser (NER) as a core compo-nent. Named entity recognition has tra-ditionally been developed as a component for information extraction systems, and current techniques are focused on this end use. However, no formal assessment has been done on the characteristics...
The problem of identifying and correcting confusibles, i.e. context-sensitive spelling errors, in text is typically tackled using specifically trained machine learning classifiers. For each different set of confusibles, a specific classifier is trained and tuned. In this research, we investigate a more generic approach to context-sensitive confusib...
When dealing with language, (machine) learning can take many different faces, of which the most important are those concerned with learning languages and grammars from data. Questions in this context have been at the intersection of the fields of inductive inference and computational linguistics for the past fifty years. To go back to the pioneerin...
In the context of confusible disambiguation (spelling corr ection that requires context), the synchronous back-off strategy combined with traditional n-gram language models performs well. However, when alternatives consist of a different number of tokens, this c lassification technique cannot be applied directly, because the computation of the prob...
Empirical grammatical inference systems are practical systems that learn structure from sequences, in contrast to theoretical
grammatical inference systems, which prove learnability of certain classes of grammars. All current empirical grammatical
inference evaluation methods are problematic, i.e. dependency on language experts, appropriateness and...
Applied Artificial Intelligence discusses applications of grammar induction (GI) that identifies grammar. A grammar is a rule-based, generative model of the elements in a possibly infinite set, where these elements are complex, structured objects like strings, trees, and graphs. The GI problem is to identify a grammar given some of the elements in...
In this article we describe our submission to the Dutch-English QA@CLEF task. We took the publicly available OpenEphyra question answering system, which is an open- source English question answering system. This was turned into a multi-lingual vari- ant by translating questions from Dutch to English using Systran's online-translation system. The cu...
Question answering on speech transcripts (QAst) is a pilot track of the CLEF com-petition. In this paper we present our con-tribution to QAst, which is centred on a study of Named Entity (NE) recognition on speech transcripts, and how it impacts on the accuracy of the final question answering system. We have ported AFNER, the NE recogniser of the A...
In this article, we describe our experiences with modifying and applying AnswerFinder, a generic question answering system that was originally designed to perform text- based question answering in English only, to multi-lingual question answering. In particular, we participated in the Dutch-English task. To enable handling of Dutch questions, we ad...
Named Entity Recognisers (NERs) are typically used by question answering (QA) systems as means to preselect an- swer candidates. However, there has not been much work on the formal assessment of the use of NERs for QA nor on their op- timal parameters. In this paper we investi- gate the main characteristics of a NER for QA. The results show that it...
Macquarie University's contribution to the QAst track of CLEF is centered on a study of Named Entity (NE) recognition on speech transcripts, and how such NE recognition impacts on the accuracy of the final question answering system. We have ported AFNER, the NE recogniser of the AnswerFinder question-answering project, to the types of answer types...
Lay people discussing machine translation systems often per- form a round trip translation, that is translating a text into a foreign language and back, to measure the quality of the system. The idea be- hind this is that a good system will produce a round trip translation that is exactly (or perhaps very close to) the original text. However, peopl...
This paper describes the Tenjinno Machine Translation Competition held as part of the International Colloquium on Grammatical
Inference 2006. The competition aimed to promote the development of new and better practical grammatical inference algorithms
used in machine translation. Tenjinno focuses on formal models used in machine translation. We dis...
In this article we present a syntax-based translation system, called TABL (Translation using Alignment-Based Learning). It trans- lates natural language sentences by mapping grammar rules (which are induced by the Alignment-Based Learning grammatical inference frame- work) of the source language to those of the target language. By parsing a sentenc...
Grammatical Inference (GI) concentrates on finding compact representations, i.e. grammars, of possibly infinite sets of sentences.
These grammars describe what sentences do or do not belong to a particular language. The process of learning the form of a
grammar based on example sentences from the language touches several fields. Here, we give an ov...