Laurette Pretorius's research while affiliated with University of South Africa and other places

Publications (64)

Article
Full-text available
The central research question that is addressed in this article is: How can ZulMorph, a finite state morphological analyser for Zulu, be employed to add value to Zulu lexical semantics with specific reference to Zulu verbs? The verb is the most complex word category in Zulu. Due to the agglutinative nature of Zulu morphology, limited information ca...
Article
Full-text available
The Grammatical Framework (GF) not only offers state of the art grammar-based machine translation support between an increasing number of languages through its so-called Resource Grammar Library, but is also fast becoming a de facto framework for developing multilingual controlled natural languages (CNLs). For a natural language to share maximally...
Article
Full-text available
A translation memory system attempts to retrieve useful suggestions from previous translations to assist a translator in a new translation task. While assisting the translator with a specific segment, some similarity metric is usually employed to select the best matches from previously translated segments to present to a translator. Automated metho...
Article
Full-text available
Afrikaans het in sy eerste 90 jaar as amptelike taal van Suid-Afrika van 'n sogenaamde kombuistaal tot 'n volledige onderwys-, wetenskaps- en kultuurtaal ontwikkel. Tog is Afrikaans in die fisiese ruimte waarin sy sprekers hulle bevind, toenemend onder druk en word sy amptelike gebruik weereens bedreig. Vanweë die tegnologiese voortuitgang op die g...
Conference Paper
Performing cross-lingual natural language processing and developing multilingual lexicographic applications for languages with complex agglutinative morphology pose specific challenges that are aggravated when such languages are also under-resourced. In this paper, Zulu, an under-resourced language spoken in Southern Africa, is considered. The verb...
Article
Tswana, a Bantu language in the Sotho group, is characterised by an agglutinative morphology and a disjunctive orthography, which mainly affects the verb category. In particular, verbal prefixes are usually written disjunctively, while suffixes follow a conjunctive writing style. Therefore, Tswana tokenisation cannot be based solely on whitespace,...
Article
Full-text available
The benefits of incorporating Semantic Web Services in web applications are well documented. However, both the real-world implementation and adoption of these services are still rather limited in practice. This is despite the promises that extend syntactic Web services with capabilities such as automatic service discovery, composition, and executio...
Article
Full-text available
Setswana is an agglutinative language where prefixes and suffixes are extensively used in the formation of words. Words such as verbs, pronouns, adjectives and so on, which have a grammatical relationship with nouns in sentences, demonstrate agreement with such nouns by means of agreement morphemes. In certain instances verbs in Setswana sentences...
Article
With regard to the significance of computational morphological analysis for Zulu, and more specifically for Zulu lexicography, the aims of this article are the following: Firstly, we discuss computational morphological analysis as an enabling technology in the field of natural language processing (NLP) with specific reference to lexicography. Secon...
Article
Language resources, by their very nature, serve as a repository of linguistic knowledge. They are therefore essential in the building and improvement of natural language applications. The aim of this paper is to elaborate on the practice and the experience gained in the development, maintenance and management of such resources with specific referen...
Article
This article describes the progress made in developing a broad-coverage computational morphological analyser for Tswana, an agglutinative disjunctively written indigenous South African language. Two central questions are addressed, viz. what is essential in order to build a broad-coverage computational morphological analyser/generator and how do we...
Conference Paper
Full-text available
This paper presents an approach to multilingual ontology verbalisation of controlled language based on the Grammatical Framework (GF) and the lemon model. It addresses specific challenges that arise when classes are used to create a consensus-based conceptual framework, in which many parties individually contribute instances. The approach is presen...
Article
The process of developing semantic services is viewed by service developers as being complex, and tedious. The main barriers that have been identified include a steep learning curve for emerging semantic models and ontological languages, the lack of integrated tool support for developing semantic services, and lack of interoperability between emerg...
Conference Paper
Real-world implementations of semantic services that could enable seamless integration of heterogeneous systems on the fly are deficient. This could be attributed to the complexity of heavy-weight semantic technologies, which mostly have a steep learning curve. As a consequence, the evolvement of modern approaches that purport to simplify the engin...
Article
Full-text available
Human language technology (HLT) has been identified as a priority area by the South African government. However, despite efforts by government and the research and development (R&D) community, South Africa has not yet been able to maximise the opportunities ...
Article
This article reports on a practical, semi-automated procedure towards creating a clean, morphologically annotated Zulu corpus of tractable size that could eventually serve both as a gold standard for Zulu computational morphology and as basis for further linguistic annotation. A corpus development architecture is proposed which includes the corpus...
Conference Paper
This article explores the relevance of information ethics, the field that concerns itself with the study of ethical issues arising from the development and use of such technologies, for a specific information technology viz. Web services. In particular, the Web services architecture, as conceptualised by the W3C, is analysed using Floridi's theory...
Conference Paper
Full-text available
The emergence of Semantic Web Services is stimulating the need for modern enterprises to efficiently and rapidly develop and deliver machine-processable and machine-interpretable value-added services in order to automate a variety of tasks on the Web. However, semantic-based services are scarcely adopted and utilised as there are few real-life exam...
Conference Paper
Semantic Web Services are touted as one possible solution for some of the challenges experienced with Web services; such as lack of automatic service discovery and consumption. Ideally, semantic services are meant to facilitate automatic business service provisioning and consumption on the Web. These services are enriched with semantics, which are...
Conference Paper
Ontologies, and in particular upper ontologies, are foundational to the establishment of the Semantic Web. Upper ontologies are used as equivalence formalisms between domain specific ontologies. Multilingualism brings one of the key challenges to the development of these ontologies. Fundamental to the challenges of defining upper ontologies is the...
Article
In this paper the development of two basic computational aids in Zulu natural language processing, namely a morphological analyser, built with the Xerox finite-state tools (Beesley & Karttunen, 2003) and a machine-readable lexicon as an XML document, are discussed. We briefly consider the linguistic characteristics of an agglutinating language such...
Conference Paper
The paper provides an overview of a project on computational morphological analysers for the Nguni cluster of languages namely Zulu, Xhosa, Swati and Ndebele. These languages are agglutinative and lesser-resourced. The project adopted a finite approach, which is well-suited to modelling both regular morphophonological phenomena and linguistic idios...
Conference Paper
Setswana is characterised by a disjunctive orthography according to which verbal prefixal morphemes are usually written disjunctively, while suffixal morphemes to the verb root follow a conjunctive writing style. This article specifically focusses on a finite-state approach to Setswana verb morphology and the challenges of the disjunctive orthograp...
Article
Copyright: 2009 Association for Computational Linguistics. EACL Workshop on Language Technologies for African Languages, Athens, Greece, 31 March 2009 Setswana, a Bantu language in the Sotho group, is one of the eleven official languages of South Africa. The language is characterised by a disjunctive orthography, mainly affecting the important word...
Conference Paper
Full-text available
Setswana, a Bantu language in the Sotho group, is one of the eleven official languages of South Africa. The language is characterised by a disjunctive orthography, mainly affecting the important word category of verbs. In particular, verbal prefixal morphemes are usually written disjunctively, while suffixal morphemes follow a conjunctive writing s...
Article
Workshop on Controlled Natural Language (CNL 2009). Marettimo Island, Italy, 8-10 June, 2009 In this paper researchers discuss a number of structural problems that are faced with when designing a machine-oriented controlled natural language for Afrikaans taking the underlying principles of Attempto Controlled English (ACE) and Processable English (...
Article
Full-text available
2009 Association for Computational Linguistics, EACL Workshop on Language Technologies for African Languages, Athens, Greece, 31 March 2009 This paper investigates the possibilities that cross-linguistic similarities and dissimilarities between related languages offer in terms of bootstrapping a morphological analyser. In this case an existing Zulu...
Article
The development of a large-coverage, computational morphological analyser for Zulu requires the modelling not only of the regular phenomena often associated with word formation, but also the idiosyncratic behaviour that may occur in Zulu morphology. This paper discusses the application of an existing rule-based, finite-state morphological analyser...
Article
The development of a computational morphological analyser for Setswana necessitates the accurate modelling and implementation of, among others, compounding as a word formation process. Compounding is known to be an area of Setswana morphology that has sadly been neglected and still requires much investigation and research. The main purpose of this...
Article
Lexical information for South African Bantu languages is not readily available in the form of machine-readable lexicons. At present the availability of lexical information is restricted to a variety of paper dictionaries. These dictionaries display considerable diversity in the organisation and representation of data. In order to proceed towards th...
Conference Paper
Full-text available
The development of natural language processing (NLP) components is resource-intensive and therefore justifies exploring ways of reducing development time and effort when building NLP components. This paper addresses the experimental fast-tracking of the development of finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by u...
Conference Paper
Speech-synchronized facial animation forms an increasingly important aspect of computer animation. The majority of commercial animation products are produced using the English language. Major stakeholders in the industry are the producers of animated movies and the developers of computer games, while the creation of conversational agents for commun...
Article
In this paper the development of computational morphological analysers for six South African Bantu languages is discussed. Due to the rich agglutinating morphological structures of these languages, the morphological processing poses particular challenges. These challenges are of an orthographical, a morphological as well as of a lexical nature. The...
Article
During its almost forty years of existence, facial animation has seen a host of technologies being invented, then fading into obsolescence. The modelling and animation methods have mostly been dictated by the available hardware, which greatly evolved through the years. Many animation desiderata that have been considered dreams at the time are now r...
Conference Paper
The development of computational morphological analysers for South African Bantu languages is linked to a project funded by the National Research Foundation in South Africa. The main research question in the project concerns the development of finite-state morphological analysers for five Bantu languages, namely Zulu, Xhosa and Swati (belonging to...
Article
In this paper we discuss the OSI model and an extension thereof, as well as an agent design framework as complementing design methodologies in cyberspace. We explore some ethical issues prevalent in the OSI model and its extension and in the agent design framework. We conclude that this model and framework may constitute complementary bases for exp...
Article
The purpose of this paper is to describe a methodology for requirements elicitation of traditional higher educational environments in order to gain a comprehensive understanding of the critical processes of this application domain. Such an understanding might be instrumental in the strategic planning and the realizing of e-learning implementations...
Article
Agent computing, and in particular intelligent mobile agent computing, is at present awarded increasing prominence in the literature. This is partly due to the pervasive nature of available Internet technologies such as search engines and booking agents. It is within this context that the importance of investigating various characteristics demonstr...
Conference Paper
In this paper certain ethical and social issues, insights and lessons learnt surrounding the spread of misinformation resulting from a hoax e-mail sent in South Africa on September 11 2001, linking South Africans to the World Trade Center disaster in New York are considered. A case study, based on the South African newspaper press coverage that thi...
Article
The aim of this paper is to discuss aspects of an on-going project on the development of grammatical and lexical resources for Zulu with sufficient coverage for unrestricted text. We explain how the basic software tools of computational morphology are used in linguistic processing, more specifically for automatic word form recognition and morpholog...
Article
Full-text available
This paper considers insights and lessons learnt surrounding the spread of misinformation resulting from a hoax email sent in South Africa on September 11, 2001. That email purported to link South Africans to the World Trade Center disaster in New York on 9/11. This paper discuses a case study based on the South African newspaper press coverage thi...
Article
As one of the largest of the 11 official languages of South Africa, Zulu is spoken by approximately 9 million people. It forms part of a language family which is characterized by rich agglutinating morphological structures. This paper discusses a prototype of a computational morphological analyzer for Zulu, built by means of the Xerox finite state...
Article
The advent of the Information Age and global connectivity has placed ethics center stage in the use of Information and Communication Technologies (ICT). As the drive towards the establishment of a socalled IT profession gains momentum, ethical conduct and codes of ethics have recently been formulated and introduced formally. Initiatives in this reg...
Article
Usability in developing countries presents special challenges. Using her own South Africa as an example, Diane Norton examines the issues of diversity in "rainbow nations" and questions whether designing to accommodate the associated differences in understanding ...
Conference Paper
The advent of the Information Age and global connectivity has placed ethics center stage in the use of Information and Communication Technologies (ICT). As the drive towards the establishment of a so-called IT profession gains momentum, ethical conduct and codes of ethics have recently been formulated and introduced formally. Initiatives in this re...
Article
Morphological analysis is a basic enabling application for further kinds of natural language processing, including part-of-speech tagging, parsing, translation and other high-level applications. Automated morphological analyz-ers exist for many of the European languages, but have not been reported for any of the indigenous languages of southern Afr...
Article
The computational treatment of morphologically complex languages, such as those belonging to the Bantu language family, requires as basic computational aids a machine-readable lexicon containing a list of all the word roots in the language as well as a computational morphological analyser/generator.In Zulu, where words are formed by productive affi...
Article
The purpose of this paper is to describe a methodology for requirements elicitation of traditional higher educational environments in order to gain a comprehensive understanding of the critical processes of this application domain. Such an understanding might be instrumental in the strategic planning and the realizing of e-learning implementations...
Article
The advent of the Information Age and global connectivity has placed ethics center stage in the use of Information and Communication Technologies (ICT). As the drive towards the establishment of a so-called IT profession gains momentum, ethical conduct and codes of ethics have recently been formu-lated and introduced formally. Initiatives in this r...
Article
With the advent of the Information Age and global connectivity, the ethical use of Information and Communication Technologies (ICT) has become essential. As the drive towards the establishment of a so-called IT profession gains momentum, ethical conduct and codes of ethics have recently been formulated and introduced formally. Forerunners in this i...
Article
The purpose of this paper is to describe a methodology for requirements elicitation of traditional educational environments in order to gain a comprehensive understanding of the critical processes of this application domain. Such an understanding might be instrumental in the strategic planning and the realising of e -learning implementations that a...
Article
Ubiquitous computing and global connectivity via the internet and the worldwide web raise new ethical as well as social questions and issues, and often require new interpretations of, and a fresh look at existing ones. In this paper we consider the teaching of Computer Ethics (CE) to our computing students. We review issues such as why the teaching...

Citations

... They range from corpus creation for data-driven NLP, such as the IsiZulu National Corpus (Khumalo, 2015) that was used for a statistical language model for a spellchecker (Ndaba et al., 2016), the Mashakane grassroots initiative 3 that focuses on data-driven machine translation for multiple African languages (Nekoto et al., 2020), to data-driven text-to-speech (Marais et al., 2020) based on Qfency 4 , and other language modelling and data augmentation (e.g., (Byamugisha, 2020;Mesham et al., 2021); see also Kambarami et al. (2021) for an overview). The main knowledge-driven approaches include terminology development in general (Khumalo, 2017) and domain-specific (e.g., (Engelbrecht et al., 2010)), and rule-based morphological analysers (Pretorius and Bosch, 2003;Bosch and Pretorius, 2017), grammars (Bamutura et al., 2020;, and natural language generation (Keet and Khumalo, 2017;Byamugisha, 2019;Mahlaza and Keet, 2020). Most of the research has taken place over the past 5-10 years and is gaining pace, albeit still for only a slowly increasing number of NCB languages. ...
... Therefore, the embracing of ethical codes by those involved in the education 585 | P a g e www.ijacsa.thesai.org of software development can significantly contribute to the improvement of ethical awareness of the future graduates. It is on that note that Barnard et al [17] advocate that the teaching of ethics concepts and actions that underpin them should be integral part of the training of any future ICT professional. Therefore, those involved in the teaching of software development should be aware of the profession's ethics so they that can be in a position to teach the students accordingly. ...
... Usually, the process of post-editing MT output requires more time than correcting TM matches [6] because of conflicts in translations [7]. Another significant metric used to judge the performance of TM is match metric that evaluates the amount of similarity between the input sentence to be translated and matched sentence retrieved in TM [8]. Moreover, a TM can offer the advantage of modifying, reusing by a translator when the matches are approximate. ...
... Hulle moet verder bewus gemaak word van 'n Afrikaanse akademiese wêreld terwyl hulle bande met hulle kultuurgroep versterk word. So kan studente byvoorbeeld bewus gemaak word van Afrikaanse vaktydskrifte en 'n internetbron soos die Afrikaanse Wikipedia (Pretorius 2016). Hulle kan ook aangemoedig word om self bydraes tot hierdie bronne te maak. ...
... As of 2019, the RGL has "complete" implementations of 35 languages-complete in the sense of covering the common abstract syntax and a set of inflectional morphology paradigms able to produce all word forms. The non-Indo-European languages are Arabic (Dada and Ranta 2006), Basque, Chinese (Ranta, Haiyan, and Tian 2015), Estonian (Listenmaa and Kaljurand 2014), Finnish, Japanese (Zimina 2012), Maltese (Camilleri 2013), Mongolian (Erdenebadrakh 2015), and Thai.At least 20 more languages are under construction, many of them Bantu languages (e.g.Ng'ang'a 2012;Pretorius, Marais, and Berg 2017;Kituku, Nganga, and Muchemi 2019).16 ...
... Case studies reflecting ethical dilemmas in Information Systems were used to help the students understand how to make ethical decisions using the theories that they had studied, applying the Code of Ethical Conduct of the Australian Computer Society and their own moral values. The case studies were used to help students to develop their analytical and critical thinking skills while making them aware of the ethical and social concerns in computing (Pretorius & Barnard, 2004). ...
... This category includes methodologies aiming to integrate ontology development into the software engineering community practices and vice versa. On one side, using UML as an intermediate minimizes the learning curve for developers, which promotes the use of ontology as a formal knowledge representation structure [65]. On other side, the software engineering discipline has gained matureness through the growing interest of industries. ...
... Similarly, nouns in classes 3, 5, 7, and 9 are all singular, while their associated plurals are assigned to classes 4, 6, 8, and 10. The assignment to a class is almost exclusively for ease of description [43,44], and is not semantically systematic or consistent. As an example, nouns associated with humans can occur in classes 1, 2, 3, 4, 7, 8, 10, and 11, while inanimate objects can occur in classes 3, 4, 9, 10, and 11. ...
... To acquire a common abstract syntax, a common semantic API, 10 we have extracted a set of shared semantico-syntactic frame valence patterns from the annotated sentences in BFN and SweFN. For instance, the shared valence patterns for the frame Desiring are: In addition to phrase types, the extracted valence patterns also specify inferred grammatical relations of NP-typed FEs: nsubj (subject), nsubjpass (passive subject), dobj (direct object) and iobj (indirect object) that correspond to the universal dependency relations (de Marneffe et al. 2014). Therefore, we also include the grammatical voice (Act=Pass 13 ) in the pattern comparison and in pattern identifiers used in the abstract syntax. ...
... The languages in this project are considered resource scarce compared to most other languages listed by the Global WordNet Association (2016) Although prototypes of rule-based morphological analysers have been developed for the mentioned two languages, these are not freely available yet (cf. Bosch & Pretorius 2011). ...