Gerhard Beukes Van Huyssteen

Gerhard Beukes Van Huyssteen
North-West University | NWU · Centre for Text Technology

Bachelor of Arts; BA Hons; Magister Artium; PhD

About

86
Publications
31,339
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
365
Citations
Additional affiliations
October 2010 - present
North-West University
Position
  • Professor (Full)

Publications

Publications (86)
Article
Full-text available
Views on constructionalisation and constructional change are at the forefront of construction grammar approaches to language change. In order to be able to talk about constructionalisation and constructional changes in a particular part of the constructicon, it is necessary to have both a diachronic and synchronic view of that network of constructi...
Conference Paper
Full-text available
Lexical units that are identical in form and that are traditionally referred to as either adpositions, adverbs, or particles (based on their morphosyntactic properties), can also be grouped together (based on semantic properties) under the term P-items (see for example Fontaine 2017). Although it is for most linguistic endeavours sufficient to refe...
Article
Full-text available
In Afrikaans and other Germanic languages there is a subcategory of exocentric compounds that can be used evaluatively as personal names. The focus of this article is on such exocentric compounds that are used pejoratively as epithets (i.e., epithetic exocentric compounds (EECs)), and even more specifically on EECs that are based on one of two conc...
Conference Paper
One of the central ideas in construction grammar (CxG) is that our linguistic knowledge is structured as a mental inventory of constructions, i.e., a constructicon. Descriptive representations of such constructicons are best characterised as a blend between CxG and lexicography, and has come to be known as constructicography (cf. Lyngfelt, 2018:2)....
Presentation
Full-text available
One of the central ideas in construction grammar (CxG) is that our linguistic knowledge is structured as a mental inventory of constructions, i.e., a constructicon. Descriptive representations of such constructicons are best characterised as a blend between CxG and lexicography, and has come to be known as constructicography (cf. Lyngfelt, 2018:2)....
Presentation
Full-text available
South Africa’s various multilingual communities – a mix of eleven official languages and many other, smaller languages – offer unique opportunities to study constructions in contact. The influence of two of the West Germanic languages on each other, English and Afrikaans, has been the subject of study for many years – see for example Bekker (2019),...
Article
Full-text available
Censorship is the practice of suppressing or controlling the creation of and/or public access to artefacts based on their content – for political and/or moral reasons. Such artefacts include the fine arts; performing arts, including film and theatre; the art of words, science, and journalism; and digital products such as computer games and websites...
Article
Die oorhoofse doel van hierdie artikel is om ’n eksemplariese opsomming te gee van die bestaande taalkundige beskrywings en leksikografiese bewerkings van hedendaagse fok en ander vormlik verwante woorde (soos befok, fokken, fokkol en opfok). Die artikel se spesifieke doelwitte is om (1) ’n konstruksienetwerk van die fok-familie op te stel wat as h...
Presentation
Full-text available
For many decades now, automatic content classification stood at the centre of artificial intelligence design and machine learning systems. One of the most famous examples is probably the Netflix Prize, which was an open competition to find the best algorithm that could predict user ratings for films (e.g., Greene 2006; Kasula 2020). Content classif...
Article
Full-text available
Navorsing oor vloekwoorde (hier gebruik as 'n hiperoniem om ander verskynsels en/of sinonieme in te sluit, waaronder swets, skel, (gods)laster en vuil taal) word al internasionaal vir baie jare in 'n verskeidenheid wetenskaplike dissiplines gedoen. Daarteenoor is daar baie min tot geen navorsing oor vloek in die Suid-Afrikaanse konteks gedoen nie....
Conference Paper
Full-text available
Likert type data is commonly used in many research fields in humanities: from ga u ging the usability of different user interface designs, to determining users’ likeliness to vote for a particular political party, to evaluation of course materials to name but a few examples. Despite its prevalence , there is still some disagreement within the stati...
Chapter
Within the blended learning environment, it is important to consolidate expert content and pedagogy inside and outside the classroom. Subject experts who serve as content developers play a vital role by contributing quality controlled subject content covered by the curriculum, which can be made available to students on digital platforms. However, i...
Article
Full-text available
Research on swearing (used here as a hypernym to include other phenomena and/or synonyms like cursing, profanity, taboo language, etc.) has been prevalent for many years internationally, also from a variety of scientific disciplines. Most of the research literature, however, is on swearing in English, although studies have also been conducted on so...
Article
Full-text available
Likert-type data is commonly used in many research fields in humanities: from gaging the usability of different user-interface designs, to determining users’ likeliness to vote for a particular political party, to evaluation of course materials–to name but a few examples. Despite its prevalence, there is still some disagreement within the statistic...
Article
In Germanic languages the linking morpheme, like the ·s· in Afrikaans seun·s·naam ‘boy’s name’, or ·en· in Dutch pann·en·koek ‘pancake’ is quite common. This word element has been the topic of discussion in the past, with no definite consensus about its origin or possible semantic input. There has been a renewed interest in this phenomenon, especia...
Article
Full-text available
A corpus exploration of huidiglik. In tandem with Van Huyssteen (2018a), this article examines the current usage of the word huidiglik (‘currently’) (an alleged Anglicism), together with other associated words (e.g. its base, huidig ‘current’). Based on a comprehensive literature review, Van Huyssteen (2018a) concludes that apart from stylistic pre...
Article
Full-text available
Norms of huidiglik. In Afrikaans, huidiglik is a truly Janus-faced word: it is being used with high frequency in especially spoken language, while at the same time being one of the biggest language pet-peeves of language practitioners (and even ordinary speakers of Afrikaans). When asked why huidiglikshould be avoided, these language practitioners...
Chapter
Full-text available
Over the past more than 100 years, Afrikaans associative plural constructions – especially constructions with hulle (‘they’) and goed (‘things/stuff; good’) as right-hand components – have been studied from both diachronic and synchronic perspectives, but with the main interest in their origins, and what they could tell us about the genesis of Afri...
Article
Full-text available
Toe die eerste Afrikaanse woordelys en spelreëls (AWS) meer as honderd jaar gelede gekonsipieer is, was dit nog nie standaard praktyk om die aard, doel en omvang van woordeboeke duidelik, omvattend en sistematies uit te spel in ʼn woordeboekkonseptualiseringsplan nie. In 2006 het die TK begin om twee dokumente saam te stel, naamlik ʼn stylgids (waari...
Article
Full-text available
Sedert 2006 het die Taalkommissie (TK) begin om formele riglyne vir die opname of eliminering van lemmas in die Afrikaanse woordelys en spelreëls (AWS) saam te stel; dié riglyne is deur Van Huyssteen (2017b - kyk Deel 1 in hierdie uitgawe) in ʼn operasionaliseringsraamwerk geformaliseer. Die doel van hierdie artikel is om die operasionaliseringsraam...
Article
Full-text available
Die Virtuele Instituut vir Afrikaans (VivA) is 'n navorsingsinstituut en diensverskaffer vir Afrikaans in digitale kontekste. Ten einde verantwoorde keuses met betrekking tot VivA se produk- en diensaanbod te maak, is kwantitatiewe en kwalitatiewe navorsing gedoen om tekortkominge in die Afrikaanse mark van digitale taalprodukte te bepaal. Sewe tem...
Conference Paper
Full-text available
Compared to well-resourced languages such as English and Dutch, natural language processing (NLP) tools for Afrikaans are still not abundant. In the context of the AfriBooms project, KU Leuven and the NorthWest University collaborated to develop a first, small treebank, a dependency parser, and an easy to use online linguistic search engine for Afr...
Article
Full-text available
Following Den Besten’s (2009) desiderata for historical linguistics of Afrikaans, this article aims to contribute some modern evidence to the debate regarding the founding dialects of Afrikaans. From an applied perspective (i.e. human language technology), we aim to determine which West Germanic language(s) and/or dialect(s) would be best suited fo...
Article
Full-text available
Aan die and besig in Afrikaans progressive construction : a corpus investigation (2) Progressive aspect is a grammatical category which signifies that an event is continuing or taking place (Comrie 1976:33-36; Bybee et al. 1994:126). In two articles, this article and (Breed & Van Huyssteen 2014), we investigate the manner in which two periphrastic...
Article
Full-text available
Progressive aspect is a grammatical category which signifies that an event is continuing or taking place (Comrie 1976:33-36; Bybee et al. 1994:126). In two articles, this article and (Breed & Van Huyssteen 2014), we investigate the manner in which two periphrastic constructions, namely the Vam, besig om te V and the Vcop aan die V constructions, ar...
Article
Full-text available
•Aan die and besig in Afrikaans progressive constructions: the origin and development Progressive aspect is a grammatical category which signifies that an event is continuing or taking place (Comrie 1976:33-36; Bybee et al. 1994:126). In two articles, this article and Breed and Van Huyssteen (submitted), we investigate the manner in which two perip...
Conference Paper
Full-text available
In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Com...
Conference Paper
Full-text available
Compounding, the process of combining several simplex words into a complex whole, is a pro-ductive process in a wide range of languages. In particular, concatenative compounding, in which the components are "glued" together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro projec...
Conference Paper
Full-text available
The linguistic categorisation of compounds dates back to some of the earliest work in linguistics. The cross-linguistic compound taxonomy of Bisetto and Scalise (2005), later refined in Scalise and Bisetto (2009), is well-known in linguistics for understanding the grammatical relations in compounds. Although this taxonomy has not been used extensiv...
Article
Full-text available
Op die terrein van teksverwerking speel die metadata oor ’n bepaalde teks in baie gevalle ’n belangrike rol. Sodanige metadata word dikwels toegevoeg met behulp van outomatiese teksklassifiseerders wat op grond van die inhoud van ’n teks een of meer vooraf bepaalde klasse of kategorieë outomaties aan ’n teks toeken. Een van die dimensies waarvolgen...
Conference Paper
Compound semantic analysis is the task of finding the correct internal relation between the constituents of a compound [3, 10]. In this paper we use a measure of semantic similarity [14] based on the relations in the Afrikaans WordNet [2] to determine the similarity between two Afrikaans compounds. We infer that if the different constituents of two...
Conference Paper
Full-text available
The computational processing of compound semantics poses several interesting challenges. Up to now, the processing of nominal compounds with non-noun left-hand constituents (henceforth XN compounds) has not received any attention, despite the fact that these also seem to be rather productive in Germanic languages. In our research project, we aim to...
Conference Paper
Full-text available
In this paper we introduce a solution for disease surveillance and monitoring in the primary animal health care (PAHC) domain that uses inbound voice-based services and voice- and text-based outbound services for connecting rural veterinarians and livestock owners with a PAHC service provider. We describe our findings from the ongoing pilots, where...
Article
Full-text available
Development of human language technologies for the indigenous South African languages is currently being undertaken in various projects across South Africa. In one such project a lemmatizer for Setswana is being developed, and this article reports on work towards the development of a first prototype. A prerequisite of lemmatization is to determine...
Conference Paper
Full-text available
This article presents initial results on a supervised machine learning approach to determine the semantics of noun-noun compounds in Dutch and Afrikaans. After a discussion of previous research on the topic, we present our annotation methods used to provide a training set of compounds with the appropriate semantic class. The support vector machine...
Conference Paper
Full-text available
Multilingual emerging markets hold many opportunities for the application of spoken language technologies, such as interactive voice response (IVR) systems. Designing such systems requires an in-depth understanding of the business drivers and salient design decisions pertaining to these markets. In this paper we analyze the business drivers and des...
Article
Full-text available
Research using voice-based services as a technology platform for providing information access and services within developing world regions has shown much promise. The results for design and deployment of such voice-based services have varied depending on the application domain, user community and context. In this paper we describe our work on devel...
Article
Full-text available
http://search.sabinet.co.za/WebZ/Authorize?sessionid=0&next=ej/ej_content_literat.html&bad=error/authofail.html
Article
Full-text available
Automatic lemmatisation for Afrikaans Automatic lemmatisation is a general normalisation procedure in text processing, where all inflected forms of a lexical word are normalised to a single lemma (i.e. a meaningful, uninflected base form from which more complex word forms could be formed). Traditionally, lemmatisers are developed by writing languag...
Article
NVolume 194 of the series Current Issues in Linguistic Theory, entitled Lexicology, Semantics and Lexicography, comprises a selection of eleven academic contributions that were originally presented at the Tenth International Conference of English Historical Linguistics. The disciplinary focus of this volume of texts is the diachronics of English vo...
Article
Full-text available
NVolume 194 of the series Current Issues in Linguistic Theory, entitled Lexicology, Semantics and Lexicography, comprises a selection of eleven academic contributions that were originally presented at the Tenth International Conference of English Historical Linguistics. The disciplinary focus of this volume of texts is the diachronics of English vo...
Article
Full-text available
Human language technology (HLT) has been identified as a priority area by the South African government. However, despite efforts by government and the research and development (R&D) community, South Africa has not yet been able to maximise the opportunities of HLT and create a thriving HLT industry. One of the key challenges is the fact that there...
Conference Paper
Full-text available
Afrikaans is one of the eleven official languages of South Africa. It is classified as an under-resourced language. No annotated broadband speech corpora currently exist for Afrikaans. This article reports on the development of speech resources for Afrikaans, specifically a broadband speech corpus and an extended pronunciation dictionary. Baseline...
Conference Paper
Full-text available
South Africa (SA) epitomises diversity, with the nation boasting eleven official languages. The field of human language technology (HLT) can play a vital role in bridging the digital divide and thus has been recognised as a priority area by the South African government. The current HLT landscape in South Africa consists mostly of a relatively young...
Conference Paper
Full-text available
HLT resource development for a resource scarce language (L2) can be expedited by recycling existing technologies for a closely related language (L1). To improve the success of L1 technologies on L2 data, one can convert L2 data to make it appear more L1-like. We explore this possibility by developing an Afrikaans-to-Dutch lexical conversion module...
Conference Paper
Full-text available
South Africa (SA) is one of the few countries in the world that boasts a large number of official languages. Due to the efforts of government and the local research and development (R&D) community (comprising universities, science councils and a few private sector companies) all the official languages are -- to varying degrees -- enabled with regar...
Conference Paper
Full-text available
Human Language Technologies, Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2-4 June 2010, Los Angeles, California, USA In this research, the authors use machine learning techniques to provide solutions for descriptive linguists in the domain of language standardization. With regard to the personal...
Conference Paper
Full-text available
Telephone-based information access has the potential to deliver a significant positive impact in the developing world. We discuss some of the most important issues that must be addressed in order to realize this potential, including matters related to resource development, automatic speech recognition, text-to-speech systems, and user-interface des...
Article
Full-text available
Julie Coleman and Christian J. Kay (Eds.). Lexicology, Semantics and Lexicography: Selected Papers from the Fourth G.L. Brook Symposium, Manchester, August 1998. 2000, xiv + 249 pp. Current Issues in Linguistic Theory Volume 194. ISBN 90 272 3701 8 (Eur.), 1 55619 972 4 (US). Amsterdam/Philadelphia: John Benjamins. Price US$ 75,00 (Hb).
Article
Full-text available
7th International Conference on Language Resources and Evaluation. 19-21 May 2010, Valetta, Malta Human language technologies (HLT) can play a vital role in bridging the digital divide and thus the HLT field has been recognised as a priority area by the South African government. The authors present the work on conducting a technology audit on the S...
Article
Full-text available
In this article we give an overview of various aspects of a project developing a spelling checker for Afrikaans. We discuss two of the main aims of the project, viz. for researchers to obtain practical experience, and to further learning of both researchers and students. This article, therefore, consists of two relatively independent parts that eac...
Conference Paper
Full-text available
Determining the algorithmic parameter combinations that deliver the best performance in applications using machine learning algorithms is a very important part in the development process. Exhaustive searches are slow and computationally expensive, which motivates the investigation of more efficient methods of automatic algorithmic parameter optimis...
Article
Full-text available
This is the author's version of the work. The definitive version was published in the Proceedings of the RANLP'2009 International Conference Recent Advances in Natural Language Processing. Borovets, Bulgaria, 14-16 September, 2009. pp 65-70 Annotation of training data for machine learning is often a laborious and costly process. In Active Learning...
Article
Full-text available
The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in questi...
Conference Paper
Full-text available
Developing digital resources is an expensive and time- consuming endeavour, especially in the case of less-resourced languages. We developed TurboAnnotate in an attempt to ac- celerate the annotation of linguistic data by means of boot- strapping linguistic data for machine-learning purposes. The design and functionality of the tool is given to sho...
Article
Full-text available
This article is intended to address two under-explored aspects in Van Huyssteen (2000), namely, the verification of his findings and their refinement by means of a corpus of written Afrikaans, and also to provide a better description of the phonological pole of the Afrikaans reduplication construction. Within a usage-based approach it is shown that...
Article
Full-text available
Within the South African context, e-learning provides various opportunities to contribute towards a multilingual society. This paper describes a new project, ICALLESAL (Intelligent Computer-Assisted Language Learning for Eleven South African Languages), where an e-learning system is being developed for the acquisition of the official South African...
Article
Full-text available
In this article, aspects of a cognitive usage-based descriptive framework for Afrikaans grammar are spelled out in detail, with the aim of setting the agenda for the description of Afrikaans morphological (and other) constructions. Constructs and terms from cognitive grammar that are necessary for the description of such constructions are explicate...
Article
Full-text available
The BA Language Technology program was recently introduced at the North-West University and is, to date, the only of its kind in South Africa. This paper gives an overview of the program, which consists of computational linguistic subjects as well as subjects from languages, computer science, mathematics, and statistics. A brief discussion of the c...
Conference Paper
Full-text available
Abstract Current spelling checkers for Afrikaans still do not provide full access to desired linguistic performance, especially with respect to high lexical recall and error precision. One of the main problems is that Afrikaans is an agglutinative language,with a high lexical generative power,using concatenative compound,formation. This means that...
Article
Full-text available
This contribution is aimed at an explanation of the motivation for the com- position of Afrikaans grammatical and onomatopoeic reduplications within a cognitive grammar framework. It is illustrated that we can adequately de- scribe the formation of reduplication s in terms of four valence factors, viz. correspondence, profile determinacy, autonomy/...
Article
Full-text available
Three theoretical presuppositions of a cognitive usage-based model of grammar are identified in this article, viz. that language is an integral part of human cognition, that grammar is inherently meaningful, and that usage-based data are essential for any linguistic description and explanation. These three presuppositions can be considered the most...
Conference Paper
Full-text available
In this paper we describe the development of an improved spelling checker for Afrikaans. We compare two currently available spelling checkers and discuss their shortcomings. The existing applications are restricted in their suggestion capabilities, as well as their preci- sion and recall, mainly because they cannot treat morphologically complex wor...
Article
Full-text available
Die vokale van Swart Suid-Afrikaanse Engels (SSAE) word in hierdie artikel ondersoek. Bestaande navorsingsbevindings word opgesom en be-spreek naas data van twee nuwe eksperimente. Die bevindinge dui daarop dat SSAE heelwat verskil van ‘Wit’ Suid-Afrikaanse Engels wat sowel monoftonge as diftonge betref Die vernaamste monoftongeienskappe van SSAE i...
Article
Full-text available
Leksikale items wat na aspekte van die seksuele domein verwys, lewer dikwels probleme op vir die leksikograaf. So byvoorbeeld is dit dikwels moeilik om die werking van uitbreidingsmeganismes soos metaforiek en metonimie in die leksikografiese bewerking van seksuele uitdrukkings te verreken. In hierdie artikel word vier Afrikaanse woordeboeke met me...
Article
Full-text available
Cognitive metaphor theories indicate that sexual metaphors are metaphors in which a taboo domain of knowledge is described in terms of a non-taboo domain of knowledge. Data from the Afrikaans language substantiate the idea that mappings between these two domains are motivated on cognitive as well as on pragmatic grounds. In this article, it will be...
Article
Full-text available
20th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA). Stellenbosch, South Africa, 30 November - 01 December 2009 For fast-tracking the development of resources for resource-scarce languages, one could transfer existing technologies from one language to another well-sourced, closely-related language. In this contribut...
Article
Full-text available
2nd AFLaT workshop at the Seventh International Conference on Language Resources and Evaluation (LREC) 2010. Valletta, Malta. 19-21 May 2010 Human Language technologies (HLT) have been identified as a priority area by South African government to enable its eleven official languages technologically. We present the results of a technology audit for t...

Network

Cited By