Hans Paulussen’s research while affiliated with KU Leuven and other places

What is this page?

This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (15)

Figure 7.1: Architecture of TTNWW. 
TTNWW to the Rescue: No Need to Know How to Handle Tools and Resources
  • Chapter
  • Full-text available

December 2017


86 Reads

Marc Kemps-Snijders






"But I don’t know how to work with [name of tool or resource]" is something one often hears when researchers in Human and Social Sciences (HSS) are confronted with language technology, be it written or spoken, tools or resources. The TTNWW project shows that these researchers do not need to be experts in language or speech technology, or to know all kinds of details about the tools involved. In principle they only need to make clear what they want to achieve. In this paper we describe a series of tools that are already available as a webservice. Details are not presented - interested readers are referred to the papers mentioned in the References and to the TTNWW website.


Building an NLP pipeline within a digital publishing workflow

December 2014


58 Reads


2 Citations

Computational Linguistics in the Netherlands Journal

Outside the laboratory environment, NLP tool developers have always been obliged to use robust techniques in order to clean and streamline the ubiquitous formats of authentic texts. In most cases, the cleaned version simply consisted of the bare text discarded of all typographical information, tokenised in such a way that even the reconstruction of a simple sentence resulted in a displeasing layout. In order to integrate the NLP output within the production workflow of digital publications, it is necessary to keep track of the original layout. In this paper, we present an example of an NLP pipeline developed to meet the requirements of real-world applications of digital publications. The NLP pipeline presented here was developed within the framework of the iRead+ project, a cooperative research project between several industrial and academic partners in Flanders. The pipeline aims at enabling automatic enrichment of texts with word-specific and contextual information in order to create an enhanced reading experience on tablets and to support automatic generation of grammatical exercises. The enriched documents contain both linguistic annotations (part-of-speech and lemmata) and semantic annotations based on the recognition and disambiguation of named entities. The whole enrichment process, provided via a web service, can be integrated into an XML-based production flow. The input of the NLP enrichment engine consists of two documents: a well-formed XML source file and a control file containing XPath expressions describing the nodes in the source file to be annotated and enriched. As nodes may contain a pre-defined set of mixed data, reconstruction of the original document (with selected enrichments) is enabled.

From input to output: The potential of parallel corpora for CALL

March 2014


99 Reads


7 Citations

Language Resources and Evaluation

The aim of this paper is to illustrate the potential of a parallel corpus in the context of (computer-assisted) language learning. In order to do so, we propose to answer two main questions (1) what corpus (data) to use and (2) how to use the corpus (data). We provide an answer to the what-question by describing the importance and particularities of compiling and processing a corpus for pedagogical purposes. In order to answer the how-question, we first investigate the central concepts of the interactionist theory of second language acquisition: comprehensible input, input enhancement, comprehensible output and output enhancement. By means of two case studies, we illustrate how the abovementioned concepts can be realized in concrete corpus-based language learning activities. We propose a design for a receptive and productive language task and describe how a parallel corpus can be at the basis of powerful language learning activities. The Dutch Parallel Corpus, a ten-million word sentence aligned and annotated parallel corpus, is used to develop these language tasks.

Fig. 11.2 DPC sentence-aligned files format
Table 11 .1 DPC text types and subtypes according to data source
Table 11 .2 DPC word counts per text type and translation direction
3 Performance of the PoS taggers and lemmatizers on a manually validated DPC sample
Dutch Parallel Corpus: A Balanced Parallel Corpus for Dutch-English and Dutch-French

January 2013


506 Reads


20 Citations

This chapter presents the Dutch Parallel Corpus (DPC)—a 10-millionword,high-quality, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French. The corpus contains five different text types and is balanced with respect to text type and translation direction. Rich metadata information is stored for each text sample. All texts included in the corpus have been cleared from copyright. The entire corpus is aligned at sentence level and enriched with linguistic annotations. Twenty-five thousand words of the Dutch-English part have been manually aligned at the sub-sentential level. The corpus is released as full texts in XML format and can also be queried via a web concordancer.

Fig. 1. Mockup showing help option for vocabulary in a conversation with an NPC, and subsequent elaborated input.  
Fig. 2. System architecture realizing the adaptations  
Vocabulary Treatment in Adventure and Role-Playing Games: A Playground for Adaptation and Adaptivity

September 2011


487 Reads


20 Citations

Communications in Computer and Information Science

Although there is pedagogical support for using computer adventure and role-playing games in order to learn a second language (L2), commercial games often lack the instructional qualities for making their language comprehensible for learners. In an interdisciplinary approach, this paper proposes a technique for adapting in-game text in order to teach L2 vocabulary, grounded in research on second language acquisition and adaptive learning systems. Keywordsadventure games–role-playing games–second language acquisition–vocabulary learning–input enhancement–adaptive learning systems–adaptivity

Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus

June 2011


53 Reads


65 Citations

Meta Journal des traducteurs

This paper presents the Dutch Parallel Corpus, a high-quality parallel corpus for Dutch, French and English consisting of more than ten million words. The corpus contains five different text types and is balanced with respect to text type and translation direction. All texts included in the corpus have been cleared from copyright. We discuss the importance of parallel corpora in various research domains and contrast the Dutch Parallel Corpus with existing parallel corpora. The Dutch Parallel Corpus distinguishes itself from other parallel corpora by having a balanced composition and by its availability to the wide research community, thanks to its copyright clearance. All texts in the corpus are sentence-aligned and further enriched with basic linguistic annotations (lemmas and word class information). Approximately 25,000 words of the Dutch-English part have been manually aligned at the sub-sentential level. Rich metadata facilitates the navigability of the corpus and enables users to select the texts that satisfy their needs. The entire corpus is released as full texts in XML format and is also available via a web interface, which supports basic and complex search queries and presents the results as parallel concordances. The corpus will be distributed by the Flemish-Dutch Human Language Technology Agency (TST-Centrale). Plan de l'article1. Introduction2. Parallel Corpora in Translation Studies2.1. Parallel Corpus Projects 3. Corpus Design, Copyright Clearance and Metadata4. Alignment and Linguistic Annotation4.1. Sentence Alignment4.2. Sub-sentential Alignment4.3. Linguistic Annotation5. Corpus Exploitation6. Conclusion

Annotating the Dutch Parallel Corpus

January 2010


28 Reads


1 Citation

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 63-72. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893 .

Table 4 summarizes the results of the evaluation.
Sentence Alignment in DPC: Maximizing Precision, Minimizing Human Effort.

January 2008


79 Reads


5 Citations

A wide spectrum of multilingual applications have a ligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all s entences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in th is paper showed that by verifying only those links which were not recognize d by at least two aligners, an error rate can be re duced by 93.76% as compared to the performance of the best aligner. Such manual i nvolvement concerned only a small portion of all da ta (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.

Citations (11)

... Parallel corpora, also known as bitexts, consist of large texts containing parallel translations and serve as valuable resources for researchers in a wide range of disciplines, including machine translation, computer-assisted translation, terminology extraction, computer-assisted language learning, contrastive linguistics, and translation studies [1]. Parallel corpora can be constructed through various approaches. ...


TamSiPara: A Tamil – Sinhala Parallel Corpus
Dutch Parallel Corpus: A Balanced Parallel Corpus for Dutch-English and Dutch-French

... Mapping out the gerunds' distinct functional usage profiles in translated and nontranslated English as well as their corresponding translation equivalents calls for a highly specific dataset as well as a thorough and systematic annotation of the data. We consulted two high-quality, comparable and parallel corpora which are partic- (Macken et al., 2011). The following subsections provide more details on the data selection and the annotation of language-internal and language-external variables. ...

Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus
  • Citing Article
  • June 2011

Meta Journal des traducteurs

... Those patterns can relate to any linguistic level, such as lexicon, morphology, or syntax. While the idea of learning languages utilizing language material (as opposed to learning by prescribed rules) has been around for several decades, and its efficacy has been experimentally substantiated, the use of parallel corpora for that purpose has received significantly less attention (Lawson, 2001;Bluemel, 2014;Montero Perez et al. 2014, to name a few). ...

From input to output: The potential of parallel corpora for CALL
  • Citing Article
  • March 2014

Language Resources and Evaluation

... Aligned parallel corpora play a fundamental role in developing corpus-based statistical MT (Koehn 2005) and example-based MT (Carl and Way 2003). Apart from machine translation, they are also a helpful resource for computer-assisted translation tools (Hutchins 2005) and computer-assisted language learning (Deville, Dumortier et al. 2004). Parallel corpora have proven especially useful when studying translated text (Halverson 1998) and when it comes to contrastive linguistics, they are often combined with comparable corpora to validate research hypotheses. ...

Génération de corpus multilingues dans la mise en œuvre d'un outil en ligne d'aide à la lecture de textes en langue étrangère
  • Citing Article

... In vertaalwoordenboeken gelden ze in ieder geval als een vertaalpaar. Dit wordt ook bevestigd door een overzicht van het vertaalpatroon 7 van de twee markeerders in het Dutch Parallel Corpus 8 (Macken et al. 2007). In literaire teksten wordt dus in 42% van de gevallen door donc vertaald en in politieke toespraken stijgt dit percentage tot 65%. ...

Dutch Parallel Corpus: A Multilingual Annotated Corpus

... The assumption underlying this is that the former encourages learners to try to understand the sequences without the help of subtitles, which might motivate and result in viewing the same clip a few more times (Türel, 2002). The latter enables learners to understand what they hear, pick up a great deal of language, and help them feel relaxed and attentive (Vanderplank, 1988;Porter & Roberts, 1981;Deville, Kelly, Paulussen, Vandecasteele, & Zimmer, 1996). ...

  • Citing Article
  • February 1996

Computer Assisted Language Learning

... Charturong used hidden Markov model for detecting the consonant and transient regions [1], and Raset and Motlotle applied wavelet transform for extracting these regions in clean speech [2]. Demol et al. also used non-uniform time scaling to slow down the speech and redistributed available time between the vowels and consonants to emphasis on these regions [3]. In addition, Ekramul et al. presented a speech intelligibility improvement process in which speech is modified based on an inverse Wiener filter on the vowel and consonant regions [4]. ...

Efficient non-uniform time-scaling of speech with WSOLA for CALL applications

... From the affective aspect, the use of digital games increases learners' intrinsic motivation, interest, and curiosity (Dickey, 2011;Lee, 2019;Malone, 2018). It encourages active participation, self-directed learning, and discovery learning (Cornillie, Jacques, De Wannemacker, Paulussen & Desmet, 2011;Lee, 2019). Digital games also promote incidental vocabulary learning by providing rich language input and an interesting learning environment (Chen et al., 2019;Sundqvist, 2019;Sykes & Reinhart, 2013). ...

Vocabulary Treatment in Adventure and Role-Playing Games: A Playground for Adaptation and Adaptivity

Communications in Computer and Information Science