• Home
  • Vincent Vandeghinste
Vincent Vandeghinste

Vincent Vandeghinste
Instituut voor de Nederlandse Taal

PhD in Linguistics

About

82
Publications
15,386
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
588
Citations
Citations since 2016
32 Research Items
276 Citations
201620172018201920202021202201020304050
201620172018201920202021202201020304050
201620172018201920202021202201020304050
201620172018201920202021202201020304050
Introduction
Vincent Vandeghinste is currently working for the Instituut voor de Nederlandse Taal. Vincent does research in Computational Linguistics, Linguistic Infrastructure, Syntax and Data Mining. He is currently working on CLARIN. He is also member of the Centre for Computational Linguistics at KU Leuven, and of Leuven.ai, teaching Computational Linguistics, Language Engineering Applications and Machine Translation at KU Leuven, campusses Leuven and Antwerp
Additional affiliations
January 2020 - July 2020
KU Leuven
Position
  • Senior Researcher
March 2018 - present
Instituut voor de Nederlandse Taal
Position
  • Senior Researcher
December 2000 - July 2020
KU Leuven
Position
  • PostDoc Position
Education
April 2004 - April 2008
KU Leuven
Field of study
  • Linguistics
October 1992 - September 1997
KU Leuven
Field of study
  • Psychology

Publications

Publications (82)
Conference Paper
Full-text available
Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and standardization issues in the available data. The former challenge relates to the volume of data ava...
Conference Paper
Full-text available
Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overcome this issue. In this preliminary study, we present the French version of a translation system using the Arasa...
Article
Full-text available
In this pilot study, we investigate the potential of pictograph translation technologies for facilitating communication and integration in the context of migration. We incorporate a new pictograph set in an existing text-to-pictograph translation system and carry out evaluations on three sets of authentic data (language classes, news articles, webs...
Conference Paper
Full-text available
We describe the creation of CLARIN Belgium (CLARIN-BE) and, associated with that, the plans of the CLARIN-VL consortium within the CLARIAH-VL infrastructure for which funding was secured for the period 2021-2025.
Poster
Full-text available
The Dutch Text2Picto system (Sevens, 2018; Vandeghinste et al., 2015) aims to automatically translate text into pictographs for people with an intellectual disability in the context of Augmentative and Alternative Communication (AAC). The AAC technologies are used by disabled people to help them to communicate in daily life and to be more independe...
Article
Full-text available
This article discusses the automatic linguistic enrichment of historical Dutch corpora through the use of part-of-speech tagging and lemmatization. Such a type of enrichment facilitates linguistic research where manual annotation is unfeasible. We built a neural network-based model using the PIE framework and performed an in-depth error analysis, i...
Article
Full-text available
The aim of this study is to identify linguistic proxies of readability in Dutch, i.e. those linguistic features that define text as being easy-to-read. To this end, we compare the Wablieft corpus (Vandeghinste et al. 2019) (Flemish easy-to-read newspaper archives) to articles that appeared in the regular Flemish newspaper De Standaard, using a wide...
Conference Paper
Full-text available
This paper presents the Wablieft corpus, a two million words corpus of a Belgian easy-to-read newspaper, written in Dutch. The corpus was automatically annotated with CLARIN tools and is made available in several formats for download and online querying, through the CLARIN infrastructure. Annotations consist of part-of-speech tagging, chunking, dep...
Article
Full-text available
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technol...
Preprint
Full-text available
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technol...
Article
Full-text available
We discuss the design, development and evaluation of an automated lexical simplification tool for Dutch. A basic pipeline approach is used to perform both text adaptation and annotation. First, sentences are preprocessed and word sense disambiguation is performed. Then, the difficulty of each token is estimated by looking at their average age of ac...
Article
In order to enable or facilitate online communication for people with an intellectual disability, the Text-to-Pictograph translation system automatically translates Dutch written text into a series of Sclera or Beta pictographs. The baseline system presents the reader with a more or less verbatim pictograph-per-word translation. As a result, long a...
Conference Paper
Full-text available
Translation memories (TM) and machine translation (MT) both are potentially useful resources for professional translators, but they are often still used independently in translation workflows. As translators tend to have a higher confidence in fuzzy matches than in MT, we investigate how to combine the benefits of TM retrieval with those of MT, by...
Conference Paper
Full-text available
We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long short-term memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchica...
Conference Paper
Full-text available
We present the highlights of the now finished 4-year SCATE project. It was completed in February 2018 and funded by the We present key results of SCATE (Smart Computer Aided Translation Environment). The project investigated algorithms, user interfaces and methods that can contribute to the development of more efficient tools for translation work.
Conference Paper
Full-text available
Translation environments offer various translation aids to support professional translators. However, translation aids typically provide only limited justification for the translation suggestions they propose. In this paper we present Intellingo, a translation environment that explores intelligibility for translation aids, to enable more sensible u...
Chapter
Full-text available
"But I don’t know how to work with [name of tool or resource]" is something one often hears when researchers in Human and Social Sciences (HSS) are confronted with language technology, be it written or spoken, tools or resources. The TTNWW project shows that these researchers do not need to be experts in language or speech technology, or to know al...
Article
Full-text available
The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the sear...
Conference Paper
Full-text available
We present the SCATE prototype: A Smart Computer-Aided Translation Environment, developed in the SCATE research project. Its user interface displays translation suggestions coming from different resources, in an intelligible and interactive way. It contains carefully designed representations that show relevant context to clarify why certain suggest...
Poster
Full-text available
We describe the improvements to the interface of GrETEL, an online tool for querying treebanks. We demonstrate how we employed the results of two usability tests and individual user feedback in order to create a more user-friendly interface which meets the users’ needs.
Conference Paper
In order to enable or facilitate online communication for people with Intellectual Disabilities, the Text-to-Pictograph translation system automatically translates Dutch written text into a series of Sclera or Beta pictographs. The baseline system presents the reader with a more or less verbatim pictograph-per-word translation. As a result, long an...
Article
Full-text available
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax-and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-...
Article
In present-day society, we communicate over the Internet in several media forms. We put videos and images online, listen to music made by famous bands or by our friends, and read and write a lot of text. Never in the history of mankind have we produced more text than at this present moment, so being able to read and write is an important way of tak...
Article
This paper presents a pictograph interface for Pictograph-to-Text translation, which facilitates the construction of written text on social media platforms for users with Intellectual Disabilities. For the design of the interface, a user-centred approach was adopted. Results show that the target group can appreciate accessing social media through p...
Conference Paper
Full-text available
Compared to well-resourced languages such as English and Dutch, natural language processing (NLP) tools for Afrikaans are still not abundant. In the context of the AfriBooms project, KU Leuven and the NorthWest University collaborated to develop a first, small treebank, a dependency parser, and an easy to use online linguistic search engine for Afr...
Conference Paper
Full-text available
We present Poly-GrETEL, an online tool which enables syntactic querying in parallel treebanks and which is based on the monolingual GrETEL environment. We provide online access to the Europarl parallel treebank for Dutch and English, allowing users to query the treebank using either an XPath expression or an example sentence in order to look for si...
Article
Full-text available
This paper has both a theoretical and a methodological objective. The theoretical one concerns the modeling of number agreement in copular constructions. For that purpose it adopts the distinction, familiar from Head-driven Phrase Structure Grammar, between morpho-syntactic agreement (also known as concord) and index agreement. The methodological o...
Conference Paper
Full-text available
We describe the implementation of a Word Sense Disambiguation (WSD) tool in a Dutch Text-to-Pictograph translation system, which converts textual messages into sequences of pictographic images. The system is used in an online platform for Augmentative and Alternative Communication (AAC). In the original translation process, the appropriate sense of...
Article
We describe and evaluate a text-to-pictograph translation system that is used in an online platform for Augmentative and Alternative Communication, which is intended for people who are not able to read and write, but who still want to communicate with the outside world. The system is set up to translate from Dutch into Sclera and Beta, two publicly...
Conference Paper
Full-text available
We describe how a Dutch Text-to-Pictograph translation system, designed to augment written text for people with Intellectual or Developmental Disabilities (IDD), was adapted in order to be usable for English and Spanish. The original system has a language-independent design. As far as the textual part is concerned, it is adaptable to all natural la...
Article
We describe the implementation and evaluation of a word sense disambiguation (WSD) tool in a translation system that converts English text messages into sequences of pictographic images. The Text-to-Picto tool for Dutch, English, and Spanish is used on the online communication platform textquotedblleftWAI-NOTtextquotedblright by people who have tro...
Chapter
Full-text available
De traditionele overzichten van koppelwerkwoorden zijn onvol­edig en maken weinig gemotiveerde onderscheidingen, zoals die tussen de ‘echte’ of prototypische koppelwerkwoorden en hun betekenisequivalenten. Deze bijdrage verschuift de aandacht van de werkwoorden naar de constructies waarin ze voorkomen. We noemen een constructie copulatief wanneer e...
Conference Paper
Full-text available
We describe our efforts to scale up a syntactic search engine from a 1 million word treebank of written Dutch text to a treebank of 500 million words, without increasing the query time by a factor of 500. This is not a trivial task. We have adapted the architecture of the database in order to allow querying the syntactic annotation layer of the SoN...
Conference Paper
Full-text available
Knowledge-based multilingual language processing benefits from having access to correctly established relations between semantic lexicons, such as the links between different WordNets. WordNet linking is a process that can be sped up by the use of computational techniques. Manual evaluations of the partly automatically established synonym set (syns...
Chapter
Full-text available
This chapter presents the Lassy Small and Lassy Large treebanks, as well as related tools and applications. Lassy Small is a corpus of written Dutch texts (1,000,000 words) which has been syntactically annotated with manual verification and correction. Lassy Large is a much larger corpus (over 500,000,000 words) which has been syntactically annotat...
Chapter
Full-text available
In this paper the PaCo-MT project is described, in which Parse and Corpus-based Machine Translation has been investigated: a data-driven approach to stochastic syntactic rule-based machine translation.In contrast to the phrase-based statistical machine translation systems (PB-SMT) which are string-based and do not use any linguistic knowledge, an M...
Conference Paper
Full-text available
Although several syntactically annotated corpora (or treebanks) exist for Dutch, they are seldomly used for descriptive linguistic research because there are no easy-to-use exploitation tools available. This demonstration paper describes GrETEL, a linguistic search engine (http:// nederbooms.ccl.kuleuven.be/eng/gretel) that enables non-technical us...
Conference Paper
Full-text available
The recent construction of large linguistic treebanks for spoken and written Dutch (e.g. CGN, LASSY, Alpino) has created new and exciting opportunities for the empirical investigation of Dutch syntax and semantics. However, the exploitation of those treebanks requires knowledge of specific data structures and query languages such as XPath. Linguist...
Conference Paper
Full-text available
Standards and the need for standards, for example for annotation purposes, only emerge after a period of time. Before, people just did what they thought was right. This may have resulted in large amounts of data in a format that in the end did not turn out to be on speaking terms with the (new) standard. This format may even have become a de facto...
Article
Full-text available
The increasing use of eXtensible Markup Language (XML) is bringing additional challenges to statistical machine transla-tion (SMT) and computer assisted trans-lation (CAT) workflow integration in the translation industry. This paper analyzes the need to handle XML markup as a part of the translation material in a technical domain. It explores diffe...
Article
Full-text available
The Varro toolkit offers an intuitive mech-anism for extracting syntactically mo-tivated multi-word expressions (MWEs) from dependency treebanks by looking for recurring connected subtrees instead of subsequences in strings. This approach can find MWEs that are in varying orders and have words inserted into their compo-nents. This paper also propos...
Conference Paper
Full-text available
In this paper we want to point out some issues arising when a natural language processing task involves several languages (like multi- lingual, multidocument summarization and the machine translation aspects involved) which are often neglected. These issues are of a more cultural nature, and may even come into play when several documents in a singl...
Article
Full-text available
This paper describes the transfer compo-nent of a syntax-based Example-based Ma-chine Translation system. The source sen-tence parse tree is matched in a bottom-up fashion with the source language side of a parallel example treebank, which results in a target forest which is sent to the target language generation component. The re-sults on a 500 se...
Article
Full-text available
In this paper we describe and evaluate a top-down transfer component of a hybrid example-based machine translation system with an architecture similar to that of transfer MT systems, but with automatically derived transfer-rules and dictionary entries based on a parallel treebank. The tests were applied on the translation pair Dutch to English. Eva...
Article
Full-text available
In this paper we describe an approach to target language modeling which is based on a large treebank. We assume a bag of bags as input for the target language gener-ation component, leaving it up to this com-ponent to decide upon word and phrase or-der. An experiment with Dutch as target language shows that this approach to can-didate translation r...
Article
Full-text available
This article describes a hybrid approach to machine translation (MT) that is inspired by the rule-based, statistical, example-based, and other hybrid machine translation approaches currently used or described in academic literature. It describes how the approach was implemented for language pairs using only limited monolingual resources and hardly...
Article
Full-text available
METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four par...
Conference Paper
Full-text available
In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is required and in which no full parser or extensive rule sets are needed. We describe the evaluation on a developme...