
Brigitte GrauComputer Sciences Laboratory for Mechanics and Engineering Sciences (LIMSI)
Brigitte Grau
PhD
About
155
Publications
21,690
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,138
Citations
Introduction
Skills and Expertise
Publications
Publications (155)
This paper tackles the task of event detection that aims at identifying and categorizing event mentions in texts. One of the difficulties of this task is the problem of event mentions corresponding to misspelled, custom, or out-of-vocabulary words. To analyze the impact of character-level features, we propose to integrate character embeddings, that...
In many information extraction applications, entity linking (EL) has emerged as a crucial task that allows leveraging information about named entities from a knowledge base. In this paper, we address the task of multimodal entity linking (MEL), an emerging research field in which textual and visual information is used to map an ambiguous mention to...
In many information extraction applications, entity linking (EL) has emerged as a crucial task that allows leveraging information about named entities from a knowledge base. In this paper, we address the task of multimodal entity linking (MEL), an emerging research field in which textual and visual information is used to map an ambiguous mention to...
Using deep learning models on small scale datasets would result in overfitting. To overcome this problem, the process of pre-training a model and fine-tuning it to the small scale dataset has been used extensively in domains such as image processing. Similarly for question answering, pre-training and fine-tuning can be done in several ways. Commonl...
Using deep learning models on small scale datasets would result in overfitting. To overcome this problem, the process of pre-training a model and fine-tuning it to the small scale dataset has been used extensively in domains such as image processing. Similarly for question answering, pre-training and fine-tuning can be done in several ways. Commonl...
Since end-to-end deep learning models have started to replace traditional pipeline architectures of question answering systems, features such as expected answertypes which are based on the question semantics are seldom used explicitly in the models. In this paper, we propose a convolution neural network model to predict these answer types based on...
Question answering has been the focus of a lot of researches and evaluation campaigns, either for text-based systems (TREC and CLEF evaluation campaigns for example), or for knowledge-based systems (QALD, BioASQ). Few systems have effectively combined both types of resources and methods in order to exploit the fruitfulness of merging the two kinds...
Extractive Question Answering (QA) focuses on extracting precise answers from a given paragraph to questions posed in natural language. Deep learning models are widely used to address this problem and can fetch good results, provided there exists enough data for learning. Such large datasets have been released in open domain, but not in specific do...
Relation extraction (RE) between a pair of entity mentions from text is an important and challenging task specially for open domain relations. Generally, relations are extracted based on the lexical and syntactical information at the sentence level. However, global information about known entities has not been explored yet for RE task. In this pape...
Textual Implication: Problems and methods for NLP
This paper deals with Machine Reading and the automatic resolution of textual inferences seen as passage reformulation. We focus on the linguistic processes that allow to encompass these various forms. In order to study and evaluate machine capabilities, we place our work within the frame of questio...
The correct identification of the link between an entity mention in a text and a known entity in a large knowledge base is important in information retrieval or information extraction. The general approach for this task is to generate, for a given mention, a set of candidate entities from the base and, in a second step, determine which is the best...
In this paper we present a relation validation method for KBP slot filling task by exploring some graph features to classify the candidate slot fillers as correct or incorrect. The proposed features with voting feature collectively performs better than the baseline voting feature.
For our participation to QALD-5, we developed a system for answering questions on a knowledge base. We proposed an unsupervised method for the semantic analysis of questions, that generates queries, based on graph transformations, in two steps. First step is independent of the knowledge base schema and makes use of very general constraints on the q...
Multiple choice questions represent a widely used evaluation mode; yet writing items that properly evaluate student learning is a complex task. Guidelines were developed for manual item creation, but automatic item quality evaluation would constitute a helpful tool for teachers.
In this paper, we present a method for evaluating distractor (i.e. inc...
RÉSUMÉ. Les bases de connaissances du Web sémantique sont généralement représentées sous forme de triplets RDF formant un graphe. Leur interrogation passe par un langage de type SPARQL, langage non maîtrisé des utilisateurs non experts, qui requiert de connaître le schéma de la base. C'est pourquoi les systèmes d'interrogation en langage naturel se...
Nowadays sensors and actuators associated with control devices can be installed anywhere, as in our homes creating smart environments. Our goal is to allow a user to configure her own smart environment by describing her needs, i.e. the environment behavioral rules, in natural language (NL). We explore the possibilities offered by an ontology, to tr...
Many high level natural language processing problems can be framed as determining if two given sentences are a rewriting of each other. In this paper, we propose a class of kernel functions, referred to as type-enriched string rewriting kernels, which, used in kernel-based machine learning algorithms, allow to learn sentence rewritings. Unlike prev...
Most research in Information Extraction concentrates on the extraction of relations from texts but less work has been done about their organization after their extraction. We present in this article a multi-level clustering method to group semantically equivalent relations: a first step groups relation instances with similar expressions to form clu...
The development of a system is usually based on shared and accepted require-ments. Hence, to be largely understood by the stakeholders, requirements are often written in natural language (NL). However, checking requirements completeness and consistency requires having them in a formal form. In this article, we focus on user requirements describing...
The development of a system is usually based on shared and accepted requirements. Hence, to be largely understood by the stakeholders, requirements are often written in natural language (NL). However, checking requirements completeness and consistency requires having them in a formal form. In this article, we focus on user requirements describing a...
In order to check requirement specifications written in natural language, we have chosen to model domain knowledge through an ontology and to formally represent user requirements by its population. Our approach of ontology population focuses on instance property identification from texts. We do so using extraction rules automatically acquired from...
In the context of prenatal diagnosis of malformation, knowledge of “similar” and resolved cases (i.e. previous cases with a diagnosis validated by fetus autopsy) is essential for diagnosis orientation. Therefore, access to biomedical data accumulated over the years by fetopathology experts specializing in the study of foetal malformations is crucia...
This article takes place in the context of unsupervised information extraction in open domain and focuses on the extraction and the clustering at a large scale of relations between named entities without defining their type a priori. The extraction step combines the use of basic but efficient criteria and a filtering procedure based on machine lear...
In this paper, we present the LIMSI's participation to QA4MRE 2013.We decided to test two kinds of methods. The first one focuses on complex questions, such as causal questions, and exploits discourse re-lations. Relation recognition shows promising results, however it has to be improved to have an impact on answer selection. The second method is b...
Question answering (QA) systems aim at providing a precise answer to a given user question. Their major difficulty lies in the lexical gap problem between question and answering passages. We present here the different types of morphological phenomena in question answering, the resources available for French, and in particular a resource that we bui...
L'extraction d'informations en domaine de spécialité amène à se poser dif-férents problèmes, liés aux types d'information cherchés. Dans cet article, nous nous intéressons à l'identification de relations entre concepts dans des compte-rendus médi-caux, tâche évaluée dans la campagne i2b2 en 2010. Les relations étant exprimées par des formulations t...
Question answering systems answer correctly to different questions because they are based on different strategies. In order to increase the number of questions which can be answered by a single process, we propose solutions to combine two question answering systems, QAVAL and RITEL. QAVAL proceeds by selecting short passages, annotates them by ques...
Résumé De nos jours capteurs et effecteurs peuvent être utilisés de multiples façons et dans différents espaces, créant ainsi des environnements intelligents. Cet article décrit une formal-isation d'un environnement intelligent et de son fonction-nement, qui permet de vérifier la consistance logique et la conformité de l'environnement. Cetteformali...
The study of the Tip of the Tongue phenomenon (TOT) provides valuable clues
and insights concerning the organisation of the mental lexicon (meaning, number
of syllables, relation with other words, etc.). This paper describes a tool
based on psycho-linguistic observations concerning the TOT phenomenon. We've
built it to enable a speaker/writer to fi...
Cet article présente la construction automatique, le filtrage et la validation d'une ressource morphologique concernant les noms d'agents déverbaux. Cette validation utilise différentes ressources et corpus pour tester l'appartenance des verbes et noms à la même famille morphologique, ainsi que leur lien, méthode qui peut se généraliser à d'autres r...
This paper presents how to build a reliable morphological knowledge base, by mean of an automatic generation, the filtering of the obtained words and their validation. The generated base concerns verbs and their agents. The validation method relies on different resources and corpora in order to verify that a verb and a noun belong to the same famil...
This chapter is dedicated to factual question answering, i.e., extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e., a query made of a list of words), and provides clues for finding precise answers. The author first focuses on the...
Question answering systems answer correctly to different questions because they are based on different strategies. In order to increase the number of questions which can be answered by a single process, we propose solutions to combine three question answering systems, QAVAL and two versions of RITEL. QAVAL is based on an answer validation method an...
Nowadays sensors and actuators are increasingly used in different spaces, creating intelligent environment. This article aims to describe a conceptualization of an intelligent environment and its operation, in order to check its consistency and its conformity. This conceptualization is done through an ontology representing the domain knowledge, who...
Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, we focus on the task of extracting and characterizing a priori unknown relations...
Question answering (QA) systems aim at finding answers to question posed in natural language using a collection of documents. When the collection is extracted from the Web, the structure and style of the texts are quite different from those of newspaper articles. We developed a QA system based on an answer validation process able to handle Web spec...
This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.
The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features...
Information extraction in specialized texts raises different problems related to the kind of searched information. In this paper, we are interested in relation identification between some concepts in medical reports, task that was evaluated in the i2b2 2010 challenge. As relations are expressed in natural language with a great variety of forms, we...
In the QA and information retrieval domains progress has been assessed via evaluation campaigns(Clef, Ntcir, Equer, Trec).In these evaluations, the systems handle independent questions and should provide one answer to each question, extracted from textual data, for both open domain and restricted domain. Quaero is a program promoting research and i...
In the QA and information retrieval domains progress has been assessed viaevaluation campaigns(Clef, Ntcir, Equer, Trec). In these evaluations, the systems handle independent questions and should provide oneanswer to each question, extracted from textual data,for both open domain and restricted domain. Quæro is a program promotingresearch and indus...
In the QA and information retrieval domains progress has been assessed via evaluation campaigns(Clef, Ntcir, Equer, Trec).In these evaluations, the systems handle independent questions and should provide one answer to each question, extracted from textual data, for both open domain and restricted domain. Quaero is a program promoting research and i...
In open domain question-answering systems, numerous questions wait for answers of an explicit type. The method we present in this article aims at verifying that an answer given by a system corresponds to the given type. This verification is done by combining criteria provided by different methods dedicated to verify the appropriateness between an a...
In open domain question-answering systems, numerous questions wait for answers of an explicit type. For example, the question "Which president succeeded Jacques Chirac?" requires an instance of president as answer. The method we present in this article aims at verifying that an answer given by a system corresponds to the given type. This verificati...
Evaluating complex system is a complex task. Evaluation campaigns are organized each year to test different systems on global
results, but they do not evaluate the relevance of the criteria used. Our purpose consist in modifying the intermediate results
created by the components and inserting the new results into the process, without modifying the...
Searching for precise answers to questions, also called "question-answering", is an evolution of information retrieval systesms can it, as its predecessors, rely mostly on numeric methods, using exceedingly little linguistic knowledge? After a presentation of the question-answering task and the issues it raises, we examine to which extent it can be...
Résumé La recherche de réponses précises à des questions, aussi appelée « questions-réponses », est une évolution des systèmes de recherche d’information : peut-elle, comme ses prédécesseurs, se satisfaire de méthodes essentiellement numériques, utilisant extrêmement peu de connaissances linguistiques ? Après avoir présenté la tâche de questions-ré...
Searching for precise answers to questions, also called "question-answering", is an evolution of information retrieval systems: can it, as its predecessors, rely mostly on numeric methods, using exceedingly little linguistic knowledge? After a presentation of the question-answering task and the issues it raises, we examine to which extent it can be...
Question answering (QA) aims at retrieving precise information from a large collection of documents, typically the Web. Different techniques can be used to find relevant information, and to compare these techniques, it is important to evaluate question answering systems. The objective of an Answer Validation task is to estimate the correctness of a...
Question answering (QA) aims at retrieving precise information from a large collection of documents, typically the Web. Different techniques can be used to find relevant information, and to compare these techniques, it is important to evaluate question answering systems. The objective of an Answer Validation task is to estimate the correctness of a...
Question answering (QA) aims at retrieving precise information from a large collection of documents. Different techniques can be used to find relevant information, and to compare these techniques, it is important to evaluate QA systems. The objective of an Answer Validation task is thus to judge the correctness of an answer returned by a QA system...
This paper presents our bilingual question-answering system MUSCLEF. We underline the diculties encountered when shifting from a mono to a cross-lingual system, then we focus on the evaluation of three modules of MUSCLEF: question analysis, answer extraction and fusion. We nally present how we re-use dierent modules of MUSCLEF to participate to AVE...
Question-answering (QA) systems aim at providing either a small passage or just the answer to a question in natural language. We have developed several QA systems that work on both English and French. This way, we are able to provide answers to questions given in both languages by searching documents in both languages also. In this article, we pres...
This article presents a bilingual question answering system, which is able to process questions and documents both in French and in English. Two cross-lingual strategies are described and evaluated. First, we study the contribution of biterms translation, and the influence of the completion of the translation dictionaries. Then, we propose a strate...
The aim of this contribution is the presentation of a dynamic model of learning. It combines the massively parallel treatment of data for the emergence of stable concepts, and the symbolic manipulation of these concepts in schemas representing the pragmatic knowledge which one can build from his experience.
Following Vygotsky, we assume that the sp...
The huge quantity of available electronic information leads to a growing need for users to have tools able to be precise and selective. These kinds of tools have to provide answers to requests quite rapidly without requiring the user to explore each document, to reformulate her request or to seek for the answer inside documents. From that viewpoint...
We present in this chapter the QALC system which has participated in the four TREC QA evaluations. We focus here on the problem
of linguistic variation in order to be able to relate questions and answers. We present first, variation at the term level
which consists in retrieving questions terms in document sentences even if morphologic, syntactic o...
This paper describes the EQueR-EVALDA Evaluation Campaign, the French evaluation campaign of Question-Answering (QA) systems. The EQueR Evaluation Campaign included two tasks of automatic answer retrieval: the first one was a QA task over a heterogeneous collection of texts -mainly newspaper articles, and the second one a specialised one in the Med...
For our second participation to the Question Answering task of CLEF, we kept last year's system named MUSCLEF, which uses two translation strategies implemented in two modules. The multilingual module MUSQAT analyzes the French questions, translates \interesting parts", and then uses these translated terms to search the refer- ence collection. The...
Mots-clefs : Navigation intra-documentaire, analyse thématique, structures du discours, relations discursives, subordination et coordination, parallélisme lexico-syntaxico-sémantique, modèle d'apprentissage, analyses linguistiques Résumé Dans ce papier, nous présentons un système de Détection de Structures fines de Texte (appelé DST). DST utilise u...
QA systems need semantic knowledge to find in documents variations of the question terms. They benefit from the use of knowledge resources such as synonym dictionaries or ontologies like Word- Net. Our goal here is to study to which extent vari- ations are needed and to determine what kinds of variations are useful or necessary for these systems. T...
Notre système de question-réponse MUSCLEF, qui a participé à l'évaluation CLEF en 2004, a été conçu pour fournir des réponses en anglais à des questions posées en français. Il est fondé sur notre système pour l'anglais, QALC, qui a participé à TREC, et y a obtenu de bons résultats quand nous avons combiné plusieurs stratégies. QALC recherchait des...
Most of the question answering systems currently developped adopt a fairly similar architecture, which can be divided into three modules : question analysis, document retrieval and answer extraction. However they differ in their tools (indexing engine, parsers...) and the knowledge bases they use. Thus, for each of these systems, it is important to...