Guy Lapalme's research while affiliated with Université de Montréal and other places

Publications (229)

Preprint
Full-text available
This paper describes the design principles behind jsRealB, a surface realizer written in JavaScript for English or French sentences from a specification inspired by the constituent syntax formalism. It can be used either within a web page or as a node .js module. We show that the seemingly simple process of text realization involves many interestin...
Conference Paper
The web is constantly growing and its documents, getting progressively more dynamic, are well-suited to presentation automation by a text realizer. Current browser-based information display systems have mostly focused on the display and layout of textual data, restricting the generation of nonnumerical informations to canned text or formatted strin...
Conference Paper
In order to help businesses to communicate fruitfully, we present a solution based on ontology alignment for integrating business documents. We focus on detecting and resolving semantic conflicts encountered during the integration process due to different terminologies used in xCBL, cXML and RosettaNet. Our contribution is to benefit from research...
Article
Full-text available
Speaking a language can be an overwhelming task. The message (what to say), its corresponding linguistic expression (how to say it) and sound form (say it, i.e. articulation) have to be determined practically on the fly. To allow for this, parts of the process, in general the mechanical aspects (sentence structures) are automated, that is, they are...
Conference Paper
Modelling a customized view is a daunting task that takes into account several parameters, the most important being the user profile. In this paper we study many facets of modelisation and the issues that must be taken into account. We propose an approach in four steps to model a personalised visualization: extract data, guess user's needs and pref...
Article
In this paper we describe the many steps involved in building a production quality Machine Translation system for translating weather warnings between French and English. Although in principle this task may seem straightforward, the details, especially corpus preparation and final text presentation, involve many difficult aspects that are often glo...
Conference Paper
In order to customize the display of meteorological data for different users, we use clustering to group similar users. We compute a rate of similarity between the current user and all others in the same cluster. We use this rate for weighting users' preferences and then compute an average to be compared with a threshold to decide to display this p...
Article
Full-text available
To speak fluently is a complex skill. In order to help the learner to acquire it we propose an electronic version of an age old method: pattern drills (PD). While being highly regarded in the fifties, pattern drills have become unpopular since then. Despite certain shortcomings we do believe in the virtues of this approach, at least with regard to...
Article
This paper presents an FCA-based methodology for concept detection in a flat ontology. We apply this approach to an automatically generated ontology for a RosettaNet Partner Interface Process (PIP) which does not take advantage of some important OWL semantic relations like subClassOf. The goal of our approach is to regroup ontology classes sharing...
Article
This paper presents a methodology for adapting RosettaNet B2B Standard to Semantic Web technologies. We present an approach for mapping RosettaNet Partner Interface Process (PIP) descriptions defined currently with DTD or XML Schemas format to an ontological representation using an OWL/XML rendering. It has been applied to the full set of PIPs. The...
Conference Paper
This paper shows that full abstraction can be accomplished in the context of guided summarization. We describe a work in progress that relies on Information Extraction, statistical content selection and Natural Language Generation. Early results already demonstrate the effectiveness of the approach.
Conference Paper
The goal of our work is to propose models or methods to personalize the visualization of a large amount of weather information in a simple way and to make sure that a user can analyze all needed information. We personalize this visualization for each user according to an automatically detected profile based on clustering. Clustering is used to grou...
Article
Full-text available
Business negotiations represent a form of communication where informativeness, i.e., the amount of provided information, depends on context and situation. In this study, we hypothesize that relations exist between language signals of informativeness and the success or failure of negotiations. We support our hypothesis through linguistic and statist...
Article
Full-text available
Historically two types of NLP have been investigated: fully automated processing of language by machines (NLP) and autonomous processing of natural language by people, i.e. the human brain (psycholinguistics). We believe that there is room and need for another kind, INLP: interactive natural language processing. This intermediate approach starts fr...
Article
We describe a study conducted on the proposal of Druide informatique inc., in collaboration with RALI, aiming at developing a system capable of detecting "overdetections", i.e. designed for filtering detections erroneously flagged by a grammar checker. Various families of classifiers have been trained in a supervised way for 14 types of detections...
Conference Paper
Full-text available
Afin d'alléger le travail d'annotation de contex-tes illustrant le comportement syntaxico-sémantique des termes du domaine de spécialité de l'informatique et de l'Internet en français, une méthode d'annotation automatique a été conçue. Dans cet article, nous proposons d'évaluer une partie du système d'annotation automatique de lexies verbales. Nous...
Article
Full-text available
In this paper we describe our method for the sum-marization of legal documents helping a legal ex-pert determine the key ideas of a judgment. Our approach is based on the exploration of the docu-ment's architecture and its thematic structures in or-der to build a table style summary for improving co-herency and readability of the text. We present t...
Article
Full-text available
The user-generated Web content has been intensively analyzed in Information Extraction and Natural Language Processing research. Web-posted reviews of consumer goods are studied to find customer opinions about the products. We hypothesize that nonemotionally charged descriptions can be applied to predict those opinions. The descriptions may include...
Conference Paper
Full-text available
Plusieurs méthodes ont été proposées dans le but d'annoter automatiquement les rôles sémantiques des unités lexicales verbales. Des méthodes statistiques ou d'apprentissage machine, basées sur des ressources lexicales, particulièrement FrameNet et Propbank pour l'anglais, ont été expérimentées en proposant de classifier les arguments d'un verbe dan...
Conference Paper
We propose a new, ambitious framework for abstractive summarization, which aims at selecting the content of a summary not from sentences, but from an abstract representation of the source documents. This abstract representation relies on the concept of Information Items (InIt), which we define as the smallest element of coherent information in a te...
Conference Paper
We describe the development of an “overdetection” identifier, a system for filtering detections erroneously flagged by a grammar checker. Various families of classifiers have been trained in a supervised way for 14 types of detections made by a commercial French grammar checker. Eight of these were integrated in the most recent commercial version o...
Article
As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embe...
Conference Paper
Full-text available
Dans cet article, nous traitons de l'identification automatique des participants actants et circonstants de lexies prédicatives verbales tirées d'un corpus spécialisé en langue française. Les actants contribuent à la réalisation du sens de la lexie alors que les circonstants sont optionnels : ils ajoutent une information supplémentaire qui ne fait...
Conference Paper
Full-text available
This paper presents a supervised machine learning approach for summarizing legal documents. A commercial system for the analysis and summarization of legal documents provided us with a corpus of almost 4,000 text and extract pairs for our machine learning experiments. That corpus was pre-processed to identify the selected source sentences in extra...
Article
An electronic version of LVF compared with other lexical resources Open access to rich English lexical resources helped the development of natural language processing research in English. Unfortunately there is no comparable lexical resource in French. While WordNet is surely the most well known, we can also think of VerbNet and FrameNet. We show t...
Conference Paper
We will present the research and products developed by members of the RALI for more than 15 years in many areas of NLP: translation tools, spelling checkers, summarization, text generation, information extraction and information retrieval. We will focus on projects involving industrial partners and will point out what we feel to be the benefits and...
Conference Paper
Full-text available
To speak fluently is a complex skill. If reaching this goal in one's mother tongue is already quite a feat, to do so in a foreign language can be overwhelming. One way to overcome the expression problem when going abroad is to use a dictionary or a phrasebook. While neither of them ensures fluency, both of them are useful translation tools. Yet, ne...
Conference Paper
This paper presents an information system for legal professionals that integrates natural language processing technologies such as text classification and summarization. We describe our experience in the use of a mix of linguistics aware transductor and XML technologies for bilingual information extraction from judgements in both French and English...
Article
Introduction Translation from a source language into a target language has become a very important activity in recent years, both in official institutions (such as the United Nations and the EU, or in the parliaments of multilingual countries like Canada and Spain), as well as in the private sector (for example, to translate user's manuals or newsp...
Conference Paper
Full-text available
The objective of our work is to develop an automatic method for identifying actants (also called arguments) of predicative lexical units in running text. This work is carried out within a larger project that aims at providing rich contextual information in terminological databases. More specifically, the project consists in annotating predicative t...
Article
Full-text available
This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class, multi-labelled, and hierarchical. For each classification task, the study relates a set of changes in a confusion matrix to specific characteristics of data. Then the analys...
Conference Paper
Full-text available
This paper shows that a detailed, although non-emotional, description of event or an action can be a reliable source for learning opinions. Empirical results show the practical utility of our approach and its competitiveness in comparison with previously used methods. 1 Motivation Humans can infer opinion from details of event or action description...
Conference Paper
This paper presents the machine translation system known as TransLI (Translation of Legal Information) developed by the authors for automatic translation of Canadian Court judgments from English to French and from French to English. Normally, a certified translation of a legal judgment takes several months to complete. The authors attempted to shor...
Conference Paper
Full-text available
Despite the impressive amount of recent studies devoted to improving the state of the art of Machine Translation (MT), Computer Assisted Translation (CAT) tools remain the preferred solution of human translators when publication quality is of concern. In this paper, we present our perspectives on improving the commercial bilingual concordancer Tran...
Article
RALI developed the system NESS2 for the TAC 2009 summarization task, with the main goal of testing the hypothesis that performing sentence selection in 2 steps improves the quality of the created sum- maries. The first step selects a number of top sentences, while the second step selects the best combination of the top scored sentences. We use the...
Article
Full-text available
This article presents an attempt to es-tablish an upper bound on purely ex-tractive summarization techniques. Al-together, five human summarizers com-posed 88 standard and update summaries of the TAC 2009 competition. Only entire sentences of the source documents were selected by the human "extractors", with-out modification, to form 100-word sum-m...
Article
Translation spotting consists in automatically identifying the translations of a user query in- side a bitext. This task, when it relies solely on statistical word alignment algorithms, fails to achieve excellent results. In this paper, we show that identifying the translations of a query during a first translation spotting stage provides relevant...
Article
TransCheck, the RALI's automatic translation checker, has recently undergone a field trial at the Government Translation Service of Ontario, where the system was used not only to detect incon- sistent terminology, but also to find new source language terms in texts sent to outside translation suppliers. We describe a specialized term-spotting modul...
Conference Paper
Full-text available
We show that verbs reliably represent texts when machine learning algorithms are used to learn opinions. We identify semantic verb categories that capture essential properties of human communication. Lexical patterns are applied to construct verb-based features that represent texts in machine learning experiments. Our empirical results show that ex...
Article
This document presents an experiment in the automatic translation of Canadian Court judgments from English to French and from French to English. We show that although the language used in this type of legal text is complex and specialized, an SMT system can produce intelligible and useful translations, provided that the system can be trained on a v...
Article
NESS, RALI's summarization system for the TAC 2008's update task, brings im- provements and continuation to our last year's "all-symbolic" approach. The most distinctive feature of our system is to rely on the syntactical parser FIPS to ex- tract linguistic knowledge from source documents. NESS selects sentences based on linguistic metrics, especia...
Article
Full-text available
We present a work in progress on machine learning of affect in human verbal communications. We identify semantic verb categories that capture essential properties when human communica- tion combines spoken and written language properties. Information Extraction methods then are used to construct verb-based features that represent texts in machine l...
Article
Full-text available
Notwithstanding machine translation's im- pressive progress over the last decade, many translators remain convinced that the output of even the best MT systems is not sufficient to facilitate the production of publication-quality texts. To increase their productivity they turn instead to translator support tools. We exam- ine the use of one such to...
Conference Paper
We describe an architecture for organizing and summarizing consumer reviews about products that have been posted on specialized web sites. The core technology is based on the automatic extraction of product features for which we report experiments on two types of corpora. We thus show that NLP techniques can be fruitfully used in this context for h...
Article
This paper describes a first attempt to base a paraphrase generation system upon Meľčuk and Žolkovskij's linguistic meaning‐text (MT) model whose purpose is to establish correspondences between meanings, represented by networks, and (ideally) all synonymous texts having this meaning. The system described here contains a Prolog implementation of a s...
Conference Paper
Full-text available
Business negotiations represent a form of communication where informativeness, the amount of provided information, depends on context and situation. This study shows that there are relations between language signals of informativeness and success or failure of negotiations. We support our claim by machine learning experiments. We use linguistic an...
Conference Paper
Full-text available
This study emphasizes the importance of using appropriate measures in particular text classiflcation settings. We focus on methods that evaluate how well a classifler performs. The efiect of transforma- tions on the confusion matrix are considered for eleven well-known and recently introduced classiflcation measures. We analyze the measure's abilit...
Article
Confidence measures are a practical solution for improving the usefulness of Natural Language Processing applications. Confidence estimation is a generic machine learning approach for deriving confidence measures. We give an overview of the application of confidence estimation in various fields of Natural Language Processing, and present experiment...
Conference Paper
Full-text available
The LIA-Thales system is made of five dif- ferent sentence selection systems and a fu- sion module. Among the five sentence selec- tion systems used, two were originally devel- oped for the Question-Answering task (QA) and three specifically built for DUC-2006. The outputs of the five systems are combined in a weighted graph where the cost function...
Article
Machine Translation (MT) is the focus of extensive scientific investigations driven by regular evaluation campaigns, but which are mostly oriented towards a some- what particular task: translating news articles into English. In this paper, we investigate how well current MT approaches deal with a real-world task. We have rationally recon- structed...
Article
Machine Translation (MT) is the focus of extensive scientific in-vestigations driven by regular evaluation campaigns, but which are mostly oriented towards a somewhat artificial task: translating news articles into En-glish. In this paper, we investigate how well current MT approaches deal with a real-world task. We have rationally reconstructed on...
Conference Paper
We describe the use of a translation memory in the context of a reconstruction of a landmark application of machine translation, the Canadian English to French weather report translation system. This system, which has been in operation for more than 20 years, was devel- oped using a classical symbolic approach. We describe our experiment in develop...
Conference Paper
Full-text available
The case-based reasoning approach to email response consists of reusing past messages to synthesize new responses to incoming requests. This task presents various challenges due to the nature of the messages: Textual descriptions, multiple topics, heterogeneous content, variable text length and varying recurrence of the statements. In this paper, w...
Article
Text prediction is a form of interactive machine translation that is well suited to skilled translators. In principle it can assist in the production of a target text with minimal disruption to a translator's normal routine. However, recent evaluations of a prototype prediction system showed that it significantly decreased the productivity of most...
Article
Full-text available
We present results of a statistical method we developped for the detection of what we define as generalized named entities from manually transcribed conversations. This work is part of an ongoing project for an information extraction system in the field of maritime Search And Rescue (SAR). Our purpose is to automatically detect relevant words and a...
Article
LetSum is a summarization system developed for producing short summaries for legal deci- sions. LetSum is built with an approach based on the exploration of the document structure and thematic segmentation in order to produce a table-style summary for improving coherency and readability of the text. We present the com- ponents of the system and its...
Conference Paper
We describe experiments carried out with adaptive language and translation models in the context of an interactive computer-assisted translation program. We developed cache-based language models which were then extended to the bilingual case for a cache- based translation model. We present the improve- ments we obtained in two contexts: in a theore...
Article
Full-text available
Résumé -Abstract Nous présentons les résultats de l'approche statistique que nous avons développée pour le repé-rage de mots informatifs à partir de textes oraux. Ce travail fait partie d'un projet lancé par le département de la défense canadienne pour le développement d'un système d'extraction d'in-formation dans le domaine de la Recherche et Sauv...
Article
We present the results of a semantic tagger we developped for the detection of in- formative words from manually transcribed conversations. This work is part of a project for developing an information extraction system in the field of maritime Search And Rescue (SAR). Our purpose is to automatically detect relevant words and annotate them with conc...
Article
Full-text available
This paper presents our work on the development of a new methodology for automatic summarization of justice decision. We describe LetSum (Legal text Sum-marizer), a prototype system, which determines the thematic structure of a judgment in four themes INTRODUCTION, CONTEXT, JURIDICAL ANALYSIS and CONCLUSION. Then it identifies the relevant sentence...
Article
This paper presents a detailed analysis of the factors determining the performance of Lesk-based word sense disambiguation methods. We conducted a series of experiments on the original Lesk algorithm, adapted to WORDNET, and on some variants. These methods were evaluated on the test corpus from SENSEVAL2, English All Words, and on excerpts from SEM...
Article
Full-text available
TT2 is an innovative tool for speeding up and facilitating the work of translators by automatically suggesting translation completions. Different versions of the system are being developed for English, French, Spanish and German by an international team of researchers from Europe and Canada. Two professional translation agencies are currently evalu...
Article
We aim at synthesizing an executable specification for a real-time reactive system by integrating real-time scenarios into a reduced timed automaton (TA). A scenario is a part of the specification of a system behavior. The integration of scenarios into a single TA is based on its formal semantics. The TA, which results from the integration of a set...
Article
Today, more and more companies are facing an over-abundance of questions from their customers who now use an electronic medium to communicate rather than a more tra- ditional medium such as mail or telephone. In this paper, we discuss the use of techniques developed in the area of question-answering to improve customer service management by re- spo...
Conference Paper
Full-text available
In this paper, we describe a case-based reasoning approach for the semi-automatic generation of responses to email messages. This task poses some cha llenges from a case-based reasoning perspective especially to the precision of the retrieval phase and the adaptation of textual cases. We are currently developing an application for the Investor rela...
Article
Full-text available
This paper discusses the design and the approach we have developed in order to deal effectively with customer e- mails sent to a corporation. We first present the current state of the art and then make the point that natural language tools are needed in order to deal effectively with the rather informal style encountered in the e-mails. In our proj...
Conference Paper
Full-text available
Lexical relationships allow a textual CBR system to establish case similarity beyond the exact correspondence of words. In this paper, we explore statistical models to insert associations between problems and solutions in the retrieval process. We study two types of models: word co- occurrences and translation alignments. These a pproaches offer th...
Article
In our second participation to the DUC evaluation, we used the SumUM system for the multi- document summarization focused by events task and summaries in response to a question task. Our multi-document summarization algorithm is based on the use of background information gathered by summarizing previous texts, which is then combined with a new docu...
Article
Introduction SumUM est un systme de rsum automatique dvelopp par H.Saggion (2000), qui produit de courts rsums automatiques de longs documents scientifiques et techniques. Il s'agit d'une approche de gnration de rsum, qui produit des rsums indicatifs et informatifs. SumUM produit le rsum en deux tapes : l'utilisateur reoit d'abord un rsum indicatif...
Article
Full-text available
A frequently encountered problem in urb an life is navigation. In order to get to some place we use private means or public transportation, and if we lack clear directions we tend to ask 'for help. We will deal in this paper with the descriptions of subway routes and. their automatic generation. In particular, we will try.to show how the relative i...
Article
We describe the results of a corpus study of more than 400 text excerpts that accompany graphics. We show that text and graphics play complementary roles in transmitting informa- tion from the writer to the reader and derive some observations for the automatic generation cf texts associated with graphics.
Article
Full-text available
n, it is also possible for an agent to execute non-linguistic speech acts. In this context, we propose a model of speech act planner to be used in a cooperative responses generation system. The model can explain an agent's general behavior during a conversation involving two agents or more. It deals with the reasoning process between the perception...
Article
Full-text available
This paper argues for looking at Controlled Lan- guages (CL) from a Natural Language Generation (NLG) perspective. We show that CLs are used in a normafive environment in which dif- ferent textual modules can be identified, each having its own set of rules constraining the text. These rules can be used as a basis for natural language generation. Th...
Article
Full-text available
This paper discusses an approach to planning the content of instructional texts. The research is based on a corpus study of 15 French procedural texts ranging from step-bystep device manuals to general artistic procedures. The approach taken starts from an AI task planner building a task representation, from which semantic carriers are selected. Th...
Article
Full-text available
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, desc...