
Marie-Jean MeursUniversité du Québec à Montréal | UQAM · Department of Computer Science
Marie-Jean Meurs
PhD
About
76
Publications
11,551
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
399
Citations
Citations since 2017
Introduction
Additional affiliations
August 2015 - February 2021
September 2010 - August 2015
September 2006 - August 2010
Education
September 2006 - December 2009
September 2004 - August 2006
Publications
Publications (76)
The rise of explainable natural language processing spurred a bulk of work on datasets augmented with human explanations, as well as technical approaches to leverage them. Notably, generative large language models offer new possibilities, as they can output a prediction as well as an explanation in natural language. This work investigates the capab...
Recently, there has been interest in multiplicative recurrent neural networks for language modeling. Indeed, simple Recurrent Neural Networks (RNNs) encounter difficulties recovering from past mistakes when generating sequences due to high correlation between hidden states. These challenges can be mitigated by integrating second-order terms in the...
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
Accurately predicting their future performance can ensure students successful graduation, and help them save both time and money. However, achieving such predictions faces two challenges, mainly due to the diversity of students' background and the necessity of continuously tracking their evolving progress. The goal of this work is to create a syste...
This chapter describes the participation of the RELAI team in the eRisk 2020 second task. The 2020 edition of eRisk proposed two tasks: (T1) Early assessment of the risk of self-harm and (T2) Signs of depression in social media users. The second task focused on automatically filling a depression questionnaire given the user’s writing history. The R...
Composing the representation of a sentence from the tokens that it comprises is difficult, because such a representation needs to account for how the words present relate to each other. The Transformer architecture does this by iteratively changing token representations with respect to one another. This has the drawback of requiring computation tha...
Acquiring training data to improve the robustness of dialog systems can be a painstakingly long process. In this work, we propose a method to reduce the cost and effort of creating new conversational agents by artificially generating more data from existing examples, using paraphrase generation. Our proposed approach can kick-start a dialog system...
Combining the representations of the words that make up a sentence into a cohesive whole is difficult, since it needs to account for the order of words, and to establish how the words present relate to each other. The solution we propose consists in iteratively adjusting the context. Our algorithm starts with a presumably erroneous value of the con...
On average, one in three Canadians will be affected by a legal problem over a three-year period. Unfortunately, whether it is legal representation or legal advice, the very high cost of these services excludes disadvantaged and most vulnerable people, forcing them to represent themselves. For these people, accessing legal information is therefore c...
Accurately predicting their future performance can ensure students successful graduation, and help them save both time and money. However, achieving such predictions faces two challenges, mainly due to the diversity of students’ background and the necessity of continuously tracking their evolving progress. The goal of this work is to create a syste...
In this paper, we propose a joint model, composed of neural and linguistic sub-models, to address classification tasks in which the distribution of labels over samples is imbalanced. Different experiments are performed on tasks 1 and 2 of the DEFT 2013 shared task [10]. In one set of experiments, the joint model is used for both classification task...
We take interest in the early assessment of risk for depression in social media users. We focus on the eRisk 2018 dataset, which represents users as a sequence of their written online contributions. We implement four RNN-based systems to classify the users. We explore several aggregations methods to combine predictions on individual posts. Our best...
Recently, there has been interest in multiplicative recurrent neural networks for language modeling. Indeed, simple Recurrent Neural Networks (RNNs) encounter difficulties recovering from past mistakes when generating sequences due to high correlation between hidden states. These challenges can be mitigated by integrating second-order terms in the...
We take interest in the early assessment of risk for depression in social media users. We focus on the eRisk 2018 dataset, which represents users as a sequence of their written online contributions. We implement four RNN-based systems to classify the users. We explore several aggregations methods to combine predictions on individual posts. Our best...
This book constitutes the refereed proceedings of the 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, held in Kingston, ON, Canada, in May 2019.
The 27 regular papers and 34 short papers presented together with 8 Graduate Student Symposium papers and 4 Industry Track papers were carefully reviewed and selected from 132 submis...
This paper presents a multipronged approach to predict early risk of mental health issues from user-generated content in social media. Supervised learning and information retrieval methods are used to estimate the risk of depression for a user given the content of its posts in reddit. The approach presented here was evaluated on the CLEF eRisk 2017...
It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors co...
This work presents the bioMine system, a full-text natural language search engine for biomedical literature. bioMine provides search capabilities based on the full-text content of documents belonging to a database composed of scientific articles and allows users to submit their search queries using natural language. Beyond the text content of artic...
This paper presents machine learning approaches based on supervised methods applied to triage of health and biomedical data. We discuss the applications of such approaches in three different tasks, and evaluate the usage of triage pipelines, as well as data sampling and feature selection methods to improve performance on each task. The scientific d...
This paper presents our ongoing work on the Vertex Separator Problem (VSP), and its application to knowledge discovery in graphs representing real data. The classic VSP is modeled as an integer linear program. We propose several variants to adapt this model to graphs with various properties. To evaluate the relevance of our approach on real data, w...
This paper presents the ongoing development of a full-text natural language search engine for biomedical literature. The system aims to provide search on the full-text content of documents belonging to a database composed of scientific articles, while allowing users to submit their search queries using natural language. Beyond the text content of a...
This paper presents a supervised learning approach to support the screening of HIV literature. The manual screening of biomedical literature is an important task in the process of systematic reviews. Researchers and curators have the very demanding, time-consuming and error-prone task of manually identifying documents that should be included in a s...
Local search engines are specialized information retrieval systems enabling users to discover amenities and services in their neighbourhood. Developing a local search system still raises scientific questions, as well as very specific technical issues. Those issues come for example from the lack of information about local events and actors, or the s...
Customer experience management (CEM) denotes a set of practices, processes, and tools, that aim at personalizing customer’s interactions with a company according to customer’s needs and desires (Weijters et al., J Serv Res 10(1):3–21, 2007 [29]). E-business specialists have long realized the potential of ubiquitous computing to develop context-awar...
This paper presents a supervised learning approach to support the screening of HIV literature. The manual screening of biomedical literature is an important task in the process of systematic reviews. Researchers and curators have the very demanding, time-consuming and error-prone task of manually identifying documents that must be included in a sys...
Enzymes active on components of lignocellulosic biomass are used for industrial applications ranging from food processing to biofuels production. These include a diverse array of glycoside hydrolases, carbohydrate esterases, polysaccharide lyases and oxidoreductases. Fungi are prolific producers of these enzymes, spurring fungal genome sequencing e...
This paper presents a machine learning system for supporting the first task of the biological literature manual curation process, called triage. We compare the performance of various classification models, by experimenting with dataset sampling factors and a set of features, as well as three different machine learning algorithms (Naive Bayes, Suppo...
The disambiguation algorithm presented in this paper is implemented in SemLinker, an entity linking system. First, named entities are linked to candidate Wikipedia pages by a generic annotation engine. Then, the algorithm re-ranks candidate links according to mutual relations between all the named entities found in the document. The evaluation is b...
In this paper, we present an algorithm for improving named entity resolution and entity linking by using surface form generation and rewriting. Surface forms consist of a word or a group of words that matches lexical units like Paris or New York City. Used as matching sequences to select candidate entries in a knowledge base, they contribute to the...
Background / Purpose:
The mycoSORT system is a machine learning based system to support the automatic triage of candidate PubMed abstracts for the mycoCLAP database . The classification performed by mycoSORT consists in analyzing a set of discriminative properties that can be used to predict the relevance of a particular abstract being chosen as...
Numerous initiatives have allowed users to share knowledge or opinions using collaborative platforms. In most cases, the users provide a textual description of their knowledge, following very limited or no constraints. Here, we tackle the classification of documents written in such an environment. As a use case, our study is made in the context of...
Wikimeta Lab participation in DeFT 2013 - Machine Learning for Information Extraction and Classification of Cooking Recipes.
This paper presents Wikimeta Lab participation in the Défi Fouille de Texte (DeFT) 2013. In 2013, this evaluation campaign is focused on mining cooking recipes in French. The campaign consists of three classification tasks an...
Fungi secrete a variety of enzymes that work efficiently to degrade lignocellulosic biomass. Since the breakdown of lignocellulose is vital for a number of industrial processes that stand to be improved, interest in these enzymes is high. Fungal lignocellulose-degrading enzymes are numerous and display a wide range of characteristics and properties...
Discovery and development of effective fungal enzyme cocktails are cornerstones of the biorefinery industry because these cocktails can convert lignocellulose into fermentable sugars for biofuel production. The manual curation of fungal genes encoding lignocellulose-active enzymes is an essential step for supporting further research and experiments...
Personalization nowadays is a commodity in a broad spec-trum of computer systems. Examples range from online shops recommending products identified based on the user's previous purchases to web search engines sorting search hits based on the user's browsing history. The aim of such adap-tive behavior is to help users to find relevant content easier...
Background / Purpose:
We propose a personalized information system that integrates natural language processing (NLP) to support users in analysing, transforming, and creating knowledge from large amounts of textual content. The whole system is designed to give users full control over personalization and leverage visualizations to adjust the adapt...
The number of scientific publications available in multiple repositories nowadays is huge and exponentially growing. Accessing this information is of critical importance to conducting research and designing experiments. However, retrieving data of particular interest for a specific research field in such a sheer volume of publications is often like...
Web portals are a major class of web-based content management systems. They can provide users with a single point of access to a multitude of content sources and applications. However, further analysis of content brokered through a portal is not supported by current portal systems, leaving it to their users to deal with information overload. We pre...
Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the...
Researchers need to extract critical knowledge from a massive amount of literature available in multiple and ever-growing repositories. The sheer volume of information makes the exhaustive analysis of literature a labor-intensive and time consuming task, during which significant knowledge can be easily missed.
In addition, the metadata generated fr...
Semantic technologies, including natural language processing (NLP), ontologies, semantic web services and web-based collaboration tools, promise to support users in dealing with complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brin...
Given G = (V, E) a connected undirected graph and a positive integer β(|V |), the vertex seperator problem is to find a partition of V into noempty three classes A, B , C such that there is no edge between A and B , max{|A|, |B|} ≤ β(|V |) and |C| is minimum. In this paper we consider the vertex separator problem from a polyhedral point of view. We...
We present our ongoing development of a semantic infrastructure supporting biofuel research. Part of this effort is the automatic curation of knowledge from the massive amount of information on fungal enzymes that is available in genomics. Working closely with biologists who manually curate the existing literature, we developed ontological NLP pipe...
Focusing on the interpretation component of spoken dialog systems, this paper introduces a stochastic approach based on dynamic Bayesian networks to infer and compose semantic structures from speech. Word strings, basic concept sequences and composed semantic frames (as defined in the Berkeley FrameNet paradigm) are derived sequentially from the us...
Spoken dialog systems enable users to interact with computer systems via natural dialogs, as they would with human beings. These systems are deployed into a wide range of application fields from commercial services to tutorial or information services. However, the communication skills of such systems are bounded by their spoken language understandi...
In the context of spoken language interpretation, this paper intro- duces a stochastic approach to infer and compose semantic struc- tures. Semantic frame structures are directly derived from word and basic concept sequences representing the users' utterances. A rule- based process provides a reference frame annotation of the speech training data....
A stochastic approach based on Dynamic Bayesian Networks (DBNs) is introduced for spoken language understanding. DBN-based models allow to infer and then to compose semantic frame-based tree structures from speech transcriptions. Experimental results on the French MEDIA dialog corpus show the appropriateness of the technique which both lead to good...
This paper introduces a stochastic interpretation process for composing semantic structures. This process, dedicated to spoken language interpretation, allows to derive semantic frame structures directly from word and basic concept sequences representing the users' utterances. First a two-step rule-based process has been used to provide a reference...
We introduce novel approaches of graph decomposition based on optimal separators and atoms generated by minimal clique separators. The decomposition process is applied to co-word graphs extracted from Web Of Science database. Two types of graphs are considered: co-keyword graphs based on the human indexation of abstracts and terminology graphs base...
A knowledge representation formalism for SLU is introduced. It is used for in-cremental and partially automated annotation of the Media corpus in terms of semantic structures. An automatic interpretation process is described for compos-ing semantic structures from basic semantic constituents using patterns involving constituents and words. The proc...
In this paper, the use of Markov Logic Networks (MLN) is considered for applica-tion in spoken dialogue systems. In spoken dialogues information that can be represented in logical form is often not explicitly expressed, but has to be inferred from detected concepts. Often, it is necessary to perform inferences in presence of incomplete premises and...
A knowledge representation formalism for SLU is in-troduced. It is used for incremental and partially au-tomated annotation of the Media corpus in terms of semantic structures. An automatic interpretation pro-cess is described for composing semantic structures from basic semantic constituents using patterns in-volving constituents and words. The pr...
A knowledge representation formalism for SLU is introduced. It is used for incremental and partially automated annotation of the Media corpus in terms of semantic structures. An automatic interpretation process is described for composing semantic structures from basic semantic constituents using patterns involving constituents and words. The proces...
This paper introduces a knowledge representation formalism used for annotation of the French M EDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semanti...
We propose a graph-based decomposition methodology of a network of document features represented by a terminology graph. The graph is automatically extracted from raw data based on Natural Language Processing techniques implemented in the TermWatch system. These graphs are Small Worlds. Based on clique minimal separators and the associated graph of...
VSP, approche polyédrale, séparateur, graphe
Network
Cited