Patrice Bellot

Patrice Bellot
Aix-Marseille Université | AMU · Laboratoire Informatique et Systèmes (LIS)

Professor

About

232
Publications
36,374
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,526
Citations
Citations since 2016
58 Research Items
907 Citations
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
Introduction
Patrice Bellot, Full Professor in Computer Science at Aix-Marseille Université AMU - CNRS (LIS and OpenEdition Lab) and Scientific Advisor in Text Mining at CNRS. I am interested in information retrieval and natural language processing approaches for mining large text collections. Applicative fields are sentiment analysis, digital humanities, web mining, question-answering, digital libraries.
Additional affiliations
January 2018 - April 2021
French National Centre for Scientific Research
Position
  • Consultant
September 2011 - present
Aix-Marseille Université
Position
  • Professor (Full)
September 2000 - August 2011
Université d´Avignon et des Pays du Vaucluse
Position
  • Professor (Associate)

Publications

Publications (232)
Conference Paper
Full-text available
Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-ar...
Chapter
Full-text available
The issues for Natural Language Processing and Information Retrieval have been studied for long time but the recent availability of very large resources (Web pages, digital documents…) and the development of statistical machine learning methods exploiting annotated texts (manual encoding by crowdsourcing is a new major way) have transformed these f...
Conference Paper
Full-text available
The current topic modeling approaches for Information Retrieval do not allow to explicitly model query-oriented latent topics. More, the semantic coherence of the topics has never been considered in this field. We propose a model-based feedback approach that learns Latent Dirichlet Allocation topic models on the top-ranked pseudo-relevant feedback,...
Article
Full-text available
The variety and diversity of published content are currently expanding in all fields of scholarly communication. Yet, scientific knowledge graphs (SKG) provide only poor images of the varied directions of alternative scientific choices, and in particular scientific controversies, which are not currently identified and interpreted. We propose to use...
Chapter
Although citizens agree on the importance of objective scientific information, yet they tend to avoid scientific literature due to access restrictions, its complex language or their lack of prior background knowledge. Instead, they rely on shallow information on the web or social media often published for commercial or political incentives rather t...
Preprint
Full-text available
The cognitive manifold of published content is currently expanding in all areas of science. However, Scientific Knowledge Graphs (SKGs) only provide poor pictures of the adversarial directions and scientific controversies that feed the production of knowledge. In this Article, we tackle the understanding of the design of the information space of a...
Poster
Full-text available
https://simpletext-project.com SimpleText tackles technical challenges and evaluation challenges by providing appropriate data and benchmarks for text simplification. We propose the following shared tasks: TASK 1 What is in (or out)? Select passages to include in a simplified summary, given a query TASK 2 What is unclear? Given a passage and a q...
Book
https://simpletext-project.com SimpleText tackles technical challenges and evaluation challenges by providing appropriate data and benchmarks for text simplification. We propose the following shared tasks: TASK 1 What is in (or out)? Select passages to include in a simplified summary, given a query TASK 2 What is unclear? Given a passage and a quer...
Chapter
The Web and social media have become the main source of information for citizens, with the risk that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. Non-experts tend to avoid scientific literature due to its complex language or their lack of prior back...
Preprint
Full-text available
Unlimited change in scientific terminology challenges integrity in scientific knowledge graph (SKG) representation, while current data and modeling standards, mostly document oriented, hardly allow a resilient semantic upgrade of scholarly content. Moreover, results of a “multimodal knowledge acquisition” are required for an efficient upgrade of se...
Chapter
Information retrieval has moved from traditional document retrieval in which search is an isolated activity, to modern information access where search and the use of the information are fully integrated. But non-experts tend to avoid authoritative primary sources such as scientific literature due to their complex language, internal vernacular, or l...
Chapter
Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of t...
Preprint
Full-text available
To cite this version: Liana Ermakova, Patrice Bellot, Pavel Braslavski, Jaap Kamps, Josiane Mothe, et al.. Text Simplification for Scientific Information Access. 43rd edition of the annual BCS-IRSG European Conference on Information Retrieval : Advances in Information Retrieval (ECIR 2021), Mar 2021, Lucca (virtual), Italy. ffhal-03121986f
Presentation
Full-text available
Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of t...
Preprint
Full-text available
Consumers are used to consulting posted reviews on the Internet before buying a product. But it's difficult to know the global opinion considering the important number of those reviews. Sentiment analysis afford detecting polarity (positive, negative, neutral) in a expressed opinion and therefore classifying those reviews. Our purpose is to determi...
Conference Paper
Full-text available
Handling long queries can involve either reducing its size by eliminating unhelpful sentences, or decomposing the long query into several short queries based on their content. A proper sentence classification improves the functionality of these procedures. Can Sentiment Analysis have an effective role in sentence classification? This paper analyses...
Article
Full-text available
This is a report on the ninth edition of the \textsl{Conference and Labs of the Evaluation Forum} (CLEF 2018), held in early September 2018, in Avignon, France. CLEF was a four day event combining a Conference and an Evaluation Forum. The Conference featured keynotes by Nicholas Belkin, Julio Gonzalo, and Gabriella Pasi, and presentation of 29 pee...
Article
Full-text available
Text normalization is a necessity to correct and make more sense of the micro-blogs messages, for information retrieval purposes. Unfortunately, tools and resources of text normalization are rarely shared. In this paper, an approach is presented based on an unsupervised method for text normalization using distributed representations of words, known...
Conference Paper
Full-text available
Citation analysis is considered as major and one of the most popular branches of bibliometrics. Citation analysis is based on the assumption that all citations have similar values and weights each equally. Specific research fields like content-based citation analysis (CCA) seeks to explain the "how" and "why" of citation behavior. In this paper we...
Conference Paper
Emojis are some of the most common ways to convey emotions and sentiments in social messaging applications. In order to help the user choose emojis among a vast range of possibilities, we aim at developing an automatic recommendation system based on user message analysis and real emoji usage, which goes beyond the simple dictionnary lookup that is...
Conference Paper
Full-text available
The iterative continuum of scientific production generates a need for filtering and specific crossing of ideas and papers. In this paper, we present BIBLME RecSys software which is dedicated to the analysis of bibliographical references extracted from scientific collections of papers. Our goal is to provide users with paper suggestions guided by th...
Chapter
In this paper, we provide a performance analysis of the question and describe their different tasks in Arabic language. Regardless of the approaches being studied this language, the first step is to analyze the question for extracting all the information exploited by the processes of searching for documents and selecting relevant passages. Question...
Conference Paper
Full-text available
Text normalisation is a necessity to correct and make more sense of the micro-blogs messages, for information retrieval purposes. Unfortunately, tools and resources of text normalisation are rarely shared. In this paper, an approach is presented based on an unsupervised method for text normalisation using distributed representations of words, known...
Article
Full-text available
Most of the queries submitted to search engines are composed of keywords but it is not enough for users to express their needs. Through verbose natural language queries, users can express complex or highly specific information needs. However, it is difficult for search engine to deal with this type of queries. Moreover, the emergence of social medi...
Conference Paper
Verbose query reduction and query term weighting are automatic techniques to deal with verbose queries. The objective is either to assign an appropriate weight to query terms according to their importance in the topic, or outright remove unsuitable terms from the query and keep only the suitable terms to the topic and user's need. These techniques...
Article
Full-text available
With the expanding growth of Arabic electronic data on the web, extracting information, which is actually one of the major challenges of the question-answering, is essentially used for building corpus of documents. In fact, building a corpus is a research topic that is currently referred to among some other major themes of conferences, in natural l...
Article
Full-text available
This article introduces an automated knowledge inference approach taking advantage of relationships extracted from texts. It is based on a novel framework making possible to exploit (i) a generated partial ordering of studied objects (e.g. noun phrases), and (ii) prior knowledge defined into ontologies. This framework is particularly suited for def...
Article
Full-text available
Dans le but d’exploiter les opinions dans les tweets, cet article présente une classification à partir du sentiment contenu au sein des tweets. Nous présentons une méthode d’identification de nouveaux mots-germes. Ils sont utilisés pour la prédiction de l’intensité de sentiments des mots en co-occurrence avec ces mots-germes. Ensuite, le calcul de...
Article
Full-text available
Working with corpus construction becomes an interesting alternative to different applications of natural language processing, such as, question-answering, machine translation, information retrieval, etc. Similarly, with the heterogeneous data and the user demands for the accurate information, many studies have accentuated the need of the Web to hig...
Article
Full-text available
Term weighting metrics assign weights to terms in order to discriminate the important terms from the less crucial ones. Due to this characteristic, these metrics have attracted growing attention in text classification and recently in sentiment analysis. Using the weights given by such metrics could lead to more accurate document representation whic...
Chapter
This study provides a comparative study of Arabic question-answering systems. It presents a review of the main approaches and emphasizes the different experimentations in Arabic. It attempts to describe and detail the recent increase in interest and progress in Arabic question-answering research. It compares already existing question-answering syst...
Conference Paper
Full-text available
In this paper, we present our contribution in Suggestion Track at the Social Book Search Lab. This track aims to develop test collections for evaluating ranking effectiveness of book retrieval and rec-ommender systems. In our experiments, we combine the results of Sequential Dependence Model (SDM) and the books information that includes the price,...
Article
Full-text available
With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus...
Conference Paper
Full-text available
In this paper, we present our contribution in SemEval2016 task7 Determining Sentiment Intensity of English and Arabic Phrases where we use web search engines for English and Arabic unsupervised sentiment intensity prediction. Our work is based, first, on a group of classic sentiment lexicons (e.g. Sentiment140 Lexicon, SentiWordNet). Second, on web...
Conference Paper
Full-text available
Exploiting the links between content is crucial in recommendation approaches. In the case of a scientific article library, bibliographic references serve as a major link source. Among them, some are explicit references as we can find at the end of articles or books, while other references are scattered in the text or in the footnotes, according to...
Conference Paper
Full-text available
Designing approaches able to automatically detect uncertain expressions within natural language is central to design efficient models based on text analysis, in particular in domains such as question-answering, approximate reasoning , knowledge-based population. This article proposes an overview of several contributions and classifications defining...
Conference Paper
In this paper, we present the automatic annotation of bibliographical references’ zone in papers and articles of XML/TEI format. Our work is applied through two phases: first, we use machine learning technology to classify bibliographical and non-bibliographical paragraphs in papers, by means of a model that was initially created to differentiate b...
Article
Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built...
Article
Full-text available
La détection de l'incertitude dans le langage naturel est centrale pour le développe-ment de nombreux modèles exploitant l'analyse de textes e.g. questions-réponses, raisonnement approché, enrichissement de bases de connaissances. Après une synthèse des différentes classifications de l'incertitude et des méthodes de détection correspondantes, cet a...
Article
Full-text available
So far different studies have tackled the sentiment analysis in several domains such as restaurant and movie reviews. But, this problem has not been studied in scholarly book reviews which is different in terms of review style and size. In this paper, we propose to combine different features in order to be presented to a supervised classifiers whic...
Conference Paper
Full-text available
Dans le contexte d'une bibliothèque d'articles scientifiques, les références bibliogra-phiques sont une source majeure de liens. Parmi elles, certaines sont explicites comme les réfé-rences que nous pouvons retrouver à la fin des articles ou des livres. Tandis que d'autres sont dispersées selon un degré de diffusion plus ou moins fort dans le corps...
Conference Paper
Full-text available
In this paper, we present our contribution in INEX 2016 Social Book Search Track. This year, we participate in a new track called Mining track. This track focus on detecting and linking book titles in online book discussion forums. We propose a supervised approach based on Support Vector Machine (SVM) classification process combined with Conditiona...
Article
Full-text available
This article proposes an approach to recommendation oriented analysis queries. In particular, we focus on queries where the user is looking for similarities between books, authors or collections. Our approach is, firstly, to identify these requests through an automatic supervised classification method. Secondly, a dependency analyzer is used to ide...
Conference Paper
Full-text available
Filtering pages about an entity (person, company, music band...) so that only interesting pages are kept is a real challenge. The interest can be qualified using criteria such as recency, novelty. In the last decade, we have seen classification systems trained to detect the interest for a document regarding an entity. For scalability reasons, it is...
Article
Full-text available
A combination of multiple information retrieval approaches is proposed for the purpose of book recommendation. In this paper, book recommendation is based on complex user's query. We used different theoretical retrieval models: probabilistic as InL2 (Divergence from Randomness model) and language model and tested their interpolated combination. Gra...
Conference Paper
Full-text available
A new combination of multiple Information Retrieval approaches are proposed for book recommendation based on complex users' queries. We used different theoretical retrieval models: probabilistic as InL2 (Divergence From Ran-domness model) and language models and tested their interpolated combination. We considered the application of a graph based a...
Conference Paper
Full-text available
In this paper, we present our contribution in INEX 2015 Social Book Search Track. This track aims to exploit social information (users reviews, ratings, etc.. .) from LibraryThing and Amazon collections. We used traditional information retrieval models, namely, InL2 and the Sequential Dependence Model (SDM) and tested their combination. We integrat...
Article
Full-text available
Malgré leur volume important et leur accessibilité, de nombreuses données numériques ne peuvent être correctement exploitées car elles sont contenues dans des textes sous des formes peu ou pas structurées. L’extraction de relations est un processus qui rassemble des techniques pour extraire des entités et des relations à partir de textes, nous donn...
Conference Paper
Full-text available
This paper describes our sentiment analysis systems which have been built for SemEval-2015 Task 10 Subtask B and E. For subtask B, a Logistic Regression classifier has been trained after extracting several groups of features including lexical, syntactic, lexicon-based, Z score and semantic features. A weighting schema has been adapted for positive...
Conference Paper
Full-text available
This paper describes our contribution in Opinion Target Extraction OTE and Sentiment Polarity sub tasks of SemEval 2015 ABSA task. A CRF model with IOB notation has been adopted for OTE with several groups of features including syntactic, lexical, semantic, sentiment lexicon features. Our submission for OTE is ranked fifth over twenty submissions....
Article
Full-text available
This paper deals with tweet contextualization evaluation. Text contextualization is defined as providing the reader with a summary allowing a reader to understand a short text that, because ofits size is not self-contained. A general evaluation framework for tweet contextualization or other type of short texts is defined. We propose a collection be...
Article
This article addresses an issue on entity driven filtering task. While detecting and disambiguating entities within documents, our approach strives to select documents of interest according to their centrality to some given named entities. We focus on selecting documents that bring novelty or relate an important event about an entity. We enhance en...
Conference Paper
Full-text available
This article constitutes an opening to think of the modeling and the analysis of Arabic texts within a question-answering system. It is a question of exceeding the traditional investigations focused on morpho-syntactic approaches. We present a new approach that analyzes a text, transforms it to logical predicates and extracts the accurate answer. I...
Conference Paper
Full-text available
Sentiment lexicon-based features have proved their performance in recent work concerning sentiment analysis in Twitter. Automatic constructed lexicon features seem to be enough influential to at tract the attention. In this paper, we propose a new metric to estimate the word polarity score, called natural entropy (ne), in order to construct a new s...
Conference Paper
Full-text available
RÉSUMÉ. Retrouver des informations importantes en temps sur une entité nommée particulière est un réel challenge. En effet, cela implique d'être capable de détecter l'entité dans les documents, mais en plus d'être capable de qualifier d'importante, au regard de l'entité, l'information véhiculée par le document. Dans cet article, nous formalisons un...