Liana Ermakova

Liana Ermakova
Université de Bretagne Occidentale | UBO · Héritages et Construction dans le Texte et l’image-HTCI (EA 4249)

PhD Computer Science
Projects: - SimpleText: Automatic Simplification of Scientific Texts - JOKER: Automatic Wordplay translation

About

78
Publications
15,700
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
374
Citations
Introduction
Currently I am working on two research projects and I organise shared tasks at CLEF-2022 (https://clef2022.clef-initiative.eu/index.php): - SimpleText: Automatic Simplification of Scientific Texts https://simpletext-project.com/ - JOKER: Automatic Wordplay Translation https://www.joker-project.com/
Additional affiliations
September 2008 - September 2016
Perm State University
Position
  • Lecturer
September 2012 - March 2016

Publications

Publications (78)
Book
This edited volume explores how digital humanities can address critical societal challenges in social media, health, education, archives, heritage, and the arts. It features contributions from leading scholars and practitioners in various fields, offering a comprehensive overview of the role of digital humanities in addressing pressing social and e...
Chapter
Full-text available
Everyone agrees on the importance of objective scientific information. However, relevant scientific documents tend to be inherently difficult to find and understand either because of intricate terminology or the potential absence of prior knowledge among their readers. Can we improve accessibility for everyone? This paper introduces the SimpleText...
Chapter
The JOKER Lab at the Conference and Labs of the Evaluation Forum (CLEF) aims to foster research on automated processing of verbal humour, including tasks such as retrieval, classification, interpretation, generation, and translation. Despite the heady success of large language models, humour and wordplay automatic processing are far from being a so...
Chapter
In the introductory chapter of this volume, the authors contemplate the role of Digital Humanities in today’s fast-paced and interconnected world and present an overview of the book’s key values. These include the intersection of Digital Humanities with data science and big data, the application of digital tools and methodologies, and the perspecti...
Chapter
The goal of the JOKER track series is to bring together linguists, translators, and computer scientists to foster progress on the automatic interpretation, generation, and translation of wordplay. Being clearly important for various applications, these tasks are still extremely challenging despite significant recent progress in AI in information re...
Chapter
There is universal consensus on the importance of objective scientific information, yet the general public tends to avoid scientific literature due to access restrictions, its complex language or their lack of prior background knowledge. Academic text simplification promises to remove some of these barriers, by improving the accessibility of scient...
Chapter
The general public tends to avoid reliable sources such as scientific literature due to their complex language and lacking background knowledge. Instead, they rely on shallow and derived sources on the web and in social media – often published for commercial or political incentives, rather than the informational value. Can text simplification help...
Chapter
Understanding and translating humorous wordplay often requires recognition of implicit cultural references, knowledge of word formation processes, and discernment of double meanings – issues which pose challenges for humans and computers alike. This paper introduces the CLEF 2023 JOKER track, which takes an interdisciplinary approach to the creatio...
Chapter
Full-text available
The information spread through the Web influences politics, stock markets, public health, people’s reputation and brands. For these reasons, it is crucial to filter out false information. In this paper, we compare different automatic approaches for fake news detection based on statistical text analysis on the vaccination fake news dataset provided...
Chapter
While humour and wordplay are among the most intensively studied problems in the field of translation studies, they have been almost completely ignored in machine translation. This is partly because most AI-based translation tools require a quality and quantity of training data (e.g., parallel corpora) that has historically been lacking for humour...
Chapter
Although citizens agree on the importance of objective scientific information, yet they tend to avoid scientific literature due to access restrictions, its complex language or their lack of prior background knowledge. Instead, they rely on shallow information on the web or social media often published for commercial or political incentives rather t...
Poster
Full-text available
https://simpletext-project.com SimpleText tackles technical challenges and evaluation challenges by providing appropriate data and benchmarks for text simplification. We propose the following shared tasks: TASK 1 What is in (or out)? Select passages to include in a simplified summary, given a query TASK 2 What is unclear? Given a passage and a q...
Poster
Full-text available
https://www.joker-project.com/clef-2022/EN/project The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation. Tasks We invite you to submit both automatic and manua...
Chapter
https://www.joker-project.com/clef-2022/EN/project The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation. Tasks We invite you to submit both automatic and manual r...
Chapter
The Web and social media have become the main source of information for citizens, with the risk that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. Non-experts tend to avoid scientific literature due to its complex language or their lack of prior back...
Book
https://simpletext-project.com SimpleText tackles technical challenges and evaluation challenges by providing appropriate data and benchmarks for text simplification. We propose the following shared tasks: TASK 1 What is in (or out)? Select passages to include in a simplified summary, given a query TASK 2 What is unclear? Given a passage and a quer...
Article
Full-text available
The paper presents the results of the discourse, semantic and sentiment analysis of the medical professional forum publications. Computer-mediated communication (CMC) within the professional community on the portal MirVracha reveals a diversity of peculiarities of medical professional discourse since the portal includes informal posts and chats alo...
Article
Full-text available
В статье обсуждается возникновение дезинформации о лечении COVID-19 в англоязычном «Твиттере» в период научной и общественной дискуссии об эффективности препарата гидроксихлорокин для лечения заболевания. Анализируется непосредственное и опосредованное влияние медиаперсон на распространение дезинформации о COVID-19 в сети. В коллекции из 10 млн тви...
Article
Power of social media including Twitter for English speaking community to shape public opinion becomes critical during the current pandemic because of misinformation. The existing studies on spreading misinformation on social media hypothesise that the initial message is fake. In contrast, we focus on information distortion occurring in cascades as...
Chapter
Information retrieval has moved from traditional document retrieval in which search is an isolated activity, to modern information access where search and the use of the information are fully integrated. But non-experts tend to avoid authoritative primary sources such as scientific literature due to their complex language, internal vernacular, or l...
Article
Full-text available
The abstract is known to be a promotional genre where researchers tend to exaggerate the benefit of their research and use a promotional discourse to catch the reader's attention. The COVID‐19 pandemic has prompted intensive research and has changed traditional publishing with the massive adoption of preprints by researchers. Our aim is to investig...
Chapter
Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of t...
Method
Full-text available
Les systèmes modernes d'accès à l'information promettent de donner aux citoyens un accès direct à des informations clés provenant de sources primaires faisant autorité. La littérature scientifique est concernée mais est en réalité difficilement accessible aux non-experts en raison de la complexité langagière, la structure, longueur, etc des documen...
Preprint
Full-text available
To cite this version: Liana Ermakova, Patrice Bellot, Pavel Braslavski, Jaap Kamps, Josiane Mothe, et al.. Text Simplification for Scientific Information Access. 43rd edition of the annual BCS-IRSG European Conference on Information Retrieval : Advances in Information Retrieval (ECIR 2021), Mar 2021, Lucca (virtual), Italy. ffhal-03121986f
Conference Paper
Full-text available
Medical discourse within the professional community has undeservingly received very sparse researchers’ attention. Medical professional discourse exists offline and online. We carried out sentiment analysis on titles and text descriptions of materials published on the Russian portal Mir Vracha (90,000 word forms approximately). The texts were gener...
Conference Paper
Full-text available
Social media have become a major source of health information for lay people. It has the power to influence the public’s adoption of health policies and to determine the response to the current COVID-19 pandemic. The aim of this paper is to enhance understanding of personality characteristics of users who spread information about controversial COVI...
Conference Paper
Full-text available
Social media have become a valuable source of information. However, its power to shape public opinion can be dangerous, especially in the case of misinformation. Although, there are many studies on detection of misinformation, their underlying hypothesis is that the initial message is fake. In contrast, we focus on information distortion occurring...
Conference Paper
During the current COVID-19 pandemic, social media have become the main source of health-related information for lay people. The attitudes towards the handling of the sanitary crisis and the adoption of health policies by users can be influenced via social media. In particular, discussions about controversial COVID-19 treatments have attracted many...
Presentation
Full-text available
Modern information access systems hold the promise to give users direct access to key information from authoritative primary sources such as scientific literature, but non-experts tend to avoid these sources due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of t...
Conference Paper
Full-text available
The information spread through the Web influences politics, stock markets, public health, people's reputation and brands. For these reasons, it is crucial to filter out false information. In this paper, we compare different automatic approaches for fake news detection based on statistical text analysis on the vaccination fake news dataset provided...
Article
The increasing volume of textual information on any topic requires its compression to allow humans to digest it. This implies detecting the most important information and condensing it. These challenges have led to new developments in the area of Natural Language Processing (NLP) and Information Retrieval (IR) such as narrative summarization and ev...
Conference Paper
Automatic summary evaluation is an important but not solved problem. Manual assessment is expensive and subjective and it is not applicable in real time or on a large corpus. Commonly used metrics for summary evaluation still involve substantial human efforts since they assume comparison with a set of reference summaries. Existing metrics based on...
Article
Full-text available
An abstract is not only a mirror of the full article; it also aims to draw attention to the most important information of the document it summarizes. Many studies have compared abstracts with full texts for their informativeness. In contrast to previous studies, we propose to investigate this relation based not only on the amount of information giv...
Article
La recherche d'information fait souvent l'hypothèse que les documents pertinents sont "à propos de" la requête; la requête est ainsi supposée refléter le besoin d'information de l'utilisateur de façon appropriée. La plupart des moteurs de recherche fait l'hypothèse que le fait d'être "à propos de" peut être mesuré par l'appariement des termes du do...
Conference Paper
Full-text available
MC2 CLEF 2017 lab deals with how cultural context of a microblog affects its social impact at large. This involves microblog search, classification, filtering, language recognition, localization, entity extraction, linking open data, and summarization. Regular Lab participants have access to the private massive multilingual microblog stream of \tex...
Conference Paper
Full-text available
Sentence ordering is a key component of verbal ability. It is also crucial for automatic text generation. While numerous researchers developed various methods to automatically evaluate the informativeness of the produced contents, the evaluation of readability is usually performed manually. In contrast to that, we present a self-sufficient metric f...
Book
Sentence ordering (SO) is a key component of verbal ability. It is also crucial for automatic text generation. While numerous researchers developed various methods to automatically evaluate the informativeness of the produced contents, the evaluation of readability is usually performed manually. In contrast to that, we present a self-sufficient met...
Conference Paper
Full-text available
Opinion and trend mining on micro-blogs like Twitter recently attracted research interest in several fields including Information Retrieval (IR) and Natural Language Processing (NLP). However, the performance of existing approaches is limited by the quality of available training material. Moreover, explaining automatic systems' suggestions for deci...
Conference Paper
Managing individual expertise is a major concern within any industrial-wide organization. If previous works have extensively studied the related expertise and authority profiling issues, they assume a semantic independence of these two key concepts. In digital libraries, state-of-the-art models generally summarize the researchers’ profile by using...
Article
Full-text available
The MC2 CLEF 2017 Content Analysis task deals with classification, filtering, language recognition, localization, entity extraction, linking open data, and summarization. Festivals have a large presence on social media. The resulting microblog stream and related URLs are appropriate to experiment on advanced social media search and mining methods....
Conference Paper
Full-text available
One of the tasks of text generation is sentence ordering since it is crucial for readability. Nevertheless, there is no common approach for evaluation of sentence ordering. The state-of-the art methods are based on the comparison with a human-provided order. However, in many cases it is impossible or time and resource consuming. Therefore, we propo...
Conference Paper
Full-text available
This paper introduces a novel approach for document re-ranking in information retrieval based on topic-comment structure of texts. While most information retrieval models make the assumption that relevant documents are about the query and that aboutness can be captured considering bags of words only, we rather consider a more sophisticated analysis...
Conference Paper
Full-text available
Query expansion (QE) aims at improving information retrieval effectiveness by enhancing the query formulation. Because users' queries are generally short and because of the language ambiguity, some information needs are difficult to satisfy. Query reformulation and QE methods have been developed to face this issue. Pseudo relevance feedback (PRF) c...
Thesis
Full-text available
The efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. Howev...
Conference Paper
Full-text available
Query expansion (QE) aims at improving information retrieval (IR) effectiveness by enhancing the query formulation. Because users' queries are generally short and because of the language ambiguity, some information needs are difficult to answer. Query reformulation and QE methods have been developed to face this issue. Relevance feedback (RF) is on...
Article
Full-text available
CLEF Cultural micro-blog Contextualization Workshop is aiming at providing the research community with data sets to gather, organize and deliver relevant social data related to events generating a large number of micro-blog posts and web documents. It is also devoted to discussing tasks to be run from this data set and that could serve applications...
Conference Paper
Full-text available
This paper presents the approach we developed for automatic multi-document summarization applied to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm...
Article
Full-text available
In this paper, we describe an approach for tweet contextualization developed in the context of the INEX 2012. The task was to provide a context up to 500 words to a tweet from the Wikipedia. As a baseline system, we used TF-IDF cosine similarity measure enriched by smoothing from local context, named entity recognition and part-of-speech weighting...
Article
Full-text available
The paper presents IRIT’s approach used at INEX Tweet Contextualization Track 2014. Systems had to provide a context to a tweet from the perspective of the entity. This year we further modified our approach presented at INEX 2011, 2012 and 2013 underlain by the product of different measures based on smoothing from local context, named entity recogn...
Article
Full-text available
Information retrieval aims at retrieving relevant documents answering a user's need expressed through a query. Users' queries are generally less than 3 words which make a correct answer really difficult. Automatic query expansion (QE) improves the precision on average even if it can decrease the results for some queries. We propose a new automatic...
Article
Full-text available
Suicide is a major, preventable public health problem. Particularly the problem is critical for young people. In Russia every year thousands of teenagers commit suicide. In most of the cases it can be prevented if a risky state is detected. Nowadays internet becomes a major way of communication, mainly in the text form. Therefore we suggest a metho...
Conference Paper
Full-text available
The majority of sentiment classifiers is based on dictionaries or requires large amount of training data. Unfortunately, dictionaries contain only limited data and machine-learning classifiers using word-based features do not consider part of words, which makes them domain-specific, less effective and not robust to orthographic mistakes. We attempt...
Conference Paper
Full-text available
Face à l'immense volume de documents renvoyé par les moteurs de recherche sur le Web, les utilisateurs sont aidés dans leur tâche de sélection de documents par de courts extraits d'une à deux phrases associés à l'URL de chaque document renvoyé par le moteur de recherche, nommés snippets. Dans cet article, nous considérons la génération automatique...
Article
Full-text available
The paper presents IRIT’s approach used at INEX Tweet Contextualization Track 2013. Systems had to provide a context to a tweet. This year we further modified our approach presented at INEX 2011 and 2012 underlain by the product of scores based on hashtag processing, TF-IDF cosine similarity measure enriched by smoothing from local context and docu...
Conference Paper
Full-text available
In this paper we describe an approach for tweet contextualization developed in the context of the INEX question answering track. The task is to provide a context up to 500 words to a tweet. The summary should be an extract from the Wikipedia. Our approach is based on the index which includes not only lemmas, but also named entities (NE). Sentence r...
Article
Full-text available
The majority of existing spam filtering techniques suffers from several serious disadvantages. Some of them provide many false positives. The others are suitable only for email filtering and may not be used in IM and social networks. Therefore content methods seem to be more efficient. One of them is based on signature retrieval. However it is not...
Article
Full-text available
Показано, что в Рунете экстремистские тексты свободно размещаются не только в блогах тех или иных пользователей, но и на сайтах крупнейших интернет-компаний. К настоящему времени сложились речевые практики комментирования общественно значимых событий, нацеленные, как правило, на выражение эмоций враждебности и вербальную агрессию. Среди распростран...

Questions

Questions (2)
Question
One of important aspects of a researcher career is searching for funding of his/her scientific projects. What kinds of projects in computer science are likely to be supported? What are the most important aspects in a project proposal? What are the best opportunities for a young scientist?
Question
The main idea of modern recommender systems is to recommend you items similar to your profile. Your profile is calculated by your history (your favorite movies, items bought in online shops, articles suggested by Facebook...). Sometimes it is almost ridicules, especially when you have just bought let say a hydration pack but contextual advertising proposes you other hydration packs. If you like a movie, IMDB suggests you only similar movies. You do not have an opportunity to discover something new. Do you think that is a major drawback of modern recommender systems? I would appreciate to have your opinions.

Network

Cited By