About
344
Publications
83,536
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,500
Citations
Current institution
Publications
Publications (344)
Early Risk Detection (ERD) on the Web aims to identify promptly users facing social and health issues. Users are analyzed post-by-post, and it is necessary to guarantee correct and quick answers, which is particularly challenging in critical scenarios. ERD involves optimizing classification precision and minimizing detection delay. Standard classif...
Medical imaging interpretation plays a vital role in primary health care, and with increasing workloads, the integration of artificial intelligence to automate this task can be useful in assisting doctors in their daily work. In the present study, we develop a novel neural architecture based on Transformer models called the enhanced transformer bas...
Social networks have become one of the most popular ways for people to communicate with others and get informed. For this reason, these platforms are being widely used to spread propaganda and thereby influence the beliefs, opinions and actions of their users. Despite its relevance, current computational approaches to detect propaganda are mainly f...
The use of Deep Learning-based solutions has become popular in Natural Language Processing due to their remarkable performance in a wide variety of tasks. Specifically, Transformer-based models (e.g. BERT) have become popular in recent years due to their outstanding performance and their ease of adaptation (fine-tuning) in a large number of domains...
We present experiments on detecting hyperpartisanship in news using a ‘masking’ method that allows us to assess the role of style vs. content for the task at hand. Our results corroborate previous research on this task in that topic related features yield better results than stylistic ones. We additionally show that competitive results can be achie...
Social media is frequently plagued with undesirable phenomena such as cyberbullying and abusive content in the form of hateful and racist posts. Therefore, it is crucial to study and propose better mechanisms to automatically identify communication that promote hate speech, hostility, and aggressiveness. Traditional approaches have only focused on...
Author profiling (AP) is a highly relevant natural language processing (NLP) problem; it deals with predicting features of authors such as gender, age and personality traits. It is done by analyzing texts written by the authors themselves; take for instance documents such as books, articles, and more recently posts in social media platforms. In the...
Misogyny is a severe social problem that affects women’s mental and physical health or even leads to femicide. This cultural problem is visible and prevalent in different communication channels, such as music and social media, confirming or inciting this behavior. Hence, the automatic detection of misogynistic content in social media using computat...
The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations be...
In this chapter, we describe the participation of our research team in the eRisk addressing the two editions of the early anorexia detection task. We used two domain-independent approaches to address this task. The first approach is based on a temporal-aware document representation, whereas the second one consists of a simple, interpretable, and no...
Currently, self-harm is considered one of the leading causes of death by suicide in young people. Timely detection of self-inflicted injury is important to help people before the illness gets worse, minimizing disabilities and returning them to their normal life. A popular way for people to share information is using social media platforms, where t...
The goal of rank fusion in information retrieval (IR) is to deliver a single output list from multiple search results. Improving performance by combining the outputs of various IR systems is a challenging task. A central point is the fact that many non-obvious factors are involved in the estimation of relevance, inducing nonlinear interrelations be...
Over the last few years, studies related to the detection of mental disorders in social media have been increasing. The latter because the awareness created by health campaigns that emphasizes the commonness of these disorders among all of us has motivated the creation of new datasets, many of them extracted from social media platforms. In this stu...
The interpretation of medical images is a fundamental process for the diagnosis and treatment of patients. This process contributes determining the causes of symptoms as well as monitoring the effects of any treatment. Although the generation of medical reports from images is a complex task, deep learning strategies have been integrated with models...
Thanks to the availability of digital media, users receive daily news reports, opinions and information on a wide variety of topics. These same media allow people to easily share and transmit their own opinions, thus enriching the debate and reflection on topics of public interest. Unfortunately, these circumstances have led to the emergence of fak...
The detection of lesions from computed tomography scans is an important and nontrivial task in medical diagnosis. The difficulty of this task is related to the medical data where the appearance of different organs and lesions is not easily distinguished from the background. This paper proposes a One-Stage Lesion Detection method named OSLeD-wA. OSL...
The author profiling task refers to extracting as much of an author through what he writes, such as gender, age, nationality, location, among others. Although this task arose a few decades ago, the explosion in social networks has made the task of author profiling mainly focus on digital media. Typically, previous works have used only the text of t...
Adverse drug reactions (ADRs) are a major cause of patients’ morbidity and mortality, and a source of financial burden for health systems. In this context, pharmacovigilance plays a key role, which has led to its application on social media texts, where users often report various personal health issues, including adverse drug reactions and problems...
Public health surveillance via social media can be a useful tool to identify and track potential cases of a disease. The aim of this research was to design a method for identifying tweets describing potential Covid-19 cases. The proposed method uses a Wide & Deep (W&D) architecture, which combines two learning branches fed from different features t...
Depression is a severe mental health problem. Due to its relevance, the development of computational tools for its detection has attracted increasing attention in recent years. In this context, several research works have addressed the problem using word-based approaches (e.g., a bag of words). This type of representation has shown to be useful, in...
Depression is a common and very important health issue with serious effects in the daily life of people. Recently, several researchers have explored the analysis of user-generated data in social media to detect and diagnose signs of this mental disorder in individuals. In this regard, we tackled the depression detection task in social media conside...
Social networks have become the main means of communication and interaction between people. In them, users share information and opinions, but also their experiences, worries, and personal concerns. Because of this, there is a growing interest in analyzing this kind of content to identify people who commit self-harm, which is often one of the first...
This paper presents DeepBoSE, a novel deep learning model for depression detection in social media. The model is formulated such that it internally computes a differentiable Bag-of-Features (BoF) representation that incorporates emotional information. This is achieved by a reinterpretation of classical weighting schemes like tf-idf into probabilist...
Taking advantage of the increasing amount of user-generated content in social media, some computational methods have already been proposed for detecting people suffering from depression and anorexia. Such complex tasks have been tackled as a binary classification problem using, in most cases, automatically generated training data. Despite its promi...
Millions of people around the world are affected by one or more mental disorders that interfere in their thinking and behavior. A timely detection of these issues is challenging but crucial, since it could open the possibility to offer help to people before the illness gets worse. One alternative to accomplish this is to monitor how people express...
Psychologists have used tests and carefully designed survey questions, such as Beck's Depression Inventory (BDI), to identify the presence of depression and to assess its severity level.
On the other hand, methods for automatic depression detection have gained increasing interest since all the information available in social media, such as Twitter...
This paper presents the Deep Bag-of-Sub-Emotions (DeepBoSE), a novel deep learning model for depression detection in social media. The model is formulated such that it internally computes a differentiable Bag-of-Features (BoF) representation that incorporates emotional information. This is achieved by a reinterpretation of classical weighting schem...
This paper summarizes the thesis: "Author Profiling in Social Media with Multimodal Information." Our solution uses a multimodal approach to extracting information from written messages and images shared by users. Previous work has shown the existence of useful information for this task in these modalities; however, our proposal goes further, demon...
Different mental disorders affect millions of people around the world, causing significant distress and interference to their daily life. Currently, the increased usage of social media platforms, where people share personal information about their day and problems, opens up new opportunities to actively detect these problems. We present a new appro...
This work presents an experimental study on the task of Named Entity Recognition (NER) for a narrow domain in Spanish language. This study considers two approaches commonly used in this kind of problem, namely, a Conditional Random Fields (CRF) model and Recurrent Neural Network (RNN). For the latter, we employed a bidirectional Long Short-Term Mem...
This paper considers the problem of leveraging multiple sources of information or data modalities (e.g., images and text) in neural networks. We define a novel model called gated multimodal unit (GMU), designed as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data...
A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF’s eRisk open tasks. SS3 was created to deal with ERD problems naturally since: it supports incremental t...
The increasing propagation of abusive language in social media is a major concern for supplier companies and governments because of its negative social impact. A large number of methods have been developed for its automatic identification, ranging from dictionary-based methods to sophisticated deep learning approaches. A common problem in all these...
Passage retrieval is an important stage of question answering systems. Closed domain passage retrieval, e.g. biomedical passage retrieval presents additional challenges such as specialized terminology, more complex and elaborated queries, scarcity in the amount of available data, among others. However, closed domains also offer some advantages such...
In this study we introduce the k-Strongest Strengths (kSS) Classification Algorithm, a novel approach for classification problems based on the well-known k-Nearest Neighbor (kNN) classifier. The proposed kSS method is motivated by an analogy to the Law of Universal Gravitation. The novelty of kSS resides in that instead of only using the neighbors’...
The facilities provided by social media and computer-mediated communication make easy the dissemination of deceptive behavior, after which different entities or people could be affected. The deception detection by supervised learning has been widely studied; however, the scenario in which there is one domain of interest and the labeled data is in a...
This paper describes the participation of the MindLab research group in the BioASQ 2019 Challenge for task 7b, document retrieval and snippet retrieval. For document retrieval, Elastic Search was used for the initial document retrieval step with BM25 as a scoring function. In the second stage, the top 100 retrieved documents were re-ranked with sev...
In this study we propose a novel method to generate Document Embeddings (DEs) by means of evolving mathematical equations that integrate classical term frequency statistics. To accomplish this, we employed a Genetic Programming (GP) strategy to build competitive formulae to weight custom Word Embeddings (WEs), produced by cutting edge feature extra...
Automatic systems for the analysis of textual information are highly relevant for the sake of developing completely autonomous assessment and authentication mechanisms in online learning scenarios. Since written texts within e-learning systems are mostly associated with exams and evaluations, it is critical to authenticate the identity of participa...
A recently introduced text classifier, called SS3, has obtained state-of-the-art performance on the CLEF's eRisk tasks. SS3 was created to deal with risk detection over text streams and therefore not only supports incremental training and classification but also can visually explain its rationale. However, little attention has been paid to the pote...
A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was created to naturally deal with ERD problems since: it supports incremental t...
Recent works have shown that it is possible to use information extracted from images to address the task of automatic gender identification. These proposals have validated their solutions using monolingual datasets, i.e., collections where images are shared by users having the same mother tongue. This paper aims to test the usefulness of images col...
This paper presents the framework and results from the MEX-A3T track at IberLEF 2019. This track considers two tasks, author profiling and aggressiveness detection, both of them using Mexican Spanish tweets. The author profiling task consists on determining the gender, occupation and place of residence of users from their tweets. As a novelty in th...
Depression is a mental disorder with strong social and economic implications. Due to its relevance, recently several researches have explored the analysis of social media content to identify and track depressed users. Most approaches follow a supervised learning strategy supported on the availability of labeled training data. Unfortunately, acquiri...
In this paper, we present our approach to the detection of anorexia at eRisk 2019. The main objective of this shared task is to identify as soon as possible if a user shows signs of anorexia by using their posts on Reddit. For this, we evaluate a representation called Bag of Sub-Emotions (BoSE), a new technique that represents user posts by buildin...
Automatic Image Annotation (AIA) is the task of assigning keywords to images, with the aim to describe their visual content. Recently, an unsupervised approach has been used to tackle this task. Unsupervised AIA (UAIA) methods use reference collections that consist of the textual documents containing images. The aim of the UAIA methods is to extrac...
We present experiments on detecting hyperpartisanship in news using a 'masking' method that allows us to assess the role of style vs. content for the task at hand. Our results corroborate previous research on this task in that topic related features yield better results than stylistic ones. We additionally show that competitive results can be achie...
Nowadays social media platforms are the most popular way for people to share information, from work issues to personal matters. For example , people with health disorders tend to share their concerns for advice, support or simply to relieve suffering. This provides a great opportunity to proactively detect these users and refer them as soon as poss...
This paper summarizes the thesis: "Identificación del perfil de autores en redes sociales usando nuevos esquemas de pesado que enfatizan información de tipo personal" whose main idea indicates that terms located in phrases exposing personal information are highly valuable for the AP task. Firstly, it is presented an study on the relevance of this i...
Author Profiling (AP) aims at predicting specific characteristics from a group of authors by analyzing their written documents. Many research has been focused on determining suitable features for modeling writing patterns from authors. Reported results indicate that content-based features continue to be the most relevant and discriminant features f...
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, mu...
Author Profiling (AP) aims at predicting specific characteristics from a group of authors by analyzing their written documents. Many research has been focused on determining suitable features for modeling writing patterns from authors. Reported results indicate that content-based features continue to be the most relevant and discriminant features f...
Patriarchal behavior, such as other social habits, has been transferred online, appearing as misogynistic and sexist comments, posts or tweets. This online hate speech against women has serious consequences in real life, and recently, various legal cases have arisen against social platforms that scarcely block the spread of hate messages towards in...
Several methods have been proposed for determining plagiarism between pairs of sentences, passages or even full documents. However, the majority of these methods fail to reliably detect paraphrase plagiarism due to the high complexity of the task, even for human beings. Paraphrase plagiarism identification consists in automatically recognizing docu...
The Bag-of-Visual-Words (BoVW) representation is a well known strategy to approach many computer vision problems. The idea behind BoVW is similar to the Bag-of-Words (BoW) used in text mining tasks: to build word histograms to represent documents. Regarding computer vision, most of the research has been devoted to obtain better visual words, rather...
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, mu...
Academic competitions and challenges comprise an effective mechanism for rapidly advancing the state of the art in diverse research fields and for solving practical problems arising in industry. In fact, academic competitions are increasingly becoming an essential component of academic events, like conferences. With the proliferation of challenges,...
The kNN algorithm has three main advantages that make it appealing to the community: it is easy to understand, it regularly offers competitive performance and its structure can be easily tuning to adapting to the needs of researchers to achieve better results. One of the variations is weighting the instances based on their distance. In this paper w...
The kNN algorithm has three main advantages that make it appealing to the community: it is easy to understand, it regularly offers competitive performance and its structure can be easily tuning to adapting to the needs of researchers to achieve better results. One of the variations is weighting the instances based on their distance. In this paper w...
The kNN algorithm has three main advantages that make it appealing to the community: it is easy to understand, it regularly offers competitive performance and its structure can be easily tuning to adapting to the needs of researchers to achieve better results. One of the variations is weighting the instances based on their distance. In this paper w...
Likability prediction of books has many uses. Readers, writers, as well as the publishing industry , can all benefit from automatic book likability prediction systems. In order to make reliable decisions, these systems need to assimilate information from different aspects of a book in a sensible way. We propose a novel multimodal neural architectur...
Biomedical Question Answering is concerned
with the development of methods and systems
that automatically find answers to natural language
posed questions. In this work, we describe
the system used in the BioASQ Challenge
task 6b for document retrieval and snippet
retrieval (with particular emphasis in this
subtask). The proposed model makes use of...
In this work, we propose a variant of a well-known instance-based algorithm: WKNN. Our idea is to exploit task-dependent features in order to calculate the weight of the instances according to a novel paradigm: the Textual Attraction Force, that serves to quantify the degree of relatedness between documents. The proposed method was applied to a cha...
This paper presents the framework and results from the MEX-A3T track at IberEval 2018. This track considers two tasks, author profiling and aggressiveness detection, both of them using Mexican Spanish tweets. The author profiling task aims to identify the place of residence and occupation of Twitter users. On the other hand, the aggressiveness dete...
According to the World Health Organization, recent years have seen a dramatic increase in the number of car accidents worldwide. In an attempt to ameliorate this situation, the automotive and telematics industry has tried to develop technology that can help drivers make better and safer decisions. One approach is to develop systems that give feedba...
Nowadays, misogynistic abuse online has become a serious issue due, especially, to anonymity and interactivity of the web that facilitate the increase and the permanence of the offensive comments on the web. In this paper, we present an approach based on stylistic and specific topic information for the detection of misogyny, exploring the several a...
With the uncontrolled increasing of fake news, untruthful claims, and rumors over the web, recently different approaches have been proposed to address this problem. In this paper, we present a credibility detector of factual claims in presidential debates. Our approach captures the distribution of the results from the search engines to infer the cr...
Journalists usually work for a long time to investigate presidential debates. Their main role is to extract the sentences in the debates that include information about facts or previous events. These sentences are called claims. The investigation process of these claims is important where it can reveal how credible is the speaker or the other candi...
Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance....
The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psychology, marketing, but specially in those related with social media exploitation. As known, social media...
Books have the power to make us feel happiness, sadness, pain, surprise, or sorrow. An author's dexterity in the use of these emotions captivates readers and makes it difficult for them to put the book down. In this paper, we model the flow of emotions over a book using recurrent neural networks and quantify its usefulness in predicting success in...
The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psychology, marketing, but specially in those related with social media exploitation. As known, social media...
Automatic text summarization systems are nowadays of great help to extract relevant information from large corpora. Many solutions to the task have been proposed from the perspective of the optimization of a single-objective function, aiming at finding the global optimum. This is an unrealistic goal since when multiple objectives are considered a s...
The Author Profiling (AP) task aims to predict specific profile characteristics of authors by analyzing their written documents. Nowadays, its relevance has been highlighted thanks to several applications in computer forensics, security and marketing. Most previous contributions in AP have been devoted to determine a suitable set of features to mod...
Recognizing Textual Entailment (RTE) is a Natural Language Processing task. It is very important in tasks as Semantic Search and Text Summarization. There are many approaches to RTE, for example, methods based on machine learning, linear programming, probabilistic calculus, optimization, and logic. Unfortunately, no one of them can explain why the...
The problem of classification is a widely studied one in supervised learning. Nonetheless, there are scenarios that received little attention despite its applicability. One of such scenarios is early text classification, where one needs to know the category of a document as soon as possible. The importance of this variant of the classification prob...
The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but...
Controversial topics are present in the everyday life, and opinions about them can be either truthful or deceptive. Deceptive opinions are emitted to mislead other people in order to gain some advantage. In the most of the cases humans cannot detect whether the opinion is deceptive or truthful, however, computational approaches have been used succe...