Ricardo Baeza-Yates

Ricardo Baeza-Yates
University Pompeu Fabra | UPF · Department of Information and Communication Technologies (DTIC)

PhD in Computer Science

About

681
Publications
192,751
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
35,701
Citations
Additional affiliations
January 2006 - February 2016
Yahoo
Position
  • VP of Research
December 2004 - present
University Pompeu Fabra
Position
  • Professor (Full)

Publications

Publications (681)
Book
Information retrieval is a sub-field of computer science that deals with the automated storage and retrieval of documents. Providing the latest information retrieval techniques, this guide discusses Information Retrieval data structures and algorithms, including implementations in C. Aimed at software engineers building systems with book processing...
Preprint
The Web contains several social media platforms for discussion, exchange of ideas, and content publishing. These platforms are used by people, but also by distributed agents known as bots. Although bots have existed for decades, with many of them being benevolent, their influence in propagating and generating deceptive information in the last years...
Article
Full-text available
Children with dyslexia have difficulties learning how to read and write. They are often diagnosed after they fail school even if dyslexia is not related to general intelligence. Early screening of dyslexia can prevent the negative side effects of late detection and enables early intervention. In this context, we present an approach for universal sc...
Article
Ranking items or people is a fundamental operation at the basis of several processes and services, not all of them happening online. Ranking is required for different tasks, including search, personalization, recommendation, and filtering. While traditionally ranking has been aimed solely at maximizing some global utility function, recently the awa...
Chapter
Full-text available
The growing ubiquity of the Internet and the information overload created a new economy at the end of the twentieth century: the economy of attention. While difficult to size, we know that it exceeds proxies such as the global online advertising market which is now over $300 billion with a reach of 60% of the world population. A discussion of the a...
Article
LocWeb and TempWeb 2021 were the eleventh events in their workshop series and took place co-located on 12 th April 2021 in conjunction with The Web Conference WWW 2021. They were intended to be held in Ljubljana, Slovenia as a potentially hybrid event, but due to the pandemic, were fully moved online. LocWeb and TempWeb were held as one colocated s...
Data
This is the user data that was collected with the game MusVis as well as analyzed and published in different publications. We designed the game content taking into consideration the analysis of mistakes of people with dyslexia in different languages and other parameters related to dyslexia like auditory perception as well as visual perception. We c...
Data
Protocol of the semi-structured literature review to select the content for 'A Universal Screening Tool for Dyslexiaby a Web-Game and Machine Learning'.
Data
Protocol of the iterations to select the content for 'A Universal Screening Tool for Dyslexiaby a Web-Game and Machine Learning'.
Data
Protocol of the generated audio files selected for 'A Universal Screening Tool for Dyslexiaby a Web-Game and Machine Learning'.
Article
Full-text available
Substance abuse and mental health issues are severe conditions that affect millions. Signs of certain conditions have been traced on social media through the analysis of posts. In this paper we analyze textual cues that characterize and differentiate Reddit posts related to depression, eating disorders, suicidal ideation, and alcoholism, along with...
Article
Full-text available
Background: Eating disorders are psychological conditions characterized by unhealthy eating habits. Anorexia nervosa (AN) is defined as the belief of being overweight despite being dangerously underweight. The psychological signs involve emotional and behavioral issues. There is evidence that signs and symptoms can manifest on social media, wherei...
Article
Full-text available
A correction to this paper has been published: https://doi.org/10.1007/s43681-021-00059-y
Article
Full-text available
The recent incidents involving Dr. Timnit Gebru, Dr. Margaret Mitchell, and Google have triggered an important discussion emblematic of issues arising from the practice of AI Ethics research. We offer this paper and its bibliography as a resource to the global community of AI Ethics Researchers who argue for the protection and freedom of this resea...
Chapter
This chapter summarizes contributions made by Ricardo Baeza-Yates, Francesco Bonchi, Kate Crawford, Laurence Devillers and Eric Salobir in the session chaired by Françoise Fogelman-Soulié on AI & Human values at the Global Forum on AI for Humanity. It provides an overview of key concepts and definitions relevant for the study of inequalities and Ar...
Article
Full-text available
When discussing interpretable machine learning results, researchers need to compare them and check for reliability, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in wrong prediction of cancer, incorrect assessment of the COVID-19 pandemic situation, or missing early screening of dyslexia...
Article
Full-text available
Dyslexia is a specific learning disorder related to school failure. Detection is both crucial and challenging, especially in languages with transparent orthographies, such as Spanish. To make detecting dyslexia easier, we designed an online gamified test and a predictive machine learning model. In a study with more than 3,600 participants, our mode...
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Preprint
Full-text available
BACKGROUND Eating disorders are psychological conditions characterized by unhealthy eating habits. Anorexia Nervosa (AN) is defined by the thought of being overweight despite being dangerously underweight. Psychological signs involve emotional and behavioral issues. There is evidence that signs and symptoms can be manifested on social media, where...
Preprint
Community search is a well-studied problem which, given a static graph and a query set of vertices, requires to find a cohesive (or dense) subgraph containing the query vertices. In this paper we study the problem of community search in temporal dynamic networks. We adapt to the temporal setting the notion of \emph{network inefficiency} which is ba...
Conference Paper
Full-text available
When discussing interpretable machine learning results, researchers need to compare results and reflect on reliable results, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in missing early screening of dyslexia or wrong prediction of cancer. We present nine criteria that help avoiding ove...
Chapter
We explore different techniques for pruning an inverted index in advance, that is, without building the full index. These techniques provide interesting trade-offs between index size, answer quality and query coverage. We experimentally analyze them in a large public web collection with two different query logs. The trade-offs that we find range fr...
Article
Neuropathologies can be classified, on the basis of post-mortem histopathology and by using machine learning, into six transdiagnostic clusters associated with clinical phenotypes.
Chapter
According to tastes, a person could show preference for a given category of content to a greater or lesser extent. However, quantifying people’s amount of interest in a certain topic is a challenging task, especially considering the massive digital information they are exposed to. For example, in the context of Twitter, aligned with his/her prefere...
Preprint
Web platforms have allowed political manifestation and debate for decades. Technology changes have brought new opportunities for expression, and the availability of longitudinal data of these debates entice new questions regarding who participates, and who updates their opinion. The aim of this work is to provide a methodology to measure these phen...
Chapter
Full-text available
Anorexia Nervosa (AN) is a serious mental disorder that has been proved to be traceable on social media through the analysis of users’ written posts. Here we present an approach to generate word embeddings enhanced for a classification task dedicated to the detection of Reddit users with AN. Our method extends Word2vec’s objective function in order...
Conference Paper
Full-text available
Children with dyslexia are often diagnosed after they fail school even if dyslexia is not related to general intelligence. In this work, we present an approach for universal screening of dyslexia using machine learning models with data gathered from a web-based language-independent game. We designed the game content taking into consideration the an...
Article
Full-text available
Measures of centrality of vertices in a network are usually defined solely on the basis of the network structure. In highly dynamic networks, where vertices appear and disappear and their connectivity constantly changes, we need to redefine our measures of centrality to properly capture the temporal dimension of the network structure evolution, as...
Article
Full-text available
Background Suicide risk assessment usually involves an interaction between doctors and patients. However, a significant number of people with mental disorders receive no treatment for their condition due to the limited access to mental health care facilities; the reduced availability of clinicians; the lack of awareness; and stigma, neglect, and di...
Preprint
Full-text available
BACKGROUND Suicide risk assessment usually involves an interaction between doctors and patients. However, a significant number of people with mental disorders receive no treatment for their condition due to the limited access to mental health care facilities; the reduced availability of clinicians; the lack of awareness; and stigma, neglect, and di...
Thesis
Full-text available
Children with dyslexia have difficulties learning how to read and write. They are often diagnosed after they fail in school, even though dyslexia is not related to general intelligence. In this thesis, we present an approach for earlier screening of dyslexia using a language-independent game in combination with machine learning models trained with...
Conference Paper
A key aspect of the Web Science conference is exploring the ethical challenges of technologies, data, algorithms, platforms, and people in the Web as well as detecting, preventing and predicting anomalies in web data including algorithmic and data biases. Handling Web Bias (HWB) is a new workshop focusing on best practices for identifying and handl...
Conference Paper
Today, more than ever, social networks and micro-blogging platforms are used as tools for political exchange. However, these platforms are biased in several aspects, from their algorithms to the population participating in them. With respect to the latter, we analyze the discussion on Twitter about an abortion bill in Chile, proposed in January 201...
Preprint
Full-text available
Dyslexia is a specific learning disorder related to school failure. Detection is both crucial and challenging, especially in languages with transparent orthographies, such as Spanish. To make detecting dyslexia easier, we designed an online gamified test and associated predictive machine learning model. In a study with more than 4,300 participants,...
Chapter
Nowadays, being excluded from the web is a huge disadvantage. People with dyslexia have, despite their general intelligence, difficulties for reading and writing through their whole life. Therefore, web technologies can help people with dyslexia to improve their reading and writing experience on the web. This chapter introduces the main technologie...
Conference Paper
Machine learning algorithms increasingly affect both our online and offline experiences. Researchers and policymakers, however, have rightfully raised concerns that these systems might inadvertently exacerbate societal biases. We provide an introduction to fair machine learning, beginning with a general overview of algorithmic fairness, and then di...
Conference Paper
We propose an effective and efficient algorithm for ranking web documents, called CombGenRank. This algorithm introduces a novel selection criterion in the classical genetic programming paradigm, which already proved to be effective for supporting web document ranking, called elitism. Extensive experimental results conducted on top of well-known we...
Article
Large-scale dynamic interaction graphs can be challenging to process and store, due to their size and the continuous change of communication patterns between nodes. In this work we address the problem of summarizing large-scale dynamic graphs, while maintaining the evolution of their structure and the communication patterns. Our approach is based o...
Conference Paper
Full-text available
Using serious games to screen dyslexia has been a suc- cessful approach for English, German and Spanish. In a pilot study with a desktop game, we addressed pre-readers screening, that is, younger children who have not acquired reading or writing skills. Based on our results, we have redesigned the game content and new interactions with visual and m...
Preprint
Full-text available
According to tastes, a person could show preference for a given category of content to a greater or lesser extent. However, quantifying people's amount of interest in a certain topic is a challenging task, especially considering the massive digital information they are exposed to. For example, in the context of Twitter, aligned with his/her prefere...
Article
Full-text available
OUR INHERENT HUMAN tendency of favoring one thing or opinion over another is reflected in every aspect of our lives, creating both latent and overt biases toward everything we see, hear, and do. Any remedy for bias must start with awareness that bias exists; for example, most mature societies raise awareness of social bias through affirmative-actio...
Conference Paper
Full-text available
Detecting dyslexia is important because early intervention is key to avoid the negative effects of dyslexia such as school failure. Most of the current approaches to detect dyslexia require expensive personnel (i.e. psychologists) or special hardware (i.e. eye trackers or MRI machines). Also, most of the methods can only be used when children are l...
Conference Paper
The Web's content has been going through major changes, triggered by multiple factors including changes in user demographic and authoring behaviour, a shift in device types that access the Web, and changes in common use cases of the Web. More specifically, the number of mobile internet users has surpassed the desktop users according to different st...
Conference Paper
Full-text available
Time is a key dimension to understand the Web. It is fair to say that it has not received yet all the attention it deserves and TempWeb is an attempt to help remedy this situation by putting time as the center of its reflection. Studying time in this context actually covers a large spectrum, from the extraction of temporal information and knowledge...
Poster
Full-text available
An initial version of this poster was presented at W4A 2017 , Perth, Western Australia: M. Rauschenberger, L. Rello, R. Baeza-Yates, E. Gomez, and J. P. Bigham, Towards the Prediction of Dyslexia by a Web-based Game with Musical Elements. In W4A’17: International Web for All Conference, 2017, pp. 4–7. http://doi.org/10.1145/3058555.3058565
Conference Paper
Full-text available
Article
Full-text available
Data mining, machine learning, and natural language processing are powerful techniques that can be used together to extract information from large texts. Depending on the task or problem at hand, there are many different approaches that can be used. The methods available are continuously being optimized, but not all these methods have been tested a...
Conference Paper
Full-text available
In this work, we define and solve the Fair Top-k Ranking problem, in which we want to determine a subset of k candidates from a large pool of n » k candidates, maximizing utility (i.e., select the "best" candidates) subject to group fairness criteria. Our ranked group fairness definition extends group fairness using the standard notion of protected...
Conference Paper
Queries are often ambiguous and can be interpreted in many ways, even by humans. Hence, semantic query understanding's primary objective is to understand the intention behind the query. This implies first predicting the language used to express the query. Second, parsing the query according to that language. Third, extracting the entities and conce...
Conference Paper
The rise of a trending topic on Twitter or Facebook leads to the temporal emergence of a set of users currently interested in that topic. Given the temporary nature of the links between these users, being able to dynamically identify communities of users related to this trending topic would allow for a rapid spread of information. Indeed, individua...
Article
A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cach...
Article
Full-text available
We present a formal problem definition and an algorithm to solve the Fair Top-k Ranking problem. The problem consists of creating a ranking of k elements out of a pool of n >> k candidates. The objective is to maximize utility, and maximization is subject to a ranked group fairness constraint. Our definition of ranked group fairness uses the standa...
Article
Full-text available
In this work we introduce the analysis of DysList, a language resource for Spanish composed of a list of unique spelling errors extracted from a collection of texts written by people with dyslexia. Each of the errors was annotated with a set of characteristics as well as with visual and phonetic features. To the best of our knowledge, this is the l...
Article
Full-text available
Background: Analyzing the disease-related web searches of Internet users provides insight into the interests of the general population as well as the healthcare industry, which can be used to shape health care policies. Methods: We analyzed the searches related to neurological diseases and drugs used in neurology using the most popular search en...
Chapter
In this dialogue, the computer scientist Ricardo Baeza-Yates explains why search technologies enable behavioral patterns which are more deterministic for some people rather than others to be predicted. And why people have in their behavior a long tail. He explains later why we feel more comfortable with determinism, which is one reason for the adop...
Conference Paper
Contextual data plays an important role in modeling search engine users' behaviors on both query auto-completion (QAC) log and normal query (click) log. User's recent search history on each log has been widely studied individually as the context to benefit the modeling of users' behaviors on that log. However, there is no existing work that explore...
Conference Paper
Full-text available
Current tools for screening dyslexia use linguistic elements, since most dyslexia manifestations are related to difficulties in reading and writing. These tools can only be used with children that have already acquired some reading skills and; sometimes, this detection comes too late to apply proper remediation. In this paper, we propose a method a...