Manoel Horta Ribeiro

Manoel Horta Ribeiro
Swiss Federal Institute of Technology in Lausanne | EPFL · School of Computer and Communication Sciences

PhD Student

About

73
Publications
60,043
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,491
Citations
Introduction

Publications

Publications (73)
Conference Paper
Full-text available
An important challenge in the process of tracking and detecting the dissemination of misinformation is to understand the political gap between people that engage with the so called "fake news". A possible factor responsible for this gap is opinion polarization, which may prompt the general public to classify content that they disagree or want to di...
Conference Paper
Full-text available
Hateful speech in Online Social Networks (OSNs) is a key challenge for companies and governments, as it impacts users and advertisers, and as several countries have strict legislation against the practice. This has motivated work on detecting and characterizing the phenomenon in tweets, social media posts and comments. However, these approaches fac...
Conference Paper
Full-text available
Most current approaches to characterize and detect hate speech focus on content posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the problem, suc...
Preprint
Full-text available
Non-profits and the media claim there is a radicalization pipeline on YouTube. Its content creators would sponsor fringe ideas, and its recommender system would steer users towards edgier content. Yet, the supporting evidence for this claim is mostly anecdotal, and there are no proper measurements of the influence of YouTube's recommender system. I...
Preprint
Full-text available
In this paper, we present a large-scale characterization of the Manosphere, a conglomerate of Web-based misogynist movements roughly focused on "men's issues," which has seen significant growth over the past years. We do so by gathering and analyzing 28.8M posts from 6 forums and 51 subreddits. Overall, we paint a comprehensive picture of the evolu...
Preprint
Full-text available
AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university as...
Preprint
Full-text available
Can large language models (LLMs) create tailor-made, convincing arguments to promote false or misleading narratives online? Early work has found that LLMs can generate content perceived on par with, or even more persuasive than, human-written messages. However, there is still limited evidence regarding LLMs' persuasive capabilities in direct conver...
Article
Full-text available
Research using YouTube data often explores social and semantic dimensions of channels and videos. Typically, analyses rely on laborious manual annotation of content and content creators, often found by low-recall methods such as keyword search. Here, we explore an alternative approach, Tube2Vec, using latent representations (embeddings) obtained vi...
Article
Full-text available
Fringe communities promoting conspiracy theories and extremist ideologies have thrived on mainstream platforms, raising questions about the mechanisms driving their growth. Here, we hypothesize and study a possible mechanism: new members may be recruited through fringe-interactions: the exchange of comments between members and non-members of fringe...
Article
Full-text available
In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals—what a user would have viewed in the absence of...
Preprint
Full-text available
We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by a...
Preprint
Full-text available
In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence...
Preprint
Full-text available
Conspiracy Theory Identication task is a new shared task proposed for the first time at the Evalita 2023. The ACTI challenge, based exclusively on comments published on conspiratorial channels of telegram, is divided into two subtasks: (i) Conspiratorial Content Classification: identifying conspiratorial content and (ii) Conspiratorial Category Cla...
Preprint
Full-text available
Research using YouTube data often explores social and semantic dimensions of channels and videos. Typically, analyses rely on laborious manual annotation of content and content creators, often found by low-recall methods such as keyword search. Here, we explore an alternative approach, using latent representations (embeddings) obtained via machine...
Preprint
Full-text available
Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing,...
Article
Full-text available
Online platforms face pressure to keep their communities civil and respectful. Thus, banning problematic online communities from mainstream platforms is often met with enthusiastic public reactions. However, this policy can lead users to migrate to alternative fringe platforms with lower moderation standards and may reinforce antisocial behaviors....
Article
Full-text available
According to journalistic standards, direct quotes should be attributed to sources with objective quotatives such as ``said'' and ``told,'' since nonobjective quotatives, e.g., ``argued'' and ``insisted,'' would influence the readers' perception of the quote and the quoted person. In this paper, we analyze the adherence to this journalistic norm to...
Article
Full-text available
Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached...
Preprint
Full-text available
Large Language Models (LLMs) have democratized synthetic data generation, which in turn has the potential to simplify and broaden a wide gamut of NLP tasks. Here, we tackle a pervasive problem in synthetic data generation: its generative distribution often differs from the distribution of real-world data researchers care about (in other words, it i...
Article
Full-text available
Online platforms have banned (“deplatformed”) influencers, communities, and even entire websites to reduce content deemed harmful. Deplatformed users often migrate to alternative platforms, which raises concerns about the effectiveness of deplatforming. Here, we study the deplatforming of Parler, a fringe social media platform, between 2020 January...
Preprint
Full-text available
Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached...
Article
Full-text available
One of the key emerging roles of the YouTube platform is providing creators the ability to generate revenue from their content and interactions. Alongside tools provided directly by the platform, such as revenue-sharing from advertising, creators co-opt the platform to use a variety of off-platform monetization opportunities. In this work, we focus...
Preprint
Full-text available
According to journalistic standards, direct quotes should be attributed to sources with objective quotatives such as "said" and "told", as nonobjective quotatives, like "argued" and "insisted" would influence the readers' perception of the quote and the quoted person. In this paper, we analyze the adherence to this journalistic norm to study trends...
Preprint
Full-text available
Online social media platforms use automated moderation systems to remove or reduce the visibility of rule-breaking content. While previous work has documented the importance of manual content moderation, the effects of automated content moderation remain largely unknown, in part due to the technical and ethical challenges in assessing their impact...
Preprint
Full-text available
Online platforms face pressure to keep their communities civil and respectful. Thus, the bannings of problematic online communities from mainstream platforms like Reddit and Facebook are often met with enthusiastic public reactions. However, this policy can lead users to migrate to alternative fringe platforms with lower moderation standards and wh...
Conference Paper
Full-text available
In many online communities, community leaders (i.e., moderators and administrators) can proactively filter undesired content by requiring posts to be approved before publication. But although many communities adopt post approvals, there has been little research on its impact on community behavior. Through a longitudinal analysis of 233,402 Facebook...
Preprint
Full-text available
In many online communities, community leaders (i.e., moderators and administrators) can proactively filter undesired content by requiring posts to be approved before publication. But although many communities adopt post approvals, there has been little research on its impact on community behavior. Through a longitudinal analysis of 233,402 Facebook...
Preprint
Full-text available
One of the key emerging roles of the YouTube platform is providing creators the ability to generate revenue from their content and interactions. Alongside tools provided directly by the platform, such as revenue-sharing from advertising, creators co-opt the platform to use a variety of off-platform monetization opportunities. In this work, we focus...
Article
Full-text available
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions. When the COVID-19 pandemic broke out and mobility restrictions ensued across the globe, it was unclear whether contributions to Wikipedia would decrease in the face of the pandemic, or whether volunteers would withstand the added stress and i...
Article
Full-text available
When toxic online communities on mainstream platforms face moderation measures, such as bans, they may migrate to other platforms with laxer policies or set up their own dedicated websites. Previous work suggests that within mainstream platforms, community-level moderation is effective in mitigating the harm caused by the moderated communities. It...
Article
Full-text available
Computerized electrocardiography (ECG) has been widely used and allows linkage to electronic medical records. The present study describes the development and clinical applications of an electronic cohort derived from a digital ECG database obtained by the Telehealth Network of Minas Gerais, Brazil, for the period 2010–2017, linked to the mortality...
Preprint
Full-text available
Recent research suggests that not all fact checking efforts are equal: when and what is fact checked plays a pivotal role in effectively correcting misconceptions. In this paper, we propose a framework to study fact checking efforts using Google Trends, a signal that captures search interest over topics on the world's largest search engine. Our fra...
Article
Full-text available
The electrocardiogram (ECG) is the most commonly used exam for the evaluation of cardiovascular diseases. Here we propose that the age predicted by artificial intelligence (AI) from the raw ECG (ECG-age) can be a measure of cardiovascular health. A deep neural network is trained to predict a patient’s age from the 12-lead ECG in the CODE study coho...
Conference Paper
Full-text available
We present a large-scale characterization of the Manosphere, a conglomerate of Web-based misogynist movements focused on men's issues, which has prospered online. Analyzing 28.8M posts from 6 forums and 51 subreddits, we paint a comprehensive picture of its evolution across the Web, showing the links between its different communities over the years...
Article
Full-text available
We present a large-scale characterization of the Manosphere, a conglomerate of Web-based misogynist movements focused on men's issues, which has prospered online. Analyzing 28.8M posts from 6 forums and 51 subreddits, we paint a comprehensive picture of its evolution across the Web, showing the links between its different communities over the years...
Article
Full-text available
YouTube plays a key role in entertaining and informing people around the globe. However, studying the platform is difficult due to the lack of randomly sampled data and of systematic ways to query the platform's colossal catalog. In this paper, we present YouNiverse, a large collection of channel and video metadata from English-language YouTube. Yo...
Article
Full-text available
We study how the COVID-19 pandemic, alongside the severe mobility restrictions that ensued, has impacted information access on Wikipedia, the world's largest online encyclopedia. A longitudinal analysis that combines pageview statistics for 12 Wikipedia language editions with mobility reports published by Apple and Google reveals massive shifts in...
Preprint
Full-text available
In 2020, the activist movement @sleeping_giants_pt (SGB) made a splash in Brazil. Similar to its international counterparts, the movement carried "campaigns" against media outlets spreading misinformation. In those, SGB targeted companies whose ads were shown in these outlets, publicly asking them to remove the ads. In this work, we present a caref...
Preprint
Full-text available
Researchers have suggested that "the Manosphere," a conglomerate of men-centered online communities, may serve as a gateway to far right movements. In that context, this paper quantitatively studies the migratory patterns between a variety of groups within the Manosphere and the Alt-right, a loosely connected far right movement that has been partic...
Preprint
Full-text available
The electrocardiogram (ECG) is the most commonly used exam for the screening and evaluation of cardiovascular diseases. Here we propose that the age predicted by artificial intelligence (AI) from the raw ECG tracing (ECG-age) can be a measure of cardiovascular health and provide prognostic information. A deep convolutional neural network was traine...
Preprint
Full-text available
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions. When the COVID-19 pandemic broke out and mobility restrictions ensued across the globe, it was unclear whether Wikipedia volunteers would become less active in the face of the pandemic, or whether they would rise to meet the increased demand...
Preprint
Full-text available
YouTube plays a key role in entertaining and informing people around the globe. However, studying the platform is difficult due to the lack of randomly sampled data and of systematic ways to query the platform's colossal catalog. In this paper, we present YouNiverse, a large collection of channel and video metadata from English-language YouTube. Yo...
Conference Paper
Full-text available
Whatsapp is the most popular messaging app in the world. It is not only used as a one-to-one messaging app but also as a platform for group discussion. Recently, Whatsapp has gained the spotlight for its role in disseminating (often low-quality) information. Our study focuses on YouTube videos shared by political-oriented public groups on Whatsapp...
Preprint
Full-text available
When toxic online communities on mainstream platforms face moderation measures, such as bans, they may migrate to other platforms with laxer policies or set up their own dedicated website. Previous work suggests that, within mainstream platforms, community-level moderation is effective in mitigating the harm caused by the moderated communities. It...
Preprint
Full-text available
Timely access to accurate information is crucial during the COVID-19 pandemic. Prompted by key stakeholders' cautioning against an "infodemic", we study information sharing on Twitter from January through May 2020. We observe an overall surge in the volume of general as well as COVID-19-related tweets around peak lockdown in March/April 2020. With...
Article
Full-text available
Aims: Atrial fibrillation (AF) is a public health problem and its prevalence is increasing worldwide. Electronic cohorts, with large electrocardiogram (ECG) databases linked to mortality data, can be useful in determining prognostic value of ECG abnormalities. Our aim is to evaluate the risk of mortality in patients with AF from Brazil. Methods:...
Conference Paper
Full-text available
The popularization of Online Social Networks has changed the dynamics of content creation and consumption. In this setting, society has witnessed an amplification in phenomena such as misinformation and hate speech. This dissertation studies these issues through the lens of users. In three case studies in social networks, we: (i) provide insight on...
Preprint
Full-text available
We study how the coronavirus disease 2019 (COVID-19) pandemic, alongside the severe mobility restrictions that ensued, has impacted information access on Wikipedia, the world's largest online encyclopedia. A longitudinal analysis that combines pageview statistics for 12 Wikipedia language editions with mobility reports published by Apple and Google...
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Article
Full-text available
The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it mig...
Article
Digital electrocardiographs are now widely available and a large number of digital electrocardiograms (ECGs) have been recorded and stored. The present study describes the development and clinical applications of a large database of such digital ECGs, namely the CODE (Clinical Outcomes in Digital Electrocardiology) study. ECGs obtained by the Teleh...
Article
Background: Left bundle branch block is recognized as a marker of higher risk of death, but the prognostic value of the right bundle branch block in the general population is still controversial. Our aim is to evaluate the risk of overall and cardiovascular mortality in patients with right (RBBB) and left bundle branch block (LBBB) in a large elec...
Preprint
Full-text available
We present a Deep Neural Network (DNN) model for predicting electrocardiogram (ECG) abnormalities in short-duration 12-lead ECG recordings. The analysis of the digital ECG obtained in a clinical setting can provide a full evaluation of the cardiac electrical activity and have not been studied in an end-to-end machine learning scenario. Using the da...
Preprint
Full-text available
Information diffusion is usually modeled as a process in which immutable pieces of information propagate over a network. In reality, however, messages are not immutable, but may be morphed with every step, potentially entailing large cumulative distortions. This process may lead to misinformation even in the absence of malevolent actors, and unders...
Preprint
Full-text available
We present a model for predicting electrocardiogram (ECG) abnormalities in short-duration 12-lead ECG signals which outperformed medical doctors on the 4th year of their cardiology residency. Such exams can provide a full evaluation of heart activity and have not been studied in previous end-to-end machine learning papers. Using the database of a l...
Article
Introduction: Telehealth system is an important tool to improve access and quality to health assistance.Large electrocardiogram (ECG) databases, linked to mortality or hospitalization data, can be useful in determining the prognostic value of ECG markers. Atrial fibrillation (AF) is a public health problem with increasing prevalence as the populati...
Article
Full-text available
Professionals outside of the area of Computer Science have an increasing need to analyze large bodies of data. This analysis often demands high level of security and has to be done in the cloud. However, current data analysis tools that demand little proficiency in systems programming struggle to deliver solutions which are scalable and safe. In th...
Article
Full-text available
Current approaches to characterize and detect hate speech focus on content posted in Online Social Networks (OSNs). They face shortcomings to get the full picture of hate speech due to its subjectivity and the noisiness of OSN text. This work partially addresses these issues by shifting the focus towards users. We obtain a sample of Twitter's retwe...
Preprint
Most current approaches to characterize and detect hate speech focus on \textit{content} posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the pro...
Conference Paper
Full-text available
Socio-technical systems play an important role in public health screening programs to prevent cancer. Cervical cancer incidence has significantly decreased in countries that developed systems for organized screening engaging medical practitioners, laboratories and patients. The system automatically identifies individuals at risk of developing the d...
Preprint
Many of the state-of-the-art algorithms for gesture recognition are based on Conditional Random Fields (CRFs). Successful approaches, such as the Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose values are mapped to the values of the labels. In this paper we propose a novel methodology to set the latent values according...
Preprint
Socio-technical systems play an important role in public health screening programs to prevent cancer. Cervical cancer incidence has significantly decreased in countries that developed systems for organized screening engaging medical practitioners, laboratories and patients. The system automatically identifies individuals at risk of developing the d...
Conference Paper
Full-text available
Many of the state-of-the-art algorithms for gesture recognition are based on Conditional Random Fields (CRFs). Successful approaches, such as the Latent-Dynamic CRFs, extend the CRF by incorporating latent variables, whose values are mapped to the values of the labels. In this paper we propose a novel methodology to set the latent values according...
Conference Paper
Full-text available
In parallel to the exponential growth of the gaming industry, video game live-streaming is rising as a major form of online entertainment. Gathering a heterogeneous community, the popularity of this new media led to the creation of web services just for streaming video games, such as Twitch. TV. In this paper, we propose a model to characterize how...

Network