Figure - uploaded by Jakab Buda
Content may be subject to copyright.
Style based features

Style based features

Source publication
Article
Full-text available
In this notebook, we summarize our work process of preparing a software for the PAN 2019 Bots and Gender Profiling task. We propose a Machine Learning approach to determine whether an unknown Twitter user is a bot or a human, and if the latter, their gender. We use logistic regressions to identify whether the author is a bot or a human and we use n...

Similar publications

Article
Full-text available
Este trabajo explora la consolidación del relato transmedia para una nueva programación de ficción televisiva, a través de la superación de la Quinta Pared. Las redes sociales permiten una nueva interacción con la audiencia. El corpus de la investigación abarca los 65 capítulos de las primeras 5 temporadas de la serie norteamericana House of Cards,...
Article
Full-text available
Identifying fake news on media has been an important issue. This is especially true considering the wide spread of rumors on popular social networks such as Twitter. Various kinds of techniques have been proposed for automatic rumor detection. In this work, we study the application of graph neural networks for rumor classification at a lower level,...
Chapter
Full-text available
Internet y las redes sociales han cambiado los esquemas comunicativos tradicionales desde su irrupción hace más de 25 años. Con nombres como Facebook, Twitter, Instagram o YouTube la comunicación se ha beneficiado de una instantaneidad, interactividad y globalidad nunca vista anteriormente. Paralelamente a esa revolución comunicativa los territorio...
Conference Paper
Full-text available
Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is hindered by the lack of the Arabic misogyny benchmark datasets. In this paper, we introduce an Arab...
Chapter
Full-text available
Esta investigación analiza la actividad de Pablo Iglesias en Twitter, candidato de Podemos a la presidencia del Gobierno, durante las dos campañas correspondientes a las elecciones generales del 20 de diciembre de 2015 y del 26 de junio de 2016. El objetivo es estudiar cómo evolucionó la estrategia del político en la red social para mejorar los res...

Citations

... The study claims that bot detection using this hybrid approach gets similar results in English and Spanish tweets. A similar approach was proposed by Bolonyai et al. (2019), who combined stylometric and user-based features to solve the task in PAN 2019. The paper shows that the combination of features allows better results when detecting bots than when using only lexical features. ...
... The study reports good results detecting bots both in Spanish and English. Several stylometric and user-based features were used to create an aggregated representation of each tweets' track addressing the bot detection task in the same competition (Bolonyai et al. 2019). The proposal shows that a classifier based on LR achieves better results in detecting bots in English than in Spanish. ...
Article
Full-text available
Misleading information spread on social networks is often supported by activists who promote this type of information and bots that amplify their visibility. The need for useful and timely mechanisms of credibility assessment in social media has become increasingly indispensable. Efforts to tackle this problem in Spanish are growing. The last years have witnessed many efforts to develop methods to detect fake news, rumors, stances, and bots on the Spanish social web. This work leads to a systematic review of the literature that relates the efforts to develop this area in the Spanish language. The work identifies pending tasks for this community and challenges that require coordination among the leading investigators on the subject.
Article
Full-text available
Author profiling (AP) is a highly relevant natural language processing (NLP) problem; it deals with predicting features of authors such as gender, age and personality traits. It is done by analyzing texts written by the authors themselves; take for instance documents such as books, articles, and more recently posts in social media platforms. In the present study, we focus in the latter, which is an scenario with a number of applications in marketing, security, health and others. Surprisingly, given the achievements of deep learning (DL) strategies on other NLP tasks, for AP DL architectures regularly underperform, left behind by classical machine learning (ML) approaches. In this study we show how a deep learning architecture based on transformers offers competitive results by exploiting a joint-intermediate fusion strategy called the Wide & Deep Transformer (WD-T). Our methodology implements a fusion of contextualized word vector representations and handcrafted features, by using a self-attention mechanism and a novel encoding technique that incorporates stylistic, topic, and personal information from authors. This allows for the creation of more accurate, fine-grained predictions. Our approach attained competitive performance against top-quartile results from the 2017–2019 editions at the Plagiarism analysis, Authorship identification, and Near-duplicate detection forum (PAN) in English and Spanish languages for gender and language variety predictions, and the Kaggle Myers–Briggs-type indicator (MBTI) dataset for personality forecasting. Our proposal consistently surpasses all other deep learning methods in PAN collections by as much as 2.4%, and up to 3.4% in the MBTI dataset. These results suggest that this DL strategy effectively addresses and improves upon the limitations of previous techniques and paves the way for new avenues of inquiry.