About
90
Publications
26,696
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,744
Citations
Publications
Publications (90)
Open science practices have been widely discussed and have been implemented with varying success in different disciplines. We argue that computational-x disciplines such as computational social science, are also susceptible to the symptoms of the crises, but in terms of reproducibility. We expand the binary definition of reproducibility into a tier...
Surveys are a cornerstone of empirical social science research, providing invaluable insights into the opinions, beliefs, behaviours, and characteristics of people. However, issues such as refusal to participate, skipping questions, sampling bias, and attrition significantly impact the quality and reliability of survey data. Recently, researchers h...
Pairwise comparisons based on human judgements are an effective method for determining rankings of items or individuals. However, as human biases perpetuate from pairwise comparisons to recovered rankings, they affect algorithmic decision making. In this paper, we introduce the problem of fairness-aware ranking recovery from pairwise comparisons. W...
Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities a...
There is an increase in the proliferation of online hate commensurate with the rise in the usage of social media. In response, there is also a significant advancement in the creation of automated tools aimed at identifying harmful text content using approaches grounded in Natural Language Processing and Deep Learning. Although it is known that trai...
Human feedback is often used, either directly or indirectly, as input to algorithmic decision making. However, humans are biased: if the algorithm that takes as input the human feedback does not control for potential biases, this might result in biased algorithmic decision making, which can have a tangible impact on people’s lives. In this paper, w...
The characterization and detection of bots with their presumed ability to manipulate society on social media platforms have been subject to many research endeavors over the last decade. In the absence of ground truth data (i.e., accounts that are labeled as bots by experts or self-declare their automated nature), researchers interested in the chara...
Inequality prevails in science. Individual inequality means that most perish quickly and only a few are successful, while gender inequality implies that there are differences in achievements for women and men. Using large-scale bibliographic data and following a computational approach, we study the evolution of individual and gender inequality for...
We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous traits in large language models (LLMs). We start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological traits (metaphorically speaking) from the vast...
This review paper provides a conceptualization of AI-assisted content moderation with various degrees of autonomy and summarizes experimental evidence for how different levels of automation in content moderation and related losses of autonomy affect individuals and groups. Our results show that current research predominantly focuses on individual l...
The hipster paradox in Electronic Dance Music is the phenomenon that commercial success is collectively considered illegitimate while serious and aspiring professional musicians strive for it. We study this behavioral dilemma using digital traces of performing live and releasing music as they are stored in the \textit{Resident Advisor}, \textit{Jun...
The hipster paradox in Electronic Dance Music is the phenomenon that commercial success is collectively considered illegitimate while serious and aspiring professional musicians strive for it. We study this behavioral dilemma using digital traces of performing live and releasing music as they are stored in the Resident Advisor, Juno Download, and D...
Network-based people recommendation algorithms are widely employed on the Web to suggest new connections in social media or professional platforms. While such recommendations bring people together, the feedback loop between the algorithms and the changes in network structure may exacerbate social biases. These biases include rich-get-richer effects...
Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an indicator of model robustness. The improvement is credited with promoting core features of the construct over spurious artifacts that happen to correlate with it. Yet, over-relying on core features may lead to unintended model bias. Especially, construct-driven...
Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show to what extent their ranks produce inequality and inequity when applied to directed social networks. To this end, we propose a d...
Social networks are very important carriers of information. For instance, the political leaning of our friends can serve as a proxy to identify our own political preferences. This explanatory power is leveraged in many scenarios ranging from business decision-making to scientific research to infer missing attributes using machine learning. However,...
Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show under which circumstances their ranks produce inequality and inequity when applied to directed social networks. To this end, we...
As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artif...
People’s activities and opinions recorded as digital traces online, especially on social media and other web-based platforms, offer increasingly informative pictures of the public. They promise to allow inferences about populations beyond the users of the platforms on which the traces are recorded, representing real potential for the social science...
Online community managers work towards building and managing communities around a given brand or topic. A risk imposed on such managers is that their community may die out and its utility diminish to users. Understanding what drives attention to content and the dynamics of discussions in a given community informs the community manager and/or host w...
It has been the historic responsibility of the social sciences to investigate human societies. Fulfilling this responsibility requires social theories, measurement models and social data. Most existing theories and measurement models in the social sciences were not developed with the deep societal reach of algorithms in mind. The emergence of ‘algo...
Research has focused on automated methods to effectively detect sexism online. Although overt sexism seems easy to spot, its subtle forms and manifold expressions are not. In this paper, we outline the different dimensions of sexism by grounding them in their implementation in psychological scales. From the scales, we derive a codebook for sexism i...
Measures of algorithmic fairness often do not account for human perceptions of fairness that can substantially vary between different sociodemographics and stakeholders. The FairCeptron framework is an approach for studying perceptions of fairness in algorithmic decision making such as in ranking or classification. It supports (i) studying human pe...
Data sharing, research ethics, and incentives must improve
To effectively tackle sexism online, research has focused on automated methods for detecting sexism. In this paper, we use items from psychological scales and adversarial sample generation to 1) provide a codebook for different types of sexism in theory-driven scales and in social media text; 2) test the performance of different sexism detection me...
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for...
People’s perceptions about the size of minority groups in social networks can be biased, often showing systematic over- or underestimation. These social perception biases are often attributed to biased cognitive or motivational processes. Here we show that both over- and underestimation of the size of a minority group can emerge solely from structu...
The interactions and activities of hundreds of millions of people worldwide are recorded as digital traces every single day. When pulled together, these data offer increasingly comprehensive pictures of both individuals and groups interacting on different platforms, but they also allow inferences about broader target populations beyond those platfo...
This volume constitutes the proceedings of the 11th International Conference on Social Informatics, SocInfo 2019, held in Doha, Qatar, in November 2019. The 17 full and 5 short papers presented in these proceedings were carefully reviewed and selected from 86 submissions. The papers presented in this volume cover a broad range of topics, ranging fr...
Do only major scientific breakthroughs hit the news and social media, or does a 'catchy' title help to attract public attention? How strong is the connection between the importance of a scientific paper and the (social) media attention it receives? In this study we investigate these questions by analysing the relationship between the observed atten...
Do only major scientific breakthroughs hit the news and social media, or does a 'catchy' title help to attract public attention? How strong is the connection between the importance of a scientific paper and the (social) media attention it receives? In this study we investigate these questions by analysing the relationship between the observed atten...
Homophily can put minority groups at a disadvantage by restricting their ability to establish links with a majority group or to access novel information. Here, we show how this phenomenon can influence the ranking of minorities in examples of real-world networks with various levels of heterophily and homophily ranging from sexual contacts, dating c...
Emergent patterns of collective attention towards scientists and their research may function as a proxy for scientific impact which traditionally is assessed via committees that award prizes to scientists. Therefore it is crucial to understand the relationships between scientific impact and online demand and supply for information about scientists...
Relational inference leverages relationships between entities and links in a network to infer information about the network from a small sample. This method is often used when global information about the network is not available or difficult to obtain. However, how reliable is inference from a small labelled sample? How should the network be sampl...
Individual's perceptions about the prevalence of attributes in their social networks is commonly skewed by the limited information available to them. Filter bubbles -- being exposed to other like-minded people -- and majority illusion -- overestimation of minorities in social networks -- are two examples of how perception biases can manifest. In th...
Previous research has shown the existence of gender biases in the depiction of professions and occupations in search engine results. Such an unbalanced presentation might just as likely occur on Wikipedia, one of the most popular knowledge resources on the Web, since the encyclopedia has already been found to exhibit such tendencies in past studies...
Scientific collaborations shape novel ideas and new discoveries and help scientists to advance their scientific career through publishing high impact publications and grant proposals. Recent studies however show that gender inequality is still present in many scientific practices ranging from hiring to peer review processes and grant applications....
Online freelancing marketplaces have grown quickly in recent years. In theory, these sites offer workers the ability to earn money without the obligations and potential social biases associated with traditional employment frameworks. In this paper, we study whether two prominent online freelance marketplaces - TaskRabbit and Fiverr - are impacted b...
Sampling from large networks represents a fundamental challenge for social network research. In this paper, we explore the sensitivity of different sampling techniques (node sampling, edge sampling, random walk sampling, and snowball sampling) on social networks with attributes. We consider the special case of networks (i) where we have one attribu...
Homophily can put minority groups at a disadvantage by restricting their ability to establish links with people from a majority group. This can limit the overall visibility of minorities in the network. Building on a Barab\'{a}si-Albert model variation with groups and homophily, we show how the visibility of minority groups in social networks is a...
Wikipedia articles about the same topic in different language editions are built around different sources of information. For example, one can find very different news articles linked as references in the English Wikipedia article titled "Annexation of Crimea by the Russian Federation" than in its German counterpart (determined via Wikipedia's lang...
This tutorial aims at outlining fundamental methods for studying typical social science research questions with organic data (i.e., data that has not been designed for a specific research purpose but can be found on the Web). Further, social theories, statistical methods and models that help to understand the processes that generated the data will...
Computational social scientists often harness the Web as a "societal observatory" where data about human social behavior is collected. This data enables novel investigations of psychological, anthropological and sociological research questions. However, in the absence of demographic information, such as gender, many relevant research questions cann...
Contributing to the writing of history has never been as easy as it is today
thanks to Wikipedia, a community-created encyclopedia that aims to document the
world's knowledge from a neutral point of view. Though everyone can participate
it is well known that the editor community has a narrow diversity, with a
majority of white male editors. While t...
For many people, Wikipedia represents one of the primary sources of knowledge about foreign cultures. Yet, different Wikipedia language editions offer different descriptions of cultural practices. Unveiling diverging representations of cultures provides an important insight, since they may foster the formation of cross-cultural stereotypes, misunde...
Culinary preferences contribute significantly to the sense of ourself [2]. While gender, race, sexuality and ethnicity describe our "major identity", preferences in music, style and food define our "minor identity". However, we find that only certain parts of them can be explained by gender-specific differences in the food consumption behavior, whi...
Wikipedia is a community-created encyclopedia that contains information about
notable people from different countries, epochs and disciplines and aims to
document the world's knowledge from a neutral point of view. However, the
narrow diversity of the Wikipedia editor community has the potential to
introduce systemic biases such as gender biases in...
Food is a central element of humans’ life, and food preferences are amongst others manifestations of social, cultural and economic forces that influence the way we view, prepare and consume food. Historically, data for studies of food preferences stems from consumer panels which continuously capture food consumption and preference patterns from ind...
For many people, Wikipedia represents one of the primary sources of knowledge
about foreign cultures. Yet, different Wikipedia language editions offer
different descriptions of cultural phenomena. Unveiling diverging
representations of cultures is an important problem since they may foster the
formation of cross-cultural stereotypes, misunderstandi...
Assessing political conversations in social media requires a deeper
understanding of the underlying practices and styles that drive these
conversations. In this paper, we present a computational approach for assessing
online conversational practices of political parties. Following a deductive
approach, we devise a number of quantitative measures fr...
Since food is one of the central elements of all human beings, a high interest exists in exploring temporal and spatial food and dietary patterns of humans. Predominantly, data for such investigations stem from consumer panels which continuously capture food consumption patterns from individuals and households. In this work we leverage data from a...
One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, images, users, or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources in social...
Online social networks (OSN) like Twitter or Facebook are popular and
powerful since they allow reaching millions of users online. They are also a
popular target for socialbot attacks. Without a deep understanding of the
impact of such attacks, the potential of online social networks as an
instrument for facilitating discourse or democratic process...
In the past, online social networks (OSN) like Facebook and Twitter became
powerful instruments for communication and networking. Unfortunately, they have
also become a welcome target for socialbot attacks. Therefore, a deep
understanding of the nature of such attacks is important to protect the
Eco-System of OSNs. In this extended abstract we prop...
One potential disadvantage of social tagging systems is that due to the lack
of a centralized vocabulary, a crowd of users may never manage to reach a
consensus on the description of resources (e.g., books, users or songs) on the
Web. Yet, previous research has provided interesting evidence that the tag
distributions of resources may become semanti...
Finding the "right people" is a central aspect of social media systems. Twitter has millions of users who have varied interests, professions and personalities. For those in fields such as advertising and marketing, it is important to identify certain characteristics of users to target. However, Twitter users do not generally provide sufficient info...
Interpreting the meaning of a document represents a fundamental challenge for current semantic analysis methods. One interesting aspect mostly neglected by existing methods is that authors of a document usually assume certain background knowledge of their intended audience. Based on this knowledge, authors usually decide what to communicate and how...
Content injection methods rely on understanding community dynamics (i.e. attention factors) in order to publish content that community users will engage with (e.g. product-related posts), however such methods require re-training should the community's discussed topics change. In this paper we present an examination of the semantic evolution of comm...
For community managers and hosts it is not only important to identify the current key topics of a community but also to assess the specificity level of the community for: a) creating sub-communities, and: b) anticipating community behaviour and topical evolution. In this paper we present an approach that empirically characterises the topical specif...
This paper sets out to explore whether data about the usage of hashtags on Twitter contains information about their semantics. Towards that end, we perform initial statistical hypothesis tests to quantify the association between usage patterns and semantics of hashtags. To assess the utility of pragmatic features { which describe how a hashtag is u...
Anticipating repliers in online conversations is a fundamental challenge for computer mediated communication systems which aim to make textual, audio and/or video communication as natural as face to face communication. The massive amounts of data that social media generates has facilitated the study of online conversations on a scale unimaginable a...
One of the key challenges for users of social media is judging the topical expertise of other users in order to select trustful information sources about specific topics and to judge credibility of content produced by others. In this paper, we explore the usefulness of different types of user-related data for making sense about the topical expertis...
Online community managers work towards building and managing communities around a given brand or topic. Arisk imposed on such managers is that their community may die out and its utility diminish to users. Understanding what drives attention to content and the dynamics of discussions in a given community informs the community manager and/or host wi...
Social bots are automatic or semi-automatic computer pro-grams that mimic humans and/or human behavior in online social networks. Social bots can attack users (targets) in on-line social networks to pursue a variety of latent goals, such as to spread information or to influence targets. Without a deep understanding of the nature of such attacks or...
Online community managers work towards building and managing communities around a given brand or topic. A risk imposed on such managers is that their community may die out and its utility diminish to users. Understanding what drives attention to content and the dynamics of discussions in a given community informs the community manager and/or host w...
Judging topical expertise of micro-blogger is one of the key challenges for information seekers when deciding which information sources to follow. However, it is unclear how useful different types of information are for people to make expertise judgments and to what extent their background knowledge influences their judgments. This study explored d...
Social media has become an integral part of today's web and allows communities to share content and socialize. Understanding the factors that influence how communities evolve over time - for example how their social network and their content co-evolve - is an issue of both theoretical and practical relevance. This paper sets out to study the tempor...
Social media has become an integral part of today's web and allows users to share content and socialize. Understanding the factors that influence how users evolve over time - for example how their social network and their contents co-evolve - is an issue of both theoretical and practical relevance. This paper sets out to study the temporal co-evolu...
This paper presents an adaptable system for detecting trends based on the micro-blogging service Twitter, and sets out to explore to what extent such a tool can support researchers. Twitter has high uptake in the scientific community, but there is a need for a means of extracting the most important topics from a Twitter stream. There are too many t...