Markus Strohmaier's research while affiliated with Universität Mannheim and other places

Publications (275)

Preprint
Improving the position of minorities in networks via interventions is a challenge of high theoretical and societal importance. In this work, we examine how different network growth interventions impact the position of minority nodes in degree rankings over time. We distinguish between two kinds of interventions: (i) group size interventions, such a...
Article
Since its emergence roughly a decade ago, micro-task crowdsourcing has been attracting a heterogeneous set of workers from all over the globe. This paper sets out to explore the characteristics of the international crowd workforce to date and offers a cross-national comparison of crowdworker populations from ten hand-selected countries. We provide...
Preprint
Full-text available
In this chapter, we provide an overview of recent advances in data-driven and theory-informed complex models of social networks and their potential in understanding societal inequalities and marginalization. We focus on inequalities arising from networks and network-based algorithms and how they affect minorities. In particular, we examine how homo...
Preprint
Full-text available
In this paper, we present a unique collection of four data sets to study social behaviour. The data were collected at four international scientific conferences, during which we measured face-to-face contacts along with additional information about individuals. Building on innovative methods developed in the last decade to study human social behavio...
Article
Full-text available
Uncovering how inequality emerges from human interaction is imperative for just societies. Here we show that the way social groups interact in face-to-face situations can enable the emergence of disparities in the visibility of social groups. These disparities translate into members of specific social groups having fewer social ties than the averag...
Preprint
Social media is subject to constant growth and evolution, yet little is known about their early phases of adoption. To shed light on this aspect, this paper empirically characterizes the initial and country-wide adoption of a new type of social media in Saudi Arabia that happened in 2017. Unlike established social media, the studied network Jodel i...
Article
Full-text available
Monitoring hygiene and motivation factors from Herzberg’s Two-Factor Theory is a popular way of understanding the influential aspects for employee satisfaction and motivation. The increased availability of employee feedback comprised in online employer reviews yields a promising data source to learn more about these influential factors and the theo...
Article
Full-text available
Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show to what extent their ranks produce inequality and inequity when applied to directed social networks. To this end, we propose a d...
Article
Full-text available
This work quantifies the effects of signaling gender through gender specific user names, on the success of reviews written on the popular amazon.com shopping platform. Highly rated reviews play an important role in e-commerce since they are prominently displayed next to products. Differences in reviews, perceived—consciously or unconsciously—with r...
Conference Paper
As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an 'infodemic' -- a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society. To combat this infodemic, there is an urgent need for benchmark datasets that can help researchers develop and evaluate models geare...
Article
Full-text available
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions. When the COVID-19 pandemic broke out and mobility restrictions ensued across the globe, it was unclear whether contributions to Wikipedia would decrease in the face of the pandemic, or whether volunteers would withstand the added stress and i...
Article
Full-text available
Network analysis provides powerful tools to learn about a variety of social systems. However, most analyses implicitly assume that the considered relational data is error-free, and reliable and accurately reflects the system to be analysed. Especially if the network consists of multiple groups (e.g., genders, races), this assumption conflicts with...
Preprint
Full-text available
Though algorithms promise many benefits including efficiency, objectivity and accuracy, they may also introduce or amplify biases. Here we study two well-known algorithms, namely PageRank and Who-to-Follow (WTF), and show under which circumstances their ranks produce inequality and inequity when applied to directed social networks. To this end, we...
Article
Full-text available
Many important decisions in societies such as school admissions, hiring or elections are based on the selection of top-ranking individuals from a larger pool of candidates. This process is often subject to biases, which typically manifest as an under-representation of certain groups among the selected or accepted individuals. The most common approa...
Preprint
Full-text available
Recent work has shown that graph neural networks (GNNs) are vulnerable to adversarial attacks on graph data. Common attack approaches are typically informed, i.e. they have access to information about node attributes such as labels and feature vectors. In this work, we study adversarial attacks that are uninformed, where an attacker only has access...
Preprint
This paper introduces Redescription Model Mining, a novel approach to identify interpretable patterns across two datasets that share only a subset of attributes and have no common instances. In particular, Redescription Model Mining aims to find pairs of describable data subsets -- one for each dataset -- that induce similar exceptional models with...
Preprint
Uncovering how inequality emerges from human interaction is imperative for just societies. Here we show that the way social groups interact in face-to-face situations can enable the emergence of degree inequality. We present a mechanism that integrates group mixing dynamics with individual preferences, which reproduces group degree inequality found...
Article
It has been the historic responsibility of the social sciences to investigate human societies. Fulfilling this responsibility requires social theories, measurement models and social data. Most existing theories and measurement models in the social sciences were not developed with the deep societal reach of algorithms in mind. The emergence of ‘algo...
Conference Paper
Full-text available
We examine how the behavior of software developers changes in response to removing gamification elements from GitHub, an online platform for collaborative programming and software development. We find that the unannounced removal of daily activity streak counters from the user interface (from user profile pages) was followed by significant changes...
Preprint
Full-text available
Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorith...
Preprint
Full-text available
Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions. When the COVID-19 pandemic broke out and mobility restrictions ensued across the globe, it was unclear whether Wikipedia volunteers would become less active in the face of the pandemic, or whether they would rise to meet the increased demand...
Preprint
Measures of algorithmic fairness often do not account for human perceptions of fairness that can substantially vary between different sociodemographics and stakeholders. The FairCeptron framework is an approach for studying perceptions of fairness in algorithmic decision making such as in ranking or classification. It supports (i) studying human pe...
Chapter
Automatically detecting semantic shifts (i.e., meaning changes) of single words has recently received strong research attention, e.g., to quantify the impact of real-world events on online communities. These computational approaches have introduced various measures, which are intended to capture the somewhat elusive and undifferentiated concept of...
Preprint
Network analysis provides powerful tools to learn about a variety of social systems. However, most analyses implicitly assume that the considered data is error-free and reliable. Especially if the network consists of multiple groups, this assumption conflicts with the range of systematic reporting biases, measurement errors and other inaccuracies t...
Chapter
Bias in Word Embeddings has been a subject of recent interest, along with efforts for its reduction. Current approaches show promising progress towards debiasing single bias dimensions such as gender or race. In this paper, we present a joint multiclass debiasing approach that is capable of debiasing multiple bias dimensions simultaneously. In that...
Preprint
Full-text available
Wikipedia represents the largest and most popular source of encyclopedic knowledge in the world today, aiming to provide equal access to information worldwide. From a global online survey of 65,031 readers of Wikipedia and their corresponding reading logs, we present novel evidence of gender differences in Wikipedia readership and how they manifest...
Preprint
Many important decisions in societies such as school admissions, hiring, or elections are based on the selection of top-ranking individuals from a larger pool of candidates. This process is often subject to biases, which typically manifest as an under-representation of certain groups among the selected or accepted individuals. The most common appro...
Preprint
Full-text available
*To appear ICSE '21* We examine how the behavior of software developers changes in response to removing gamification elements from GitHub, an online platform for collaborative programming and software development. We find that the unannounced removal of daily activity streak counters from the user interface (from user profile pages) was followed by...
Preprint
We systematically evaluate the (in-)stability of state-of-the-art node embedding algorithms due to randomness, i.e., the random variation of their outcomes given identical algorithms and graphs. We apply five node embeddings algorithms---HOPE, LINE, node2vec, SDNE, and GraphSAGE---to synthetic and empirical graphs and assess their stability under r...
Preprint
Full-text available
We train word-emoji embeddings on large scale messaging data obtained from the Jodel online social network. Our data set contains more than 40 million sentences, of which 11 million sentences are annotated with a subset of the Unicode 13.0 standard Emoji list. We explore semantic emoji associations contained in this embedding by analyzing associati...
Preprint
Full-text available
We study how the coronavirus disease 2019 (COVID-19) pandemic, alongside the severe mobility restrictions that ensued, has impacted information access on Wikipedia, the world's largest online encyclopedia. A longitudinal analysis that combines pageview statistics for 12 Wikipedia language editions with mobility reports published by Apple and Google...
Preprint
Bias in Word Embeddings has been a subject of recent interest, along with efforts for its reduction. Current approaches show promising progress towards debiasing single bias dimensions such as gender or race. In this paper, we present a joint multiclass debiasing approach that is capable of debiasing multiple bias dimensions simultaneously. In that...
Preprint
Full-text available
We introduce POLAR - a framework that adds interpretability to pre-trained word embeddings via the adoption of semantic differentials. Semantic differentials are a psychometric construct for measuring the semantics of a word by analysing its position on a scale between two polar opposites (e.g., cold -- hot, soft -- hard). The core idea of our appr...
Preprint
Full-text available
This work quantifies the effects of signaling and performing gender on the success of reviews written on the popular amazon.com shopping platform. Highly rated reviews play an important role in e-commerce since they are prominently displayed below products. Differences in how gender-signaling and gender-performing review authors are received can le...
Preprint
Data ownership and data protection are increasingly important topics with ethical and legal implications, e.g., with the right to erasure established in the European General Data Protection Regulation (GDPR). In this light, we investigate network embeddings, i.e., the representation of network nodes as low-dimensional vectors. We consider a typical...
Article
As video game press ("experts") and casual gamers ("amateurs") have different motivations when writing video game reviews, discrepancies in their reviews may arise. To study such potential discrepancies, we conduct a large-scale investigation of more than 1 million reviews on the Metacritic review platform. In particular, we assess the existence an...
Article
Full-text available
People’s perceptions about the size of minority groups in social networks can be biased, often showing systematic over- or underestimation. These social perception biases are often attributed to biased cognitive or motivational processes. Here we show that both over- and underestimation of the size of a minority group can emerge solely from structu...
Article
Full-text available
Crowd employment is a new form of short-term and flexible employment that has emerged during the past decade. To understand this new form of employment, it is crucial to illuminate the underlying motivations of the workforce involved in it. This article introduces the Multidimensional Crowdworker Motivation Scale (MCMS), a scale for measuring the m...
Conference Paper
With the increase of biased information available online, the importance of analysis and detection of such content has also significantly risen. In this paper, we aim to quantify different kinds of social biases using word embeddings. Towards this goal we train such embeddings on two politically biased MediaWiki instances, namely RationalWiki and C...
Chapter
Numerous collaboration websites struggle to achieve self-sustainability—a level of user activity preventing a transition to a non-active state. We know only a little about the factors which separate sustainable and successful collaboration websites from those that are inactive or have a declining activity. We argue that modeling and understanding v...
Conference Paper
In this paper, we quantify the impact of self- and cross-excitation on the temporal development of user activity in Stack Exchange Question & Answer (Q&A) communities. We study differences in user excitation between growing and declining Stack Exchange communities, and between those dedicated to STEM and humanities topics by leveraging Hawkes proce...
Preprint
Full-text available
This paper introduces HopRank, an algorithm for modeling human navigation on semantic networks. HopRank leverages the assumption that users know or can see the whole structure of the network. Therefore, besides following links, they also follow nodes at certain distances (i.e., k-hop neighborhoods), and not at random as suggested by PageRank, which...
Article
Millions of users on the Internet discuss a variety of topics on Question-and-Answer (Q8A) instances. However, not all instances and topics receive the same amount of attention, as some thrive and achieve self-sustaining levels of activity, while others fail to attract users and either never grow beyond being a small niche community or become inact...
Preprint
Full-text available
We present the results of two studies on how individuals interact with each other during a international, interdisciplinary scientific conference. We first show that contact activity is highly variable across the two conferences and between different socio-demographic groups. However, we found one consistent phenomenon: Professors connect and inter...
Preprint
Micro-task crowdsourcing is an international phenomenon that has emerged during the past decade. This paper sets out to explore the characteristics of the international crowd workforce and provides a cross-national comparison of the crowd workforce in ten countries. We provide an analysis and comparison of demographic characteristics and shed light...
Article
Full-text available
Homophily can put minority groups at a disadvantage by restricting their ability to establish links with a majority group or to access novel information. Here, we show how this phenomenon can influence the ranking of minorities in examples of real-world networks with various levels of heterophily and homophily ranging from sexual contacts, dating c...
Conference Paper
Full-text available
The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a "Software as a Service" architecture (SaaS). The research environment addresses requirements for the quantitative evaluation of large amounts of qualitative data with text mining methods as well as requirements fo...
Conference Paper
Full-text available
The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a "Software as a Service" architecture (SaaS). The research environment addresses requirements for the quantitative evaluation of large amounts of qualitative data with text mining methods as well as requirements fo...
Preprint
Full-text available
The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a "Software as a Service" architecture (SaaS). The research environment addresses requirements for the quantitative evaluation of large amounts of qualitative data with text mining methods as well as requirements fo...
Preprint
As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by f...
Chapter
Allowing users to organize content by tagging resources in webbased systems has led to the emergence of the so-called SocialWeb. Tags turned out to be helpful not only for giving recommendations and improving search in social tagging systems but also for enhancing information access by navigating. In this chapter, we will cover much of the pioneer...
Conference Paper
In this paper we present a large-scale quantitative comparison between expert- and crowdsourced writing of history by analysing articles from the English Wikipedia and Britannica. In order to quantify attention to particular periods, we extract mentioned year numbers and utilise them to study historical timelines of nations stretched over the last...
Preprint
Previous research has acknowledged the use of social media in political communication by right-wing populist parties and politicians. Less is known, however, about its pivotal role for right-wing social movements which rely on personalized messages to mobilize supporters and challenge the mainstream party system. This paper analyzes online politica...