About
467
Publications
111,309
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,281
Citations
Introduction
I am a Project Leader at the Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Viterbi School of Engineering's Computer Science Department. My research focuses on applying network- and machine learning-based methods to problems in social computing.
Additional affiliations
June 1998 - present
Publications
Publications (467)
The recent proliferation of short form video social media sites such as TikTok has been effectively utilized for increased visibility, communication, and community connection amongst trans/nonbinary creators online. However, these same platforms have also been exploited by right-wing actors targeting trans/nonbinary people, enabling such anti-trans...
Social media platforms have become critical spaces for discussing mental health concerns, including eating disorders. While these platforms can provide valuable support networks, they may also amplify harmful content that glorifies disordered cognition and self-destructive behaviors. While social media platforms have implemented various content mod...
Affective polarization, the emotional divide between ideological groups marked by in-group love and out-group hate, has intensified in the United States, driving contentious issues like masking and lockdowns during the COVID-19 pandemic. Despite its societal impact, existing models of opinion change fail to account for emotional dynamics nor offer...
Despite the rapid growth of cities in the past century, our quantitative, in-depth understanding of how cities grow remains limited due to a consistent lack of historical data. Thus, the scaling laws between a city's features and its population as they evolve over time, known as temporal city scaling, is under-explored, especially for time periods...
The rich and dynamic information environment of social media provides researchers, policymakers, and entrepreneurs with opportunities to learn about social phenomena in a timely manner. However, using these data to understand social behavior is difficult due to heterogeneity of topics and events discussed in the highly dynamic online information en...
Measuring the relative impact of CTs is important for prioritizing responses and allocating resources effectively, especially during crises. However, assessing the actual impact of CTs on the public poses unique challenges. It requires not only the collection of CT-specific knowledge but also diverse information from social, psychological, and cult...
Quantifying the effect of textual interventions in social systems, such as reducing anger in social media posts to see its impact on engagement, poses significant challenges. Direct interventions on real-world systems are often infeasible, necessitating reliance on observational data. Traditional causal inference methods, typically designed for bin...
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs). The knowledge acquired during pre-training is crucial for this few-shot capability, providing the model with task priors. However, recent studies have shown that ICL predominantly relies on retrieving task priors rather t...
Online social platforms use recommender algorithms to collate and sort the universe of messages users see, which is distorted in how popular content will be perceived before any personalization. We call this “exposure bias,” and we focus on evaluating different recommendation and personalization approaches using diverse exposure bias metrics to und...
Online social networks use recommender systems to suggest relevant information to their users in the form of personalized timelines. Studying how these systems expose people to information at scale is difficult to do as one cannot assume each user is subject to the same timeline condition and building appropriate evaluation infrastructure is costly...
Recommender systems underpin many of the personalized services in the online information & social media ecosystem. However, the assumptions in the research on content recommendations in domains like search, video, and music are often applied wholesale to domains that require a better understanding of why and how users interact with the systems. In...
Othering, the act of portraying outgroups as fundamentally different from the ingroup, often escalates into framing them as existential threats--fueling intergroup conflict and justifying exclusion and violence. These dynamics are alarmingly pervasive, spanning from the extreme historical examples of genocides against minorities in Germany and Rwan...
The COVID-19 pandemic profoundly impacted people globally, yet its effect on scientists and research institutions has yet to be fully examined. To address this knowledge gap, we use a newly available bibliographic dataset covering tens of millions of papers and authors to investigate changes in research activity and collaboration during this period...
Following the Russian Federation's full-scale invasion of Ukraine in February 2022, a multitude of information narratives emerged within both pro-Russian and pro-Ukrainian communities online. As the conflict progresses, so too do the information narratives, constantly adapting and influencing local and global community perceptions and attitudes. Th...
In-Context Learning (ICL) in Large Language Models (LLM) has emerged as the dominant technique for performing natural language tasks, as it does not require updating the model parameters with gradient-based methods. ICL promises to "adapt" the LLM to perform the present task at a competitive or state-of-the-art level at a fraction of the computatio...
Eating disorders are complex mental health conditions that affect millions of people around the world. Effective interventions on social media platforms are crucial, yet testing strategies in situ can be risky. We present a novel LLM-driven experimental testbed for simulating and assessing intervention strategies in ED-related discussions. Our fram...
A sequence of technological inventions over several centuries has dramatically lowered the cost of producing and distributing information. Because culture and societies ride on a substrate of information, these changes have profoundly impacted how we live, work, and interact with each other. This paper explores the nature of information architectur...
Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with o...
Background
Effective communication is crucial during health crises, and social media has become a prominent platform for public health experts (PHEs) to share information and engage with the public. At the same time, social media also provides a platform for pseudoexperts who may spread contrarian views. Despite the importance of social media, key...
The gendered expectations about ideal body types can lead to body image concerns, dissatisfaction, and in extreme cases, disordered eating and other psychopathologies across the gender spectrum. While research has focused on pro-anorexia online communities that glorify the 'thin ideal', less attention has been given to the broader spectrum of body...
Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and a...
Members of different political groups not only disagree about issues but also dislike and distrust each other. While social media can amplify this emotional divide—called affective polarization by political scientists—there is a lack of agreement on its strength and prevalence. We measure affective polarization on social media by quantifying the em...
BACKGROUND
Effective communication is crucial during health crises, and social media has become a prominent platform for public health experts to inform and to engage with the public. At the same time, social media also platforms pseudo-experts who may promote contrarian views. Despite the significance of social media, key elements of communication...
Counterspeech -- speech that opposes hate speech -- has gained significant attention recently as a strategy to reduce hate on social media. While previous studies suggest that counterspeech can somewhat reduce hate speech, little is known about its effects on participation in online hate communities, nor which counterspeech tactics reduce harmful b...
Socio-linguistic indicators of text, such as emotion or sentiment, are often extracted using neural networks in order to better understand features of social media. One indicator that is often overlooked, however, is the presence of hazards within text. Recent psychological research suggests that statements about hazards are more believable than st...
Narratives are foundation of human cognition and decision making. Because narratives play a crucial role in societal discourses and spread of misinformation and because of the pervasive use of social media, the narrative dynamics on social media can have profound societal impact. Yet, systematic and computational understanding of online narratives...
The conflict between Israel and Palestinians significantly escalated after the October 7, 2023 Hamas attack, capturing global attention. To understand the public discourse on this conflict, we present a meticulously compiled dataset-IsamasRed-comprising nearly 400,000 conversations and over 8 million comments from Reddit, spanning from August 2023...
Detecting norm violations in online communities is critical to maintaining healthy and safe spaces for online discussions. Existing machine learning approaches often struggle to adapt to the diverse rules and interpretations across different communities due to the inherent challenges of fine-tuning models for such context-specific tasks. In this pa...
Online manipulation is a pressing concern for democracies, but the actions and strategies of coordinated inauthentic accounts, which have been used to interfere in elections, are not well understood. We analyze a five million-tweet multilingual dataset related to the 2017 French presidential election, when a major information campaign led by Russia...
Many online hate groups exist to disparage others based on race, gender identity, sex, or other characteristics. The accessibility of these communities allows users to join multiple types of hate groups (e.g., a racist community and misogynistic community), which calls into question whether these peripatetic users could be further radicalized compa...
Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this stud...
The scaling relations between city attributes and population are emergent and ubiquitous aspects of urban growth. Quantifying these relations and understanding their theoretical foundation, however, is difficult due to the challenge of defining city boundaries and a lack of historical data to study city dynamics over time and space. To address this...
Different ideological groups within a heterogeneous online social networks not only disagree but also dislike and distrust each other. This phenomenon, called affective polarization, widens political divisions and also impacts how information spreads online. We directly measure affective polarization on social media by quantifying emotions and toxi...
Harmful and toxic speech contribute to an unwelcoming online environment that suppresses participation and conversation. Efforts have focused on detecting and mitigating harmful speech; however, the mechanisms by which toxicity degrades online discussions are not well understood. This paper makes two contributions. First, to comprehensively model h...
Narrative is a foundation of human cognition and decision making. Because narratives play a crucial role in societal discourses and spread of misinformation and because of the pervasive use of social media, the narrative dynamics on social media can have profound societal impact. Yet, systematic and computational understanding of online narratives...
The rich and dynamic information environment on social media provides researchers, policy makers, and entrepreneurs with opportunities to learn about social phenomena in a timely manner. However, using this data to understand human affect and behavior poses multiple challenges, such as heterogeneity of topics and events discussed in the highly dyna...
On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered...
Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral annotations have been released. However, these datasets vary in the method of data collection, domain, topics, instr...
Effective response to the COVID-19 pandemic required coordinated adoption of mitigation measures, like masking and quarantines, to curb virus's spread. However, political divisions that emerged early in the pandemic hindered consensus on the appropriate response. To better understand these divisions, our study examines a vast collection of COVID-19...
Online manipulation is a pressing concern for democracies, but the actions and strategies of coordinated inauthentic accounts, which have been used to interfere in elections, are not well understood. We analyze a five million-tweet multilingual dataset related to the 2017 French presidential election, when a major information campaign led by Russia...
The rise in eating disorders, a condition with serious health complications, has been linked to the proliferation of idealized body images on social media platforms. However, the relationship between social media and eating disorders is more complex, with online platforms potentially enabling harmful behaviors by linking people to ``pro-ana'' commu...
Real-world networks are rarely static. Recently, there has been increasing interest in both network growth and network densification, in which the number of edges scales superlinearly with the number of nodes. Less studied but equally important, however, are scaling laws of higher-order cliques, which can drive clustering and network redundancy. In...
Real-world networks are rarely static. Recently, there has been increasing interest in both network growth and network densification, in which the number of edges scales superlinearly with the number of nodes. Less studied but equally important, however, are scaling laws of higher-order cliques, which can drive clustering and network redundancy. In...
Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral annotations have been released. However, these datasets vary in the method of data collection, domain, topics, instr...
Many openly non-binary gender individuals participate in social networks. However, the relationship between gender and online interactions is not well understood, which may result in disparate treatment by large language models. We investigate individual identity on Twitter, focusing on gender expression as represented by users chosen pronouns. We...
On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered...
Journalists play a vital role in surfacing issues of societal importance, but their choices of what to highlight and who to interview are influenced by societal biases. In this work, we use natural language processing tools to measure these biases in a large corpus of news articles about the Covid-19 pandemic. Specifically, we identify when experts...
Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences. In the last few years, there has been a considerable interest in adapting machine learning algorithms to the problem of estimating heterogeneous effects from observational and experimental data...
Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast...
The social web has linked people on a global scale, transforming how we communicate and interact. The massive interconnectedness has created new vulnerabilities in the form of social manipulation and misinformation. As the social web matures, we are entering a new phase, where people share their private feelings and emotions. This so-called social...
Emotions play an important role in interpersonal interactions and social conflict, yet their function in the development of controversy and disagreement in online conversations has not been explored. To address this gap, we study controversy on Reddit, a popular network of online discussion forums. We collect discussions from a wide variety of topi...
The growing popularity of wearable sensors has generated large quantities of temporal physiological and activity data. Ability to analyze this data offers new opportunities for real-time health monitoring and forecasting. However, temporal physiological data presents many analytic challenges: the data is noisy, contains many missing values, and eac...
The growing prominence of social media in public discourse has led to greater scrutiny of the quality of information spreading online and the role that polarization plays in this process. However, studies of information spread on social media platforms like Twitter have been hampered by the difficulty of collecting data about the social graph, spec...
The need for emotional inference from text continues to diversify as more and more disciplines integrate emotions into their theories and applications. These needs include inferring different emotion types, handling multiple languages, and different annotation formats. A shared model between different configurations would enable the sharing of know...
Detecting emotions expressed in text has become critical to a range of fields. In this work, we investigate ways to exploit label correlations in multi-label emotion recognition models to improve emotion detection. First, we develop two modeling approaches to the problem in order to capture word associations of the emotion words themselves, by eith...
Morality plays an important role in culture, identity, and emotion. Recent advances in natural language processing have shown that it is possible to classify moral values expressed in text at scale. Morality classification relies on human annotators to label the moral expressions in text, which provides training data to achieve state-of-the-art per...
Diversity in science is necessary to improve innovation and increase the capacity of the scientific workforce. Despite decades-long efforts to increase gender diversity, however, women remain a small minority in many fields, especially in senior positions. The dearth of elite women scientists, in turn, leaves fewer women to serve as mentors and rol...
The scaling relations between city attributes and population are emergent and ubiquitous aspects of urban growth. Quantifying these relations and understanding their theoretical foundation, however, is difficult due to the challenge of defining city boundaries and a lack of historical data to study city dynamics over time and space. To address this...
Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as...