Haewoon Kwak's research while affiliated with Singapore Management University and other places

Publications (135)

Preprint
Full-text available
New leaders in democratic countries typically enjoy high approval ratings immediately after taking office. This phenomenon is called the honeymoon effect and is regarded as a significant political phenomenon; however, its mechanism remains underexplored. Therefore, this study examines how social media users respond to changes in political leadershi...
Preprint
Recent studies have exploited advanced generative language models to generate Natural Language Explanations (NLE) for why a certain text could be hateful. We propose the Chain of Explanation Prompting method, inspired by the chain of thoughts study \cite{wei2022chain}, to generate high-quality NLE for implicit hate speech. We build a benchmark base...
Article
Full-text available
This research investigates changes in online behavior of users who publish in multiple communities on Reddit by measuring their toxicity at two levels. With the aid of crowdsourcing, we built a labeled dataset of 10,083 Reddit comments, then used the dataset to train and fine-tune a Bidirectional Encoder Representations from Transformers (BERT) neu...
Article
Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning mod...
Article
Characters in fighting videogames¹ such as Street Fighter V and Tekken7 typically reveal a phenomenon that we define as virtual enfreakment: their bodies, costumes, and fighting styles are exaggerated (1) in a manner that emphasizes perceived exoticism and (2) to enable them to be easily visually and conceptually distinguishable from one another. H...
Article
While the contagious nature of online toxicity sparked increasing interest in its early detection and prevention, most of the literature focuses on the Western world. In this work, we demonstrate that 1) it is possible to detect toxicity triggers in an Asian online community, and 2) toxicity triggers can be strikingly different between Western and...
Article
A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean L...
Preprint
Achievement systems have been actively adopted in gaming platforms to maintain players' interests. Among them, trophies in PlayStation games are one of the most successful achievement systems. While the importance of trophy design has been casually discussed in many game developers' forums, there has been no systematic study of the historical datas...
Article
Full-text available
The transfer of power stemming from the 2020 presidential election occurred during an unprecedented period in United States history. Uncertainty from the COVID-19 pandemic, ongoing societal tensions, and a fragile economy increased societal polarization, exacerbated by the outgoing president's offline rhetoric. As a result, online groups such as QA...
Preprint
Full-text available
The United States have some of the highest rates of gun violence among developed countries. Yet, there is a disagreement about the extent to which firearms should be regulated. In this study, we employ social media signals to examine the predictors of offline political activism, at both population and individual level. We show that it is possible t...
Article
The United States have some of the highest rates of gun violence among developed countries. Yet, there is a disagreement about the extent to which firearms should be regulated. In this study, we employ social media signals to examine the predictors of offline political activism, at both population and individual level. We show that it is possible t...
Preprint
A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean L...
Preprint
While the contagious nature of online toxicity sparked increasing interest in its early detection and prevention, most of the literature focuses on the Western world. In this work, we demonstrate that 1) it is possible to detect toxicity triggers in an Asian online community, and 2) toxicity triggers can be strikingly different between Western and...
Preprint
Full-text available
Social media is not only a place for people to communicate on a daily matter but also a virtual venue to transmit and exchange various ideas. Such ideas are known as the raw voices of potential consumers, which come from a wide range of people who may not participate in consumer surveys, and therefore their opinions may contain high value to compan...
Preprint
Full-text available
False information spreads on social media, and fact-checking is a potential countermeasure. However, there is a severe shortage of resources of fact-checkers, and an efficient way to scale fact-checking is desperately needed, especially in pandemics like COVID-19. In this study, we focus on spontaneous debunking by social media users, which has bee...
Article
Full-text available
Although established marketing techniques have been applied to design more effective health campaigns, more often than not, the same message is broadcasted to large populations, irrespective of unique characteristics. As individual digital device use has increased, so have individual digital footprints, creating potential opportunities for targeted...
Preprint
We investigate predictors of anti-Asian hate among Twitter users throughout COVID-19. With the rise of xenophobia and polarization that has accompanied widespread social media usage in many nations, online hate has become a major social issue, attracting many researchers. Here, we apply natural language processing techniques to characterize social...
Article
We study the response to the Charlie Hebdo shootings of January 7, 2015 on Twitter across the globe. We ask whether the stances on the issue of freedom of speech can be modeled using established sociological theories, including Huntington's culturalist Clash of Civilizations, and those taking into consideration social context, including Density and...
Article
We improve the readability of a personal timeline by weaving multiple social contexts of tweets into a vi-sualization. Our social contexts consist of three dimen-sions: community membership, key persons, and inter-esting tweets within a personal timeline. A person is of-ten a member of several communities, such as a family, a class, or a team, simu...
Article
A growing number of people are changing the way they consume news, replacing the traditional physical newspapers and magazines by their virtual online versions or/and weblogs. The interactivity and immediacy present in online news are changing the way news are being produced and exposed by media corporations. News websites have to create effective...
Article
Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes ("microframes") that are overrepresented in th...
Article
To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, the research community does not own a sufficient understanding of what kinds of editing strategies effe...
Preprint
Full-text available
The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim or article, either manually or automatically. Thus, many researchers are shifting their attention to higher granularity, aiming to profile entire news outlets, which makes it possible to detect lik...
Conference Paper
Full-text available
As Internet users increasingly rely on social media sites to receive news, they are faced with a bewildering number of news media choices. For example, thousands of Facebook pages today are registered and categorized as some form of news media outlets. This situation boosted the so-called independent journalism, also known as alternative news media...
Preprint
To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, research community does not own a sufficient level of understanding of what kinds of editing strategies...
Preprint
UNSTRUCTURED While established marketing techniques have been applied to design more effective health campaigns, more often than not, the same message is broadcasted to large populations, irrespective of unique characteristics. As individual digital device usage has increased, so has individual digital footprints, creating potential opportunities f...
Preprint
Full-text available
Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online, has made it impossible to fact-check every single suspicious...
Article
Online communities adopt various reputation schemes to measure content quality. This study analyzes the effect of a new reputation scheme that exposes one's offline social status, such as an education degree, within an online community. We study two Reddit communities that adopted this scheme, whereby posts include tags identifying education status...
Article
In this work, we empirically validate three common assumptions in building political media bias datasets, which are (i) labelers' political leanings do not affect labeling tasks, (ii) news articles follow their source outlet's political leaning, and (iii) political leaning of a news outlet is stable across different topics. We build a ground-truth...
Preprint
Framing is an indispensable narrative device for news media because even the same facts may lead to conflicting understandings if deliberate framing is employed. Therefore, identifying media framing is a crucial step to understanding how news media influence the public. Framing is, however, difficult to operationalize and detect, and thus tradition...
Article
Although used in many domains, the evaluation of personas is difficult due to the lack of validated measurement instruments. To tackle this challenge, we propose the Persona Perception Scale (PPS), a survey instrument for evaluating how individuals perceive personas. We develop the scale by reviewing relevant literature from social psychology, pers...
Preprint
Online communities adopt various reputation schemes to measure content quality. This study analyzes the effect of a new reputation scheme that exposes one's offline social status, such as an education degree, within an online community. We study two Reddit communities that adopted this scheme, whereby posts include tags identifying education status...
Preprint
We propose FrameAxis, a method of characterizing the framing of a given text by identifying the most relevant semantic axes ("microframes") defined by antonym word pairs. In contrast to the traditional framing analysis, which has been constrained by a small number of manually annotated general frames, our unsupervised approach provides much more de...
Chapter
Gender and racial diversity in the mediated images from the media shape our perception of different demographic groups. In this work, we investigate gender and racial diversity of 85,957 advertising images shared by the 73 top international brands on Instagram and Facebook. We hope that our analyses give guidelines on how to build a fully automated...
Preprint
Full-text available
We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what's behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame...
Conference Paper
Full-text available
Despite the considerable interest in the detection of toxic comments, there has been little research investigating the causes -- i.e., triggers -- of toxicity. In this work, we first propose a formal definition of triggers of toxicity in online communities. We proceed to build an LSTM neural network model using textual features of comments, and the...
Preprint
Gender and racial diversity in the mediated images from the media shape our perception of different demographic groups. In this work, we investigate gender and racial diversity of 85,957 advertising images shared by the 73 top international brands on Instagram and Facebook. We hope that our analyses give guidelines on how to build a fully automated...
Article
Online platforms, such as Facebook, Twitter, and Reddit, provide users with a rich set of features for sharing and consuming political information, expressing political opinions, and exchanging potentially contrary political views. In such activities, two types of communication spaces naturally emerge: those dominated by exchanges between political...
Preprint
Online platforms, such as Facebook, Twitter, and Reddit, provide users with a rich set of features for sharing and consuming political information, expressing political opinions, and exchanging potentially contrary political views. In such activities, two types of communication spaces naturally emerge: those dominated by exchanges between political...
Conference Paper
Full-text available
Predicting user confusion can help improve information presentation on websites, mobile apps, and virtual reality interfaces. One promising information source for such prediction is eye-tracking data about gaze movements on the screen. Coupled with think-aloud records, we explore if user's confusion is correlated with primarily fixation-level featu...
Conference Paper
Full-text available
Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an on...
Chapter
Full-text available
As the quantity of social and online analytics data has drastically increased, a wide variety of methods are deployed to make sense of this data, typically via computational and algorithmic approaches. However, in many cases, these approaches trade one form of complexity for another by ignoring the principles of human cognitive processing. In this...
Article
Full-text available
In this critique, we conceptually examine the use of personas in an age of availability of large-scale online analytics data. Based on the criticism and benefits outlined in prior work, we formulate the major arguments for and against the use of personas, analyze these arguments, and demonstrate areas for the productive employment of personas by le...
Article
Full-text available
We develop a methodology to automate creating imaginary people, referred to as personas, by processing complex behavioral and demographic data of social media audiences. From a popular social media account containing more than 30 million interactions by viewers from 198 countries engaging with more than 4,200 online videos produced by a global medi...
Article
Full-text available
We propose a novel approach for isolating customer segments using online customer data for products that are distributed via online social media platforms. We use non-negative matrix factorization to first identify behavioral customer segments and then to identify demographic customer segments. We employ a methodology for linking the two segments t...
Article
Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an on...
Article
Social media analytics is insightful but can also be difficult to use within organizations. To address this, we present Automatic Persona Generation (APG), a system and methodology for quantitatively generating personas using large amounts of online social media data. The APG system is operational, deployed in a pilot version with several organizat...
Article
In this research, we evaluate four widely used face detection tools, which are Face++, IBM Bluemix Visual Recognition, AWS Rekognition, and Microsoft Azure Face API, using multiple datasets to determine their accuracy in inferring user attributes, including gender, race, and age. Results show that the tools are generally proficient at determining g...
Preprint
Because word semantics can substantially change across communities and contexts, capturing domain-specific word semantics is an important challenge. Here, we propose SEMAXIS, a simple yet powerful framework to characterize word semantics using many semantic axes in word- vector spaces beyond sentiment. We demonstrate that SEMAXIS can capture nuance...
Article
Full-text available
Understanding users in the era of social media is challenging, requiring organizations to adopt novel computation-aided approaches. To exemplify such an approach, we retrieved information on millions of interactions with YouTube video content from a major Middle Eastern media outlet, to automatically generate personas that capture how different aud...
Article
Full-text available
Since virtual identities such as social media profiles and avatars have become a common venue for self-expression, it has become important to consider the ways in which existing systems embed the values of their designers. In order to design virtual identity systems that reflect the needs and preferences of diverse users, understanding how virtual...
Conference Paper
Full-text available
Over the past few years, a number of new "fringe" communities, like 4chan or certain subreddits, have gained traction on the Web at a rapid pace. However, more often than not, little is known about how they evolve or what kind of activities they attract, despite recent research has shown that they influence how false information reaches mainstream...
Conference Paper
Full-text available
Over the past few years, a number of new "fringe" communities, like 4chan or certain subreddits, have gained traction on the Web at a rapid pace. However, more often than not, little is known about how they evolve or what kind of activities they attract, despite recent research has shown that they influence how false information reaches mainstream...
Conference Paper
Full-text available
In this research, we investigate if and how more photos than a single headshot can heighten the level of information provided by persona profiles. We conduct eye-tracking experiments and qualitative interviews with variations in the photos: a single headshot, a headshot and images of the persona in different contexts, and a headshot with pictures o...
Conference Paper
Full-text available
We report findings and implications from a semi-naturalistic user study of a system for Automatic Persona Generation (APG) using large-scale audience data of an organization's social media channels conducted at the workplace of a major international corporation. Thirteen participants from a range of positions within the company engaged with the sys...
Conference Paper
Full-text available
Personas are widely used in software development, system design, and HCI studies. Yet, their evaluation is difficult, and there are no recognized and validated measurement scales to date. To improve this condition, this research develops a persona perception scale based on reviewing relevant literature. We validate the scale through a pilot study w...
Conference Paper
We investigate the alignment of international attention of news media organizations within 193 countries with the expressed international interests of the public within those same countries from March 7, 2016 to April 14, 2017. We collect fourteen months of longitudinal data of online news from Unfiltered News and web search volume data from Google...
Conference Paper
Full-text available
To more effectively convey relevant information to end users of persona profiles, we conducted a user study consisting of 29 participants engaging with three persona layout treatments. We were interested in confusion engendered by the treatments on the participants, and conducted a within-subjects study in the actual work environment, using eye-tra...
Conference Paper
Full-text available
We present Automatic Persona Generation (APG), a methodology and system for quantitative persona generation using large amounts of online social media data. The system is operational, beta deployed with several client organizations in multiple industry verticals and ranging from small-to-medium sized enterprises to large multi-national corporations...
Article
Full-text available
We investigate the alignment of international attention of news media organizations within 193 countries with the expressed international interests of the public within those same countries from March 7, 2016 to April 14, 2017. We collect fourteen months of longitudinal data of online news from Unfiltered News and web search volume data from Google...
Article
Full-text available
Over the past few years new "fringe" communities have been created and gained traction on the Web at a rapid rate. Very often little is known about how they evolve and what kinds of activities they attract, despite recent research has shown that they do influence how misinformation reaches mainstream communities. This motivates the need to monitor...
Conference Paper
Full-text available
The availability of large quantities of online data affords the isolation of key user segments based on demographics and behaviors for many online systems. However, there is an open question of how organizations can best leverage this user information in communication and decision-making. The automatic generation of personas to represent customer s...
Conference Paper
Full-text available
One of the reasons for using personas is to align user understandings across project teams and sites. As part of a larger persona study, at Al Jazeera English (AJE), we conducted 16 qualitative interviews with media producers, the end users of persona descriptions. We asked the participants about their understanding of a typical AJE media consumer,...
Conference Paper
Examining 103,133 news articles that are the most popular for different demographic groups in Daum News (the second most popular news portal in South Korea) during the whole year of 2015, we provided multi-level analyses of gender and age differences in news consumption. We measured such differences in four different levels: (1) by actual news item...
Conference Paper
The objective of this study is to assess the longitudinal trends of media similarity and dissimilarity on the international scale. As news value has well-established political, cultural, and economic consequences, the degree to which media coverage and content is converging across countries has implications for international relations. To study thi...
Conference Paper
Full-text available
We conduct a mixed-method study to better understand the content consumption patterns of Middle Eastern social media users and to explore new ways to present online data by using automatic persona generation. First, we analyze millions of content interactions on YouTube to dynamically generate personas describing behavioral patterns of different de...
Article
2017 IEEE. We conduct a mixed-method study to better understand the content consumption patterns of Middle Eastern social media users and to explore new ways to present online data by using automatic persona generation. First, we analyze millions of content interactions on YouTube to dynamically generate personas describing behavioral patterns of d...
Conference Paper
We propose a novel method for generating personas based on online user data for the increasingly common situation of content creators distributing products via online platforms. We use non-negative matrix factorization to identify user segments and develop personas by adding personality such as names and photos. Our approach can develop accurate pe...
Conference Paper
We built a multiplex media attention and disregard network (MADN) among 129 countries over 212 days. By characterizing the MADN from multiple levels, we found that it is formed primarily by skewed, hierarchical, and asymmetric relationships. Also, we found strong evidence that our news world is becoming a "global village." However, at the same time...