Keyu Chen’s research while affiliated with Yale-New Haven Hospital and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (12)


Partisan US News Media Representations of Syrian Refugees
  • Article
  • Full-text available

June 2023

·

115 Reads

·

2 Citations

Proceedings of the International AAAI Conference on Web and Social Media

Keyu Chen

·

·

Yiwen Shi

·

[...]

·

We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes.

Download

Categorizing Memes About the Ukraine Conflict

February 2023

·

137 Reads

·

4 Citations

Lecture Notes in Computer Science

The Russian disinformation campaign uses pro-Russia memes to polarize Americans, and increase support for the Russian invasion of Ukraine. Thus, it is critical for governments and similar stakeholders to identify pro-Russia memes, countering them with evidence-based information. Identifying broad meme themes is crucial for developing a targeted and strategic counter response. There are also a range of pro-Ukraine memes that bolster support for the Ukrainian cause. As such, we need to identify pro-Ukraine memes and aid with their dissemination to augment global support for Ukraine. We address the indicated issues through the following contributions: 1) Creation of an annotated dataset of pro-Russia (N = 70) and pro-Ukraine (N = 121) memes regarding the Ukraine conflict; 2) Identification of broad themes within the pro-Russia and pro-Ukraine meme categories. Broadly, our findings indicated that pro-Russia memes fall into thematic categories that seek to undermine specific elements of US and their allies’ policy and culture. Pro-Ukraine memes are far more diffuse thematically, highlighting admiration for Ukraine’s people and its leadership. Stakeholders may utilize our findings to develop targeted strategies to mitigate Russian influence operations - possibly reducing effects of the conflict.


US News and Social Media Framing Around Vaping

February 2023

·

40 Reads

·

4 Citations

Lecture Notes in Computer Science

In this paper, we investigate how vaping is framed differently (2008–2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news media framing of vaping shifted over time in line with emergent regulatory trends, such as; flavored vaping bans, with little discussion around vaping as a smoking cessation tool. We found that social media discussions were far more varied, with transitions toward vaping both as a public health harm and as a smoking cessation tool. Our cloze test, dynamic topic models, and question answering showed similar patterns, where social media, but not news media, characterizes vaping as combustible cigarette substitute. We use n-grams and LDA topic models to detail that social media data first centered on vaping as a smoking cessation tool, and in 2019 moved toward narratives around vaping regulation, similar to news media frames. Overall, social media tracks the evolution of vaping as a social practice, while news media reflects more risk based concerns. A strength of our work is how the different techniques we have applied validate each other.


Online Platforms' Framing around Vaping

November 2022

·

16 Reads

·

3 Citations

Drug Testing and Analysis

In this paper, we provide a descriptive overview of how vaping is framed differently between various online plat- forms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow, Facebook, online news media). We provide an overview of >1M posts and news articles about vaping to study the differences in framing between online platforms. Findings indicate an inconsistent framing around vaping across platforms. Stakeholders may utilize our findings to inter- vene around the framing of vaping, and may design commu- nications campaigns that improve the way society sees vap- ing, possibly aiding smoking cessation; and reducing youth vaping.


Interpretable and High-Performance Hate and Offensive Speech Detection

November 2022

·

151 Reads

·

2 Citations

Lecture Notes in Computer Science

The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, which can reduce trust, there is a need to create explainable and interpretable models. Thus, we build an explainable and interpretable high performance model based on the XGBoost algorithm, trained on Twitter data. For unbalanced Twitter data, XGboost outperformed the LSTM, AutoGluon, and ULMFiT models on hate speech detection with an F1 score of 0.75 compared to 0.38 and 0.37, and 0.38 respectively. When we down-sampled the data to three separate classes of approximately 5,000 tweets, XGBoost performed better than LSTM, AutoGluon, and ULMFiT; with F1 scores for hate speech detection of 0.79 vs 0.69, 0.77, and 0.66 respectively. XGBoost also performed better than LSTM, AutoGluon, and ULMFiT in the down-sampled version for offensive speech detection with F1 score of 0.83 vs 0.88, 0.82, and 0.79 respectively. We use Shapley Additive Explanations (SHAP) on our XGBoost models’ outputs to makes it explainable and interpretable compared to LSTM, AutoGluon and ULMFiT that are black-box models. KeywordsTransparencyXGBoostPerformanceMachine learningNatural language processingHateOffensive


How Is Vaping Framed on Online Knowledge Dissemination Platforms?

September 2022

·

17 Reads

·

1 Citation

Lecture Notes in Computer Science

Studying how vaping is framed on various knowledge dissemination platforms (e.g., Quora, Reddit, Wikipedia) is central to understanding the process of knowledge dissemination around vaping. Such understanding can help us craft tools specific to each platform, to dispel vaping misperceptions and reinforce evidence-based information. We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use NLP techniques to understand these differences. As an example, regarding question answering results, for the question What is vaping for?, we note answers framing vaping as a smoking cessation tool in Quora, Medium, and Stack Exchange. Reddit tended to frame vaping as a hobby. Wikipedia had a mix of answers, some centered on EVALI, and others on vaping as harm reduction. Broadly, results indicate that Quora is an appropriate venue for those looking to transition from smoking to vaping. Other platforms (Reddit, wikiHow) are more for vaping hobbyists and may not sufficiently dissuade youth vaping. Conversely, Wikipedia may exaggerate vaping harms, dissuading smokers from transitioning. A strength of our work is how the different techniques we have applied validate each other. Stakeholders may utilize our findings to design vaping regulation that clarifies the role of vapes as a smoking cessation tool.


Fig. 1. Word clouds for various forms of speech.
Fig. 2. Shap for the tweet "if you still hate this nigga" labeled as class hate.
F1 Score of the three classes for XGBoost, LSTM, AutoGluon, and ULMFiT.
F1 Score of the three classes for XGBoost, LSTM, AutoGluon, and ULMFiT after down sampling.
Explainable and High-Performance Hate and Offensive Speech Detection

June 2022

·

119 Reads

The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, which can reduce trust, there is a need to create explainable and interpretable models. Thus, we build an explainable and interpretable high performance model based on the XGBoost algorithm, trained on Twitter data. For unbalanced Twitter data, XGboost outperformed the LSTM, AutoGluon, and ULMFiT models on hate speech detection with an F1 score of 0.75 compared to 0.38 and 0.37, and 0.38 respectively. When we down-sampled the data to three separate classes of approximately 5000 tweets, XGBoost performed better than LSTM, AutoGluon, and ULMFiT; with F1 scores for hate speech detection of 0.79 vs 0.69, 0.77, and 0.66 respectively. XGBoost also performed better than LSTM, AutoGluon, and ULMFiT in the down-sampled version for offensive speech detection with F1 score of 0.83 vs 0.88, 0.82, and 0.79 respectively. We use Shapley Additive Explanations (SHAP) on our XGBoost models' outputs to makes it explainable and interpretable compared to LSTM, AutoGluon and ULMFiT that are black-box models.


How is Vaping Framed on Online Knowledge Dissemination Platforms?

June 2022

·

30 Reads

We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venues for those looking to transition from smoking to vaping. Other platforms (Reddit, wikiHow) are more for vaping hobbyists and may not sufficiently dissuade youth vaping. Conversely, Wikipedia may exaggerate vaping harms, dissuading smokers from transitioning. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to design informational tools to reinforce or mitigate vaping (mis)perceptions online.


Figure 1: (a) Article count related to Syrian refugees (2011-2021) across US partisan news outlets online. (b) and (c): Sentiment and Offensive Speech Scores for articles related to Syrian refugees across US partisan online news outlets.
Partisan US News Media Representations of Syrian Refugees

June 2022

·

93 Reads

·

1 Citation

We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes.


The top four candidate words ranked by BERT probability for the cloze test "Vaping is [MASK] for smoking" for Media Cloud and CrowdTangle data.
US News and Social Media Framing around Vaping

June 2022

·

214 Reads

In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news media framing of vaping shifted over time in line with emergent regulatory trends, such as; flavored vaping bans, with little discussion around vaping as a smoking cessation tool. We found that social media discussions were far more varied, with transitions toward vaping both as a public health harm and as a smoking cessation tool. Our cloze test, dynamic topic model, and question answering showed similar patterns, where social media, but not news media, characterizes vaping as combustible cigarette substitute. We use n-grams to detail that social media data first centered on vaping as a smoking cessation tool, and in 2019 moved toward narratives around vaping regulation, similar to news media frames. Overall, social media tracks the evolution of vaping as a social practice, while news media reflects more risk based concerns. A strength of our work is how the different techniques we have applied validate each other. Stakeholders may utilize our findings to intervene around the framing of vaping, and may design communications campaigns that improve the way society sees vaping, thus possibly aiding smoking cessation; and reducing youth vaping.


Citations (6)


... In follow-on research (KhudaBukhsh et al. 2022), we show that the language of former president Trump's YouTube channel comments was most similar to the news language of One America News Network and Newsmax TV, the two news networks that later got into legal trouble for voter fraud misinformation (Pruitt-Young 2021). We further show that our machine-translation-based framework has broader applications that transcend to other languages (e.g., Spanish Villa-Cox et al. 2022) and other crisis settings (e.g., studying linguistics differences in framing refugee crises Chen et al. 2023). ...

Reference:

Deceptively simple: An outsider's perspective on natural language processing
Partisan US News Media Representations of Syrian Refugees

Proceedings of the International AAAI Conference on Web and Social Media

... Where possible, fathers themselves should be consulted on articles about fatherhood. For example, a panel staffed by fathers can comment on fatherhood-related online news articles, providing suggestions on how articles can more accurately represent fathers' concerns [1,2]. Our findings relied on the validity of data collected with our search terms. ...

Online Platforms' Framing around Vaping
  • Citing Article
  • November 2022

Drug Testing and Analysis

... The approach uses active learning cycles to train the task using the result-label pairs and improves the model's accuracy. Babaeianjelodar et al. [5] built an explainable and interpretable high-performance model based on the XGBoost algorithm, trained on Twitter data, to detect hate and offensive speech. The paper uses Shapley Additive Explanations (SHAP) on the XGBoost models' outputs to make it explainable and interpretable compared to black-box models. ...

Interpretable and High-Performance Hate and Offensive Speech Detection

Lecture Notes in Computer Science

... While there have been studies on preventing vaping among adolescents [12,13], and the effect of vaping misinformation on attitudes toward vapes [1], and vaping misinformation more broadly [10,14,15], there is limited research on interventions to mitigate misinformation about vapes. Thus, we are far from knowing when and how to intervene best. ...

How Is Vaping Framed on Online Knowledge Dissemination Platforms?
  • Citing Chapter
  • September 2022

Lecture Notes in Computer Science

... As a result, research communities devote a great deal of attention to the study of news bias [5,6,7]. However, the first step in conducting such a study is to identify it [8,9]. Although the task may appear trivial, it is in fact challenging as bias can manifest itself at different levels in complex ways [10]. ...

Partisan US News Media Representations of Syrian Refugees

... Finally, an increased trend corresponding to the symptomatic expressions similar to the Isolation phase is observed in the vaccination intervention graphs of Figure A5. This can be attributed to the other COVID-19 variants and the multiple waves of infection, as well as the anxiety corresponding to the vaccines (Kumar et al. 2022). The highest change is observed in the cumulative changes in the anxiety values (6.23%). ...

COVID-19 vaccine perceptions in the initial phases of US vaccine roll-out: an observational study on reddit

BMC Public Health