U.S. support for marijuana legalization grew from 38% to 65% in 2008-2019. To find the discourse features that preceded and followed the shift, I curated a comprehensive corpus of Reddit comments from the same period. Neural networks trained on human annotations of attitude and persuasion attempts separated strategic use of narratives from non-argumentative discourse. Two narrative frames considered important to persuasion in past research were studied: anecdotal vs. generalized content. I operationalized anecdotal frames based on three linguistic clause-level features: Whether the clause is about a generic kind, if it represents a reliable state or an event, and whether any events are bounded in time. A corpus of Reddit and news was annotated for these features and more, neural networks based on which estimated anecdotal properties in the broader Reddit dataset. Anecdotal themes were less prevalent but present in most comments, particularly in arguments favoring legalization. Nationally, a surge in anecdotes within non-argumentative discourse happened over time as a consequence of attitude shifts. Generalized discourse was a potential cause with major surges around the 2012 and 2016 legal milestones. Attempts to associate generalized discourse with legal changes were complicated by marijuana’s varied status across the U.S. I therefore inferred user locations and compared the rate of anecdotal themes before and after legalization in comments from pioneering states. More generalized frames set the stage for each successful legalization bid. The particular content, however, varied between the two milestones. Character judgments were prominent in 2012, while crimes and politics took center-stage in 2016. The generalized precedents of legalization in the two periods shared argumentative and moralistic focus but had distinctive clause-level profiles. Meanwhile, legal and medical arguments were sidelined, meaning the novel consensus was not informed by much of the relevant information. Together, my findings present generalized argument framing as a harbinger of attitude shift toward hot-button topics, and anecdotal non-argumentative framing as a consequence of it. The machine learning pipeline that made this insight possible is novel for social media research but general-purpose, allowing similar abstract narrative frames to be broken down into theory-driven constituents, and studied in quantitative detail.
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3(Brown et al., 2020). We showcase the usefulness of this corpus for detailed discourse analysis of text generation by providing preliminary evidence that less numerous, shorter and more often incoherent clause relations are associated with lower perceived quality of computer-generated narratives and arguments.
We present a novel corpus of 445 human-and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3 (Brown et al., 2020). We showcase the usefulness of this corpus for detailed discourse analysis of text generation by providing preliminary evidence that less numerous, shorter and more often incoherent clause relations are associated with lower perceived quality of computer-generated narratives and arguments.
Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes ("microframes") that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. It can also provide nuanced insights by considering a rich set of semantic axes. FrameAxis is designed to quantitatively tease out two important dimensions of how microframes are used in the text. Microframe bias captures how biased the text is on a certain microframe, and microframe intensity shows how prominently a certain microframe is used. Together, they offer a detailed characterization of the text. We demonstrate that microframes with the highest bias and intensity align well with sentiment, topic, and partisan spectrum by applying FrameAxis to multiple datasets from restaurant reviews to political news. The existing domain knowledge can be incorporated into FrameAxis by using custom microframes and by using FrameAxis as an iterative exploratory analysis instrument. Additionally, we propose methods for explaining the results of FrameAxis at the level of individual words and documents. Our method may accelerate scalable and sophisticated computational analyses of framing across disciplines.
Using the results of the detailed survey conducted by Korea Institute for National Unification in 2020 with representative national samples of South Koreans, we show how various misestimations of societal knowledge about Korean unification and attitudes towards it shape the current public opinion impasse on the topic.
Legalization and commercial sale of non-medical cannabis has led to increasing diversity and potency of cannabis products. Some of the American states that were the first to legalize have seen rises in acute harms associated with cannabis use, e.g. Colorado has seen increases in emergency department visits for cannabis-related acute psychological distress and severe vomiting (hyperemesis), as well as a number of high-profile deaths related to ingestion of high doses of cannabis edibles. Over-ingestion of cannabis is related to multiple factors, including the sale of cannabis products with high levels of THC and consumers’ confusion regarding labelling of cannabis products, which disproportionately impact new or inexperienced users. Based on our review of the literature, we propose three approaches to minimizing acute harms: early restriction of cannabis edibles and high-potency products; clear and consistent labelling that communicates dose/serving size and health risks; and implementation of robust data collection frameworks to monitor harms, broken down by cannabis product type (e.g. dose, potency, route of administration) and consumer characteristics (e.g. age, sex, gender, ethnicity). Ongoing data collection and monitoring of harms in jurisdictions that have existing legal cannabis laws will be vital to understanding the impact of cannabis legalization and maximizing public health benefits.
Stance detection on social media is an emerging opinion mining paradigm for various social and political applications in which sentiment analysis may be sub-optimal. There has been a growing research interest for developing effective methods for stance detection methods varying among multiple communities including natural language processing, web science, and social computing, where each modeled stance detection in different ways. In this paper, we survey the work on stance detection across those communities and present an exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied. Our survey reports state-of-the-art results on the existing benchmark datasets on stance detection, and discusses the most effective approaches. In addition, we explore the emerging trends and different applications of stance detection on social media, including opinion mining and prediction and recently using it for fake news detection. The study concludes by discussing the gaps in the current existing research and highlights the possible future directions for stance detection on social media.
Scholars from across the social and media sciences have issued a clarion call to address a recent resurgence in criminalized characterizations of immigrants. Do these characterizations meaningfully impact individuals’ beliefs about immigrants and immigration? Across two online convenience samples (total N = 1,054 adult U.S. residents), we applied a novel analytic technique to test how different narratives—achievement, criminal, and struggle-oriented—impacted cognitive representations of German, Russian, Syrian, and Mexican immigrants and the concept of immigrants in general. All stories featured male targets. Achievement stories homogenized individual immigrant representations, whereas both criminal and struggle-oriented stories racialized them along a White/non-White axis: Germany clustered with Russia, and Syria clustered with Mexico. However, criminal stories were unique in making our most egalitarian participants’ representations as differentiated as our least egalitarian participants’. Narratives about individual immigrants also generalized to update representations of nationality groups. Most important, narrative-induced representations correlated with immigration-policy preferences: Achievement narratives and corresponding homogenized representations promoted preferences for less restriction, and criminal narratives promoted preferences for more.
In recent years, marijuana use on U.S. college campuses reached the highest point while the perceptions of risk and social disapproval registered the lowest since the early 1980s. However, little attention has been paid to the sources of the marijuana-related messages and their relationships with marijuana knowledge and confidence in knowledge, proximate protective/risk factors. To fill this gap, a convenience sample of students (N = 249) on a campus located in a U.S. recreational marijuana legal state were surveyed to identify their marijuana information sources and explore the relationships among the sources, confidence in marijuana knowledge, and objective knowledge. Peers/media were the most important sources and they were used more than other sources. Use of peers/media sources was related to lower health knowledge and higher confidence in knowledge. Although students named parents and education/science sources as important, these were less frequently used than siblings, the sources they named as the least important. This study advanced our understanding of the various sources of marijuana information used by U.S. college students and the relationships between the information sources and confidence in knowledge and objective knowledge, two emerging risk/protective factors in the era of marijuana deregulation.
This project analyzed print news articles on cannabis legalization that were published in 2015 ( N = 295) from newspapers across the United States. The following year, 2016, saw more states legalize cannabis for adult use and medical use than before. Therefore, one goal of this research was to investigate the relationship between reports on cannabis legalization and subsequent legal changes that occurred in states that reformed their cannabis laws. Findings reveal that cannabis legalization issues are reported in the media with tones that favor, oppose, or are neutral toward cannabis legalization. Overall, cannabis legalization stories were reported with a neutral tone. Additionally, arguments about whether cannabis should be legalized are framed using criminological, economic, medical, and political themes. The political theme emerged most frequently in all reports. Findings indicate that there is an association between the tone of positive reporting and subsequent cannabis legalization in states where those reports originated. These findings have implications for allowing policymakers and healthcare professionals to build on their existing knowledge of the relationship between media, public opinion, and emerging cannabis policy. Finally, this study provides some context for the connection between a story’s theme, tone, and how they can shed light on cannabis legalization outcomes.
This open access book provides new methodological and theoretical insights into temporal reference and its linguistic expression, from a cross-linguistic experimental corpus pragmatics approach. Verbal tenses, in general, and more specifically the categories of tense, grammatical and lexical aspect are treated as cohesion ties contributing to the temporal coherence of a discourse, as well as to the cognitive temporal coherence of the mental representations built in the language comprehension process. As such, it investigates the phenomenon of temporal reference at the interface between corpus linguistics, theoretical linguistics and pragmatics, experimental pragmatics, psycholinguistics, natural language processing and machine translation.
Events unfold over time, i.e. they have a beginning and endpoint. Previous studies have illustrated the importance of endpoints for event perception and memory. However, this work has only discussed events with a self-evident endpoint, and the internal temporal structure of events has not received much attention. In this study, we hypothesise that event cognition computes boundedness, an abstract feature of the internal temporal structure of events. We further hypothesise that sensitivity to boundedness affects how individual temporal slices of events (such as event midpoints or endpoints) are processed and integrated into a coherent event representation. The results of three experiments confirm these hypotheses. In Experiment 1, viewers identified the class of bounded (non-homogeneous, culminating) and unbounded (homogeneous, non-culminating) events in a categorisation task. In Experiments 2 and 3, viewers reacted differently to temporal disruptions in bounded versus unbounded events. We conclude that boundedness shapes how events are temporally processed.