Chapter

Dictionaries, Supervised Learning, and Media Coverage of Public Policy

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Aiming at the optimization of public sports service quality, this study analyzes the public sports service data deeply by constructing a supervised learning model. Firstly, the theoretical framework of this study is established. Secondly, the technical framework is constructed based on the supervised learning model. Finally, the comprehensive performance of the model is evaluated using a dataset and practical application. The results show that when the model is used to process public sports service data, its performance is excellent. Specifically, the model’s accuracy and recall in processing various types of data markedly exceed expectations, with the accuracy reaching more than 88% and the recall remaining at a similarly high level. This remarkable result not only validates the supervised learning model’s practicability in the quality optimization of public sports services but also highlights its huge application potential and value. In addition, the possibility and challenge of the model in practical application are also discussed, which provides a useful reference for further improving the quality of public sports service. The findings of this study enrich the research methods in the field of public sports services and offer a scientific basis for relevant decision-making, which helps promote the continuous optimization and development of public sports services.
Article
Full-text available
To understand and measure political information consumption in the high-choice media environment, we need new methods to trace individual interactions with online content and novel techniques to analyse and detect politics-related information. In this paper, we report the results of a comparative analysis of the performance of automated content analysis techniques for detecting political content in the German language across different platforms. Using three validation datasets, we compare the performance of three groups of detection techniques relying on dictionaries, classic supervised machine learning, and deep learning. We also examine the impact of different modes of data preprocessing on the low-cost implementations of these techniques using a large set (n = 66) of models. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by deep learning- and classic machine learning-based models, in contrast to the more robust performance of dictionary-based models on noisy data.
Article
Full-text available
A sizable literature finds evidence of public responsiveness to policy change, across a range of salient policy domains and countries. We have a very limited sense for what drives this aggregate-level responsiveness, however. One possibility is that individuals learn at least part of what they need to know from mass media. Work tends to emphasize failures in both media coverage and citizens, but little research explores the prevalence of relevant, accurate information in media content, or citizens' abilities to identify and respond to that information. Using the case of defense spending in the United States, we examine both, through an automated content analysis of 35 years of reporting, validated by a coding exercise fielded to survey respondents. Results prompt analyses of the American National Election Study (ANES), tracing both individual-level perceptions of and preferences for defense spending change over time. These results, supplemented by aggregate analyses of the General Social Survey (GSS), illustrate how media might facilitate-but also confuse-public responsiveness.
Article
Full-text available
quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations, it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multithreading extensively, quanteda is also considerably faster and more efficient than other R and Python ackages in processing large textual data. The package is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.
Article
Full-text available
en The average citizen often does not experience government policy directly, but learns about it from the mass media. The nature of media coverage of public policy is thus of real importance, for both public opinion and policy itself. It nevertheless is the case that scholars of public policy and political communication have invested rather little time in developing methods to track public policy coverage in media content. The lack of attention is all the more striking in an era in which media coverage is readily available in digital form. This paper offers a proposal for tracking coverage of the actual direction of policy change in mass media. It begins with some methodological considerations, and then draws on an expository case—defense spending in the United States—to assess the effectiveness of our automated content‐analytic methods. Results speak to the quantity and quality in media coverage of policy issues, and the potential role of mass media—to both inform and mislead—in modern representative democracy. Abstract zh 普通公民通常不会直接感受到政府的政策,而是间接地从大众媒体中进行了解。因此,媒体对公共政策的报道对于公众舆论和政策本身都具有重要意义。然而,公共政策学和政治传播学的学者却投入相当少的时间去发展研究方法来追踪媒体内容中的公共政策。在媒体报道以数字形式呈现的时代,缺乏对此的关注更加令人惊讶。本文提出一个方案来追踪大众媒体对政策变化实际方向所进行的报道。本文首先从一些对方法论的思考开始,然后利用一个说明性案例——美国的国防开支,来评估我们的自动化内容分析方法的有效性。我们的结果说明了媒体对政策问题进行的报道的数量和质量,以及大众传媒在现代代议制民主中的潜在作用——既是信息提供者又是信息误导者。
Article
Full-text available
Text has always been an important data source in political science. What has changed in recent years is the feasibility of investigating large amounts of text quantitatively. The internet provides political scientists with more data than their mentors could have imagined, and the research community is providing accessible text analysis software packages, along with training and support. As a result, text-as-data research is becoming mainstream in political science. Scholars are tapping new data sources, they are employing more diverse methods, and they are becoming critical consumers of findings based on those methods. In this article, we first describe the four stages of a typical text-as-data project. We then review recent political science applications and explore one important methodological challenge—topic model instability—in greater detail.
Article
Full-text available
Objective: Work on economic news argues that US coverage focuses primarily on changes rather than levels of future economic conditions; it also both affects and reflects public economic sentiment. Given that economic perceptions are related to policy preferences and government support, this is of consequence for politics. This paper explores the generalizability of these findings. Methods: Using nearly 100,000 stories over 30 years in the US, UK, and Canada, we compare media tone, public opinion and economic conditions. Result: Results demonstrate that media tone and public opinion follow future economic change in all three countries. Media and opinion are also related, but the effect mostly runs from the public to the media, not the other way around. Conclusion: These results confirm the generalizability of prior findings, and the importance of considering more than a simple uni-directional link between media coverage and public economic sentiment.
Article
Full-text available
Social scientists have long hand-labeled texts to create datasets useful for studying topics from congressional policymaking to media reporting. Many social scientists have begun to incorporate machine learning into their toolkits. RTextTools was designed to make machine learning accessible by providing a start-to-finish product in less than 10 steps. After installing RTextTools, the initial step is to generate a document term matrix. Second, a container object is created, which holds all the objects needed for further analysis. Third, users can use up to nine algorithms to train their data. Fourth, the data are classified. Fifth, the classification is summarized. Sixth, functions are available for performance evaluation. Seventh, ensemble agreement is conducted. Eighth, users can cross-validate their data. Finally, users write their data to a spreadsheet, allowing for further manual coding if required.
Article
Full-text available
The responsiveness of government to the preferences of its citizens is considered to be an important indicator of the performance of advanced democracy. This article argues that the thermostatic model of policy/opinion responsiveness can be represented in the form of an error-correction model where policy and public opinion variables are cointegrated, and extends the focus of investigation to government outputs. This models the short-run and long-run equilibrium of interactions between public opinion and policy/bureaucratic outputs. The article assesses the performance of British government – and, in particular, the Immigration and Nationality Directorate of the Home Office – in the operation of border controls and administration of claims for asylum, for the period between 1994 and 2007.
Article
Full-text available
The representation of public preferences in public policy is fundamental to most conceptions of democracy. If representation is effectively undertaken, we would expect to find a correspondence between public preferences for policy and policy itself. If representation is dynamic, policy makers should respond to changes in preferences over time. The integrity of the representational connection, however, rests fundamentally on the expectation that the public actually notices and responds to policy decisions. Such a public would adjust its preferences for 'more' or 'less' policy in response to what policy makers actually do, much like a thermostat. Despite its apparent importance, there is little research that systematically addresses this feedback of policy on preferences over time. Quite simply, we do not know whether the public adjusts its preferences for policy in response to what policy makers do. By implication, we do not fully understand the dynamics of representation. This research begins to address these issues and focuses on the relationships between public preferences and policy in a single, salient domain.
Article
Full-text available
This article develops a model of public responsiveness to social policy in the United States, focusing in particular on the public’s ability to distinguish between direct and indirect government spending as means of financing social benefits. We argue that public opinion should be responsive to changes in both direct (appropriations) and indirect (tax expenditures encouraging the private provision of social goals) spending. Further, the public should respond to changes in direct and indirect spending in distinct ways consistent with the divergent resource and interpretive effects of the two types of spending. We find that while public opinion is not responsive to the total amount of federal social spending, it is attentive to changes in direct and indirect spending, considered as separate concepts. The results show that the electorate treats changes in the relative allocation of government spending as representing important shifts in the ideological direction of public policy.
Article
Full-text available
Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.
Article
Full-text available
In choosing and displaying news, editors, newsroom staff, and broadcasters play an important part in shaping political reality. Readers learn not only about a given issue, but also how much importance to attach to that issue from the amount of information in a news story and its position. In reflecting what candidates are saying during a campaign, the mass media may well determine the important issues--that is, the media may set the "agenda." of the campaign.
Article
en Policy feedback refers to the variety of ways in which existing policies can shape key aspects of politics and policymaking. Originating in historical institutionalism, the study of policy feedback has expanded to address resource and interpretative effects on target populations and mass publics, the roles of policy elites, and how feedback effects are conditioned by policy designs and larger institutional contexts. Recently, more attention has also been paid to feedback effects that are not self‐reinforcing in nature. This introduction provides a nonexhaustive review of the existing historical institutionalist literature on policy feedback as well as introducing the contributions to the special issue. The diversity of policy feedback scholarship is reflected in manuscripts building from the social constructions framework and the thermostatic model. Advances in research are captured in contributions empirically testing different forms of feedback, varied strategies of policy elites to shape feedback, and how context may suppress feedback effects. The special issue emphasizes critical, understudied dimensions shaping feedback processes, such as race, and the role of organizations. A major lacuna in policy feedback scholarship, the overwhelming emphasis on the United States, must be addressed through more comparative and international research including closer dialogs between U.S.‐based and non‐U.S. scholars. Abstract zh 政策反馈是指既有政策塑造政治和决策的关键方面的各种方式。源于历史制度主义,政策反馈方面的研究已经拓展到如下几个方面:针对目标人群和普通民众的资源效应和解释效应,政策精英所发挥的作用,以及政策设计和更大的制度背景是如何制约反馈效应的。近来,人们也越来越关注本质上并非自我强化的反馈效应。本绪论对现有的关于政策反馈的历史制度主义文献进行了非详尽的回顾,并介绍了这些文献对该特刊的贡献。政策反馈文献的多样性体现在这些建立在社会建构框架和“温度调节器模型”的文章中。通过实证检验政策反馈的不同形式,政策精英塑造反馈的不同策略,以及其背景因素如何抑制反馈效应,这些文章推动了政策反馈的研究进展。该特刊强调了影响反馈过程的几个重要却仍研究不足的方面,如种族和组织的作用。我们需要注意到政策反馈文献中的一个重大缺陷,即对美国环境的过于重视,这必须通过更多的国际比较研究来解决,如加强美国和非美国学者之间更密切的对话。
Article
Content analysis of large-scale textual data sets poses myriad problems, particularly when researchers seek to analyze content that is both theoretically derived and context dependent. In this piece, we detail the approach we developed to tackle the analysis of the context-dependent content of political incivility. After describing our manually validated organic dictionaries approach, we compare the method to others we could have used and then replicate the method in a different—but still context-dependent—project examining political issue content on social media. We conclude by summarizing the strengths and weaknesses of the approach and offering suggestions for future research that can refine and expand the method.
Article
Political scientists often find themselves analyzing data sets with a large number of observations, a large number of variables, or both. Yet, traditional statistical techniques fail to take full advantage of the opportunities inherent in “big data,” as they are too rigid to recover nonlinearities and do not facilitate the easy exploration of interactions in high‐dimensional data sets. In this article, we introduce a family of tree‐based nonparametric techniques that may, in some circumstances, be more appropriate than traditional methods for confronting these data challenges. In particular, tree models are very effective for detecting nonlinearities and interactions, even in data sets with many (potentially irrelevant) covariates. We introduce the basic logic of tree‐based models, provide an overview of the most prominent methods in the literature, and conduct three analyses that illustrate how the methods can be implemented while highlighting both their advantages and limitations.
Article
Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.
Article
Economic perceptions affect policy preferences and government support. It thus matters that these perceptions are driven by factors other than the economy, including media coverage. We nevertheless know little about how media reflect economic trends, and whether they influence (or are influenced by) public economic perceptions. This article explores the economy, media, and public opinion, focusing in particular on whether media coverage and the public react to changes in or levels of economic activity, and the past, present, or future economy. Analyses rely on content-analytic data drawn from 30,000 news stories over 30 years in the United States. Results indicate that coverage reflects change in the future economy, and that this both influences and is influenced by public evaluations. These patterns make more understandable the somewhat surprising finding of positive coverage and public assessments in the midst of the Great Recession. They also may help explain previous findings in political behavior.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
Collection and especially analysis of open-ended survey responses are relatively rare in the discipline and when conducted are almost exclusively done through human coding. We present an alternative, semiautomated approach, the structural topic model (STM) (Roberts, Stewart, and Airoldi 2013; Roberts et al. 2013), that draws on recent developments in machine learning based analysis of textual data. A crucial contribution of the method is that it incorporates information about the document, such as the author's gender, political affiliation, and treatment assignment (if an experimental study). This article focuses on how the STM is helpful for survey researchers and experimentalists. The STM makes analyzing open-ended responses easier, more revealing, and capable of being used to estimate treatment effects. We illustrate these innovations with analysis of text from surveys and experiments.
Article
Inquiry into the origins of partisan polarization has generally treated polarization as a simple, symmetric phenomenon—the degree to which the worldviews of the mass Democratic and Republican parties have or have not diverged from one another. In this article, we disaggregate polarization into its constituent parts, the dynamic preferences of the mass Democratic and Republican Parties. This approach allows for the possibility that intraparty dynamics may influence interparty differences and for the integration of studies of polarization with literatures addressing other dynamics in aggregate public opinion. Building on individual-level research on partisan identities and macrolevel research on public mood, we argue that party polarization may be catalyzed, in part, by the mass parties’ differential responsiveness to changes in the macro political-economic context. We find support for this position, showing asymmetries in the dynamics of polarization that are associated with differential partisan responsiveness to domestic policy choices.
Article
There is nowsubstantial evidence that defense spending decisions in the United States are influenced by citizen preferences. However, there is little time-series evidence for countries other than the United States. Regression models of citizen responsiveness and opinion representation in the politics of defense spending in five democracies are estimated. Results showthat public opinion in all five countries is systematically responsive to recent changes in defense spending, and the form of the responses across countries uniformly resembles the “thermostat” metaphor developed by Wlezien and the more general theory of opinion dynamics developed by Stimson. Findings showalso that defense budgeting is representative: public support for defense spending is the most consistently significant influence on defense budgeting change in four countries; thus, a parsimonious theory of comparative policy representation is potentially within reach. The implications of the results for defense spending in the NATO alliance and the European Union are discussed.
Article
Theory: Democratic accountability requires that the public be reasonably well-informed about what policymakers actually do. Such a public would adjust its preferences for ''more'' or ''less'' policy in response to policy outputs themselves. In effect, the public would behave like a thermostat; when the actual policy ''temperature'' differs from the preferred policy temperature, the public would send a signal to adjust policy accordingly, and once sufficiently adjusted, the signal would stop. Hypotheses: In domains where policy is clearly defined and salient to the public, changes in the public's preferences for more policy activity are negatively related to changes in policy. Methods: A thermostatic model of American public preferences for spending on defense and a set of five social programs is developed and then tested using time series regression analysis. Results: Changes in public preferences for more spending reflect changes in both the preferred levels of spending and spending decisions themselves. Most importantly, changes in preferences are negatively related to spending decisions, whereby the public adjusts its preferences for more spending downward (upward) when appropriations increase (decrease). Thus, consistent with the Eastonian model, policy outputs do ''feed back'' on public inputs, at least in the defense spending domain and across a set of social spending domains.
Book
This book develops and tests a “thermostatic” model of public opinion and policy, in which preferences for policy both drive and adjust to changes in policy. The representation of opinion in policy is central to democratic theory and everyday politics. So too is the extent to which public preferences are informed and responsive to changes in policy. The coexistence of both “public responsiveness” and “policy representation” is thus a defining characteristic of successful democratic governance, and the subject of this book. The authors examine both responsiveness and representation across a range of policy domains in the United States, the United Kingdom, and Canada. The story that emerges is one in which representative democratic government functions surprisingly well, though there are important differences in the details. Variations in public responsiveness and policy representation responsiveness are found to reflect the “salience” of the different domains and governing institutions – specifically, presidentialism (versus parliamentarism) and federalism (versus unitary government).
Article
The link between public opinion and policy is of special importance in representative democracies. Policymakers' responsiveness to public opinion is critical. Public responsiveness to policy itself is as well. Only a small number of studies compare either policy or public responsiveness across political systems, however. Previous research has focused on a handful of countries - mostly the US, UK and Canada - that share similar cultures and electoral systems. It remains, then, for scholars to assess the opinion-policy connection across a broad range of contexts. This paper takes a first step in this direction, drawing on data from two sources: (1) public preferences for spending from the International Social Survey Program (ISSP) and (2) measures of government spending from OECD spending datasets. These data permit a panel analysis of 17 countries. The article tests theories about the effects of federalism, executive-legislative imbalance, and the proportionality of electoral systems. The results provide evidence of the robustness of the 'thermostatic' model of opinion and policy but also the importance of political institutions as moderators of the connections between them.
Article
An increasing number of studies in political communication focus on the “sentiment” or “tone” of news content, political speeches, or advertisements. This growing interest in measuring sentiment coincides with a dramatic increase in the volume of digitized information. Computer automation has a great deal of potential in this new media environment. The objective here is to outline and validate a new automated measurement instrument for sentiment analysis in political texts. Our instrument uses a dictionary-based approach consisting of a simple word count of the frequency of keywords in a text from a predefined dictionary. The design of the freely available Lexicoder Sentiment Dictionary (LSD) is discussed in detail here. The dictionary is tested against a body of human-coded news content, and the resulting codes are also compared to results from nine existing content-analytic dictionaries. Analyses suggest that the LSD produces results that are more systematically related to human coding than are results based on the other available dictionaries. The LSD is thus a useful starting point for a revived discussion about dictionary construction and validation in sentiment analysis for political communication.
Article
This article addresses current methodological research on non-parametric Random Forests. It provides a brief intellectual history of Ran-dom Forests that covers CART, boosting and bagging methods. It then introduces the primary methods by which researchers can visualize results, the relationships between covariates and responses, and the out-of-bag test set error. In addition, the article considers current research on universal consistency and importance tests in Random Forests. Finally, several uses for Random Forests are discussed, and available software is identified. This article addresses current methodological research on non-parametric Ran-dom Forests [14]. Random Forests are ensembles of trees grown from boot-strapped training data. For classification, the trees are combined using majority voting with one vote per tree over all the trees in the forest. For regression, forests are created by averaging over trees. Scholars tend to agree that non-parametric ensemble methods, or 'committee methods', such as Random Forests can offer significant improvements over any single classifier or regression tree [28, 29][38, p. 251]. In constructing the ensemble, Random Forests use two types of randomness. First, in growing any given tree, a random sample of predictors is selected at each node in choosing the best split. A further layer of randomness is added by using a random sample of observations for growing each tree in the first place. In theory, using a random sample of observations and selecting random predictors at each node should reduce dependence between covariates and thus between the resulting trees [14, p. 10-11][5].
Article
The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis. Government Version of Record
Article
In this innovative account of the way policy issues rise and fall on the national agenda—the first detailed study of so many issues over an extended period—Frank R. Baumgartner and Bryan D. Jones show that rapid change not only can but does happen in the hidebound institutions of government. Short-term, single-issue analyses of public policy, the authors contend, give a narrow and distorted view of public policy as the result of a cozy arrangement between politicians, interest groups, and the media. Baumgartner and Jones upset these notions by focusing on several issues—including civilian nuclear power, urban affairs, smoking, and auto safety—over a much longer period of time to reveal patterns of stability alternating with bursts of rapid, unpredictable change. A welcome corrective to conventional political wisdom, Agendas and Instability revises our understanding of the dynamics of agenda-setting and clarifies a subject at the very center of the study of American politics.
The media frames corpus: Annotations of frames across issues
  • D Card
  • A E Boydstun
  • J H Gross
  • P Resnik
  • N A Smith
Card, D., Boydstun, A. E., Gross, J. H., Resnik, P., & Smith, N. A. (2015, July 26-31). The media frames corpus: Annotations of frames across issues. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers) (pp. 438-444).
Classifier technology and the illusion of progress
  • D J Hand
Hand, D. J. (2006). Classifier technology and the illusion of progress. Statistical Science, 21(4), 1-14. https://doi.org/10.1214/088342306000000349
DICTION 5.0: The text analysis program
  • R P Hart
Hart, R. P. (2000). DICTION 5.0: The text analysis program. Sage-Scolari.
The general inquirer: A computer approach to content analysis
  • P J Stone
  • D C Dumphy
  • D M Ogilvie
Stone, P. J., Dumphy, D. C., & Ogilvie, D. M. (1966). The general inquirer: A computer approach to content analysis. MIT Press.