Content uploaded by Christopher Bail
Author content
All content in this area was uploaded by Christopher Bail on Mar 24, 2015
Content may be subject to copyright.
The cultural environment: measuring culture with big
data
Christopher A. Bail
#Springer Science+Business Media Dordrecht 2014
Abstract The rise of the Internet, social media, and digitized historical archives has
produced a colossal amount of text-based data in recent years. While computer scientists
have produced powerful new tools for automated analyses of such “big data,”they lack the
theoretical direction necessary to extract meaning from them. Meanwhile, cultural sociolo-
gists have produced sophisticated theories of the social origins of meaning, but lack the
methodological capacity to explore them beyond micro-levels of analysis. I propose a
synthesis of these two fields that adjoins conventional qualitative methods and new
techniques for automated analysis of large amounts of text in iterative fashion. First, I
explain how automated text extraction methods may be used to map the contours of cultural
environments. Second, I discuss the potential of automated text-classification methods to
classify different types of culture such as frames, schema, or symbolic boundaries. Finally, I
explain how these new tools can be combined with conventional qualitative methods to trace
the evolution of such cultural elements over time. While my assessment of the integration of
big data and cultural sociology is optimistic, my conclusion highlights several challenges in
implementing this agenda. These include a lack of information about the social context in
which texts are produced, the construction of reliable coding schemes that can be automated
algorithmically, and the relatively high entry costs for cultural sociologists who wish to
develop the technical expertise currently necessary to work with big data.
Keywords Culture .Content analysis .Mixed-methods .Evolutionary theory
More data were accumulated in 2002 than all previous years of human history
combined.
1
By 2011, the amount of data collected prior to 2002 was being collected
every 2 days.
2
This dramatic growth in data spans nearly every part of our lives—from
Theor Soc
DOI 10.1007/s11186-014-9216-5
1
International Data Corporation, “The 2011 Digital Universe Study: Extracting Value from Chaos,”June,
2011. See also Christopher R. Johnson, “How Big is Big Data?”Lecture at the University of Michigan’s
Cyber-Infrastructure Conference, November 7th, 2012.
2
Ibid.
C. A. Bail (*)
University of North Carolina at Chapel Hill, 225 Hamilton Hall, Chapel Hill, NC 27599, USA
e-mail: christopherandrewbail@gmail.com
gene sequencing to consumer behavior.
3
While most of these data are binary or
quantitative, text-based data are also being accumulated on an unprecedented scale. In
an era of social science research plagued by declining survey response rates and concerns
about the generalizability of qualitative research, these data hold considerable potential
(Golder and Macy 2011;King2011; Lazer et al. 2009). Yet social scientists—and cultural
sociologists in particular—have largely ignored the promise of so-called “big data.”
Instead, cultural sociologists have left this wellspring of information about the arguments,
worldviews, or values of hundreds of millions of people from Internet sites and other
digitized texts to computer scientists who possess the technological expertise to extract
and manage such data but lack the theoretical direction to interpret their meaning.
The most obvious explosion in text-based data coincided with the rise of the
Internet. Between 1995 and 2008 the number of websites expanded by a factor of
more than 66 million, recently surpassing 1 trillion.
4
Although sociologists were
understandably concerned about digital divides in years past, these inequalities appear
to be steadily decreasing (DiMaggio and Bonikowski 2008;Dimaggioetal.2001).
According to a 2012 survey, roughly half of all Americans visit a social media site such
as Facebook or Twitter each day, producing billions of lines of text in so doing.
5
These
trends are markedly higher among younger people, suggesting these trends may only
continue to grow over time.
6
Most of the text from social media sites is readily
accessible via simple computer programs.
7
Yet the outgrowth in text-based data on
the Internet is not limited to social media sites. Screen-scraping technologies can be
used to extract information from any number of Internet sites within time frames that
are only limited by digital storage capacity.
8
And the potential to collect such data is not
only tied to the future, but also the past. Since 1996, a non-profit organization known as
the Internet Archive has been storing all text from nearly every website on the Internet.
The outgrowth of text-based data is also not confined to the Internet. Thanks to new
digital technologies from fields as diverse as library science and communications, an
unprecedented amount of qualitative data is being archived. Google alone has already
created digital copies of nearly every single book ever written in collaboration with more
than 19 million libraries worldwide.
9
Academic data warehouses such as LEXIS-NEXIS or
ProQuest now contain digital copies of most of the world’s journals, newspapers, and
magazines. The Vanderbilt Television News Archive contains copies of most major
3
The US National Science Foundation invested more than $15 million in Big Data projects in 2012, and will
easily surpass this amount in upcoming years due to the development of new infrastructure for funding big
data projects in collaboration with Britain’s Economic & Social Research Council, the Netherlands Organi-
zation for Scientific Research, and the Canada Foundation for Innovation, among many others.
4
Jesse Alpert and Nissan Hajaj, “We knew the web was big…” Official Google Blog, July 25th, 2008 (http://
googleblog.blogspot.com/2008/07/we-knew-web-was-big.html accessed January 2012).
5
Pew Internet & American Life Project, February 1st, 2012.
6
“Social Networking Popular Across Globe,”Pew Research Global Attitudes Project, December 12, 2012.
7
Moreover, the US Library of Congress recently announced plans to release a database of every single Twitter
message ever made. Current estimates place the total number of tweets that might be archived at more than
170 billion.
8
Web-scraping technologies have facilitated the collection of remarkably large datasets. Golder and Macy
(2011), for example, recently conducted a study of more than 500 million Twitter messages produced in more
than 84 countries over a 2 year period.
9
Though access to the entire Google book archive is limited by pay walls designed to protect copyright
privileges, Google has released the entire dataset in “ngram”format, which allows scholars to analyze them
via the automated text analysis tools discussed in further detail below.
Theor Soc
newscasts produced since 1998. An unprecedented amount of text-based data that describe
legislative debates, government reports, and other state discourse is also now available on
websites such as the National Archives of the United States and Great Britain. Qualitative
academic research is also being compiled within “meta-data”archives on an unprecedented
scale—from in-depth interview data to field notes.
10
Continuing improvement in digital
speech recognition technologies has also facilitated even more text-based data from histor-
ical audio sources to local town hall meetings that are recorded and uploaded to websites for
posterity. Indeed, the remarkable growth in text-based data warrants a brief thought exper-
iment: what types of text or speech-based data are not currently being archived?
If the answer is little—or very little—text is not being archived, then cultural sociology
must have a reckoning with big data alongside those in other fields.
11
Political scientists
are currently exploring the potential of social media to explain political mobilization
(Hopkins and King 2010; Livne et al. 2011). Public health scholars use Twitter to identify
trends in disease (Paul and Dredze 2011), and communications scholars claim it can be
used to predict shifts within the stock market (Bollen et al. 2011). Even humanities
scholars have invented the vibrant new field of digital humanities (e.g. Gold 2012;
Moretti 2013; Tangherlini and Leonard 2013). By comparison, cultural sociologists have
made very few ventures into the universe of big data even thoughtexts are a central object
of study in the field–in the form of primary documents, interview transcriptions, or field
notes.
12
In this article, I argue inattention to big data among cultural sociologists is doubly
surprising since it is naturally occurring—unlike survey research or cross-sectional
qualitative interviews—and therefore critical to understanding the evolution of meaning
structures in situ. That is, many archived texts are the product of conversations between
individuals, groups, or organizations instead of responses to questions created by re-
searchers who usually have only post-hoc intuition about the relevant factors in meaning-
making—much less how cultural evolves in “real time.”
13
For all the promise of big data for cultural sociology, formidable obstacles remain.
First of all, the sheer volume of data can be overwhelming. Large corpora cannot be
coded by hand, and automated data mining techniques are of little utility if they are not
guided by theory. Second, big data is untidy. Although computer-assisted data classi-
fication and data reduction techniques have improved in the past decade, much big data
analysis remains computationally intensive and therefore out of reach for many cultural
sociologists—particularly those without any background in statistics or computer
programming. Third—and perhaps most importantly—there is much that is of interest
to cultural sociologists that is not easily reducible to text. The greatest challenge for
cultural sociologists interested in big data is to develop new techniques to measure the
unspoken or implicit meanings that occur in-between words. The preconscious cultural
scripts or frames that shape how people understand the world (e.g., DiMaggio 1997),
10
See, for example, the Dataverse Network, the Interdisciplinary Consortium for Political and Social
Research, and the United Kingdom’s Qualidata archive.
11
The neologism “big data”has come to refer to many different types of data. Here, I use the term to refer to
the increasingly large volume of text-based data that is often-though not always- produced through digital
sources. As the remainder of this manuscript describes, these data are also unique because they are “naturally
occurring,”unlike survey data which result from the intrusion of researchers into everyday life.
12
Exceptions described in additional detail below include Franzosi (2004), Lewis et al. (2008), Bail (2012),
Bail (forthcoming) and several other works in progress.
13
“Real time”refers to the collection, presentation, or analysis of data at or very near the time it is being
produced by social actors.
Theor Soc
for example, are not always manifest in speech or text. Similarly, most big data eschews
the production of meaning through bodily interaction (e.g., Eliasoph and Lichterman
2003)—though the future of big data may include new techniques to analyze the ever-
increasing volume of video on the Internet (Collins 2013; Lan and Raptis 2013).
This article does not offer solutions to each of these limitations of big data. Instead, it
provides a critical survey of recent developments within the big data movement and
links them to outstanding theoretical debates and measurement challenges within
cultural sociology. These include the measurement of cultural environments or meaning
systems such as discursive fields; the classification of cultural elements such as frames
or schema within such systems; and tracing cultural processes over long segments of
time. In describing the promise of big data for cultural sociology, I also detail how the
latter field may address some of the most vexing challenges of the former given its
foundational interest in the systematic study of meaning. I provide only limited
discussion of the technical and logistical issues that arise in working with big data
because these issues are currently being addressed within separate literatures referenced
below.
14
I also do not review the promising field of quantitative narrative analysis
because it has been addressed elsewhere.
15
This article is thus an invitation to cultural
sociologists curious about the potential of big data and a call to shatter the disciplinary
silos that inhibit collaboration between this field and those who lead the big data
movement.
Mapping cultural environments
By and large, the central objects of study in cultural sociology have been confined to
micro-levels of analysis. For example, cultural elements such as symbolic boundaries
(e.g., Lamont 1992), cultural toolkits (e.g., Swidler 1986), cognitive schemas (e.g.,
DiMaggio 1997), and cultural frames (e.g., Benford and Snow 2003) have been defined
as judgments, classifications, or pre-conscious decisions that can only be measured
through close readings of texts such as interview transcripts, content analysis of key
texts, or ethnographic field notes. Yet as Swidler (1995)argues,“the greatest unan-
swered question in the sociology of culture is whether and how some cultural elements
control, anchor, or organize others.”
16
For example, how are cultural frames ordered
within vast discursive fields? Is there a space between such fields? How do cultural
frames shape the evolution of fields more broadly? Addressing such questions requires
meso- and macro-level analysis of the relationship between multiple cultural elements
or systems of meaning. One of the most promising dimensions of the big data
movement for cultural sociology is to enable new analyses at these larger levels of
analysis. As I describe below, one can now obtain every website, blog, social media
message, newspaper article, or television transcript on a given topic fairly easily.
The capacity to capture all—or nearly all—relevant text on a given topic opens
exciting new lines of meso- and macro-level inquiry into what I call cultural
14
For a technical overview of techniques designed for analysis of Big Data, see Manning and Schuetze
(1999).
15
For an overview, see Franzosi (2009).
16
See also Ghaziani and Baldassarri (2011).
Theor Soc
environments (Bail forthcoming). Ecological or functionalist interpretations of culture
have been unpopular with cultural sociologists for some time—most likely because the
subfield defined itself as an alternative to the general theory proposed by Talcott Parsons
(Alexander 2006). Yet many cultural sociologists also draw inspiration from Mary
Douglas (e.g., Alexander 2006;Lamont1992; Zelizer 1985), who—like Swidler—insists
upon the need for our subfield to engage broader levels of analysis. “For sociology to
accept that no functionalist arguments work,”writes Douglas (1986,p.43),“is like cutting
off one’s nose to spite one’s face.”To be fair, cultural sociologists have recently made
several programmatic statements about the need to engage functional or ecological
theories of culture. Abbott (1995), for example, explains the formation of boundaries
between professional fields as the result of an evolutionary process. Similarly, Lieberson
(2000), presents an ecological model of fashion trends in child-naming practices. In a
review essay, Kaufman (2004) describes such ecological approaches to cultural sociology
as one of the three most promising directions for the future of the subfield.
17
The concept of discursive fields is perhaps the most promising theoretical construct
to advance an ecological approach to cultural sociology (Bourdieu 1975; Foucault
1970;Martin2003; Wuthnow 1993). Yet field theory is often castigated for being
tautological, or assuming the existence of invisible or intangible social forces that
reproduce structures of inequality or patterns of cultural differentiation without ever
directly observing them. The boundaries of fields are usually unobserved in empirical
studies because of the considerable methodological obstacles involved. Apart from
Eyal (2009), cultural sociologists have scarcely theorized the outer limits of cultural
fields, the spaces between them, or the relationships among multiple fields.
18
This is a
significant limitation since most field theory makes several assumptions that are
inherently ecological. For example, many studies assume that relationships between
actors or groups of actors within a field produce a polarity that sustains or reproduces
uneven power relationships or access to institutions (Bourdieu 1985; Fligstein and
McAdam 2011; Wuthnow 1993). Others borrow more directly from ecological or
evolutionary theory to explain the competition for attention or resources within fields
(Abbott 2001;Kaufman2004;Lieberson2000), or the ability of cultural entrepreneurs
to exploit niches within such environments (e.g., Mische 2008).
Despite the implicit ecological reasoning of field theory, most applications of this
framework rely upon micro- or meso-level measurement strategies. For example, many
studies identify key actors or institutions within fields and trace their influence over
other parts of the fields. Other studies focus upon conflict or “classification struggles”
within fields in order to identify such influential actors (Bourdieu 1990). As a result,
these types of studies only observe the consequences of field-level processes rather than
meso- or macro-level relationships between social actors and cultural elements that
most scholars believe create such social spaces.
19
These micro-level measurement
strategies are typically necessary because most discursive fields are so broad that an
entire team of researchers working for several years could only map a fraction of all the
texts, transcripts, or archives that define them. The size of most cultural fields has
become even more daunting with the rise of the Internet. Indeed, a researcher could
17
See also Mark (2003).
18
One exception is Evans and Kay’s(2008) study of field overlap.
19
Exceptions include Mohr and Guerra-Pearson (2010)andBail(2012).
Theor Soc
easily follow links between websites for hours only to forget where, when, or why they
shifted focus from one site to another.
The big data movement has made extracting all text from a discursive field easier
than ever before.
20
Massive databases already exist that classify texts into meaningful
social categories. For example, services such as LEXIS-NEXIS and Pro-Quest have
sophisticated searchable indexes that cover industries, geographical location, time, or
different types of text (e.g., newspapers, newswires, or television transcripts). Simple
Boolean operates such as “AND”and “OR”can be used to further specify meaningful
cultural environments within each of these sub-samples.
21
Yet perhaps the most
powerful innovation of the big data movement for the mapping of cultural environ-
ments has been screen scraping, or automated extraction of text from websites. Screen
scraping is typically used to mine text or other data from web pages, though it can also
be used to extract text from scanned images using Optical Character Recognition
(OCR) technologies. A variety of data archives have developed searchable indexes
based on such screen-scraping technologies. Google, for example, allows Boolean
searches of its archives of books, blogs, government documents, and major US
newspapers and magazines.
But new technologies produced by the big data movement have also advanced
automated extraction of text far beyond simple indexes, Boolean searches, and screen
scraping. In particular, new techniques have been developed to exploit the relational
nature of many sources of big data—particularly those from the Internet. For example,
Gong (2011) recently introduced new software that fuses snowball sampling methods
with screen-scraping technologies. The user simply inputs a starting website and a
classifying rule such as a Boolean search term or one of the other classification
algorithms described in further detail below. The software then visits each site that is
linked to the starting website and uses the classifying rule to decide whether it should
be included in the sample. If so, the program extracts all text from the site and repeats
the process of “spidering”links across multiple waves that are only constrained by
computer memory processing power. Given a number of different starting sites and a
sufficient number of waves, the Snowcrawl software produces a total sample of all
websites pertaining to a given topic. Although this tool is currently limited to the
Internet, a number of other qualitative data archives store relational data that could
potentially be analyzed using similar automated snowball methods. What is more, the
majority of newspapers, television stations, journals, or other texts of interest to cultural
sociologists are now available on the web.
A second promising tool for extracting large amounts of data from the web or
qualitative data archives are Application Programming Interfaces (APIs). These web-
based tools provide an interactive interface with large data archives that are designed to
enable targeted data extraction. They were developed primarily for consumer pur-
poses—such as the creation of third party applications for social media sites such as
20
While automated data extraction methods are particularly useful for mapping the contours of discursive
fields, it is important to note that such techniques do not capture the deeper preconscious cultural elements that
undergird social fields as Bourdieu and others have theorized them (e.g., Bourdieu 1990; Fligstein and
McAdam 2011;Martin2003). I return to the question of whether big data techniques can be leveraged to
classify such cultural elements in the following section as well as my discussion and conclusion.
21
For example, one might define a discursive field by identifying all texts with a certain set of keywords or
within a certain search index offered by text archives.
Theor Soc
Facebook, Twitter, or Google—but a number of academics have begun to use them as
data collection tools as well (Bail 2013a;GabyandCaren2012; Livne et al. 2011).
Even conventional media outlets such as the New York Times now offer APIs that enable
users to search and download articles or user comments from their website. APIs are
superior to other forms of data extraction not only because they enable more sophisti-
cated targeting of different types of text—such as Twitter messages about the Arab
Spring—but also because such sites typically record a vast array of information about
the users of their sites as well as their behavior online. For example, Twitter’sAPI
enables rapid extraction of information about the online social networks of individual
users. Facebook and Google’s API enable direct interface with its massive archives of
web content as well, but also includes information about the size, geographic location,
and demographic characteristics of the audiences of each site as well.
22
Classifying culture
Obtaining total or near total samples of text on a given topic is a remarkable feat given
that it was nearly unthinkable only a decade ago. Yet such giant samples are of little
utility if they cannot be classified in a meaningful manner. Cultural sociology has been
fascinated with classification since its inception because it was largely inspired by the
Durkheimian idea of classification struggles (e.g., Barth 1969; Bourdieu 1975;Douglas
1966;Latour1988). For example, Gieryn (1999) highlights the critical role of social
classification in the evolution of scientific fields. Lamont (1992,2000)explainshow
class and racial boundaries shape the process of group formation. Finally, Espeland and
Stevens (1998) make a broader argument about the key role of commensuration in
producing social power.
23
Yet for all the theoretical interest in the process of classifi-
cation, cultural sociologists seldom discuss the appropriate way to measure social
categories (Lamont and White 2009). Most studies either rely upon in-depth interviews
or case studies that highlight the social construction of ranking within institutions. The
lack of consensus about how to classify data has even prompted some critics to accuse
cultural sociologists of the reification of social classifications according to their theo-
retical persuasion (e.g., Biernacki 2012).
To date, cultural sociologists have scarcely explored the promise of automated text
analysis to classify texts.
24
Where these techniques have been used they have been relatively
primitive approaches to automation that simply identify keywords or phrases. This approach
is severely limited because it requires the researcher to have an a priori sense of which terms
are well suited to address the theoretical question of interest. Moreover, it eschews the
broader context of words within sentences. One solution to this problem is to evaluate the
co-prevalence of words within sentences using Global Regular Expression Print (GREP)
commands available in qualitative software analysis programs such as Atlas.TI or
22
Facebook’s API requires user-authentication to access these data. Therefore, one must either access only
publicly available data or obtain an authentication token from a Facebook page’s owner. Elsewhere, I argue
that app-based technologies are the most promising data collection tools to overcome such challenges. See
Bail (2013b).
23
For a recent review of this literature, see Lamont (2012).
24
Notable exceptions discussed in further detail below include Mohr (1998), Franzosi (2004), Bearman et al.
(1999), Bearman and Stovel (2000), Smith (2007), and Bail (2012).
Theor Soc
WordStat. Yet these approaches nevertheless fail to recognize important nuances in the
use of language. For example, a GREP search for sentences with the terms “President”
and “hate”would reveal both “I hate the President,”and “I’d hate to be President.”
Recent technological advances within the fields of computer science, pattern identifica-
tion, and linguistics have produced a variety of superior alternatives. I begin by reviewing
“unsupervised”text classification techniques that rely exclusively on computer algorithms to
create meaningful groupings of texts. For example, recent studies have invoked a number of
different forms of multi-dimensional scaling or cluster analysis to classify texts (e.g.,
Grimmer and King 2011; Livne et al. 2011).
25
These techniques replace each unique word
in a document with a number and then use various metrics to calculate dissimilarities among
all texts in the sample. These measures may be plotted within multidimensional space in
order to identify meaningful groupings of documents. A substantial problem with cluster
analysis is that the results are highly sensitive to the researcher’s assumptions about the
number of possible clusters (k), as well as the mathematical distances employed within each
algorithm. These idiosyncrasies can be controlled, however, if multiple forms of cluster
analysis are used in tandem. Grimmer and King (2011), for example, have developed
software that applies all existing variants of cluster analysis to large text corpora. They
apply this powerful tool to thousands of political texts by or about US presidents in order to
classify their ideological position on a range of substantive issues.
Another promising development within the big data movement for cultural sociologists is
the burgeoning field of “machine learning”and specifically the field of topic modeling. This
new field resulted from collaboration between linguists and computer scientists designed to
identify hidden or latent themes within large corpora.
26
Topic Models identify such themes
using probabilistic models that evaluate the co-occurrence of words. The most popular form
of topic modeling is Latent Dirichlet Allocation (LDA), which assumes a random allocation
of words across a latent theme or topic and then uses a generative process of classification to
analyze the probability of a document containing information about a topic given the
distribution of words therein.
27
Dozens of studies have used LDA or related Bayesian
approaches to infer latent topics in scientific journals, news articles, or blog posts (e.g., Blei
and Lafferty 2007; Hopkins and King 2010; Quinn et al. 2010). Despite these advances,
topic models have several considerable limitations. For example, the method assumes that
the order of words in a document does not matter, as well as the order of documents within
the broader sample. Most topic models also required that each document be assigned to
mutually exclusive categories, and do not recognize relationships between topics them-
selves. Basic topic models also do not recognize that topics may shift or combine over time.
Finally, topic models—not unlike cluster analysis—must be validated in order to verify the
appropriate number of topics within a corpus.
28
This is particularly difficult given that many
cultural sociologists are interested in analyzing broad, unstructured samples of text such as
those described in the previous section of this article.
25
Mohr (1998) made early calls for cultural sociologists to adopt these methods to classify meaning structures, yet
they were mostly ignored even as they become widely used by cognitive anthropologists (e.g., D’Andrade 1995).
26
For an overview of this field, see Blei (2012).
27
For a technical overview of LDA, see Blei et al. (2003).
28
A number of scholars have proposed validity measures for LDA, most recently Blei (2012). Most of these
emphasize comparisons of topic models via log-likelihoods or harmonic means, yet most proponents of topic
modeling agree that they must also be validated via qualitative inspection of individual topics within subsets of
large samples.
Theor Soc
Proponents of topic modeling have already begun to develop a number of solutions
to these limitations of this method, though they are too technical to discuss here.
29
Among the more promising recent developments in the field is the advent of “super-
vised”topic modeling (Blei and McAuliffe 2010). In this technique, a human coder
identifies topics within a subset of documents, and topic models use these assignments
to assess probability instead of assuming that the distribution of topics across docu-
ments is random. Supervised text classification was first introduced within social
science by Hopkins and King (2010), who used this approach to assess public opinion
of presidential candidates expressed upon thousands of political blogs during the 2008
election.
30
Given a sufficient number of “training documents”produced through in-
depth coding, these authors argue that their technique classifies sentiment about
presidential candidates more reliably than human coders themselves.
31
While such
claims have not yet been widely validated, supervised learning techniques hold con-
siderable promise for the purpose of identifying cultural elements within texts and
further improving the snowball sampling methods described above.
32
Perhaps the most important question for cultural sociologists interested in employing
topic models is whether they can be used to classify cultural elements such as frames,
symbolic boundaries, or cultural toolkits. A number of current studies suggest topic
models may be used to capture such nuanced cultural elements. For example, Dimaggio
et al. (forthcoming) argue topic models can be used to identify frames about arts
funding. Polletta is currently using topic modeling to identify hidden frames in Internet
discussions about cap-and-trade.
33
Hopkins (2013) employs topic models to measure
frames about the Affordable Care Act. Yet a key issue remains whether cultural
frames—as Goffman (1974) first defined them—can be represented by groups of
words. While the face-work that Goffman emphasized is clearly not measurable
through text, Goffman himself used texts extensively throughout his work, including
biographies, newspaper clippings, and transcripts of interactions.
34
Although Goffman
emphasized the absence of certain words as much as the presence of others—these
omissions could be modeled effectively because they would shape the probability
distributions around groups of words that LDA analyzes to create classifications of
texts. Nevertheless, the quality of supervised topic modeling is only as good as the
codes developed by human coders themselves. Therefore, cultural elements that are
highly nuanced or situation-based are not easily captured via this technique because of
low inter-coder reliability.
35
29
For example, see Blei and Lafferty (2006), Wallach (2006), Chang et al. (2009), and Hopkins and King
(2010).
30
See also Grimmer (2010) and Quinn et al. (2010).
31
In particular, Hopkins and King (2010) argue that coding more than 500 documents produces diminishing
returns in the reliability of automated text analysis.
32
For example, a supervised topic model can be used to determine whether websites should be included in a
directed web-crawl such as SnowCrawl to capture sites that discuss a theme or topic without using a single
key-word.
33
See Baumer et al. (2013).
34
Consider, for example, the diaries analyzed in Goffman (1963) or the newspaper clippings in Goffman
(1974). Also, textual descriptions of face-work or other unspoken forms of bodily interaction in the form of
field notes could potentially be analyzed using topic models.
35
For a discussion of the challenges of achieving high levels of inter-coder reliability in cultural analysis, see
Krippendorff (2003).
Theor Soc
On the other hand, the meticulous coding definitions required by topic models may also
provide an opportunity for cultural sociologists to contribute new methodologies to the big
data movement. Indeed, the use of generative and multi-stage coding schemes has been a
key concern of cultural sociology in the form of “thick description”(e.g., Geertz 1973),
“middle-range theory”(Merton 1949), “structural hermeneutics”(Alexander and Smith
2001), and “paradigmatic clusters”(Weber 2005). Each of these approaches emphasizes
that researches should move back and forth between different levels of analysis to tune their
coding schemes and to assess the scope conditions of a particular finding. To this end, the
expertise of cultural sociologists may be applied to repeated stages of supervised topic
models—elaborating classification systems as if they were Russian Dolls, to borrow
Bourdieu’s metaphor. Mohr et al. (2014), for example, have advanced this technique in
their study of US National Security Statements over a 22 year period. By developing
increasingly precise codes from iterative qualitative analysis of small sub-sets of this large
corpus of text, these scholars have developed increasingly promising topic models that can
later be applied to the entire sample. We need further empirical validation of such techniques.
At the very least, however, such methods provide a systematic method of focusing
qualitative microscopes within the increasingly overwhelming world of big data.
Tracing the evolution of cultural environments
One of the most promising elements of the big data movement is that so much of the
qualitative data that has been collected is longitudinal. For example, the Library of
Congress’s archive of all Twitter messages will enable unprecedented analysis of how
different issues rise and fall over time. The Internet Archive and screen-scraping
technologies could be used to map shifts in the discourses of different types of websites
over time. Likewise, the massive newspaper and television transcript archives now
available could be used to analyze similar issues over the past century. These longitu-
dinal data are particularly promising because so many of the most pressing questions in
cultural sociology concern change over time. While Swidler’s(1986) toolkit analogy
has received extensive attention in recent decades, for example, her call for future
studies to examine the transition from unsettled to settled historical periods has been
mostly ignored.
36
While Sewell’s(1996) theory of events has inspired considerable
interest, few studies place such events in broader historical context.
37
Finally, Lamont’s
(1992) work reveals considerable cross-national differences in the salience of symbolic
boundaries. Yet we urgently need broad historical analyses to identify how such
divergent meaning systems evolved over time. Each of these outstanding questions
requires methods capable of capturing broad-scale cultural change.
In addition to identifying cultural elements such as frames or symbolic boundaries,
automated text analysis can be used to differentiate social actors or key events within
36
But see Cerulo (1998), Wagner-Pacifici (2010), and Bail (2012).
37
Still, historical analyses with big data are limited by the availability of texts produced during this period that
were amenable to digitization. This presents a number of important limitations, including pervasive illiteracy
during early historical periods as well as the tendency for only elite accounts of historical events to survive the
passage of time. Still, comparative-historical sociologists face these problems regardless of whether they are
working with big data. Furthermore, primary documents obtained through archival analysis can be easily
digitized through photographs, scanning, and text-recognition technologies.
Theor Soc
large qualitative datasets.
38
Cultural sociologists can make huge strides towards
advancing theories of social change simply by mapping the relations among cultural
elements, actors, and events over time. The literature on quantitative narrative analysis
has already established how analysis of relationships between actors and events can be
used to map broad historical sequences (e.g., Bearman et al. 1999; Bearman and Stovel
2000; Franzosi 2004; Smith 2007). Incorporating cultural elements identified via topic
modeling into such methods would open exciting new lines of inquiry about the
interpenetration of culture and structure. If topic modeling can be used to identify
actors and organizations as well as the cultural elements they produce, for example,
social network relationships might be mapped onto cultural patterns—or vice versa. At
a minimum, mapping the relationships among cultural elements, actors, and events
would help focus in-depth qualitative analysis of key historical shifts or “turning
points”(Abbott 1997) where meaning structures change.
39
One problem, of course, is that cultural elements themselves often change through-
out such broad-scale historical transformations. Sewell, for example, argues the very
concept of revolution was developing at the same time that murderous mobs stormed
the Bastille—setting off the French revolution, before they knew precisely what they
were doing. Topic models are ill equipped to capture such nuances unless human
coders calibrate them repeatedly across multiple time periods. Even then, slight shifts in
cultural elements may be difficult to code automatically because human coders may
struggle to achieve high inter-coder reliability. Here again, new tools for automated text
may prove useful. For example, several new methods have been developed to identify
dissimilarities between pairs of documents. Primitive forms of these techniques simply
count the number of words shared between the two documents. Yet recent advances in
plagiarism detection software employ “word-maps”that utilize data from thesaurus in
order to identify “near”matches between two documents as well (e.g., Bail 2012).
Once again, these document comparison tools will not identify cultural elements by
themselves. Yet they may be particularly powerful when combined with topic models
and micro-level qualitative analysis of key texts or transitional moments within history.
Another major advantage of big data is that much of it includes detailed information
about relationships between social actors. This is particularly true of social media sites
such as Twitter or Facebook, but advances in library science are also creating hyper-
links between texts within archival collections as well. Using Twitter’sApplication
Programming Interface, once can easily extract not only all the messages produced by a
single actor, but also the precise location of this actor within a broader social network—
including measures of both “in”and “out”degree. Livne et al. (2011) for example,
extracted 460,000 tweets from all candidates for US House, Senate, and Gubernatorial
elections between 2006 and 2010. Their data not only reveals the partisan networks of
such social actors, but also patterns in the similarity of the language they post on Twitter
via cluster analysis. Through this analysis, Livne et al. document the meteoric rise of the
38
If key actors or events are already known, simple key word searches or Global Regular Expression Print
(GREP) commands may also be used to identify them. If actors or events are not known, they can be identified
through keyword counts that remove common words such as “the”or “and.”Once actors or events are
defined, topic models may be used to identify them as well. A number of computer scripts have also been
recently developed to identify names within big data without such intermediary steps such as the Natural
Language Toolkit and the Stanford Parser.
39
See also Sewell (1996) and Wagner-Pacifici (2010).
Theor Soc
Tea Party in recent elections, and the realignment of mainstream conservative networks
that ensued. These and other datasets could be used toaddress a number ofkey questions
at the intersection of cultural sociology and network theory. For instance, Pachucki and
Breiger’s(2010) argument about “cultural holes”within networks, or Vaisey and
Lizardo’s(2010) theory that cultural worldviews influence network composition.
40
The potential to assemble large datasets that describe cultural elements, actors,
events, and social networks over time may also encourage critical advances in field
theory. Most of the most pressing questions in this literature are about the evolution of
fields over time (Fligstein and McAdam 2011; Padgett and Powell 2012). For instance,
a number of recent studies have begun to analyze the emergence of fields (e.g.,
Armstrong 2002;Bartley2007). By and large, these case studies are unable to
investigate a variety of broad cultural processes that may occur between discursive
fields. For example, do most fields emerge out of the dissolution of others? Or, do fields
develop when the space between any two pre-existing fields is sufficiently broad (Eyal
2009;Medvetz2012)? Big data may also enable analysis of a number of intriguing
questions within individual fields as well. For example, do discursive fields have
carrying capacities for new forms of culture? Do certain actors gain power within
discursive fields by exploiting niches between rival factions? Or, what is the relation-
ship between the core and periphery of discursive fields (e.g., Bail 2012)?
Another exciting feature of big data is that it often includes geo-coded data. For
example, Twitter and Facebook record the geographic location of their users. This
information is also often recorded on the comments sections of websites. Finally,
analytics or “insights”data often include the latitude and longitude of visitors to
different websites via Internet Protocol (IP) addresses or other geographic identifiers
such as city names. Political scientists have even mined visual data on ethnic conflict
from Google Earth (Agnew et al. 2008). The potential to look at the relationship
between Cartesian coordinates and cultural elements could create a new subfield within
cultural sociology that analyzes the geography of meaning. Such a field might examine
questions such as: 1) Do cultural frames or symbolic boundaries cluster at the national
level or supranational levels? 2) Does physical proximity breed more convergence of
worldviews than online interaction? Finally, does the answer to these two questions
change over time as the forces of globalization push people ever closer together?
Conclusion
Cultural sociology has long suffered from an imbalance of theory and data (Ghaziani
2009). Yet the big data movement may radically alter this equilibrium. The big data
movement began with the Internet and social media, but the future of the field will also
entail increasingly ambitious forays into the past. As digitized historical archives
continue to expand and social scientists coordinate new ways of organizing qualitative
meta data with rich detail about the evolution of meaning, cultural sociologists can no
longer afford to ignore the big data movement. Above, I argued that integration of in-
depth qualitative coding techniques pioneered by cultural sociologists and anthropol-
ogists can be leveraged to improve already powerful automated text analysis techniques
40
On the concept of cultural holes, see also Lizardo (in this issue).
Theor Soc
produced by computer scientists, linguists, and political scientists. This synthesis will
enable cultural sociologists to achieve theoretical progress on questions that were once
thought un-measurable. Proponents of big data may also gain key insight from cultural
sociologists about how to further hone their tools to map the contours of cultural fields,
classify cultural elements, and trace the evolution of culture over time.
Yet for all of my optimism about the marriage of cultural sociology and big data,
formidable obstacles remain. Perhaps the most vexing problem is that big data often
does not include information about the social context in which texts are produced
(Griswold and Wright 2004). Although we are able to collect millions of blog posts
about virtually any issue, these data typically include little or no information about the
authors of such posts—or those who comment upon them. Twitter data are publicly
available, but provide very little information about the social context in which Tweets
are produced. Other sites such as Facebook collect massive amounts of data about
social context but are often unable to share them with researchers because of concerns
about user privacy. Sources of big data outside social media also often lack important
information about the social context in which texts are produced. Collecting every
newspaper article on a political topic is of marginal utility absent in-depth analysis of
the political and institutional processes that lead media to gravitate towards one issue
over another.
41
Yet these obstacles are not without solutions that might build upon the
progress of cultural sociologists in developing mixed-method research designs. For
example, qualitative or quantitative surveys of Twitter users could be conducted to
place their online behavior within broader context. Or, large-scale analyses of media
data or historical surveys might be used to identify compelling puzzles for comparative
historical analysis. In theory, big data could also be used to guide ethnographic
interventions as well—or at least help place the findings of ethnography within broader
cultural fields. In brief, big data methods should be viewed as a complement—not a
replacement—for the tried and tested techniques of cultural sociology.
A second major challenge is that computer-assisted coding can never be more
reliable than the codes themselves. Cultural sociologists seldom discuss coding criteria
or inter-coder reliability, in part because the definition of many of our core concepts is
highly contested (Biernacki 2012). One need only read the literature on framing, for
example, to witness significant disagreement about whether and how they should be
measured or operationalized.
42
While these debates will not be easily resolved, the
integration of big data and cultural sociology will depend critically upon our capacity to
converge upon several broadly accepted definitions of these core concepts. Yet big data
may actually facilitate such conversations—since conceptual vagueness among cultural
sociologists results in part from our paucity of shared datasets. Cultural sociologists are
also looking across disciplinary lines for guidance in making core concepts more
concrete. For example, Mohr et al. (2013) have fused the literatures on narrative from
41
It is also worth noting that texts that cannot be collected because they are not in the public domain may
ultimately have less impact upon the evolution of broader cultural domains precisely because they are hidden
from public view. This underlies a broader pragmatist argument about the need to focus attention upon
consequences of social action (e.g., Johnson-Hanks et al. 2011; Tavory and Timmermans 2013). An interesting
analogue is the debate about the social construction of ethnicity via the enumeration of different groups by the
US Census (cf. Loveman and Muniz 2007). I thank Andy Perrin for bringing this issue to my attention.
42
For a detailed analysis of conceptual and methodological ambiguities in the measurement of frames, see
Scheufele (1999).
Theor Soc
linguistics with studies of social networks and topic modeling from sociology and
computer science. Polletta is currently synthesizing linguistics and cultural sociology
using new visualization techniques that enable them to explore how making people
aware of their cultural schemas shapes their behavior during democratic deliberation.
43
Finally, Ignatow and Mihalcea (2013) propose a model for big data analysis that
synthesizes neuroscience and Bourdieusian practice theory.
A final concern for cultural sociologists is the relatively high entry cost for those
who wish to develop the technical expertise currently necessary to work with big data.
44
Although these costs are rapidly decreasing thanks to simple web-based tools for big
data analyses, formalizing these techniques for cultural sociology will require a new
generation of scholars with both technical expertise and theoretical ambition. For now,
the big data movement urgently requires the guidance of theoretically and qualitatively
oriented cultural sociologists. Little can be learned from big data without big thinking.
While data mining may reveal interesting patterns in large text corpora or compelling
visualizations, many pieces of hay have come to resemble needles.
45
Therefore, the
future of the big data movement hinges upon collaboration among cultural sociologists,
computer scientists, and others to teach computers to differentiate different types of
meaning and their shifting relationships over time.
Acknowledgments I thank Elizabeth Armstrong, Alex Hanna, Gabe Ignatow, Charles Kurzman, Brayden
King, Jennifer Lena, John Mohr, Terry McDonnell, Andy Perrin, and Steve Vaisey for helpful comments on
previous drafts. The Robert Wood Johnson Foundation and the Odum Institute at the University of North
Carolina provided financial support for this research.
References
Abbott, A. (1995). Things of boundaries. Social Science Research, 62(4), 857–882.
Abbott, A. (1997). On the concept of turning point. Comparative Social Research, 16,85–106.
Abbott, A. (2001). Chaos of disciplines. Chicago: University of Chicago Press.
Agnew, J., Gillespie, T., Gonzalez, J., & Min, B. (2008). Baghdad nights: Evaluating the US military “surge”
using nighttime light signatures.
Alexander, J. (2006). The civil sphere. Oxford: Oxford University Press.
Alexander, J., & Smith, P. (2001). The strong program in cultural theory: Elements of a structural hermeneu-
tics. In J. H. Turner (Ed.), Handbook of sociological theory (pp. 135–150). New York: Springer.
Armstrong, E. A. (2002). Forging gay identities:Organizing sexuality in San Francisco, 1950–1994. Chicago:
University of Chicago Press.
Bail, C. (2012). The fringe effect: civil society organizations and the evolution of media discourse about Islam,
2001–2008. American Sociological Review, 77(7), 855–879.
43
See Baumer et al. (2013)
44
Efforts are currently underway to make the collection and analysis of big data possible for those without a
computer programming background. Gary King and colleagues are producing a web-based tool named
“Consilience”that will enable cluster analysis of unstructured text. Primitive forms of topic modeling and
sentiment analysis are available via a variety of web-based software programs as well such as www.
discovertext.com . Finally, there is a variety of high quality tutorials available online for those who wish to
develop basic programming skills for working with big data. For example, see http://nealcaren.web.unc.edu/
big-data/,andhttp://www.chrisbail.net/p/software.html. A complete list of tutorials is available at http://www.
chrisbail.net/p/big-data.html.
45
See Steve Lohr, “The Age of Big Data,”The New York Times, February 11, 2012.
Theor Soc
Bail, C. (2013a). Winning minds through hearts: Organ donation advocacy, emotional feedback, and
social media. Working Paper, Department of Sociology, University of North Carolina at Chapel
Hill.
Bail, C. (2013b). Taming big data: Apps and the future of survey research. Working Paper, Department of
Sociology, University of North Carolina, Chapel Hill.
Bail, C.A. (forthcoming). Terrified: How anti-muslim organizations became mainstream. Princeton University
Press, Princeton, NJ.
Barth, F. (1969). Ethnic groups and boundaries: The social organization of cultural difference. Boston: Little,
Brown.
Bartley, T. (2007). How foundations shape social movements: the construction of an organizational field and
the rise of forest certification. Social Problems, 54(3), 229–255.
Baumer, E. P. S., Polletta, F., Pierski, N., Celaya, C., Rosenblatt, K., & Gay, G. K. (2013, February).
Developing computational supports for frame reflection. Retrieved from http://hdl.handle.net/2142/
38374.
Bearman, P., & Stovel, K. (2000). Becoming a Nazi: a model for narrative networks. Poetics, 27(2), 69–90.
Bearman, P., Faris, R., & Moody, J. (1999). Blocking the future: new solutions for old problems in historical
social science. Social Science History, 23(4), 501–533.
Benford, R., & Snow, D. (2003, November 28). Framing processes and social movements: An overview and
assessment. Review-article.
Biernacki, R. (2012). Reinventing evidence in social inquiry: Decoding facts and variables. New York:
Palgrave Macmillan.
Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D., & Lafferty, J. (2006). International Conference on Machine Learning, ACM, New York, New York,
113–120.
Blei, D., & Lafferty, J. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.
Blei, D., & McAuliffe, J. (2010). Supervised topic models. arXiv:1003.0783. Retrieved from http://arxiv.org/
abs/1003.0783.
Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3,
993–1022.
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. JournalofComputational
Science, 2(1), 1–8. doi:10.1016/j.jocs.2010.12.007.
Bourdieu, P. (1975). The specificity of the scientific field and the social conditions of the progres of reason.
Social Science Information, 14(6), 1–19.
Bourdieu, P. (1985). The social space and the genesis of groups. Theory and Society, 14(6), 723–744. doi:10.
1007/BF00174048.
Bourdieu, P. (1990). Homo Academicus (1st ed.). Stanford: Stanford University Press.
Cerulo, K. A. (1998). Deciphering violence: The cognitive structure of right and wrong (1st ed.). New York:
Routledge.
Chang, J., Boyd-graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading tea leaves: How humans
interpret topic models.
Collins, R. (2013). Solving the Mona Lisa smile, and other developments in visual micro-sociology. Working
Paper, Department of Sociology, University of Pennsylvania.
D’Andrade, R. G. (1995). The development of cognitive anthropology. Cambridge: Cambridge University
Press.
DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23,263–287.
DiMaggio, P., & Bonikowski, B. (2008). Make money surfing the web? The impact of internet use on the
earnings of U.S. workers. American Sociological Review, 73(2), 227–250. doi:10.1177/
000312240807300203.
Dimaggio, P., Hargittai, E., Neuman, W. R., & Robinson, J. (2001). Social implications of the internet. Annual
Review of Sociology, 27, 307–336.
Dimaggio, P., Nag, M., & Blei, D. (forthcoming). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of government arts funding in the
U.S. Poetics, Page numbers unknown.
Douglas, M. (1966). Purity and danger: An analysis of concepts of pollution and taboo. New York: Praeger.
Douglas, M. (1986). How institutions think. Syracuse: Syracuse University Press.
Eliasoph, N., & Lichterman, P. (2003). Culture in interaction. American Journal of Sociology, 108(4), 735–
794.
Espeland, W. N., & Stevens, M. L. (1998). Commensuration as a social process. Annual Review of Sociology,
24,313–343. doi:10.2307/223484.
Theor Soc
Evans, R., & Kay, T. (2008). How environmentalists “Greened”trade policy: strategic action and the
architecture of field overlap. American Sociological Review, 73(6), 970–991. doi:10.1177/
000312240807300605.
Eyal, G. (2009). The space between fields. Working Paper, Center for Comparative Research, Yale University.
Fligstein, N., & McAdam, D. (2011). Toward a general theory of strategic action fields. Sociological Theory,
29(1), 1–26.
Foucault, M. (1970). The order of things: An archaeology of the human sciences (1st ed.). New York: Vintage.
Franzosi, R. (2004). From words to numbers: Narrative, data, and social science. Cambridge: Cambridge
University Press.
Franzosi, R. (2009). Quantitative narrative analysis (1st ed.). Thousand Oaks: SAGE Publications, Inc.
Gaby, S., & Caren, N. (2012). Occupy online: how cute old men and Malcolm X Recruited 400,000 U.S. users
to OWS on Facebook. Social Movement Studies, 11,367–374.
Geertz, C. (1973). The interpretation of cultures: Selected essays. New York: Basic Books.
Ghaziani, A. (2009). An “amorphous mist”? The problem of measurement in the study of culture. Theory and
Society, 38(6), 581–612. doi:10.1007/s11186-009-9096-2.
Ghaziani, A., & Baldassarri, D. (2011). Cultural anchors and the organization of differences. American
Sociological Review, 76(2), 179–206. doi:10.1177/0003122411401252.
Gieryn, T. F. (1999). Cultural boundaries of science: Credibility on the line (1st ed.). Chicago: University of
Chicago P ress.
Goffman, E. (1963). Stigma: Notes on the management of spoiled identity. New York: Touchstone.
Goffman, E. (1974). Frame analysis. Cambridge: Harvard University Press.
Gold, M. K. (2012). Debates in the digital humanities. Minneapolis: U of Minnesota Press.
Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and day length across
diverse cultures. Science, 333(6051), 1878–1881. doi:10.1126/science.1202775.
Gong, A. (2011). An automated snowball census of the political web. SSRN eLibrary. Retrieved from http://
papers.ssrn.com/sol3/papers.cfm?abstract_id=1832024.
Grimmer, J. (2010). A Bayesian hierchical topic model for political texts: measuring expressed agendas in
senate press releases. Political Analysis, 18(1), 1–35.
Grimmer, J., & King, G. (2011). General purpose computer-assisted clustering and conceptualiza-
tion. Proceedings of the National Academy of Sciences, 108(7), 2643–2650. doi:10.1073/pnas.
1018067108.
Griswold,W.,&Wright,N.(2004).Wiredandwellread.InSociety online: The internet in context. New York: Sage.
Hopkins, D. (2013). The exaggerated life of death panels: The limits of framing effects in the 2009–2012
health care debate. Working Paper, SSRN.
Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science.
American Journal of Political Science, 54(1), 229–247. doi:10.1111/ j.1 540-59 07. 2009.00 428 .x.
Ignatow, G., & Mihalcea, R. (2013). Text mining for comparative cultural analysis. Working Paper,
Department of Sociology, University of North Texas.
Johnson-Hanks, J., Bachrach, C., Morgan, P., & Kohler, H.-P. (2011). Understanding family change and variation:
toward a theory of conjuctural action. Understanding Population Trends and Processes, 5,1–179.
Kaufman, J. (2004). Endogenous explanation in the sociology of culture. Annual Review of Sociology, 30,
335–357.
King, G. (2011). Ensuring the data rich future of the social sciences. Science, 331(11 February), 719–721.
Krippendorff, K. H. (2003). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks:
Sage Publications, Inc.
Lamont, M. (1992). Money, morals, and manners: The culture of the French and American upper-middle
class. Chicago: University of Chicago Press.
Lamont, M. (2000). The dignity of working men: Morality andthe boundaries of race, class, and immigration.
New York: Russell Sage.
Lamont, M. (2012). Toward a comparative sociology of valuation and evaluation. Annual Review of Sociology,
38,201–221.
Lamont, M., & White, P. (2009). The evaluation of systematic qualitative research in the social sciences.
Report of the U.S. National Science Foundation.
Lan, T., & Raptis, M. (2013). From subcategories to visual composites: A multi-level framework for object
recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Latour, B. (1988). How to follow scientists and engineers through society. Cambridge: Harvard University Press.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., et al. (2009). SOCIAL
SCIENCE: computational social science. Science, 323(5915), 721–723. doi:10.1126/science.
1167742.
Theor Soc
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: a new
social network dataset using Facebook.com. Social Networks, 30(4), 330–342. doi:10.1016/j.socnet.2008.
07.002.
Lieberson, S. (2000). A matter of taste: How names, fashions, and culture change. New Haven: Yale
University Press.
Livne, A., Simmons, M. P., Adar, E., & Adamic, L. (2011). The party is over here: Structure and content in the
2010 election. Proceedings of the Fifth Intenrational AAAI Conference on Weblogs and Social Media,
201–209.
Loveman, M., & Muniz, J. (2007). How Puerto Rico became white: boundary dynamics and inter-census
racial classification. American Sociological Review, 72, 915–939.
Manning, C. D., & Schuetze, H. (1999). Foundations of statistical natural language processing (1st ed.).
Cambridge: The MIT Press.
Mark, N. P. (2003). Culture and competition: homophily and distancing explanations for cultural niches.
American Sociological Review, 68(3), 319–345. doi:10.2307/1519727.
Martin, J. L. (2003). What is field theory? American Journal of Sociology, 109(1), 1–49.
Medvetz, T. (2012). The rise of think tanks in America: Merchants of policy and power. Chicago: University of
Chicago.
Merton,R.(1949).Social theory and social structure. New York: The Free Press.
Mische,A.(2008).Partisan publics: Communication and contention across Brazilian youth activist networks.
Princeton: Princeton University Press.
Mohr, J. (1998). Measuring meaning structures. Annual Review of Sociology, 24,345–370.
Mohr, J., & Guerra-Pearson, F. (2010). The duality of niche and form: The differentiation of institutional space
in New York City, 1888–1917. In Categories in markets: Origins and evolution (pp. 321–368). New
York: Emerald Group Publishing.
Mohr, J., Singh, A., & Wagner-Pacifici, R. (2013). CulMINR: Cultural meanings from the interpretation of
narrative and rhetoric: A dynamic network approach to hermeneutic mining of large text corpora.
Working Paper, Department of Sociology, University of California, Santa Barbara.
Mohr, J., Wagner-Pacifici, R., Breiger, R., Bogdanov, P. (2014). Graphing the grammar of motives in National
Security Strategies: cultural interpretation, automated text analysis, and the drama of global politics.
Poetics, 41(6), 670–700.
Moretti, F. (2013). Distant reading. London: VERSO BOOKS.
Pachucki, M. A., & Breiger, R. L. (2010). Cultural holes: beyond relationality in social networks and culture.
Annual Review of Sociology, 36(1), 205–224. doi:10.1146/annurev.soc.012809.102615.
Padgett, J. F., & Powell, W. W. (2012). The emergence of organizations and markets. Princeton: Princeton
University Press.
Paul, M. J., & Dredze, M. (2011). You are what you tweet: Analyzing Twitter for public health. Fifth
International Conference on Weblogs.
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political
attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209–228.
doi:10.1111/j.1540-5907.2009.00427.x .
Scheufele, D. A. (1999). Framing as a theory of media effects. The Journal of Communication, 49(1), 103–
122. doi:10.1111/j.1460-2466.1999.tb02784.x.
Sewell, W. (1996). Historical events as transformations of structures: inventing revolution at the Bastille.
Theory and Society, 25(6), 841–881. doi:10.1007/BF00159818.
Smith, T. (2007). Narrative boundaries and the dynamics of ethnic conflict and conciliation. Poetics, 35,22–
46.
Swidler, A. (1986). Culture in action: symbols and strategies. American Sociological Review, 51(2), 273–286.
Swidler, A. (1995). Cultural power and social movements. In Social movements and culture. London:
Routledge.
Tangherlini, T. R., & Leonard, P. (2013). Trawling in the Sea of the Great Unread: sub-corpus topic modeling
and Humanities research. Poetics. doi:10.1016/j.poetic.2013.08.002.
Tavory, I., & Timmermans, S. (2013). Consequences in Action: A pragmatist approach to causality in
ethnography. Working Paper, New School for Social Research.
Vaisey, S., & Lizardo, O. (2010). Can cultural worldviews influence network composition? Social Forces,
88(4), 1595–1618. doi:10.1353/sof.2010.0009.
Wagner-Pacifici, R. (2010). Theorizing the restlessness of events. American Journal of Sociology, 115(5),
1351–1386.
Wallach, H. (2006). Topic modeling: Beyond bag of words. Proceedings of the 23rd International Conference
on Machine Learnings.
Theor Soc
Weber, K. (2005). A toolkit for analyzing corporate cultural toolkits. Poetics, 33(3–4), 227–252. doi:10.1016/j.
poetic.2005.09.011.
Wuthnow, R. (1993). Communities of discourse: Ideology and social structure in the reformation, the
enlightenment, and European socialism. Cambridge: Harvard University Press.
Zelizer, V. A. R. (1985). Pricing the priceless child: The changing social value of children. Princeton:
Princeton University Press.
Christopher A. Bail is Assistant Professor of Sociology at the University of North Carolina, Chapel Hill. His
research interests include cultural sociology, political sociology, organizations, and mixed-method research
designs. He is currently completing a manuscript entitled Terrified:How Anti-Muslim Organizations Became
Mainstream. His other work has appeared in the American Sociological Review and the Revue Europeene des
Migrations Internationales.
Theor Soc