ArticlePDF Available

The Cultural Environment: Measuring Culture With Big Data



The rise of the Internet, social media, and digitized historical archives has produced a colossal amount of text-based data in recent years. While computer scientists have produced powerful new tools for automated analyses of such "big data," they lack the theoretical direction necessary to extract meaning from them. Meanwhile, cultural sociologists have produced sophisticated theories of the social origins of meaning, but lack the methodological capacity to explore them beyond micro-levels of analysis. I propose a synthesis of these two fields that adjoins conventional qualitative methods and new techniques for automated analysis of large amounts of text in iterative fashion. First, I explain how automated text extraction methods may be used to map the contours of cultural environments. Second, I discuss the potential of automated text-classification methods to classify different types of culture such as frames, schema, or symbolic boundaries. Finally, I explain how these new tools can be combined with conventional qualitative methods to trace the evolution of such cultural elements over time. While my assessment of the integration of big data and cultural sociology is optimistic, my conclusion highlights several challenges in implementing this agenda. These include a lack of information about the social context in which texts are produced, the construction of reliable coding schemes that can be automated algorithmically, and the relatively high entry costs for cultural sociologists who wish to develop the technical expertise currently necessary to work with big data.
The cultural environment: measuring culture with big
Christopher A. Bail
#Springer Science+Business Media Dordrecht 2014
Abstract The rise of the Internet, social media, and digitized historical archives has
produced a colossal amount of text-based data in recent years. While computer scientists
have produced powerful new tools for automated analyses of such big data,they lack the
theoretical direction necessary to extract meaning from them. Meanwhile, cultural sociolo-
gists have produced sophisticated theories of the social origins of meaning, but lack the
methodological capacity to explore them beyond micro-levels of analysis. I propose a
synthesis of these two fields that adjoins conventional qualitative methods and new
techniques for automated analysis of large amounts of text in iterative fashion. First, I
explain how automated text extraction methods may be used to map the contours of cultural
environments. Second, I discuss the potential of automated text-classification methods to
classify different types of culture such as frames, schema, or symbolic boundaries. Finally, I
explain how these new tools can be combined with conventional qualitative methods to trace
the evolution of such cultural elements over time. While my assessment of the integration of
big data and cultural sociology is optimistic, my conclusion highlights several challenges in
implementing this agenda. These include a lack of information about the social context in
which texts are produced, the construction of reliable coding schemes that can be automated
algorithmically, and the relatively high entry costs for cultural sociologists who wish to
develop the technical expertise currently necessary to work with big data.
Keywords Culture .Content analysis .Mixed-methods .Evolutionary theory
More data were accumulated in 2002 than all previous years of human history
By 2011, the amount of data collected prior to 2002 was being collected
every 2 days.
This dramatic growth in data spans nearly every part of our livesfrom
Theor Soc
DOI 10.1007/s11186-014-9216-5
International Data Corporation, The 2011 Digital Universe Study: Extracting Value from Chaos,June,
2011. See also Christopher R. Johnson, How Big is Big Data?Lecture at the University of Michigans
Cyber-Infrastructure Conference, November 7th, 2012.
C. A. Bail (*)
University of North Carolina at Chapel Hill, 225 Hamilton Hall, Chapel Hill, NC 27599, USA
gene sequencing to consumer behavior.
While most of these data are binary or
quantitative, text-based data are also being accumulated on an unprecedented scale. In
an era of social science research plagued by declining survey response rates and concerns
about the generalizability of qualitative research, these data hold considerable potential
(Golder and Macy 2011;King2011; Lazer et al. 2009). Yet social scientistsand cultural
sociologists in particularhave largely ignored the promise of so-called big data.
Instead, cultural sociologists have left this wellspring of information about the arguments,
worldviews, or values of hundreds of millions of people from Internet sites and other
digitized texts to computer scientists who possess the technological expertise to extract
and manage such data but lack the theoretical direction to interpret their meaning.
The most obvious explosion in text-based data coincided with the rise of the
Internet. Between 1995 and 2008 the number of websites expanded by a factor of
more than 66 million, recently surpassing 1 trillion.
Although sociologists were
understandably concerned about digital divides in years past, these inequalities appear
to be steadily decreasing (DiMaggio and Bonikowski 2008;Dimaggioetal.2001).
According to a 2012 survey, roughly half of all Americans visit a social media site such
as Facebook or Twitter each day, producing billions of lines of text in so doing.
trends are markedly higher among younger people, suggesting these trends may only
continue to grow over time.
Most of the text from social media sites is readily
accessible via simple computer programs.
Yet the outgrowth in text-based data on
the Internet is not limited to social media sites. Screen-scraping technologies can be
used to extract information from any number of Internet sites within time frames that
are only limited by digital storage capacity.
And the potential to collect such data is not
only tied to the future, but also the past. Since 1996, a non-profit organization known as
the Internet Archive has been storing all text from nearly every website on the Internet.
The outgrowth of text-based data is also not confined to the Internet. Thanks to new
digital technologies from fields as diverse as library science and communications, an
unprecedented amount of qualitative data is being archived. Google alone has already
created digital copies of nearly every single book ever written in collaboration with more
than 19 million libraries worldwide.
Academic data warehouses such as LEXIS-NEXIS or
ProQuest now contain digital copies of most of the worlds journals, newspapers, and
magazines. The Vanderbilt Television News Archive contains copies of most major
The US National Science Foundation invested more than $15 million in Big Data projects in 2012, and will
easily surpass this amount in upcoming years due to the development of new infrastructure for funding big
data projects in collaboration with Britains Economic & Social Research Council, the Netherlands Organi-
zation for Scientific Research, and the Canada Foundation for Innovation, among many others.
Jesse Alpert and Nissan Hajaj, We knew the web was big…” Official Google Blog, July 25th, 2008 (http:// accessed January 2012).
Pew Internet & American Life Project, February 1st, 2012.
Social Networking Popular Across Globe,Pew Research Global Attitudes Project, December 12, 2012.
Moreover, the US Library of Congress recently announced plans to release a database of every single Twitter
message ever made. Current estimates place the total number of tweets that might be archived at more than
170 billion.
Web-scraping technologies have facilitated the collection of remarkably large datasets. Golder and Macy
(2011), for example, recently conducted a study of more than 500 million Twitter messages produced in more
than 84 countries over a 2 year period.
Though access to the entire Google book archive is limited by pay walls designed to protect copyright
privileges, Google has released the entire dataset in ngramformat, which allows scholars to analyze them
via the automated text analysis tools discussed in further detail below.
Theor Soc
newscasts produced since 1998. An unprecedented amount of text-based data that describe
legislative debates, government reports, and other state discourse is also now available on
websites such as the National Archives of the United States and Great Britain. Qualitative
academic research is also being compiled within meta-dataarchives on an unprecedented
scalefrom in-depth interview data to field notes.
Continuing improvement in digital
speech recognition technologies has also facilitated even more text-based data from histor-
ical audio sources to local town hall meetings that are recorded and uploaded to websites for
posterity. Indeed, the remarkable growth in text-based data warrants a brief thought exper-
iment: what types of text or speech-based data are not currently being archived?
If the answer is littleor very littletext is not being archived, then cultural sociology
must have a reckoning with big data alongside those in other fields.
Political scientists
are currently exploring the potential of social media to explain political mobilization
(Hopkins and King 2010; Livne et al. 2011). Public health scholars use Twitter to identify
trends in disease (Paul and Dredze 2011), and communications scholars claim it can be
used to predict shifts within the stock market (Bollen et al. 2011). Even humanities
scholars have invented the vibrant new field of digital humanities (e.g. Gold 2012;
Moretti 2013; Tangherlini and Leonard 2013). By comparison, cultural sociologists have
made very few ventures into the universe of big data even thoughtexts are a central object
of study in the fieldin the form of primary documents, interview transcriptions, or field
In this article, I argue inattention to big data among cultural sociologists is doubly
surprising since it is naturally occurringunlike survey research or cross-sectional
qualitative interviewsand therefore critical to understanding the evolution of meaning
structures in situ. That is, many archived texts are the product of conversations between
individuals, groups, or organizations instead of responses to questions created by re-
searchers who usually have only post-hoc intuition about the relevant factors in meaning-
makingmuch less how cultural evolves in real time.
For all the promise of big data for cultural sociology, formidable obstacles remain.
First of all, the sheer volume of data can be overwhelming. Large corpora cannot be
coded by hand, and automated data mining techniques are of little utility if they are not
guided by theory. Second, big data is untidy. Although computer-assisted data classi-
fication and data reduction techniques have improved in the past decade, much big data
analysis remains computationally intensive and therefore out of reach for many cultural
sociologistsparticularly those without any background in statistics or computer
programming. Thirdand perhaps most importantlythere is much that is of interest
to cultural sociologists that is not easily reducible to text. The greatest challenge for
cultural sociologists interested in big data is to develop new techniques to measure the
unspoken or implicit meanings that occur in-between words. The preconscious cultural
scripts or frames that shape how people understand the world (e.g., DiMaggio 1997),
See, for example, the Dataverse Network, the Interdisciplinary Consortium for Political and Social
Research, and the United Kingdoms Qualidata archive.
The neologism big datahas come to refer to many different types of data. Here, I use the term to refer to
the increasingly large volume of text-based data that is often-though not always- produced through digital
sources. As the remainder of this manuscript describes, these data are also unique because they are naturally
occurring,unlike survey data which result from the intrusion of researchers into everyday life.
Exceptions described in additional detail below include Franzosi (2004), Lewis et al. (2008), Bail (2012),
Bail (forthcoming) and several other works in progress.
Real timerefers to the collection, presentation, or analysis of data at or very near the time it is being
produced by social actors.
Theor Soc
for example, are not always manifest in speech or text. Similarly, most big data eschews
the production of meaning through bodily interaction (e.g., Eliasoph and Lichterman
2003)though the future of big data may include new techniques to analyze the ever-
increasing volume of video on the Internet (Collins 2013; Lan and Raptis 2013).
This article does not offer solutions to each of these limitations of big data. Instead, it
provides a critical survey of recent developments within the big data movement and
links them to outstanding theoretical debates and measurement challenges within
cultural sociology. These include the measurement of cultural environments or meaning
systems such as discursive fields; the classification of cultural elements such as frames
or schema within such systems; and tracing cultural processes over long segments of
time. In describing the promise of big data for cultural sociology, I also detail how the
latter field may address some of the most vexing challenges of the former given its
foundational interest in the systematic study of meaning. I provide only limited
discussion of the technical and logistical issues that arise in working with big data
because these issues are currently being addressed within separate literatures referenced
I also do not review the promising field of quantitative narrative analysis
because it has been addressed elsewhere.
This article is thus an invitation to cultural
sociologists curious about the potential of big data and a call to shatter the disciplinary
silos that inhibit collaboration between this field and those who lead the big data
Mapping cultural environments
By and large, the central objects of study in cultural sociology have been confined to
micro-levels of analysis. For example, cultural elements such as symbolic boundaries
(e.g., Lamont 1992), cultural toolkits (e.g., Swidler 1986), cognitive schemas (e.g.,
DiMaggio 1997), and cultural frames (e.g., Benford and Snow 2003) have been defined
as judgments, classifications, or pre-conscious decisions that can only be measured
through close readings of texts such as interview transcripts, content analysis of key
texts, or ethnographic field notes. Yet as Swidler (1995)argues,the greatest unan-
swered question in the sociology of culture is whether and how some cultural elements
control, anchor, or organize others.
For example, how are cultural frames ordered
within vast discursive fields? Is there a space between such fields? How do cultural
frames shape the evolution of fields more broadly? Addressing such questions requires
meso- and macro-level analysis of the relationship between multiple cultural elements
or systems of meaning. One of the most promising dimensions of the big data
movement for cultural sociology is to enable new analyses at these larger levels of
analysis. As I describe below, one can now obtain every website, blog, social media
message, newspaper article, or television transcript on a given topic fairly easily.
The capacity to capture allor nearly allrelevant text on a given topic opens
exciting new lines of meso- and macro-level inquiry into what I call cultural
For a technical overview of techniques designed for analysis of Big Data, see Manning and Schuetze
For an overview, see Franzosi (2009).
See also Ghaziani and Baldassarri (2011).
Theor Soc
environments (Bail forthcoming). Ecological or functionalist interpretations of culture
have been unpopular with cultural sociologists for some timemost likely because the
subfield defined itself as an alternative to the general theory proposed by Talcott Parsons
(Alexander 2006). Yet many cultural sociologists also draw inspiration from Mary
Douglas (e.g., Alexander 2006;Lamont1992; Zelizer 1985), wholike Swidlerinsists
upon the need for our subfield to engage broader levels of analysis. For sociology to
accept that no functionalist arguments work,writes Douglas (1986,p.43),is like cutting
off ones nose to spite ones face.To be fair, cultural sociologists have recently made
several programmatic statements about the need to engage functional or ecological
theories of culture. Abbott (1995), for example, explains the formation of boundaries
between professional fields as the result of an evolutionary process. Similarly, Lieberson
(2000), presents an ecological model of fashion trends in child-naming practices. In a
review essay, Kaufman (2004) describes such ecological approaches to cultural sociology
as one of the three most promising directions for the future of the subfield.
The concept of discursive fields is perhaps the most promising theoretical construct
to advance an ecological approach to cultural sociology (Bourdieu 1975; Foucault
1970;Martin2003; Wuthnow 1993). Yet field theory is often castigated for being
tautological, or assuming the existence of invisible or intangible social forces that
reproduce structures of inequality or patterns of cultural differentiation without ever
directly observing them. The boundaries of fields are usually unobserved in empirical
studies because of the considerable methodological obstacles involved. Apart from
Eyal (2009), cultural sociologists have scarcely theorized the outer limits of cultural
fields, the spaces between them, or the relationships among multiple fields.
This is a
significant limitation since most field theory makes several assumptions that are
inherently ecological. For example, many studies assume that relationships between
actors or groups of actors within a field produce a polarity that sustains or reproduces
uneven power relationships or access to institutions (Bourdieu 1985; Fligstein and
McAdam 2011; Wuthnow 1993). Others borrow more directly from ecological or
evolutionary theory to explain the competition for attention or resources within fields
(Abbott 2001;Kaufman2004;Lieberson2000), or the ability of cultural entrepreneurs
to exploit niches within such environments (e.g., Mische 2008).
Despite the implicit ecological reasoning of field theory, most applications of this
framework rely upon micro- or meso-level measurement strategies. For example, many
studies identify key actors or institutions within fields and trace their influence over
other parts of the fields. Other studies focus upon conflict or classification struggles
within fields in order to identify such influential actors (Bourdieu 1990). As a result,
these types of studies only observe the consequences of field-level processes rather than
meso- or macro-level relationships between social actors and cultural elements that
most scholars believe create such social spaces.
These micro-level measurement
strategies are typically necessary because most discursive fields are so broad that an
entire team of researchers working for several years could only map a fraction of all the
texts, transcripts, or archives that define them. The size of most cultural fields has
become even more daunting with the rise of the Internet. Indeed, a researcher could
See also Mark (2003).
One exception is Evans and Kays(2008) study of field overlap.
Exceptions include Mohr and Guerra-Pearson (2010)andBail(2012).
Theor Soc
easily follow links between websites for hours only to forget where, when, or why they
shifted focus from one site to another.
The big data movement has made extracting all text from a discursive field easier
than ever before.
Massive databases already exist that classify texts into meaningful
social categories. For example, services such as LEXIS-NEXIS and Pro-Quest have
sophisticated searchable indexes that cover industries, geographical location, time, or
different types of text (e.g., newspapers, newswires, or television transcripts). Simple
Boolean operates such as ANDand ORcan be used to further specify meaningful
cultural environments within each of these sub-samples.
Yet perhaps the most
powerful innovation of the big data movement for the mapping of cultural environ-
ments has been screen scraping, or automated extraction of text from websites. Screen
scraping is typically used to mine text or other data from web pages, though it can also
be used to extract text from scanned images using Optical Character Recognition
(OCR) technologies. A variety of data archives have developed searchable indexes
based on such screen-scraping technologies. Google, for example, allows Boolean
searches of its archives of books, blogs, government documents, and major US
newspapers and magazines.
But new technologies produced by the big data movement have also advanced
automated extraction of text far beyond simple indexes, Boolean searches, and screen
scraping. In particular, new techniques have been developed to exploit the relational
nature of many sources of big dataparticularly those from the Internet. For example,
Gong (2011) recently introduced new software that fuses snowball sampling methods
with screen-scraping technologies. The user simply inputs a starting website and a
classifying rule such as a Boolean search term or one of the other classification
algorithms described in further detail below. The software then visits each site that is
linked to the starting website and uses the classifying rule to decide whether it should
be included in the sample. If so, the program extracts all text from the site and repeats
the process of spideringlinks across multiple waves that are only constrained by
computer memory processing power. Given a number of different starting sites and a
sufficient number of waves, the Snowcrawl software produces a total sample of all
websites pertaining to a given topic. Although this tool is currently limited to the
Internet, a number of other qualitative data archives store relational data that could
potentially be analyzed using similar automated snowball methods. What is more, the
majority of newspapers, television stations, journals, or other texts of interest to cultural
sociologists are now available on the web.
A second promising tool for extracting large amounts of data from the web or
qualitative data archives are Application Programming Interfaces (APIs). These web-
based tools provide an interactive interface with large data archives that are designed to
enable targeted data extraction. They were developed primarily for consumer pur-
posessuch as the creation of third party applications for social media sites such as
While automated data extraction methods are particularly useful for mapping the contours of discursive
fields, it is important to note that such techniques do not capture the deeper preconscious cultural elements that
undergird social fields as Bourdieu and others have theorized them (e.g., Bourdieu 1990; Fligstein and
McAdam 2011;Martin2003). I return to the question of whether big data techniques can be leveraged to
classify such cultural elements in the following section as well as my discussion and conclusion.
For example, one might define a discursive field by identifying all texts with a certain set of keywords or
within a certain search index offered by text archives.
Theor Soc
Facebook, Twitter, or Googlebut a number of academics have begun to use them as
data collection tools as well (Bail 2013a;GabyandCaren2012; Livne et al. 2011).
Even conventional media outlets such as the New York Times now offer APIs that enable
users to search and download articles or user comments from their website. APIs are
superior to other forms of data extraction not only because they enable more sophisti-
cated targeting of different types of textsuch as Twitter messages about the Arab
Springbut also because such sites typically record a vast array of information about
the users of their sites as well as their behavior online. For example, TwittersAPI
enables rapid extraction of information about the online social networks of individual
users. Facebook and Googles API enable direct interface with its massive archives of
web content as well, but also includes information about the size, geographic location,
and demographic characteristics of the audiences of each site as well.
Classifying culture
Obtaining total or near total samples of text on a given topic is a remarkable feat given
that it was nearly unthinkable only a decade ago. Yet such giant samples are of little
utility if they cannot be classified in a meaningful manner. Cultural sociology has been
fascinated with classification since its inception because it was largely inspired by the
Durkheimian idea of classification struggles (e.g., Barth 1969; Bourdieu 1975;Douglas
1966;Latour1988). For example, Gieryn (1999) highlights the critical role of social
classification in the evolution of scientific fields. Lamont (1992,2000)explainshow
class and racial boundaries shape the process of group formation. Finally, Espeland and
Stevens (1998) make a broader argument about the key role of commensuration in
producing social power.
Yet for all the theoretical interest in the process of classifi-
cation, cultural sociologists seldom discuss the appropriate way to measure social
categories (Lamont and White 2009). Most studies either rely upon in-depth interviews
or case studies that highlight the social construction of ranking within institutions. The
lack of consensus about how to classify data has even prompted some critics to accuse
cultural sociologists of the reification of social classifications according to their theo-
retical persuasion (e.g., Biernacki 2012).
To date, cultural sociologists have scarcely explored the promise of automated text
analysis to classify texts.
Where these techniques have been used they have been relatively
primitive approaches to automation that simply identify keywords or phrases. This approach
is severely limited because it requires the researcher to have an a priori sense of which terms
are well suited to address the theoretical question of interest. Moreover, it eschews the
broader context of words within sentences. One solution to this problem is to evaluate the
co-prevalence of words within sentences using Global Regular Expression Print (GREP)
commands available in qualitative software analysis programs such as Atlas.TI or
Facebooks API requires user-authentication to access these data. Therefore, one must either access only
publicly available data or obtain an authentication token from a Facebook pages owner. Elsewhere, I argue
that app-based technologies are the most promising data collection tools to overcome such challenges. See
Bail (2013b).
For a recent review of this literature, see Lamont (2012).
Notable exceptions discussed in further detail below include Mohr (1998), Franzosi (2004), Bearman et al.
(1999), Bearman and Stovel (2000), Smith (2007), and Bail (2012).
Theor Soc
WordStat. Yet these approaches nevertheless fail to recognize important nuances in the
use of language. For example, a GREP search for sentences with the terms President
and hatewould reveal both I hate the President,and Id hate to be President.
Recent technological advances within the fields of computer science, pattern identifica-
tion, and linguistics have produced a variety of superior alternatives. I begin by reviewing
unsupervisedtext classification techniques that rely exclusively on computer algorithms to
create meaningful groupings of texts. For example, recent studies have invoked a number of
different forms of multi-dimensional scaling or cluster analysis to classify texts (e.g.,
Grimmer and King 2011; Livne et al. 2011).
These techniques replace each unique word
in a document with a number and then use various metrics to calculate dissimilarities among
all texts in the sample. These measures may be plotted within multidimensional space in
order to identify meaningful groupings of documents. A substantial problem with cluster
analysis is that the results are highly sensitive to the researchers assumptions about the
number of possible clusters (k), as well as the mathematical distances employed within each
algorithm. These idiosyncrasies can be controlled, however, if multiple forms of cluster
analysis are used in tandem. Grimmer and King (2011), for example, have developed
software that applies all existing variants of cluster analysis to large text corpora. They
apply this powerful tool to thousands of political texts by or about US presidents in order to
classify their ideological position on a range of substantive issues.
Another promising development within the big data movement for cultural sociologists is
the burgeoning field of machine learningand specifically the field of topic modeling. This
new field resulted from collaboration between linguists and computer scientists designed to
identify hidden or latent themes within large corpora.
Topic Models identify such themes
using probabilistic models that evaluate the co-occurrence of words. The most popular form
of topic modeling is Latent Dirichlet Allocation (LDA), which assumes a random allocation
of words across a latent theme or topic and then uses a generative process of classification to
analyze the probability of a document containing information about a topic given the
distribution of words therein.
Dozens of studies have used LDA or related Bayesian
approaches to infer latent topics in scientific journals, news articles, or blog posts (e.g., Blei
and Lafferty 2007; Hopkins and King 2010; Quinn et al. 2010). Despite these advances,
topic models have several considerable limitations. For example, the method assumes that
the order of words in a document does not matter, as well as the order of documents within
the broader sample. Most topic models also required that each document be assigned to
mutually exclusive categories, and do not recognize relationships between topics them-
selves. Basic topic models also do not recognize that topics may shift or combine over time.
Finally, topic modelsnot unlike cluster analysismust be validated in order to verify the
appropriate number of topics within a corpus.
This is particularly difficult given that many
cultural sociologists are interested in analyzing broad, unstructured samples of text such as
those described in the previous section of this article.
Mohr (1998) made early calls for cultural sociologists to adopt these methods to classify meaning structures, yet
they were mostly ignored even as they become widely used by cognitive anthropologists (e.g., DAndrade 1995).
For an overview of this field, see Blei (2012).
For a technical overview of LDA, see Blei et al. (2003).
A number of scholars have proposed validity measures for LDA, most recently Blei (2012). Most of these
emphasize comparisons of topic models via log-likelihoods or harmonic means, yet most proponents of topic
modeling agree that they must also be validated via qualitative inspection of individual topics within subsets of
large samples.
Theor Soc
Proponents of topic modeling have already begun to develop a number of solutions
to these limitations of this method, though they are too technical to discuss here.
Among the more promising recent developments in the field is the advent of super-
visedtopic modeling (Blei and McAuliffe 2010). In this technique, a human coder
identifies topics within a subset of documents, and topic models use these assignments
to assess probability instead of assuming that the distribution of topics across docu-
ments is random. Supervised text classification was first introduced within social
science by Hopkins and King (2010), who used this approach to assess public opinion
of presidential candidates expressed upon thousands of political blogs during the 2008
Given a sufficient number of training documentsproduced through in-
depth coding, these authors argue that their technique classifies sentiment about
presidential candidates more reliably than human coders themselves.
While such
claims have not yet been widely validated, supervised learning techniques hold con-
siderable promise for the purpose of identifying cultural elements within texts and
further improving the snowball sampling methods described above.
Perhaps the most important question for cultural sociologists interested in employing
topic models is whether they can be used to classify cultural elements such as frames,
symbolic boundaries, or cultural toolkits. A number of current studies suggest topic
models may be used to capture such nuanced cultural elements. For example, Dimaggio
et al. (forthcoming) argue topic models can be used to identify frames about arts
funding. Polletta is currently using topic modeling to identify hidden frames in Internet
discussions about cap-and-trade.
Hopkins (2013) employs topic models to measure
frames about the Affordable Care Act. Yet a key issue remains whether cultural
framesas Goffman (1974) first defined themcan be represented by groups of
words. While the face-work that Goffman emphasized is clearly not measurable
through text, Goffman himself used texts extensively throughout his work, including
biographies, newspaper clippings, and transcripts of interactions.
Although Goffman
emphasized the absence of certain words as much as the presence of othersthese
omissions could be modeled effectively because they would shape the probability
distributions around groups of words that LDA analyzes to create classifications of
texts. Nevertheless, the quality of supervised topic modeling is only as good as the
codes developed by human coders themselves. Therefore, cultural elements that are
highly nuanced or situation-based are not easily captured via this technique because of
low inter-coder reliability.
For example, see Blei and Lafferty (2006), Wallach (2006), Chang et al. (2009), and Hopkins and King
See also Grimmer (2010) and Quinn et al. (2010).
In particular, Hopkins and King (2010) argue that coding more than 500 documents produces diminishing
returns in the reliability of automated text analysis.
For example, a supervised topic model can be used to determine whether websites should be included in a
directed web-crawl such as SnowCrawl to capture sites that discuss a theme or topic without using a single
See Baumer et al. (2013).
Consider, for example, the diaries analyzed in Goffman (1963) or the newspaper clippings in Goffman
(1974). Also, textual descriptions of face-work or other unspoken forms of bodily interaction in the form of
field notes could potentially be analyzed using topic models.
For a discussion of the challenges of achieving high levels of inter-coder reliability in cultural analysis, see
Krippendorff (2003).
Theor Soc
On the other hand, the meticulous coding definitions required by topic models may also
provide an opportunity for cultural sociologists to contribute new methodologies to the big
data movement. Indeed, the use of generative and multi-stage coding schemes has been a
key concern of cultural sociology in the form of thick description(e.g., Geertz 1973),
middle-range theory(Merton 1949), structural hermeneutics(Alexander and Smith
2001), and paradigmatic clusters(Weber 2005). Each of these approaches emphasizes
that researches should move back and forth between different levels of analysis to tune their
coding schemes and to assess the scope conditions of a particular finding. To this end, the
expertise of cultural sociologists may be applied to repeated stages of supervised topic
modelselaborating classification systems as if they were Russian Dolls, to borrow
Bourdieus metaphor. Mohr et al. (2014), for example, have advanced this technique in
their study of US National Security Statements over a 22 year period. By developing
increasingly precise codes from iterative qualitative analysis of small sub-sets of this large
corpus of text, these scholars have developed increasingly promising topic models that can
later be applied to the entire sample. We need further empirical validation of such techniques.
At the very least, however, such methods provide a systematic method of focusing
qualitative microscopes within the increasingly overwhelming world of big data.
Tracing the evolution of cultural environments
One of the most promising elements of the big data movement is that so much of the
qualitative data that has been collected is longitudinal. For example, the Library of
Congresss archive of all Twitter messages will enable unprecedented analysis of how
different issues rise and fall over time. The Internet Archive and screen-scraping
technologies could be used to map shifts in the discourses of different types of websites
over time. Likewise, the massive newspaper and television transcript archives now
available could be used to analyze similar issues over the past century. These longitu-
dinal data are particularly promising because so many of the most pressing questions in
cultural sociology concern change over time. While Swidlers(1986) toolkit analogy
has received extensive attention in recent decades, for example, her call for future
studies to examine the transition from unsettled to settled historical periods has been
mostly ignored.
While Sewells(1996) theory of events has inspired considerable
interest, few studies place such events in broader historical context.
Finally, Lamonts
(1992) work reveals considerable cross-national differences in the salience of symbolic
boundaries. Yet we urgently need broad historical analyses to identify how such
divergent meaning systems evolved over time. Each of these outstanding questions
requires methods capable of capturing broad-scale cultural change.
In addition to identifying cultural elements such as frames or symbolic boundaries,
automated text analysis can be used to differentiate social actors or key events within
But see Cerulo (1998), Wagner-Pacifici (2010), and Bail (2012).
Still, historical analyses with big data are limited by the availability of texts produced during this period that
were amenable to digitization. This presents a number of important limitations, including pervasive illiteracy
during early historical periods as well as the tendency for only elite accounts of historical events to survive the
passage of time. Still, comparative-historical sociologists face these problems regardless of whether they are
working with big data. Furthermore, primary documents obtained through archival analysis can be easily
digitized through photographs, scanning, and text-recognition technologies.
Theor Soc
large qualitative datasets.
Cultural sociologists can make huge strides towards
advancing theories of social change simply by mapping the relations among cultural
elements, actors, and events over time. The literature on quantitative narrative analysis
has already established how analysis of relationships between actors and events can be
used to map broad historical sequences (e.g., Bearman et al. 1999; Bearman and Stovel
2000; Franzosi 2004; Smith 2007). Incorporating cultural elements identified via topic
modeling into such methods would open exciting new lines of inquiry about the
interpenetration of culture and structure. If topic modeling can be used to identify
actors and organizations as well as the cultural elements they produce, for example,
social network relationships might be mapped onto cultural patternsor vice versa. At
a minimum, mapping the relationships among cultural elements, actors, and events
would help focus in-depth qualitative analysis of key historical shifts or turning
points(Abbott 1997) where meaning structures change.
One problem, of course, is that cultural elements themselves often change through-
out such broad-scale historical transformations. Sewell, for example, argues the very
concept of revolution was developing at the same time that murderous mobs stormed
the Bastillesetting off the French revolution, before they knew precisely what they
were doing. Topic models are ill equipped to capture such nuances unless human
coders calibrate them repeatedly across multiple time periods. Even then, slight shifts in
cultural elements may be difficult to code automatically because human coders may
struggle to achieve high inter-coder reliability. Here again, new tools for automated text
may prove useful. For example, several new methods have been developed to identify
dissimilarities between pairs of documents. Primitive forms of these techniques simply
count the number of words shared between the two documents. Yet recent advances in
plagiarism detection software employ word-mapsthat utilize data from thesaurus in
order to identify nearmatches between two documents as well (e.g., Bail 2012).
Once again, these document comparison tools will not identify cultural elements by
themselves. Yet they may be particularly powerful when combined with topic models
and micro-level qualitative analysis of key texts or transitional moments within history.
Another major advantage of big data is that much of it includes detailed information
about relationships between social actors. This is particularly true of social media sites
such as Twitter or Facebook, but advances in library science are also creating hyper-
links between texts within archival collections as well. Using TwittersApplication
Programming Interface, once can easily extract not only all the messages produced by a
single actor, but also the precise location of this actor within a broader social network
including measures of both inand outdegree. Livne et al. (2011) for example,
extracted 460,000 tweets from all candidates for US House, Senate, and Gubernatorial
elections between 2006 and 2010. Their data not only reveals the partisan networks of
such social actors, but also patterns in the similarity of the language they post on Twitter
via cluster analysis. Through this analysis, Livne et al. document the meteoric rise of the
If key actors or events are already known, simple key word searches or Global Regular Expression Print
(GREP) commands may also be used to identify them. If actors or events are not known, they can be identified
through keyword counts that remove common words such as theor and.Once actors or events are
defined, topic models may be used to identify them as well. A number of computer scripts have also been
recently developed to identify names within big data without such intermediary steps such as the Natural
Language Toolkit and the Stanford Parser.
See also Sewell (1996) and Wagner-Pacifici (2010).
Theor Soc
Tea Party in recent elections, and the realignment of mainstream conservative networks
that ensued. These and other datasets could be used toaddress a number ofkey questions
at the intersection of cultural sociology and network theory. For instance, Pachucki and
Breigers(2010) argument about cultural holeswithin networks, or Vaisey and
Lizardos(2010) theory that cultural worldviews influence network composition.
The potential to assemble large datasets that describe cultural elements, actors,
events, and social networks over time may also encourage critical advances in field
theory. Most of the most pressing questions in this literature are about the evolution of
fields over time (Fligstein and McAdam 2011; Padgett and Powell 2012). For instance,
a number of recent studies have begun to analyze the emergence of fields (e.g.,
Armstrong 2002;Bartley2007). By and large, these case studies are unable to
investigate a variety of broad cultural processes that may occur between discursive
fields. For example, do most fields emerge out of the dissolution of others? Or, do fields
develop when the space between any two pre-existing fields is sufficiently broad (Eyal
2009;Medvetz2012)? Big data may also enable analysis of a number of intriguing
questions within individual fields as well. For example, do discursive fields have
carrying capacities for new forms of culture? Do certain actors gain power within
discursive fields by exploiting niches between rival factions? Or, what is the relation-
ship between the core and periphery of discursive fields (e.g., Bail 2012)?
Another exciting feature of big data is that it often includes geo-coded data. For
example, Twitter and Facebook record the geographic location of their users. This
information is also often recorded on the comments sections of websites. Finally,
analytics or insightsdata often include the latitude and longitude of visitors to
different websites via Internet Protocol (IP) addresses or other geographic identifiers
such as city names. Political scientists have even mined visual data on ethnic conflict
from Google Earth (Agnew et al. 2008). The potential to look at the relationship
between Cartesian coordinates and cultural elements could create a new subfield within
cultural sociology that analyzes the geography of meaning. Such a field might examine
questions such as: 1) Do cultural frames or symbolic boundaries cluster at the national
level or supranational levels? 2) Does physical proximity breed more convergence of
worldviews than online interaction? Finally, does the answer to these two questions
change over time as the forces of globalization push people ever closer together?
Cultural sociology has long suffered from an imbalance of theory and data (Ghaziani
2009). Yet the big data movement may radically alter this equilibrium. The big data
movement began with the Internet and social media, but the future of the field will also
entail increasingly ambitious forays into the past. As digitized historical archives
continue to expand and social scientists coordinate new ways of organizing qualitative
meta data with rich detail about the evolution of meaning, cultural sociologists can no
longer afford to ignore the big data movement. Above, I argued that integration of in-
depth qualitative coding techniques pioneered by cultural sociologists and anthropol-
ogists can be leveraged to improve already powerful automated text analysis techniques
On the concept of cultural holes, see also Lizardo (in this issue).
Theor Soc
produced by computer scientists, linguists, and political scientists. This synthesis will
enable cultural sociologists to achieve theoretical progress on questions that were once
thought un-measurable. Proponents of big data may also gain key insight from cultural
sociologists about how to further hone their tools to map the contours of cultural fields,
classify cultural elements, and trace the evolution of culture over time.
Yet for all of my optimism about the marriage of cultural sociology and big data,
formidable obstacles remain. Perhaps the most vexing problem is that big data often
does not include information about the social context in which texts are produced
(Griswold and Wright 2004). Although we are able to collect millions of blog posts
about virtually any issue, these data typically include little or no information about the
authors of such postsor those who comment upon them. Twitter data are publicly
available, but provide very little information about the social context in which Tweets
are produced. Other sites such as Facebook collect massive amounts of data about
social context but are often unable to share them with researchers because of concerns
about user privacy. Sources of big data outside social media also often lack important
information about the social context in which texts are produced. Collecting every
newspaper article on a political topic is of marginal utility absent in-depth analysis of
the political and institutional processes that lead media to gravitate towards one issue
over another.
Yet these obstacles are not without solutions that might build upon the
progress of cultural sociologists in developing mixed-method research designs. For
example, qualitative or quantitative surveys of Twitter users could be conducted to
place their online behavior within broader context. Or, large-scale analyses of media
data or historical surveys might be used to identify compelling puzzles for comparative
historical analysis. In theory, big data could also be used to guide ethnographic
interventions as wellor at least help place the findings of ethnography within broader
cultural fields. In brief, big data methods should be viewed as a complementnot a
replacementfor the tried and tested techniques of cultural sociology.
A second major challenge is that computer-assisted coding can never be more
reliable than the codes themselves. Cultural sociologists seldom discuss coding criteria
or inter-coder reliability, in part because the definition of many of our core concepts is
highly contested (Biernacki 2012). One need only read the literature on framing, for
example, to witness significant disagreement about whether and how they should be
measured or operationalized.
While these debates will not be easily resolved, the
integration of big data and cultural sociology will depend critically upon our capacity to
converge upon several broadly accepted definitions of these core concepts. Yet big data
may actually facilitate such conversationssince conceptual vagueness among cultural
sociologists results in part from our paucity of shared datasets. Cultural sociologists are
also looking across disciplinary lines for guidance in making core concepts more
concrete. For example, Mohr et al. (2013) have fused the literatures on narrative from
It is also worth noting that texts that cannot be collected because they are not in the public domain may
ultimately have less impact upon the evolution of broader cultural domains precisely because they are hidden
from public view. This underlies a broader pragmatist argument about the need to focus attention upon
consequences of social action (e.g., Johnson-Hanks et al. 2011; Tavory and Timmermans 2013). An interesting
analogue is the debate about the social construction of ethnicity via the enumeration of different groups by the
US Census (cf. Loveman and Muniz 2007). I thank Andy Perrin for bringing this issue to my attention.
For a detailed analysis of conceptual and methodological ambiguities in the measurement of frames, see
Scheufele (1999).
Theor Soc
linguistics with studies of social networks and topic modeling from sociology and
computer science. Polletta is currently synthesizing linguistics and cultural sociology
using new visualization techniques that enable them to explore how making people
aware of their cultural schemas shapes their behavior during democratic deliberation.
Finally, Ignatow and Mihalcea (2013) propose a model for big data analysis that
synthesizes neuroscience and Bourdieusian practice theory.
A final concern for cultural sociologists is the relatively high entry cost for those
who wish to develop the technical expertise currently necessary to work with big data.
Although these costs are rapidly decreasing thanks to simple web-based tools for big
data analyses, formalizing these techniques for cultural sociology will require a new
generation of scholars with both technical expertise and theoretical ambition. For now,
the big data movement urgently requires the guidance of theoretically and qualitatively
oriented cultural sociologists. Little can be learned from big data without big thinking.
While data mining may reveal interesting patterns in large text corpora or compelling
visualizations, many pieces of hay have come to resemble needles.
Therefore, the
future of the big data movement hinges upon collaboration among cultural sociologists,
computer scientists, and others to teach computers to differentiate different types of
meaning and their shifting relationships over time.
Acknowledgments I thank Elizabeth Armstrong, Alex Hanna, Gabe Ignatow, Charles Kurzman, Brayden
King, Jennifer Lena, John Mohr, Terry McDonnell, Andy Perrin, and Steve Vaisey for helpful comments on
previous drafts. The Robert Wood Johnson Foundation and the Odum Institute at the University of North
Carolina provided financial support for this research.
Abbott, A. (1995). Things of boundaries. Social Science Research, 62(4), 857882.
Abbott, A. (1997). On the concept of turning point. Comparative Social Research, 16,85106.
Abbott, A. (2001). Chaos of disciplines. Chicago: University of Chicago Press.
Agnew, J., Gillespie, T., Gonzalez, J., & Min, B. (2008). Baghdad nights: Evaluating the US military surge
using nighttime light signatures.
Alexander, J. (2006). The civil sphere. Oxford: Oxford University Press.
Alexander, J., & Smith, P. (2001). The strong program in cultural theory: Elements of a structural hermeneu-
tics. In J. H. Turner (Ed.), Handbook of sociological theory (pp. 135150). New York: Springer.
Armstrong, E. A. (2002). Forging gay identities:Organizing sexuality in San Francisco, 19501994. Chicago:
University of Chicago Press.
Bail, C. (2012). The fringe effect: civil society organizations and the evolution of media discourse about Islam,
20012008. American Sociological Review, 77(7), 855879.
See Baumer et al. (2013)
Efforts are currently underway to make the collection and analysis of big data possible for those without a
computer programming background. Gary King and colleagues are producing a web-based tool named
Consiliencethat will enable cluster analysis of unstructured text. Primitive forms of topic modeling and
sentiment analysis are available via a variety of web-based software programs as well such as www. . Finally, there is a variety of high quality tutorials available online for those who wish to
develop basic programming skills for working with big data. For example, see
big-data/,and A complete list of tutorials is available at http://www.
See Steve Lohr, The Age of Big Data,The New York Times, February 11, 2012.
Theor Soc
Bail, C. (2013a). Winning minds through hearts: Organ donation advocacy, emotional feedback, and
social media. Working Paper, Department of Sociology, University of North Carolina at Chapel
Bail, C. (2013b). Taming big data: Apps and the future of survey research. Working Paper, Department of
Sociology, University of North Carolina, Chapel Hill.
Bail, C.A. (forthcoming). Terrified: How anti-muslim organizations became mainstream. Princeton University
Press, Princeton, NJ.
Barth, F. (1969). Ethnic groups and boundaries: The social organization of cultural difference. Boston: Little,
Bartley, T. (2007). How foundations shape social movements: the construction of an organizational field and
the rise of forest certification. Social Problems, 54(3), 229255.
Baumer, E. P. S., Polletta, F., Pierski, N., Celaya, C., Rosenblatt, K., & Gay, G. K. (2013, February).
Developing computational supports for frame reflection. Retrieved from
Bearman, P., & Stovel, K. (2000). Becoming a Nazi: a model for narrative networks. Poetics, 27(2), 6990.
Bearman, P., Faris, R., & Moody, J. (1999). Blocking the future: new solutions for old problems in historical
social science. Social Science History, 23(4), 501533.
Benford, R., & Snow, D. (2003, November 28). Framing processes and social movements: An overview and
assessment. Review-article.
Biernacki, R. (2012). Reinventing evidence in social inquiry: Decoding facts and variables. New York:
Palgrave Macmillan.
Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 7784.
Blei, D., & Lafferty, J. (2006). International Conference on Machine Learning, ACM, New York, New York,
Blei, D., & Lafferty, J. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 1735.
Blei, D., & McAuliffe, J. (2010). Supervised topic models. arXiv:1003.0783. Retrieved from
Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3,
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. JournalofComputational
Science, 2(1), 18. doi:10.1016/j.jocs.2010.12.007.
Bourdieu, P. (1975). The specificity of the scientific field and the social conditions of the progres of reason.
Social Science Information, 14(6), 119.
Bourdieu, P. (1985). The social space and the genesis of groups. Theory and Society, 14(6), 723744. doi:10.
Bourdieu, P. (1990). Homo Academicus (1st ed.). Stanford: Stanford University Press.
Cerulo, K. A. (1998). Deciphering violence: The cognitive structure of right and wrong (1st ed.). New York:
Chang, J., Boyd-graber, J., Gerrish, S., Wang, C., & Blei, D. M. (2009). Reading tea leaves: How humans
interpret topic models.
Collins, R. (2013). Solving the Mona Lisa smile, and other developments in visual micro-sociology. Working
Paper, Department of Sociology, University of Pennsylvania.
DAndrade, R. G. (1995). The development of cognitive anthropology. Cambridge: Cambridge University
DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23,263287.
DiMaggio, P., & Bonikowski, B. (2008). Make money surfing the web? The impact of internet use on the
earnings of U.S. workers. American Sociological Review, 73(2), 227250. doi:10.1177/
Dimaggio, P., Hargittai, E., Neuman, W. R., & Robinson, J. (2001). Social implications of the internet. Annual
Review of Sociology, 27, 307336.
Dimaggio, P., Nag, M., & Blei, D. (forthcoming). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of government arts funding in the
U.S. Poetics, Page numbers unknown.
Douglas, M. (1966). Purity and danger: An analysis of concepts of pollution and taboo. New York: Praeger.
Douglas, M. (1986). How institutions think. Syracuse: Syracuse University Press.
Eliasoph, N., & Lichterman, P. (2003). Culture in interaction. American Journal of Sociology, 108(4), 735
Espeland, W. N., & Stevens, M. L. (1998). Commensuration as a social process. Annual Review of Sociology,
24,313343. doi:10.2307/223484.
Theor Soc
Evans, R., & Kay, T. (2008). How environmentalists Greenedtrade policy: strategic action and the
architecture of field overlap. American Sociological Review, 73(6), 970991. doi:10.1177/
Eyal, G. (2009). The space between fields. Working Paper, Center for Comparative Research, Yale University.
Fligstein, N., & McAdam, D. (2011). Toward a general theory of strategic action fields. Sociological Theory,
29(1), 126.
Foucault, M. (1970). The order of things: An archaeology of the human sciences (1st ed.). New York: Vintage.
Franzosi, R. (2004). From words to numbers: Narrative, data, and social science. Cambridge: Cambridge
University Press.
Franzosi, R. (2009). Quantitative narrative analysis (1st ed.). Thousand Oaks: SAGE Publications, Inc.
Gaby, S., & Caren, N. (2012). Occupy online: how cute old men and Malcolm X Recruited 400,000 U.S. users
to OWS on Facebook. Social Movement Studies, 11,367374.
Geertz, C. (1973). The interpretation of cultures: Selected essays. New York: Basic Books.
Ghaziani, A. (2009). An amorphous mist? The problem of measurement in the study of culture. Theory and
Society, 38(6), 581612. doi:10.1007/s11186-009-9096-2.
Ghaziani, A., & Baldassarri, D. (2011). Cultural anchors and the organization of differences. American
Sociological Review, 76(2), 179206. doi:10.1177/0003122411401252.
Gieryn, T. F. (1999). Cultural boundaries of science: Credibility on the line (1st ed.). Chicago: University of
Chicago P ress.
Goffman, E. (1963). Stigma: Notes on the management of spoiled identity. New York: Touchstone.
Goffman, E. (1974). Frame analysis. Cambridge: Harvard University Press.
Gold, M. K. (2012). Debates in the digital humanities. Minneapolis: U of Minnesota Press.
Golder, S. A., & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and day length across
diverse cultures. Science, 333(6051), 18781881. doi:10.1126/science.1202775.
Gong, A. (2011). An automated snowball census of the political web. SSRN eLibrary. Retrieved from http://
Grimmer, J. (2010). A Bayesian hierchical topic model for political texts: measuring expressed agendas in
senate press releases. Political Analysis, 18(1), 135.
Grimmer, J., & King, G. (2011). General purpose computer-assisted clustering and conceptualiza-
tion. Proceedings of the National Academy of Sciences, 108(7), 26432650. doi:10.1073/pnas.
Griswold,W.,&Wright,N.(2004).Wiredandwellread.InSociety online: The internet in context. New York: Sage.
Hopkins, D. (2013). The exaggerated life of death panels: The limits of framing effects in the 20092012
health care debate. Working Paper, SSRN.
Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science.
American Journal of Political Science, 54(1), 229247. doi:10.1111/ j.1 540-59 07. 2009.00 428 .x.
Ignatow, G., & Mihalcea, R. (2013). Text mining for comparative cultural analysis. Working Paper,
Department of Sociology, University of North Texas.
Johnson-Hanks, J., Bachrach, C., Morgan, P., & Kohler, H.-P. (2011). Understanding family change and variation:
toward a theory of conjuctural action. Understanding Population Trends and Processes, 5,1179.
Kaufman, J. (2004). Endogenous explanation in the sociology of culture. Annual Review of Sociology, 30,
King, G. (2011). Ensuring the data rich future of the social sciences. Science, 331(11 February), 719721.
Krippendorff, K. H. (2003). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks:
Sage Publications, Inc.
Lamont, M. (1992). Money, morals, and manners: The culture of the French and American upper-middle
class. Chicago: University of Chicago Press.
Lamont, M. (2000). The dignity of working men: Morality andthe boundaries of race, class, and immigration.
New York: Russell Sage.
Lamont, M. (2012). Toward a comparative sociology of valuation and evaluation. Annual Review of Sociology,
Lamont, M., & White, P. (2009). The evaluation of systematic qualitative research in the social sciences.
Report of the U.S. National Science Foundation.
Lan, T., & Raptis, M. (2013). From subcategories to visual composites: A multi-level framework for object
recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Latour, B. (1988). How to follow scientists and engineers through society. Cambridge: Harvard University Press.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., et al. (2009). SOCIAL
SCIENCE: computational social science. Science, 323(5915), 721723. doi:10.1126/science.
Theor Soc
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: a new
social network dataset using Social Networks, 30(4), 330342. doi:10.1016/j.socnet.2008.
Lieberson, S. (2000). A matter of taste: How names, fashions, and culture change. New Haven: Yale
University Press.
Livne, A., Simmons, M. P., Adar, E., & Adamic, L. (2011). The party is over here: Structure and content in the
2010 election. Proceedings of the Fifth Intenrational AAAI Conference on Weblogs and Social Media,
Loveman, M., & Muniz, J. (2007). How Puerto Rico became white: boundary dynamics and inter-census
racial classification. American Sociological Review, 72, 915939.
Manning, C. D., & Schuetze, H. (1999). Foundations of statistical natural language processing (1st ed.).
Cambridge: The MIT Press.
Mark, N. P. (2003). Culture and competition: homophily and distancing explanations for cultural niches.
American Sociological Review, 68(3), 319345. doi:10.2307/1519727.
Martin, J. L. (2003). What is field theory? American Journal of Sociology, 109(1), 149.
Medvetz, T. (2012). The rise of think tanks in America: Merchants of policy and power. Chicago: University of
Merton,R.(1949).Social theory and social structure. New York: The Free Press.
Mische,A.(2008).Partisan publics: Communication and contention across Brazilian youth activist networks.
Princeton: Princeton University Press.
Mohr, J. (1998). Measuring meaning structures. Annual Review of Sociology, 24,345370.
Mohr, J., & Guerra-Pearson, F. (2010). The duality of niche and form: The differentiation of institutional space
in New York City, 18881917. In Categories in markets: Origins and evolution (pp. 321368). New
York: Emerald Group Publishing.
Mohr, J., Singh, A., & Wagner-Pacifici, R. (2013). CulMINR: Cultural meanings from the interpretation of
narrative and rhetoric: A dynamic network approach to hermeneutic mining of large text corpora.
Working Paper, Department of Sociology, University of California, Santa Barbara.
Mohr, J., Wagner-Pacifici, R., Breiger, R., Bogdanov, P. (2014). Graphing the grammar of motives in National
Security Strategies: cultural interpretation, automated text analysis, and the drama of global politics.
Poetics, 41(6), 670700.
Moretti, F. (2013). Distant reading. London: VERSO BOOKS.
Pachucki, M. A., & Breiger, R. L. (2010). Cultural holes: beyond relationality in social networks and culture.
Annual Review of Sociology, 36(1), 205224. doi:10.1146/annurev.soc.012809.102615.
Padgett, J. F., & Powell, W. W. (2012). The emergence of organizations and markets. Princeton: Princeton
University Press.
Paul, M. J., & Dredze, M. (2011). You are what you tweet: Analyzing Twitter for public health. Fifth
International Conference on Weblogs.
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political
attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209228.
doi:10.1111/j.1540-5907.2009.00427.x .
Scheufele, D. A. (1999). Framing as a theory of media effects. The Journal of Communication, 49(1), 103
122. doi:10.1111/j.1460-2466.1999.tb02784.x.
Sewell, W. (1996). Historical events as transformations of structures: inventing revolution at the Bastille.
Theory and Society, 25(6), 841881. doi:10.1007/BF00159818.
Smith, T. (2007). Narrative boundaries and the dynamics of ethnic conflict and conciliation. Poetics, 35,22
Swidler, A. (1986). Culture in action: symbols and strategies. American Sociological Review, 51(2), 273286.
Swidler, A. (1995). Cultural power and social movements. In Social movements and culture. London:
Tangherlini, T. R., & Leonard, P. (2013). Trawling in the Sea of the Great Unread: sub-corpus topic modeling
and Humanities research. Poetics. doi:10.1016/j.poetic.2013.08.002.
Tavory, I., & Timmermans, S. (2013). Consequences in Action: A pragmatist approach to causality in
ethnography. Working Paper, New School for Social Research.
Vaisey, S., & Lizardo, O. (2010). Can cultural worldviews influence network composition? Social Forces,
88(4), 15951618. doi:10.1353/sof.2010.0009.
Wagner-Pacifici, R. (2010). Theorizing the restlessness of events. American Journal of Sociology, 115(5),
Wallach, H. (2006). Topic modeling: Beyond bag of words. Proceedings of the 23rd International Conference
on Machine Learnings.
Theor Soc
Weber, K. (2005). A toolkit for analyzing corporate cultural toolkits. Poetics, 33(34), 227252. doi:10.1016/j.
Wuthnow, R. (1993). Communities of discourse: Ideology and social structure in the reformation, the
enlightenment, and European socialism. Cambridge: Harvard University Press.
Zelizer, V. A. R. (1985). Pricing the priceless child: The changing social value of children. Princeton:
Princeton University Press.
Christopher A. Bail is Assistant Professor of Sociology at the University of North Carolina, Chapel Hill. His
research interests include cultural sociology, political sociology, organizations, and mixed-method research
designs. He is currently completing a manuscript entitled Terrified:How Anti-Muslim Organizations Became
Mainstream. His other work has appeared in the American Sociological Review and the Revue Europeene des
Migrations Internationales.
Theor Soc
... In the first stage of the research we studied the 50 most followed Media Directors from Spain and analyzed the 50 accounts they started to follow as a network, from 2017 to 2019 (Israel-Turim and Micó-Sanz, 2021). We categorized these accounts and proceeded to do a quantitative data analysis as we crossed different variables of the data we collected and used visualization tools in the search for possible repetitions that could signify patterns or trends (Mahrt and Scharkow, 2013;Bail, 2014;Batrinca and Treleaven, 2015). We melded data analysis techniques combining computational and manual methods to preserve contextual implications while obtaining as much information and knowledge from the data (Lewis, Zamith and Hermida, 2013). ...
... Con el objetivo de contribuir con la profundización de la comprensión de las dinámicas y flujos de influencia online entre las élites de poder, analizamos a través de un software de machine learning, las 50 cuentas que la red de los directores de medios más seguidos en España comenzó a seguir, y las comparamos con las cuentas que comenzaron a seguir los medios que dirigen. Las categorizamos en tipos de cuentas, ubicación y género, y analizamos las repeticiones entre las cuentas que comenzaron a seguir, para luego trabajar con métodos de visualización de datos en busca de tendencias y patrones(Bail, 2014;Batrinca and Treleaven, 2015). Los resultados de esta investigación indican que algunos patrones de comportamiento difieren entre ambas redes, como el género y los tipos de cuentas que comenzaron a seguir, mientras que presentaron tendencias similares con respecto a la ubicación de las cuentas. ...
Full-text available
Digital platforms have transformed the influence streams among media, journalists, politicians and the citizenship, as well as concerning gatekeeping and agenda setting (Guo and Vargo, 2017; Wallace, 2018; Casero-Ripollés, 2021). Nonetheless, homophilic tendencies among power groups continue to be reproduced online (McPherson, Smith-Lovin and Cook, 2001; Maares, Lind and Greussing, 2021). With the objective of contributing to the deepening of the understanding of the dynamics and influence flows online among power elites, we analyzed via a machine learning Software, the 50 accounts that the network of the most followed Media Directors in Spain began following and compared them with the accounts that the Media they manage started following. We categorized them in Types of accounts, Location and Gender, and analyzed the repetitions between the accounts they began to follow to subsequently work with data visualization methods in order to find trends and tendencies (Bail, 2014; Batrinca and Treleaven, 2015). The results of this research indicate that some patterns of behavior differ between both networks, such as the gender and types of accounts they began following, whereas the location presented similar trends. The year where we can see the highest similarities corresponds to 2018, an electoral year in Spain, where both networks started following a majority of Spanish male politicians.
... Where understanding culture will require understanding the actions and its reasonings, and the surroundings forming that culture, which may involve sub-fields like emotional recognition, behavioural, classification, and categorisation. One of the biggest concerns of incorporating big data with cultural sociology is that text-based data rarely contain the social context produced by it (Bail, 2014). Hence, Christopher proposed synthesising conventional qualitative analysis methods with automated analysis of text-based data, in which he highlighted treating automated analysis as a complement to qualitative analysis, not a replacement (Bail, 2014). ...
... One of the biggest concerns of incorporating big data with cultural sociology is that text-based data rarely contain the social context produced by it (Bail, 2014). Hence, Christopher proposed synthesising conventional qualitative analysis methods with automated analysis of text-based data, in which he highlighted treating automated analysis as a complement to qualitative analysis, not a replacement (Bail, 2014). Some studies like Behnam et al. analyse users' online behaviour with big data from social media, including the demographic data to provide the social context, and preprocessed the data by adding gender extraction and language identification and translation to make the dataset more complete and consistent (Rahdari et al., 2017). ...
Full-text available
The applications of artificial intelligence (AI) in our daily activities have helped us accomplish our tasks more effectively and efficiently. With the advancement of computing power and the Internet, AI systems have been applied in many areas such as business, management, and health. This paper analysed the possibility of integrating AI into the Islamic System of Governance (ISG). This research utilised Amin’s ISG, which provides a set of analytical frameworks for gearing relevant organisations towards achieving their objectives. The ISG was derived from the Prophetic Madinian Polity and thus adopted elements of the polity. For instance, the ISG set the Maqāsid of the Sharī’ah as an organisation’s strategic objectives and constructed the Islamic Governance operational framework with four components from elements of the polity: (1) Tauhid, (2) Juristic, (3) Values, and (4) Cultural. A comprehensive literature review was conducted to look into the integration of AI with this ISG. The research findings indicated that there is not much research conducted in this area. Therefore, we explore the potential of applying AI in the context of an ISG. Several researches have investigated the integration of AI into some of the ISG components like the Juristic, Values, and Cultural, in other areas and contexts. However, none of them is completely suitable for the ISG context. Furthermore, none of the research was done as an integrated whole, thus, ignoring the relational dynamic between the components of ISG. Hence, a more comprehensive study is required to fill the gap in deploying AI with the ISG.
... Research into the activities of a particular individual in combination with the picture provided by open could provide a more realistic and objective picture. For example, Bail sees in the big data movement an instrument that can radically change the imbalance of cultural sociology between theory and data [Bail 2014]. ...
Full-text available
The cultural and creative sector (CCS) is one of the hardest-hit sectors due to COVID-19. The rapid decline in audience and revenues makes it impossible to create and offer financially demanding cultural products and services. This will be a long- term effect – it will affect the supply and demand of cultural services for at least two to three years to come. A realistic forecast suggests that this period will be characterized by a narrowed audience market, a decline in the purchasing power of the population and reduced leverage of funding. Decisions about performances, productions and investments in new content can no longer rely on traditional audience behavioural patterns, historical demand data, or an organization’s institutional memory. Cultural institutions need to make decisions about the future without being able to predict it even two weeks ahead. The article examines the open data arrays available in Latvia on and the potential of their use in the cultural and creative sector to alleviate the crisis.
... In fact, the codes were created when the data analysis took place, and the authors did not use any predetermined code. This way, the key components of the daytime tourism milieu were inductively derived from the data (Bail, 2014;Baumer et al., 2013). Secondly, the authors applied topic modelling, which is based on calculations and programming techniques for identifying similar and different elements of texts. ...
Full-text available
In order to propose a repositioning toolkit, this research addresses the essence of the daytime tourism milieu of the Hungarian capital Budapest’s nightly party zone and formulates the following two research questions: (1) What are the available elements of the daytime tourism milieu of Budapest’s party zone? and (2) How can this milieu enhance tourist experience for leveraging a sense of place in a future post-Covid-19 era? The data for this research were collected with the help of 85 undergraduates, who were given the task of taking 3 photos, as if they were tourists, aiming to capture the best reflection of the daytime tourism milieu of Budapest’s party zone. A database of 255 photos was analysed through visual content analysis. Additionally, each image was assigned a location of the photo, five hashtags and a short description. The descriptions of the photos were analysed by Python programming and calculations. The findings show the most important research outcomes concerning Budapest’s party zone, focusing on the daytime values of the district. The research identified the “creative milieu”’, “Jewish heritage milieu” and “gastronomic milieu” as the most important daytime profiles of the party zone. Based on the findings, the authors propose a repositioning toolkit and a strategy which, on the one hand, will develop a stronger sense of place in the case of the tourism milieu of Budapest’s party zone and, on the other hand, will position the party zone not only as a place of nightlife but also as a venue of daytime tourism.
... An dieser Stelle kommen wir in den Bereich der automatisierten bzw. computergestützten, quantitativen Textanalyse (Bail 2014;Heiberger und Riebling 2016). Mit der Hilfe solcher Verfahren sind beispielsweise die Identifikation von wiederkehrenden Themen sowie der in der digitalen Kommunikation transportierten Stimmungen in einem ersten Schritt möglich. ...
Full-text available
Das Open Access Buch bietet für Einsteiger*innen Erklärungen darüber, wie eine geeignete qualitative oder quantitative inhaltsanalytische Methode abhängig von a) Forschungsinteresse und b) Datenumfang ausgewählt werden kann. Teil 1 definiert Auswertungstechniken und zeigt Möglichkeiten und Grenzen sozialwissenschaftlicher Inhaltsanalysen auf. Teil 2 stellt digital unterstützte und teilautomatisierte Techniken, Teil 3 die automatisierten Techniken Korrespondenzanalyse, Sentiment Analyse und Topic Modeling vor. Alle Einführungen erfolgen mit Beispielen und Softwareanwendungen (AntConc, MAXQDA, Python, RStudio oder VosViewer).
Public culture is a powerful source of cognitive socialization; for example, media language is full of meanings about body weight. Yet it remains unclear how individuals process meanings in public culture. We suggest that schema learning is a core mechanism by which public culture becomes personal culture. We propose that a burgeoning approach in computational text analysis – neural word embeddings – can be interpreted as a formal model for cultural learning. Embeddings allow us to empirically model schema learning and activation from natural language data. We illustrate our approach by extracting four lower-order schemas from news articles: the gender, moral, health, and class meanings of body weight. Using these lower-order schemas we quantify how words about body weight “fill in the blanks” about gender, morality, health, and class. Our findings reinforce ongoing concerns that machine-learning models (e.g., of natural language) can encode and reproduce harmful human biases.
Computational power and big data have created new opportunities to explore and understand the social world. A special synergy is possible when social scientists combine human attention to certain aspects of the problem with the power of algorithms to automate other aspects of the problem. We review selected exemplary applications where machine learning amplifies researcher coding, summarizes complex data, relaxes statistical assumptions, and targets researcher attention to further social science research. We aim to reduce perceived barriers to machine learning by summarizing several fundamental building blocks and their grounding in classical statistics. We present a few guiding principles and promising approaches where we see particular potential for machine learning to transform social science inquiry. We conclude that machine learning tools are increasingly accessible, worthy of attention, and ready to yield new discoveries for social research.
Network perspectives in organizational research have focused primarily on how the embeddedness of actors shapes individual, or nodal, outcomes. Against this backdrop, a growing number of researchers have begun to adopt a wider lens on organizational networks, shifting the focus to collective, or whole network, performance. Yet, efforts to understand the relationship between whole network structure and whole network performance have produced conflicting findings, which suggests that a different approach may be needed. Drawing on macrostructural sociology, we propose a "whole network morphology" framework, which argues the whole network structure-performance relationship is contingent on other fundamental—relational and cultural—whole network dimensions. Subsequently, we undertake an application of our framework, through which we demonstrate how a morphological view helps address conflicting findings on the structure-performance relationship. We study 250 whole interorganizational networks known as Accountable Care Organizations (ACOs), which collectively comprise more than 44,000 healthcare organizations and 250,000 physicians. Consistent with previous work, we do not find a clear association between structural connectedness and performance. However, we find that a more disconnected network structure is associated with negative ACO performance when the relational strength of network ties is high. We also find evidence of better ACO performance in the presence of a physician cultural orientation when the whole network is more connected.
The emergence of big data and computational tools has introduced new possibilities for using large-scale textual sources in sociological research. Recent work in sociology of culture, science, and economic sociology has shown how computational text analysis can be used in theory building and testing. This review starts with an introduction of the history of computer-assisted text analysis in sociology and then proceeds to discuss five families of computational methods used in contemporary research. Using exemplary studies, it shows how dictionary methods, semantic and network analysis tools, language models, unsupervised, and supervised machine learning can assist sociologists with different analytical tasks. After presenting recent methodological developments, this review summarizes several important implications of using large datasets and computational methods to infer complex meaning in texts. Finally, it calls researchers from different methodological traditions to adopt text mining tools while remaining mindful of lessons learned from working with conventional data and methods.
How do women’s business networks help to advance women’s freedom? Drawing on Zerilli’s freedom-centred feminism, our study sets out to answer this question at the intersection of freedom, feminism and work. Critics argue that women’s business networks promote a postfeminist view of freedom focusing on individual self-realisation and thus participate in rolling back collective, feminist efforts to dismantle structural inequalities. We reconceptualise women’s business networks as political arenas and argue that making claims about shared interests and concerns in such an arena constitutes a feminist practice of freedom. With an original, inductive and qualitative research design combining topic modeling and dialectical analysis, we examine the claims made in 1529 posts across four women’s business network blogs. We identify postfeminist claims and new forms of change and transformation that can help to advance women’s freedom across three ‘dialectics of freedom’: conformity and imagination; performative care and relational care; sameness and openness. Our findings show that uncertain and contradictory ways of defining and engaging with women’s freedom can emerge through claim-making in such arenas. The fragility of the process and its outcomes are, then, what can move feminism forward at work and beyond.
Fertility rates vary considerably across and within societies, and over time. Over the last three decades, social demographers have made remarkable progress in documenting these axes of variation, but theoretical models to explain family change and variation have lagged behind. At the same time, our sister disciplines—from cultural anthropology to social psychology to cognitive science and beyond—have made dramatic strides in understanding how social action works, and how bodies, brains, cultural contexts, and structural conditions are coordinated in that process. Understanding Family Change and Variation: Toward a Theory of Conjunctural Action argues that social demography must be reintegrated into the core of theory and research about the processes and mechanisms of social action, and proposes a framework through which that reintegration can occur. This framework posits that material and schematic structures profoundly shape the occurrence, frequency, and context of the vital events that constitute the object of social demography. Fertility and family behaviors are best understood as a function not just of individual traits, but of the structured contexts in which behavior occurs. This approach upends many assumptions in social demography, encouraging demographers to embrace the endogeneity of social life and to move beyond fruitless debates of structure versus culture, of agency versus structure, or of biology versus society.
In several ways, Swidler provides a more developed analysis of the relationship between culture and social movements than does McAdam. First, she focuses on the ways culture shapes individual beliefs and desires. Thus, culture provides a means by which people make sense of the world. Second, Swidler examines the ways culture provides repertoires of public symbols that structure the kinds of expected responses that individuals develop from their social interactions. A handshake on first meeting a person could be seen as such a symbol: Failure to shake hands once another has been extended is a deliberate insult. Thus, once they have offered it, most people expect that their hand will be shaken. Such an expectation represents cultural knowledge that exists even when no handshake is ongoing. Such assumptions may shape how a social movement acts even if its members are ideologically divided and its contention with the broader society sharp. Third, Swidler pays attention to the ways social institutions shape movement activities: If official organizations and others try to integrate or co-opt a group, for example, the movement is likely to behave differently than if it faces aggressive, perhaps violent, repression. Culture, then, is more than just the private beliefs of individual group members, and it is more than a set of broad principles that can be used for group purposes. It involves a dynamic interaction that shapes private and public acts together.