Working PaperPDF Available
Please cite as: Gheyle, N. & Jacobs, T. (2017). Content Analysis: a short overview. Internal research note.
Keywords: content analysis; discourse analysis; qualitative; quantitative ; unitizing ; sampling ; reliability ;
validity; coding
In this working paper
What is content analysis?
Are there differences between quantitative and qualitative approaches?
How does content analysis distinguish itself from discourse analysis?
What methodological considerations are common for content analyses?
What are some examples of different content analyses?
Typing “content analysis” in Google Scholar provides an astonishing 5,790,000 results, with
the regular Google search function even reaching 41,500,000 hits (December 2017). Clearly,
content analysis is a term often mentioned, used and searched for. Giving that it promises to
“analyze” “content”, this is not really surprising. Aren’t all researchers to some extent
analyzing the content of something? This multitude of textbooks, papers, and web-excerpts
seems daunting to delve into. What is more, several of these contributions even the most
highly cited ones define content analysis differently, and (even more importantly) deem
different approaches worthy of the label ‘content analysis’. We think this is unhelpful. For
ourselves, first and foremost, but maybe also for other people (e.g. students) who quickly want
to digest what content analysis is, what it is not, which concepts and considerations are often
associated with it, and how some examples look like. This short overview tries to clear a little
bit of the fog. It is not intended as an exhaustive review of how to do a content analysis (there
are many great books out there), nor is it a “quick fix” that will alleviate the reader of doing
further reading. Rather, we summarize some points we think are worth mentioning about the
books and papers we read, together with our own thoughts on how to think about this
methodology.
Content Analysis: a short overview
Niels Gheyle & Thomas Jacobs
Centre for EU Studies, Ghent University
1. Introduction
Content analysis (CA) is a research methodology to make sense of the (often unstructured)
content of messages be they texts, images, symbols or audio data. In short it could be said to
try to determine textual meaning. It is only one research methodology that promises to do this,
as there are numerous other analyses dealing with text, messages and its content and meaning
(such as conversational, rhetorical or discourse analysis). However, content analysis is distinct,
for several reasons, as can be noticed in one often-cited definition: it is a research technique for
making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their
use(Krippendorff, 2004). This stresses the inferential nature of content analysis: the fact that
through an inductive, deductive, or abductive process, conclusions are drawn from certain
premises and samples. Content analysts therefore typically use some guidelines for inference
(based on existing theories, previous research, or experience) and strict procedural (coding)
rules to move from unstructured text to answers to their research questions (White & Marsh,
2006). During this process, due attention is given to the context wherein these messages are
embedded: two similar sentences can mean different things in different surroundings.
A common distinction in social sciences, which also applies to content analysis methods, is
between qualitative and quantitative analyses. Problematically though, this dichotomy can be
understood in multiple ways, which in itself can be a source of confusion, but the various ways
of defining what counts as qualitative and quantitative also blur the frontiers of what can be
considered as content analysis. The dichotomy is first of all applicable to approaches within
the container of content analysis itself if we delineate it from other textual analyses. Hence,
even after distinguishing content analysis from other methodologies, it can flexibly be applied
in a quantitative or qualitative setting (White & Marsh, 2006). Section 2 of this paper deals with
what this distinction entails.
Secondly, some authors equate the word “qualitative” with “interpretive”, given that content
analyses in general focus on meaning and context. The dichotomy in this sense is between
content analysis as a systematic, rudimentary, quantitative approach, and other approaches
that are more qualitative or interpretative (Neuendorf, 2001). Content analysis should hence be
contrasted to, for example, discourse analysis (DA), which would then be the more
“qualitative” of the two. But while both deal with text in some way, we argue there are still
differences between CA and DA that make it distinct approaches. In section 3, we assess these
differences, as one way to delimit CA from other textual analyses.
In sum, we deem “content analysis” a distinct methodology from “discourse analysis” (or
other types of textual analysis, such as rhetorical or conversational analysis), while
maintaining that within the container-term of content analysis, there is a continuum of
quantitative and qualitative approaches to using it. In sector 2 and 3, we explain this in more
detail. After this delineation, section 4 shows some of the methodological considerations that
content analysts always take into account. Section 5 provides a range of examples where
content analysis can be applied to. Section 6, lastly, looks at some of the contemporary
evolutions that we witness.
2. Quantitative and qualitative approaches
The way content analysis is defined often matches the discipline and time period it has been
developed in. The origins of the first mentioned CA (under that header) track back to the 1950s,
where it has been developed in communication studies, with a very ‘list-and-count’-approach
(Krippendorff, 2004; White & Marsh, 2006). Newspaper data was coded into explicit (a priori)
categories and then described using several statistical tools (e.g. cross-tabulation, correlation
or regression analysis). It is this approach that came to be seen as a more quantitative content
analysis and flows from a positivist tradition (White & Marsh, 2006). The main elements of
such an approach are the generation of hypotheses, the sampling of data, and a clear a priori
coding scheme. It implies a deductive approach, whereby categories are decided upon from
the beginning, and unambiguous coding rules are laid out to know what goes where. After
coding, statistical tools are used to analyze the results, but also to test for their reliability and
validity (cfr. infra).
Several definitions in the literature still reflect this more quantitative approach and some
equate it with ‘content analysis’ as such: CA is “a research technique for the objective,
systematic, and quantitative description of manifest
1
content of communications” (Berelson,
1952). The adjective ‘manifest’ refers to information in texts that is visible and obtainable at
first sight opposite to ‘latent’ content, which is more hidden in text and requires more
subjective interpretation. Neuendorf (2001) as well calls it a “summarizing, quantitative
analysis of messages that relies on the scientific method (including attention to objectivity-
intersubjectivity, a priori design, reliability, validity, generalizability, replicability and
hypothesis testing). CA defined in this way can best be described by the metaphor of a
‘container’, as if meaning or content is inherent to a text, and it is just waiting there to be picked
up by the content analyst (Krippendorff, 2004).
Qualitative approaches also require an analytical process that implies formulating research
questions, sampling, working with categories, coding, and determining trustworthiness (Kaid
& Wadsworth, 1989). However, it differs most from quantitative approaches with respect to
categorization and coding. Its outset is more inductive, in that it does not have pre-defined
categories based on existing research, but more open questions that can go different ways.
Instead of an a priori coding scheme, coding and analyzing happen interchangeably, reading
through the text while constructing categories that appear (for the first time) or qualify the
research questions (Hsieh & Shannon, 2005). The evidence is as important as the initial
questions guiding the research (White & Marsh, 2006). Krippendorff (2004) calls this process a
hermeneutic loop: constantly re-contextualizing, reinterpreting and redefining the research
(White & Marsh, 2006). Code and question are co-constructed in an abductive research strategy
(Delputte & Orbie, 2017).
1
“Manifest” points towards information in texts that is visible and easily extracted from it. This is the opposite
of “latent” content, which is not directly observable, often more subjective, but also sometimes inferred from
manifest content.
Qualitative CA also pays more attention to semantic relationships rather than just presence of
words, and in general on meaning behind texts. It goes beyond merely counting words or
columns, by categorizing bodies of text that represent similar meanings (Weber, 1990). These
categories also go beyond manifest content by also including inferred communication, or
latent content. Hsieh and Shannon (2005) ultimately describe qualitative content analysis as “a
research method for the subjective interpretation of the content of text data through the
systematic classification process of coding and identifying themes or patterns”. This is in line
with Mayring (2000), who argues the qualitative approach consists of preserving the benefits
that quantitative content analysis had, with more attention to the (theory behind) the creation
of categories.
Lastly, qualitative approaches are not necessarily recognized by their exclusion of ‘numbers’
or reliability checks. An inductive approach often ends in descriptive statistics (percentages,
cross-tabulation), and several measurement standards (such as reliability and validity) to
verify the trustworthiness of the research are applicable to qualitative approaches as well, such
as transferability (instead of external validity) or confirmability (instead of inter-coder
reliability) (see White & Marsh, 2006, p. 38).
Notwithstanding this rather rigid presentation, some scholars doubt if a clear dichotomy
between qualitative and quantitative content analyses is really helpful. Krippendorff (2004)
argues that “ultimately, all reading of texts is qualitative, even when certain characteristics of
a text are later converted into numbers” (p. 16). Morgan (1993) has even dubbed the more
systematic approach “a quantitative analysis of qualitative data”. In sum, we can think of CA
as a range of methods on a continuum, that go from very quantitative approaches concerned
with coding (manifest) data in pre-defined categories and representing those with statistical
tools, to more qualitative approaches, that are also concerned with reading and putting content
into meaningful classifications, but often operate more inductively with respect to coding
schemes (irrespective of the presence of statistical representations). In practice, research is
often somewhere on this continuum, with some sense of theory or categories beforehand,
while also being open to and informed by the evidence.
3. Differences compared to discourse analysis
The stipulation that CA necessarily has a qualitative dimension that relates to reading and
meaningfulness, brings CA remarkably close to another field of inquiry, namely Discourse
Analysis. Just like CA, discourse analysis (DA) is a broad banner, under which we find several
more specific approaches which in the case of DA are concerned with the study of
communication and meaning-making in context.
2
Discourse analysis is generally speaking
interested in how meaning is formed and interpreted in a particular situation. Frequently, this
2
Good starting points for an introduction to the field of discourse analysis and its various inhabitants
include Brown & Yule (1983), Jorgensen & Phillips (2002), Blommaert (2005), Gee (2014) and Coulthard
(2014).
analysis results in a critical and normative evaluation of how these communicative processes
affect the social world around us.
The definition of CA provided by Krippendorff (2004) above (together with his insistence on
reliability and validity in later chapters) points to a very positivist understanding of handling
texts, even though it wants to be sensitive to meaning and context. However, in his examples
of methods that fall under the CA-umbrella, he includes more interpretivist approaches such
as (critical) discourse analysis, social constructivist analyses or rhetorical analysis. This is
backed up by his epistemological considerations, in which he argues that texts have no reader-
independent qualities, no single meaning that can be “described”, or that text is relative to
contexts, discourses or purposes (p. 22-23). These are all principles integral to most forms of
discourse analysis. This begs the question: can interpretivist accounts such as DA be seen as
part of CA? We go deeper into this question by focusing on what discourse analysis is, in order
to more clearly delineate the boundaries where content analysis ends.
It is first of all important to note that discourse analysis is a theory-driven activity. No matter
the variety, DA has specific prescriptions regarding how the process of meaning-making can
be studied that are prior to the analysis itself. The same goes for CA of course, but in DA, these
prescriptions have an ontological status, in the sense that they make specific claims about how
reality is, whereas for CA, these claims remain at the methodological level, and are informed
by an idea of how reality can best be studied. For CA, the idea that “texts have no reader-
independent qualities” is a guideline on how to make valid inferences, for DA, this is an
assumption about how the meaning and signification work.
This creates a mismatch at the epistemological level, as the idea that there is a “good” or
“appropriate” way to study reality is alien to DA to begin with, given it is concerned with the
study of the construction of reality, rather than with reality itself. This makes the question
whether CA and DA match tributary to one of the most famous debates in the philosophy of
science can two approaches be compatible if they have different epistemologies? Both sides
of the argument have enjoyed famous and rousing defenses (including Ruggie in favor of
compatibility, and Geertz against it), but the question remains largely open and a matter of
personal positioning.
Yet even if one does deem the knowledge generated by DA and CA (as epistemologically
contradictory approaches) to be compatible, the point made above still entails a more practical
consideration that is relevant to distinguishing DA and CA. Since DA studies the
intersubjective construction of reality, its
object of interest is how ideas and concepts
are assembled in a discourse. This assembly
process can include non-textual components
for some forms of DA (such as in
poststructuralist Discourse Theory),
necessarily taking these forms of DA beyond
the scope of CA, if we narrowly define it as
textual analysis. Yet even for those forms
that are in fact fully based on the study of
documents, this at best makes the
knowledge generated by CA and DA
mutually supplementary, rather than
overlapping, since CA focuses on the prevalence of ideas in texts, rather than on their
construction. Discourse analyses start where (other) content analyses stop: at the idea or the
concept, which is the smallest research unit in the latter, but is itself decomposed in the former.
The diagram on the previous page clarifies this process.
Finally, it is worth discussing the predominant criticism of discourse analysis, as this does not
apply to the approaches conventionally seen as content analysis, and thus serves as one more
marker of the difference between both. Discursive approaches, and particularly the most
radically constructionist ones, emphasize deconstruction. They show the different puzzle
pieces of which an idea is composed, but in doing so, they often lose the normative ground to
prefer some compositions of the puzzle over others. As the process of construction is revealed,
the current instance of the constructed reality loses its necessary character: all options to
construct a particular reality are indeed possible, and what can still motivate us to prefer one
version over the others in that case. One could accuse the post-structurally inspired theories
in particular of being more deconstructive than socially constructive. Some theories have
solved this by using a Marxist base (cfr. Fairclough), but they are still far more vulnerable to
this criticism than any method that could conventionally be seen as CA.
4. Methodological considerations
At this point it should be apparent that content analysis is a positivistic, rigorous method to
extract ‘content’ from texts, images or any type of message that has meaning. With such an
approach come certain standards to streamline, operate and evaluate a research undertaking.
In the following, we discuss four elements that every content analyst should think of before
proceeding with the actual research: unitizing and sampling (pre-coding), the coding itself,
and evaluative tests of the process.
4.1. Units and Unitizing
Although obvious for some research purposes, it is a good idea to explicitly think (and write)
about the types of unit, and especially the way they have been cut (unitizing). A unit is an
identifiable message or message component, which serves as the basis for identifying the
population and drawing a sample, on which variables are measured, or which serves as the
basis for reporting analyses” (Carney, 1971). Krippendorff (2004) argues there are three types
of units: sampling, coding and context units. In other research approaches, such as survey
research for example, there is no distinction between the sampling and coding units (the
observant is both the unit of sampling and coding), and context units are irrelevant. For
content analysis, though, they can all be different
3
.
Sampling units / units of selection are units that are distinguished for selective inclusion
(or exclusion) in an analysis”. The easiest example would be “a newspaper”, or a
newspaper article”. These units should be strictly bounded, given that any use of
inferential statistics is predicated on them being independent sampling units. One must
therefore define sampling units so that connections across sampling units do not bias
the analyst and all relevant information is contained in individual sampling units (or
if not, that the omission does not impoverish the analysis). Of course, if you analyze all
possible units in a pre-defined population (e.g. all newspaper articles from newspaper
X in country Y that are about the EU), you are analyzing the full population.
Coding units / units of description are units that are distinguished for separate description,
transcription, recording, or coding”. They are typically smaller than sampling units, at
most coinciding with them, but never exceeding them. Sampling units are often still
too complex to be described reliably. Even “newspaper article” as sampling unit
contains a lot of information, which can be broken down. CA has found it convenient
to describe smaller units on which they can more easily agree and then use analytic
procedures to obtain descriptions of larger units. For example, a certain selection of
newspaper articles may be the sampling unit, but individual claims made within that
article are the coding units.
Context unit / units of delineation are units of textual matter that set limits on the
information to be considered in the description of recording units”. Unlike other units, these
are not counted, need not be independent of each other, can overlap, and may be
consulted in the description of several recording units. These are parts of the text that
give context and broader understanding to the specific coding unit. E.g. a sentence “I
am against it.” on its own does not make much sense, and necessitates reading a bigger
block of text. That bigger block of text’ is the context unit. Defining context units
should be large enough as meaningful (adding to their validity), and as small as is
feasible (adding to reliability). Making it broader means you’re more certain that
interpreting and coding by someone else will ‘measure what you want to measure’,
3
The underlying analysis is based on Krippendorff (2004).
but it also increases the risk that another coder would code it differently (hurting
reliability) given that there is more room for interpretation.
Besides the types of units, there are several common ways in which these units can be
systematically separated, i.e. unitized. Krippendorff (2004) distinguishes five such ways:
physical (partitioning by time, length, syntactical, categorical, propositional and thematic.
Partition by
Example
Physical
Time, length, size, i.e. the
physicality of the unit
Time period, articles
containing keywords,
every x-th issue
Syntactical
Syntax
Single words, sentences,
quotations
Categorical
Membership in class/category
Everything referring to
the president of the
United States (he, him,
Donald Trump, the guy
with weird hair)
Propositional
Particular propositional form,
or those that exhibit certain
semantic relations between
conceptual components
All sentences that
include an actor
expressing (in some kind
of way) its position on a
topic
Thematic
Freely generated narratives
All requests to the
European Commission
by traditional letter
These five ways differ in the kinds of cognitive operations coders must go through to
identify units within a text. The simpler and more “natural” these operations are, the more
efficient and reliable, but may not be the most productive ones analytically. Hence, this always
involves compromises.
4.2. Sampling
Sampling is the process of selecting a subset of units from the larger population. This can either
be random, meaning that every element has an equal chance of being selected, or non-random.
For random sampling, there are different, more tailored, approaches to take this on: simple
random sampling is the most known (with or without replacing the unit); systematic random
sampling is selecting every x-th element; cluster sampling is sampling several units together
once drawn, because of logistic reasons; stratified sampling consists of segmenting the
sampling frame to categories on some variable of prime interest (e.g. in months, and then
randomly selecting from every month); multistage or combination sampling; relevance
sampling (i.e. selecting all textual units that contribute to answering given research questions.
We refer to Neuendorff (2002) or Krippendorff (2004) for a more elaborate account.
There is no universally accepted set of criteria for selecting sample size, but it can best be
calculated using formulas for standard errors and confidence intervals (see Krippendorff,
2004, chapter 6). A general (qualitative) rule is that “when units of text that would make a
difference in answering the research question are rare, the sample size must be larger than is
the case when such units are common”.
Sampling problems do not arise when analysts can answer their research question by
examining all texts of a particular population of texts, such as all of a given writer’s works, all
issues of a newspaper etc. If you want to know something about the press coverage of a certain
event and collect all newspaper articles pertaining to that event, that complete set of texts
constitutes a census, or the population. If the set of texts is manageable in size, they is no need
to reduce it by using relevance or random sampling.
4.3. Coding process
The process of coding unstructured texts into categories (inductively or deductively) is a
laborious effort. Only this creation of categories alone merits extensive thought. Categories
(and coding rules that put observations in them) should be crystal clear and exhaustive: for
every coded unit there is a category. These categories should also be mutually exclusive, in
that they cannot overlap, not even to a small degree.
The coding rules, i.e. the procedure by which a unit is categorized as such or such, are also
commonly written down in codebooks. There is often an amazing level of detail in these
codebooks, to the benefit of reliability (cfr. infra). The goal, in any case, is to make coding rules
as unambiguous as possible, so that every individual coder would categorize or label units in
one and the same way. Still, there will often be a period of training before the actual coding,
where scholars interact to make sure they have the same idea and protocol to start with.
It would not do testimony to the complex and detailed way of constructing coding rules (and
an overall process) by elaborating this section in this superficial way. There are large textbook
parts written that help scholars construct codebooks and coding rules step-by-step and we
would therefore refer to Neuendorff (2002, chapter 6), Krippendorff (2004, chapter 7) or
Schreier (2014).
4.4. Measurement standards
Content analysis often involves human coding, which is susceptible to errors. If such errors
are random, the problem filters out if many observations are taken into account. It gets more
serious when such errors are not random, and thus imply a bias. For example, systematically
coding a variable incorrectly means repeated error and will not approximate the ‘true’
measurement of that variable. To deal with this and other measurement problems, content
analyses should be able to pass the test of different standards, to check whether the results are
trustworthy. Again, different scholars introduce different concepts (sometimes referring to the
same idea with different words), but two of the most mentioned concepts are reliability and
validity.
Reliability
Reliability is probably the most important test in content analysis, especially when human
coding is involved. In general, it implies that coding results should be the same (i.e. replicable),
when different persons are given a certain coding scheme. To calculate whether this is actually
the case, several assessments for inter-coder reliability (the degree in which different coders
get to the same results) have been established
4
.
Agreement measures imply the question ‘did both coders code exactly the same?’ If a coding
measure can only be ‘male’ or ‘female’, for example, agreement involves both coding the same
thing. The most used criteria here are ‘percent agreement’ (% of equally coded units, in relation
to total amount of coded units) or ‘range agreement’ (if in the same range of answers, it is
considered equal).
Agreement beyond chance builds on the observation that even random coding would result in
matching codes in 50% of the time, purely by luck/chance. Several statistical tests hence try to
assess reliability ‘beyond chance’. The best known are Scott’s pi, Cohen’s kappa or
Krippendorff’s alpha. Hayes & Krippendorff (2007) make a convincing case for the use of
Krippendorff’s alpha to be used, because it generalizes across scales of measurement (nominal,
interval etc.), can be used with any number of coders (others are developed for two coders),
with or without missing data, and satisfies all criteria for a good measure of reliability. They
have added a macro in that article for import into SPSS or SAS.
Covariation measures are used when dealing with interval or ratio level variables. If you code
the age of someone in years, it would be very difficult for two coders to achieve the same
results (guessing 67 and 68 would be wrong). That’s why these criteria imply the question: are
coded results varying in the same way? High results by one coder, are they met by high results
by another coder? Again, several statistical measures are developed to assess reliability:
Spearman’s rho, Pearson’s correlation coefficient, or Lin’s concordance correlation coefficient.
What constitutes an acceptable level of inter-coder reliability? 90% or higher would be
acceptable to all authors, 80-90% acceptable to most, but beneath that it really depends on the
author (Neuendorff, 2001). The best practice is to present full and clear reporting of at least
one reliability coefficient of each variable measured in a human-coded content analysis. In any
case, a poorly executed coding scheme, inadequate coder training or coder fatigue are all
sources of reliability loss. It will also be more difficult to code latent (e.g. aggression, opinion)
rather than manifest variables (e.g. gender). To spot discrepancies or inconsistencies, it is
advised to have a pilot reliability test at the beginning of the coding. If variables do not meet
reliability tests, it is often advised to drop the variable, or to use a non-CA for that particular
variable.
4
If there is only one coder, intra-coder reliability is sometimes used, for example by rate-rerate methods (to
see if a variable is coded equally some weeks/months later), but these are not deemed very good.
Validity
Validity is the extent to which a measuring procedure represents the intended and only the
intended concept. The main question is “are we really measuring what we want to measure?”
Validity can take the form of triangulation: lending credibility to the findings by incorporating
multiple sources of data, methods or theories. Shapiro and Markoff (1997) assert that content
analysis is only valid and meaningful to the extent that results are related to other measures.
However, validity can be assessed without triangulation as well, and different types are then
used (Neuendorff, 2001).
External validity, for example, is often equated with generalizability. In other words, can the
results of a measure be extrapolated to other settings, times etc.? Internal validity in contrast
implies asking are we operationalizing our measures in such a way that they measure what
we want to measure, and not more or less? There are several ways of addressing this, the most
basic and obvious being ‘face validity’: what you see is what you get. Very difficult to define,
but this is actually common sense. Is the measuring procedure tapping into the desired concept
“on the face of things”? Take a step back, get in another person, and ask the question: does this
indicate what I want to measure? Other examples are criterion validity (does this measure taps
an established standard or important behavior that is external to the measure?), content
validity (degree to which the measure reflects the full domain of the concept being measured)
or construct validity (extent to which a measure is related to other measures in a way
consistent with hypotheses derived from theory; do the measures and their outcomes relate in
the way they should relate according to the outcome?)
Some of these questions concern the coding scheme itself (do you think this measures what it
should measure? Have I included enough variables as to be sure that I tap into is full content?),
others assess the measuring outcome ex post (is this in line with theory? Does this match an
external criterion?). These questions of validity are all implicit presumably when
constructing the code book and different categories. These categories should be exhaustive (so
include an ‘other’ option!), mutually exclusive (so unambiguously in one category), and coded
with an appropriate level (nominal, interval…).
5. Examples
In the following, we present four types of research that can be labelled as content analysis:
framing, political claims making analysis, automated text analysis, and a content analysis of
pictures.
5.1. Framing analysis: a prototype of qualitative context analysis?
In a short but seminal piece, Entman (1993) once argued for a more theoretical connection
between the disparate use of the word ‘framing’ and several advances in communication
studies that were being made. Regarding the latter, he argued that content analysis in
particular could benefit enormously from a framing paradigm. Content analysts, he argues,
are usually coding positive and negative positions, simply adding them up and drawing
conclusions from the absolute sums. Content analysis informed by a theory of framing would
avoid treating all these instances as equally salient and/or influential. They also fail to relate
frames to the audience’s mental maps (schemata). In sum, without framing analysis, content
analysis may produce data that misrepresent the media messages that are actually being taken
up.
This marks the continuum we set out between quantitative and qualitative content analyses,
the latter being more sensitive to emergent coding (rather than a priori coding), context and
meaning, rather than to mere frequencies and positions. As a practical example, Gamson and
Modigliani (1989) argue there is interaction between framing of policy issues (presented as
media ‘packages’) and public opinion on that issue. To understand which frames are being
used, they conduct a qualitative content analysis of newspapers and other media, by looking
for specific elements of a package. This means that expressions in every kind of wording or
imagery that fit a certain frame/package, are categorized as such. This shows the more
qualitative element of this content analysis, by going beyond merely counting, but also
reading, interpreting and categorizing inductively.
This is also a good case to show the differences between CA and DA, as abstractly laid out
above. Framing analysis is very much agency-directed: one can say and frame whatever he or
she wants. The resonance of these frames depends on larger structures, but the individual is
considered to be free to speak as he prefers. Discourse analysts focus more strongly on the fact
that an articulation acquires meaning by reference to other messages and by the context, and
as such is more structurally founded. What we can say in a sensible and coherent way does
not just depend on the voluntarism of the individual, but also on a wider discursive context,
that is itself constructed through articulations such as the one we are now talking about. As
such, it is fair to say that DA takes a middle-ground position in the structure-agency debate,
with more specific forms leaning both ways, whereas framing analysis is innately more
agency-driven. Furthermore, the theories formulated by Benford and Snow (2000) and other
scholars of framing theory do not have the elaborate ontological groundwork that theories of
discourse possess. In this regard too then, framing is more at home in CA than in DA if one
makes the distinction between both, although amongst the CA approaches one of the closest
ones to DA.
5.2. Quantitative (Relational) Content Analyses: PEA, CSA, PCA
Protest Event Analysis (PEA) is a particular quantitative content analysis method that gained
ground in the 1980s in social movement research. It is used to systematically assess the
frequency, intensity and features of protest across area and time (Hutter, 2014). Systematic
data about these events is usually not available, and we have to rely on secondary sources such
as newspaper reports or police records to infer the occurrence of protest. PEA is hence a
technique of reading and interpreting these unstructured texts and distilling relevant features
and characteristics from them.
The specific PEA method has evolved over time, from a primordial interest in mapping as
much protest as possible (large numbers of countries over time), to more detailed coding along
various criteria, and above all more sensitivity to biases (such as selection bias, or
systematicity of the media landscape) (Hutter, 2014). Over time, the unit of analysis has
broadened from a narrow description of protest (signature collecting, public rallies,
demonstrations, etc.) to definitions that underscore the relational aspect of protest (Hutter,
2014; Kleinnijenhuis & Pennings, 2001): by (i) unpacking the single protest event and focusing
on action and interaction inside them, and (ii) broadening the unit of analysis to cover other
elements of public debate besides protest, such as discursive claims about an issue. The
methods developed in this era hence use ‘political claims’, ‘nuclear sentences’ or ‘semantic
triplets’ as units of analysis, trying to uncover subject-verb-object-relationships (Hutter, 2014).
They want to capture relationships between political subjects and objects (issues, or other
actors) and qualify this relationship, in order, for example, to map political party positions on
different topics (Helbling & Tresch, 2011).
Core (or nuclear) sentence analysis (CSA) is such an approach (Kleinnijenhuis, De Ridder, &
Rietberg, 1997; Kleinnijenhuis & Pennings, 2001; Kriesi et al., 2008). It builds on the assumption
that the content of a document consists of relationships between political objects: a political
actor has a position on a political issue or on another actor. Every sentence that expresses such
a relationship is then deconstructed to its ‘core’, involving a subject, the issue at hand, and the
relationship between the two (positive, neutral, negative). As a simple example: The
Conservatives have always supported TTIP as they deem it important not to blow up any bridges with
the United Stateswould be stripped to its core by stating that The Conservatives (subject),
support (positive relationship), TTIP (object). Through CSA it is therefore possible to quantify
and map positions on policy issues.
Just as CSA, PCA is another offspring of the older protest analysis by (i) coding discursive
forms of protest as well, hence expanding the coding unit to every instance of claims-making
made about an issue/event, and (ii) by coding all actors involved to depict a multi-
organizational field, instead of only looking at the protestors themselves. In itself, PCA is a
descriptive method, it merely describes who and what is present in the public sphere (Statham
& Trenz, 2013). The (coding) units of analysis are instances of so-called ‘claims-making’: all
acts that involve demands, criticism, proposals or evaluation to a topic, irrespective of their
form (violent protest, speech act in parliament, legal action, etc.). For every such claim, the
name of the claimant, action form/size/type, target and position is often included. In addition,
one could code the specific demands, addressees, objects and the type of frame used to justify
a view. As such, it goes beyond more traditional media content analysis (such as core sentence
analysis, see e.g. Kleinnijenhuis & Rietberg, 1995), which mostly restricts its detail to claimants
and their positions.
5.3. Coding images
As an example that content analysis is not confined to text as data, we can consider Corrigall-
Brown and Wilkes (2012) study on the visual framing of collective action. Protest is often
framed in the media according to the ‘protest paradigm’: “a pattern of reporting found in
articles that tends to marginalize protesters and legitimizes authorities”. They wanted to know
if such a paradigm is also present when analyzing pictures. If so, we should see more officials
than challengers in the pictures (representation), officials should be viewed more rational,
while opponents as emotional (legitimacy) and appear that we are looking up at officials
instead of the other way around (power).
They analyzed more than 700 pictures in over 2000 newspapers, and for each person in the
pictures they coded the type of actor, gender, age, size of the person, angle of the viewer etc.
The potential for bias was quite high here, so therefore three persons did the coding, to increase
reliability.
5.4. Automated text analysis
Automated, instead of human, text analysis is the most quantitative versions of content
analysis. Not just the processing of the results of a content analysis is quantified, but so is the
actual data-processing itself. This is done with digital software for text analysis.
Software or tools for computer-assisted text analysis can largely be grouped into ‘supervised’
and ‘unsupervised’ methods, depending on whether the baseline is created by the analyst or
by the tool itself. These tools lighten the burden of the analyst and allow for the processing of
a larger quantity of data than manual coding would with the same amount of labor input. The
key, of course, is to get the tool to create interesting and relevant output. As such, validation
stands central in this type of content analysis, particularly for unsupervised methods
(Grimmer & Stewart, 2013). Yet even more fundamentally, thorough reflection on the research
set-up; on how the particular modelling of language conducted by a tool is useful, on why a
particular tool is chosen, and on what benefit it brings to subjective reflection by the analyst,
is crucial to automated text analysis.
Interesting algorithms include WordFish (unsupervised) and WordScore (supervised), which
try to situate language use on an ideological scale; topic modelling, which filters co-occurrence
patterns in a corpus to trace the prevalence of ‘topics’ (unsupervised); more simple clustering
tools such as concordance and collocation algorithms (supervised); and methods based on a
pre-written dictionary (supervised). An elaborate discussion of how these tools work, what
their presuppositions are and how they are best employed is beyond the scope of this paper,
but it is needless to say that their employment in a content analysis must have a good, explicit
motivation and a proper methodological basis.
6. Next steps in content analysis research
This paper has hopefully provided some pointers about what content analysis is (and is not),
and which practical methodological concepts and considerations often pop up in this type
of textual analysis. Writing this anno 2017, however, necessitates reflecting upon technological
and contextual changes that pose exciting opportunities but also challenges to contemporary
content analysis.
The previous section on automated text analysis already outlined one of these trends. The tools
disposable for content analysts have evolved at such a rapid speed, that the manual content
analyses are almost seemingly becoming extinct. While it is indeed true that analysts working
with large data sets might have a comparative advantage in comparison with human coding
teams (who are naturally more limited in the amount of data they can handle), this is not to
say that computers will, nor should, take over each step of the basic process. Relying solely on
automated procedures often comes at the expense of detailed comprehension, and so a
development in which computer-assisted coding is complemented with human action (at
different stages: category creation, coding interventions, analysis) seems the best way forward.
Not only are the tools becoming more sophisticated, but the environment of “messages” is also
rapidly undergoing a metamorphosis. In an interactive, social media, age, content is no longer
only available at fixed places, produced by fixed (corporate-driven) sources. With the advent
of Web 2.0 and all sorts of interactive media platforms, the amount of users generating,
spreading and consuming messages has skyrocketed (Skalski, Neuendorf, & Cajigas, 2017).
This opens up possibilities for innovative collection and analysis in hitherto scientifically
difficult-to-reach populations. At the same time, if these environments are to take over (or
significantly supplement) traditional content platforms, this poses additional challenges as to
how to capture, archive and analyze these sets. In conclusion, many of these evolutions might
significantly affect the process of doing content analysis, but the basics of what a content
analysis is and is not, remain important to grasp.
References
Benford, R. D., & Snow, D. A. (2000). Framing Processes and Social Movements: An Overview
and Assessment. Annual Review of Sociology, 26, 611-639. doi:10.2307/223459
Berelson, B. (1952). Content analysis in communications research. Illinois: Free Press.
Carney, T. F. (1971). Content analysis: A review essay. Historical Methods Newsletter, 4(2), 52-
61.
Corrigall-Brown, C., & Wilkes, R. (2012). Picturing protest: The visual framing of collective
action by First Nations in Canada. American Behavioral Scientist, 56(2), 223-243.
Delputte, S. & Orbie, J. (2017) EU Development Policy: Abduction as a Research Strategy. In:
H. Heinelt and S. Münch (eds) Handbook of European Policy: Formulation, Development and
Evaluation. Edward Elgar
Entman, R. M. (1993). Framing: Toward clarification of a fractured paradigm. Journal of
Communication, 43(4), 51-58.
Gamson, W. A., & Modigliani, A. (1989). Media discourse and public opinion on nuclear
power: A constructionist approach. American journal of sociology, 95(1), 1-37.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content
analysis methods for political texts. Political analysis, 21(3), 267-297.
Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure
for coding data. Communication methods and measures, 1(1), 77-89.
Helbling, M., & Tresch, A. (2011). Measuring party positions and issue salience from media
coverage: Discussing and cross-validating new indicators. Electoral Studies, 30(1), 174-183.
Hsieh, H.-F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis.
Qualitative health research, 15(9), 1277-1288.
Hutter, S. (2014). Protest event analysis and its offspring. In D. Della Porta (Ed.),
Methodological practices in social movement research (pp. 335-367). Oxford: Oxford
University Press.
Kaid, L. L., & Wadsworth, A. J. (1989). Content Analysis. Measurement of communication
behavior, 197-217.
Kleinnijenhuis, J., De Ridder, J. A., & Rietberg, E. M. (1997). Reasoning In Economic Discourse:
An Application Of The Network Approach To The Dutch Press.
Kleinnijenhuis, J., & Pennings, P. (2001). Measurement of party positions on the basis of party
programmes, media coverage and voter perceptions. Estimating the policy positions of
political actors, 162.
Kriesi, H., Grande, E., Lachat, R., Dolezal, M., Bornschier, S., & Frey, T. (2008). West European
politics in the age of globalization: Cambridge University Press Cambridge.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology: Sage.
Mayring, P. (2000). Qualitative Content Analysis. Forum Qualitative Sozialforschung, 1(2).
Morgan, D. L. (1993). Qualitative content analysis: a guide to paths not taken. Qualitative
health research, 3(1), 112-121.
Neuendorf, K. A. (2001). The Content Analysis Guidebook: Sage.
Schreier, M. (2014). Qualitative content analysis. The SAGE handbook of qualitative data
analysis, 170-183.
Shapiro, G., & Markoff, G. (1997). In CW Roberts (Ed.), Text analysis for the social sciences:
Methods for drawing statistical inferences from text and transcripts (pp. 931): Mahwah, NJ:
Lawrence Erlbaum Associates.
Skalski, P. D., Neuendorf, K., & Cajigas, J. (2017). Content analysis in the interactive media
age. The content analysis guidebook, 201-242.
Statham, P., & Trenz, H. J. (2013). How European Union politicization can emerge through
contestation: The constitution case. JCMS: Journal of Common Market Studies, 51(5), 965-980.
Weber, R. P. (1990). Basic content analysis: Sage.
White, M. D., & Marsh, E. E. (2006). Content analysis: A flexible methodology. Library trends,
55(1), 22-45.
... The study used content analysis as its method of interpretation to examine the information gathered from documents and interviews. Gheyle (2017) defines content analysis as a methodical technique for identifying the occurrence and connections of specific terms, themes, or ideas in qualitative data. This approach allowed for the evaluation of consistency, the tracking of perceptions across geographical boundaries, and the identification of recurrent patterns. ...
Article
This research examined the connection between the recent currency redesign policy and the conduct of the 2023 general elections in Nigeria. Elections in democratic systems depend, to a large extent, on cash flow for campaigns and logistics and though the Naira had been redesigned several times in the past, the 2022 initiative was unique because the timing of its introduction coincided with the 2023 elections. The purpose of this study was to investigate the widespread notion that the currency redesign policy was a political strategy to limit access to funds by politicians. Based on secondary data sources and results from interviews, the study found that the scarcity of cash led to an increase in bartering and cashless transactions among the citizens. At the same time, it disrupted the financing strategies of political actors involved in the elections. The study recommended, among other things, that monetary policies and reforms should be implemented gradually and deliberately timed to avoid any negative impacts on the electoral process.
... Prior to step-one (Familiarizing yourself with your data), Liebenberg et al. [25] suggest conducting a 'pre-analysis' to reduce and distill the data before it goes to participants due to different factors such as time, literacy, and confidentiality constraints. As our primary data were raw transcripts, a student research assistant conducted preliminary directed content analysis [28] using a qualitative descriptive approach [29] to code data into similar segments prior to the workshop. A portion of data from each code that was considered to represent the larger sum of data was chosen to be reviewed by the youth during the participatory data analysis workshop. ...
Article
Full-text available
Background The United Nations Convention on the Rights of the Child affirms the human right of children to have their voices heard about issues affecting their lives. The One Chance to be a Child (One Chance) report provided an evidence-informed data profile of the well-being of children and youth in Nova Scotia (NS). To promote the report, we engaged youth from across the province in a knowledge mobilization (KMb) project. The purpose of this research is to outline the methods of the project, as well as the priority areas identified by youth. Methods 10 NS youth (grades 7–12) were recruited to take part in a three-phased KMb approach: (1) A sense-making workshop to learn and discuss the report, (2) The planning and delivery of a youth-led forum to engage decision-makers in dialogue around the report, and (3) A participatory data analysis workshop to identify priority areas from the report. Data were collected through audio-recordings, note-taking, and pictures of all materials. Results Five priority areas were identified by youth: (1) Access to Care– high-quality care in a timely manner, (2) Community Care– inclusive community solutions, (3) Open Minded Education– school curricula that reflects their needs, (4) Quality of Life and Basic Needs– living wages and healthy workplace policies, and (5) Youth Empowerment– youth voice embedded throughout all actions. Conclusion Engaging youth around the findings of the One Chance report supported their voices being heard, and their well-being needs to be considered by decision-makers.
... The data were evaluated by content analysis method which is a technique used for studying texts, documents, symbols, audios, and videos (Gheyle and Jacobs, 2017) to form general bilateral themes, requiring coding (Flick, 2014). Firstly, all the forms were read in detail and then entered ATLAS 7.0 software commonly and anonymously. ...
Article
Full-text available
Generally speaking, as the nature of business life some companies can be successful at implementing sustainability, however many others can’t and struggle. The ration behind this may be related with demographic factors and employee's variables. While the benefits of sustainability for the environment and for organizations have been widely documented, there has been relatively little research exploring the influence of sustainability on employee behaviors and outcomes. The research at hands aims to fulfill that need via exploring the link among sustainability and employee commitment, engagement, individual environmental behavior, wellbeing, and performance in the fast-moving consumer goods (FMCG) sector. By understanding the ways in which sustainability can impact employee behaviors and outcomes, FMCG companies may be better equipped to design and implement sustainable practices that benefit both the environment, their employees, and their organizations. The records submitted in this survey emphasized the cases of firms which have favorably applied sustainability practices. The authors know of no other study dedicated to the investigation of the relation among sustainability and employee commitment, engagement, individual environmental behavior, wellbeing, and performance in the FMCG sector firms performing in a developing country, Türkiye in particular. Among qualitative research methods in-depth interview technique was utilized to gather data from 52 employees working at different eight FMCG local and international companies in Türkiye. The companies were selected due to their strong focus on sustainability and their commitment to implementing environmentally friendly practices. In order to ensure the validity and reliability of the interview form, a pilot interview was conducted on 20 workers, who were selected by random sampling method, and accordingly necessary adjustments were made in the form. The data were evaluated by content analysis method via ATLAS 7.0 software to form general bilateral themes. Lincoln and Guba’s criteria were used for establishing the overall trustworthiness of qualitative research results. The findings of this study suggested that sustainability practices positively impact employee's commitment, engagement, individual environmental behavior, wellbeing, and performance. According to these data employee's commitment, engagement, individual environmental behavior, wellbeing, and performance increased after their corporations started adapting sustainability practices, especially at production department and for non-managerial roles. It had also been determined that as years of work experience and number of children increased workers’ commitment, engagement, individual environmental behavior, wellbeing, and performance also enhanced. In addition, international firms’ sustainability practices were seen more supportive than local ones in terms of commitment, individual environmental behavior, and performance.
... Discrepancies, manifesting exclusively during the coding of the emotional tone of the videos, were addressed through rigorous deliberation, ultimately resulting in a consensus on coding decisions. A threshold of 80% or higher is generally regarded as an acceptable inter-coder agreement rate for content analysis (Gheyle & Jacobs, 2017;Miles & Huberman, 1994;Patton, 2002). The elevated level of agreement observed in this study substantiates the reliability and methodological robustness of the results. ...
Article
Full-text available
This study analyzes UNEP's communication strategies on TikTok, contributing to the understanding of the organization's digital communication strategies, and elucidating TikTok's role in the communication strategies of international environmental organizations. Employing a content analysis, the study scrutinized 402 audiovisual content promulgated via UNEP's TikTok interface from June 2, 2020, to January 3, 2025. The analytical dimensions encompassed thematic delineations, linguistic modalities, typological classifications, strategic intents, affective undertones, structural configurations, interactional metrics, spokesperson attributes, and the demographic parameters of spokespersons. The findings revealed that UNEP utilized diverse video formats, characterized by varying emotional tones and spokesperson profiles, to convey messages on topics such as climate change, sustainability, food waste, and biodiversity, aiming to foster audience engagement. Nevertheless, the analysis found that the organization does not sufficiently consider the needs of disadvantaged individuals, represents certain groups in a limited manner, and fails to adequately adhere to the principle of multilingualism. Moreover, notwithstanding the recurrent incorporation of subtitles, captions, music, and hashtags within the content, the organization's engagement metrics remained conspicuously low.
... The data in this study will be categorized into 3 groups, namely low, medium, high categories. More details are in table 2 below: The main idea of qualitative content analysis is to determine the definition criteria of various kinds of underlying theoretical backgrounds, the formulation of research questions, and aspects of the material that have been determined (Gheyle & Jacobs, 2017). Categorization and classification are very important for evaluating student performance (Pallathadka et al., 2021). ...
Article
The purpose of the study was to describe the entrepreneurial attitude and interpersonal communication of teaching students. This study uses a cross sectional design and a quantitative descriptive approach through the explanatory method. The research sample was 168 respondents with purposive sampling technique. The data collection technique used a questionnaire, the analysis technique was a technique for categorizing the mean results from hypothetical statistics and the Kruskal-Wallis test. The research findings are the entrepreneurial attitude of teacher students in the low category of 60.07% and the mean result of the Kruskall Wallis test is 31.7321. The students' interpersonal communication is also in the low category of 43.45% and the mean value of the Kruskall Wallis test is 26.6488. Practical and academic implications of research findings for policy makers and future researchers in the field of entrepreneurship education to find effective learning process formulations to shape entrepreneurial attitudes and improve teacher student interpersonal communication.
... MAXQDA's auto-code function was used to support the coding process. This division of expertise added value by increasing the reliability and accuracy of the coding system and reducing the risk of overlooking important issues or text segments (Gheyle and Jacobs, 2017). At the end of the process, the two coding systems were compared to produce the final version shown in Table 2 with the ID numbers used for graphical identification, the sub-codes and their descriptions, and literature addressing either pet food or human food. ...
Article
Full-text available
The role of sustainability communication in pet food industry has changed, following similar trends observed in the food industry for human consumption, towards a marketing approach that incorporates different elements, including sustainability. This study explores how the biggest pet food companies in the United States (US) and European Union (EU) communicate their sustainability practices, with a focus on the environmental, social and economic dimensions of sustainability. Content analysis of pet food company websites, revealed environmental sustainability to be the dominant dimension in both geographic areas, with ecological topics such as water conservation, soil preservation and emissions reduction being emphasized the most. Aspects pertaining to social sustainability, including community support and respect for human rights, also appeared in the communication strategies, although to a lesser extent. The study identified significant differences between the communication strategies used in the two geographic areas. The approach adopted by European companies tended to be more structured and shaped by the regulations in place, reflecting the presence of more rigid non-financial reporting guidelines in this area, whereas companies in the United States displayed greater variability in their communication approaches, probably due to the lack of centralized regulations. Despite this, companies in the United States tended to place greater emphasis on collaborations and partnerships, in particular, on energy and emissions management. The findings contribute to furthering our understanding of how sustainability practices are being communicated in the pet food industry, offering a comparison of the two markets considered, and highlight the growing need for integrated, transparent communication strategies in the sector.
Article
Full-text available
The position and readiness of pre-service preschool and primary teachers to use artificial intelligence (AI) in the study process is becoming an increasingly widely studied area. Therefore, it is important to analyse how students perceive AI opportunities, strengths, weaknesses, and threats. Namely, the conducted qualitative study analysed the position of pre-service preschool and primary teachers on the use of artificial intelligence (AI) in the study process, emphasising strengths, opportunities, weaknesses, and threats. A hundred and twelve first cycle university students participated in the study, and the study itself was conducted during the spring semester of 2024. An instrument of four open-ended questions was applied in the study, and the collected verbal data was analysed using quantitative content analysis. The research revealed that AI is valued as an important tool for improving the efficiency of studies. Respondents highlighted AI’s ability to quickly systematise and present information, save time, automate tasks, and promote creativity and the application of innovative teaching/learning methods. Also, the possibilities emerged to use AI to personalise teaching content and create interactive learning environments. At the same time, the problems of using AI were highlighted, including the lack of reliability of information, the limitation of technological capabilities, and the potential decline in creativity, independence, and critical thinking skills. Ethical issues, such as plagiarism and lack of academic integrity, as well as data privacy issues, were of the respondents’ concern. The research results show that it is necessary to integrate the AI teaching/learning elements into study programmes, in order to develop students’ competencies to effectively use AI tools. The need for clear ethical guidelines to ensure the responsible use of AI is also emphasised. The study showed that AI can be an effective tool for improving educational processes, however, its use must be carefully balanced with the potential threat management.
Chapter
This article analyses the digital communication of Iniciativa Liberal during the 2024 Portuguese General Elections. The party led by Rui Rocha concentrated on discrediting the public policies of the outgoing government, which was led by the Portuguese Socialist Party, attacking its political actors and the ideological positions of the political left, accusing them of wanting to impoverish Portugal. Through a quantitative and qualitative analysis of the content published on the social network X, the study seeks to quantify the frequency with which the party resorted to negative expressions and political attacks, to the detriment of its own merits and proposals for the country. In addition, by analysing different variables, the research explores the relationship between the type of political content and the level of online engagement generated. The results obtained contribute to understanding political communication strategies in the digital age and to the debate on the impact of populism on the Portuguese political landscape.
Article
Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.