The words people use in their daily lives can reveal important aspects of their social and psychological worlds. With advances in computer technology, text analysis allows researchers to reliably and quickly assess features of what people say as well as subtleties in their linguistic styles. Following a brief review of several text analysis programs, we summarize some of the evidence that links natural word use to personality, social and situational fluctuations, and psychological interventions. Of particular interest are findings that point to the psychological value of studying particles-parts of speech that include pronouns, articles, prepositions, conjunctives, and auxiliary verbs. Particles, which serve as the glue that holds nouns and regular verbs together, can serve as markers of emotional state, social identity, and cognitive styles.
LANGUAGE USE: Our Words, Our Selves
Abstract The words people use in their daily lives can reveal important aspects
of their social and psychological worlds. With advances in computer technology, text
analysis allows researchers to reliably and quickly assess features of what people
say as well as subtleties in their linguistic styles. Following a brief review of several
text analysis programs, we summarize some of the evidence that links natural word
use to personality, social and situational fluctuations, and psychological interventions.
Of particular interest are findings that point to the psychological value of studying
particles—parts of speech that include pronouns, articles, prepositions, conjunctives,
and auxiliary verbs. Particles, which serve as the glue that holds nouns and regular
verbs together, can serve as markers of emotional state, social identity, and cognitive
The ways people use words convey a great deal of information about themselves,
their audience, and the situations they are in. Individuals’ choice of words can
hint at their social status, age, sex, and motives. We sense if the speaker or writer
is emotionally close or distant, thoughtful or shallow, and possibly extraverted,
neurotic, or open to new experience. Although several Annual Review chapters
have summarized research on language acquisition, production, comprehension,
and its links to brain activity, this is the first to discuss how language and, more
specifically, word use is a meaningful marker and occasional mediator of natural
social and personality processes.
That the words people use are diagnostic of their mental, social, and even
physical state is not a new concept. Freud (1901) provided several compelling
common errors in speech betraypeople’s deeper motivesorfears.Drawingheavily
on psychoanalysis, Jacques Lacan (1968) extended these ideas by suggesting that
the unconscious asserts itself through language. Indeed, language, in his view, is
the bridge to reality. Philosopher Paul Ricoeur (1976) argued that the ways we
describe events define the meanings of the events and that these meanings help us
keep our grasp on reality. Similar assumptions are implicit in much of the work in
sociolinguistics (e.g., Eckert 1999, Tannen 1994), narrativeand discourse analyses
(Schiffrin 1994), and communication research (Robinson & Giles 2001).
This article explores the methods and recent findings on word use rather than
language per se: the styles in which people use words rather than the content of
what they say. The distinction between linguistic style and linguistic content can
be seen in how two people may make a simple request. “Would it be possible for
you to pass me the salt?” and “Pass the salt,” both express the speaker’s desire
for salt and direct the listener’s action. However, the two utterances also reveal
different features of the interactants’ relationship, the speaker’s personality, and
perhaps the way the speaker understands himself.
Because word use is a relatively unstudied phenomenon, this article focuses
on four broad topics. The first deals with ways researchers have tried to study the
ways people naturally use words. By “natural,” we refer to relatively open-ended
responses to questions, natural interactions, and written or spoken text. The most
analyses of language. The second section of this article explores recent findings
linking word use to individual differences. The final two sections consider the
links between word usage and social or situational differences and how we can
use words to mark psychological change.
Although many of the assumptions about language as a psychological marker are
shared, the methods of studying language and word use have often been a battle-
ground. Most narrative researchers assume that language is, by definition, contex-
tual. Consequently, phrases, sentences, or entire texts must be considered within
the context of the goals of the speaker and the relationship between the speaker
and the audience. Because of the complexity of communication, this strategy as-
sumes that the investigator must attend to the meaning of the utterances in context.
However defined, meaning is believed to be sufficiently multilayered to only be
decoded by human judges who then evaluate what is said or written. Qualitative
analyses, then, provide the researcher with broad impressions or agreed-upon de-
(e.g., Schiffrin 1994).
An alternative perspective is that features of language or word use can be
counted and statistically analyzed. Quantitative approaches to text analysis have
gained increasing popularity over the past half century (for reviews see Popping
2000, Smith 1992, Weber 1994, West 2001). The existing approaches can be cat-
egorized into three broad methodologies. Judge-based thematic content analyses
typicallyinvolvejudgeswhoidentify the presenceofcriticalthematicreferences in
text samples on the basis of empirically developed coding systems (Smith 1992).
Thematic content analyses have been widely applied for studying a variety of
psychological phenomena such as motive imagery (e.g., Atkinson & McClelland
1948, Heckhausen 1963, Winter 1994), explanatorystyles (Peterson 1992), cogni-
tive complexity (Suedfeld et al. 1992), psychiatric syndromes (Gottschalk 1997),
goal structures (Stein et al. 1997), arousal patterns associated with cultural shifts
(Martindale 1990), and levels of thinking (Pennebaker et al. 1990).
A relatively new approach, word pattern analysis, has emerged from the arti-
ficial intelligence community. Rather than exploring text “top down” within the
context of previously defined psychological content dimensions or word cate-
gories, word pattern strategies mathematically detect “bottom-up” how words co-
vary across large samples of text (Foltz 1998, Popping 2000). One particularly
promising strategy is latent semantic analysis (LSA) (e.g., Landauer & Dumais
1997), which is akin to a factor analysis of individual words. By establishing the
factorstructure of worduse within a largenumber of writing samples, it is possible
to learn howany new writing samples are similar to one another. Traditionally, this
technique has been used to determine the degree to which two texts are similar in
terms of their content.
The third general methodology prominent in quantitative text analysis focuses
on word count strategies. Psychological word count strategies exist for both the
sometimes require rather complex linguistic analysis (e.g., active versus passive
voice or metaphoric language use), most current approaches involve simple word
counts, such as standard grammatical units (personal pronouns, prepositions) or
psychologically derived linguistic dimensions (e.g., emotion words, achievement-
related words). Word count strategies are based on the assumption that the words
people use convey psychological information over and above their literal meaning
and independent of their semantic context. Although some language researchers
consider this assumption problematic, others see unique potentials in analyzing
word choice because of judges’ readiness to “read” content and their inability to
monitor word choice (e.g., Hart 2001). With only one exception (Weintraub 1989),
the most commonly used approaches presented below are computer based. In this
section we briefly review six widely used methods that have evolved from very
different theoretical perspectives.
The General Inquirer
Developed by Stone and colleagues in the early 1960s, the General Inquirer (Stone
General Inquirer is a compilation of a set of rather complex word count routines.
It was designed as a multipurpose text analysis tool that was strongly informed
by both need-based and psychoanalytic traditions. Historically, three thematic
dictionaries, the Harvard III PsychosociologicalDictionary, the Stanford Political
the Need-Achievement Dictionary receiving special attention in psychology. The
Need-Achievement Dictionary was created in an attempt to replace the complex
judge-based scoring of achievement imagery in thematic apperception test (TAT)
stories by computerized content analysis.
The General Inquirer goes beyond counting words. In a two-step process it
first identifies so called homographs (ambiguous words that have different mean-
ings depending on the context). It then applies a series of preprogrammed dis-
ambiguation rules aimed at clarifying their meaning in the text. For example,
human judges score the statement “He is determined to win” as achievement im-
agery. The General Inquirer identifies the word “determined” as an ambiguous
NEED word and “win” as an ambiguous COMPETE word (because they both
can have non-achievement-related meanings) and codes a statement as achieve-
ment imagery only if both aspects are present and occur in the NEED-COMPETE
The General Inquirer is unique in its flexibility. It can be used to study virtu-
ally any topic of interest by creating a user-defined dictionary. Its most critical
advantage, the power to perform context-dependent word counts, is also its most
serious pragmatic drawback. The construction of a custom dictionary with the
specification of disambiguation rules is time consuming and in many cases not
worth the extra effort (as compared with simple word counts). Nevertheless, it is
not overstated to say that the General Inquirer has given birth to and still continues
to shape the scientific field of computerized text analysis.
Analyzing Emotion-Abstraction Patterns: TAS/C
Mergenthaler and his research group realized the need for computer-assisted text
analysis when trying to characterize key moments in psychotherapy sessions.
Based on Bucci’s (1995) referential cycle model, they developed a computer
program called TAS/C that focuses exclusively on two language dimensions,
emotional tone and abstraction. According to the theory, emotion-abstraction pat-
terns occur periodically in psychotherapy sessions with insight processes (ab-
straction) following emotional events (emotion) with a time lag (Mergenthaler
For the analysis of emotional tone, defined as the density (rather than the va-
lence) of emotion words in a given text, a dictionary was developed that contains
more than 2000 items. The final restricted list of emotion words captures the three
dimensions of pleasure, approval, and attachment which account for roughly 5%
of the words of a text (Mergenthaler 1996). Abstraction is defined as the amount of
abstract nouns in a given text. Abstract nouns are identified via the use of suffixes
such as -ity, -ness, -ment, -ing or -ion. The abstraction dictionary includes 3900
entries and accounts for about 4% of the text. There is no overlap across the two
TAS/C analysis of emotion-abstraction patterns has been successfully applied
to verbatim therapy protocols (Mergenthaler 1996) and attachment interviews
(Buchheim & Mergenthaler 2000). More recently, TAS/C has been extended to
include a measure of referential activity(Bucci 1995). Referential activityrefers to
theabilitytoverbalizenonverbal experiences,characterizedinspeechbyconcrete-
ness, specificity, clarity, and imagery (Mergenthaler & Bucci 1999). It is captured
by counting third person pronouns and prepositions. The TAS/C approach to ana-
lyzinglanguageistheorydrivenand limited to a verynarrow spectrum of linguistic
Weintraub’s Analysis of Verbal Behavior
At the core of Weintraub’s (1981, 1989) explorations into what he calls verbal
mannerisms lies the clinical observation that individuals speaking under stress
often reveal important information about their degree of psychological adap-
tation. Drawing on his medical experience, Weintraub argued that psychologi-
cal defense mechanisms manifest themselves in speech patterns obtained under
mildly stressful conditions. For the assessment of these defense mechanisms,
he developed a standardized procedure for sampling naturally occurring lan-
guage. Participants are asked to talk into a microphone for 10 minutes on any
The transcripts are then submitted to a linguistic analysis. Unlike other word
count approaches, Weintraub’s linguistic analysis is performed by na¨ıve judges
who “can score...[the transcripts] without extensive knowledge of lexical mean-
ing.” (Weintraub 1989, p. 11). The linguistic features he and his colleagues have
been interested in are largely intuitively derived and are drawn from clinical expe-
riences of how psychopathology surfaces in patients’ language use. Weintraub’s
most recent work has focused on 15 linguistic dimensions including three pronoun
categories (I, we, me), negatives (e.g., not, no, never),qualifiers (kind of, what you
might call), expressions of feelings (e.g., I love, we were disgusted), and adverbial
intensifiers (really, so).
Weintraub has explored people’s verbal behavior in multiple ways. In addition
to his main field of interest, the language of psychopathology, he also analyzed
the Watergate transcripts, characterized speaking styles ofpost–World War II U.S.
presidents, identified linguistic correlates of intimacy, and related language use to
personality. Overall, Weintraub’s approach can be considered stylistic. A strong
emphasis is put on research that is clinically relevant and can inform psychoana-
lytically oriented psychotherapy.
Analyzing Verbal Tone with DICTION
Researchers in the area of language use in politics generally tend to focus on the
content of political speeches (Winter 1973, Zullow et al. 1988). Roderick Hart
(1984) is a communication researcher concerned with the subtle power of word
choice. Over the past two decades he has developed and refined a computerized
wordcount program called DICTION (Hart 2001). DICTIONis designed to reveal
the verbal tone of political statements by characterizing text on five statistically
independent master variables: activity, optimism, certainty, realism, and common-
ality. The rationale behind these master variables is that “if only five questions
could be asked of a given passage, these five would provide the most robust
understanding.” (Hart 2001, p. 45). The five master variables are composed of
35 linguistic subfeatures (e.g., optimism: praise, satisfaction, inspiration, blame,
hardship, denial).
DICTION relies on 10,000 search words that are assigned to the categories
without overlap. The output is either a profile of absolute values or norm scores
based on 20,000 samples of verbal discourse. Special features of DICTION are the
abilityto “learn,” i.e. update itsdatabasewith everyprocessedtext,anda statistical
weighting procedure for homographs, words that are spelled the same but have
differentmeanings. DICTION has been used to analyze presidential and campaign
speeches,political advertising,public debates, andmediacoverage.The DICTION
approach is style focused and attempts to cover a broad range of linguistic aspects.
Linguistic Inquiry and Word Count
Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al. 2001) was orig-
inally developed within the context of Pennebaker’s work on emotional writing
(Pennebaker & Francis 1996, Pennebaker et al. 1997). It was designed to discover
which features of writing about negative life experiences could predict subse-
quent health improvements. More recently the use of LIWC has been expanded to
tracking language use in text sources spanning classical literature, personal narra-
tives, press conferences, and transcripts of everyday conversations (Pennebaker &
Graybeal 2001).
LIWC uses a word count strategy whereby it searches for over 2300 words or
word stems within any given text file. The search words have previously been cat-
egorized by independent judges into over 70 linguistic dimensions. These dimen-
sionsinclude standardlanguagecategories(e.g.,articles,prepositions, pronouns—
including first person singular, first person plural, etc.), psychological processes
(e.g., positive and negative emotion categories, cognitive processes such as use
of causation words, self-discrepancies), relativity-related words (e.g., time, verb
tense, motion, space), and traditional content dimensions (e.g., sex, death, home,
occupation). The LIWC dimensions are hierarchically organized. Forexample,the
word “cried” would fall into the categories “sadness,” “negative emotion,” “over-
all affect,” and “past-tense verb.” The program is sufficiently flexible to allow for
user-defined categories as well.
Whereas some of the LIWC categories were initially derived from psycholog-
ical theories (e.g., inhibition words, discrepancy words), most categories try to
capture information at a very basic linguistic (e.g., pronouns, articles, preposi-
tions) as well as psychological level (e.g., positive emotions, negative emotions,
cognitive words). In its current version LIWC has been most effective in tracking
stylistic aspects of language use. However,researchers can use the traditional con-
tent categories (e.g., achievement, religion, sexuality) as well as create their own
user-defined dimensions.
Biber: Factor Analyzing the English Language
Although Biber’s (1988) work on language use was developed as a tool to under-
stand the English language, it has important implications for psychology. Biber,
an English professor, undertook an extensive empirical investigation in which he
studied which linguistic dimensions emerge when discourse function rather then
grammatical function is taken as the organizing principle. The purpose of this
inductive approach was to factor analyze language and identify linguistic dimen-
sions that would constitute a useful framework for describing language variations
across different text types and genres.
Biber’s study comprised two separate steps. The first sampled text from 23
spokenand written genres such as science fiction, humor, and press reports. A total
of 481 texts with almost 1,000,000 words were submitted to a broad computerized
interest. Among the 67 selected variables were pronouns, adjectives, adverbs,
adverbials, tense markers, nominalizations (words with -tion, -ment, -ness, or -ity
suffixes), passive voice, and negations.
In the second step Biber submitted these 67 linguistic variables to a factor anal-
ysis. Generally words are considered to cluster together according to their gram-
matical function (e.g., pronouns, articles, prepositions). Biber’s factor analytic
approach clustered word patterns according to their natural co-occurrence. This
providedusefulinformationofa commondiscoursefunctionbehind certainwords.
Passive voice, for example, tends to statistically co-occur with nominalizations
(Biber 1988). This then can help determine the role of words in creating the tone
or character of a specific type of text. Biber found 6 general factors: informational
versus involved production, narrative versus nonnarrative concerns, explicit ver-
sus situation-dependent reference, overt expression of persuasion, abstract versus
nonabstract information, and on-line informational elaboration. He later demon-
strated that the factors could separate the different linguistic genres of writing.
Biber’s analyses are groundbreaking in that they restructure the English language
according to how it is used in text across different written and spoken genres.
Summary and Evaluation
Word count strategies count words within a given text sample irrespective of the
context in which the words occur. They have an undisputed advantage of being
able to perform reliably and efficiently with the use of computers. Word count
approaches differ among each other in their specificity, i.e., in their attempt to
either capture a maximum of words in a given text (e.g., Biber’s approach, LIWC)
orconcentrate on only some linguistic aspects (e.g., TAS/C,need-achievement).In
a compelling review ofwordcount approaches, Hart (2001) comparesjudge-based
or discourse approaches with more detached word count strategies by drawing on
a metaphor of two people trying to understand a city by driving on the streets
or viewing it from a helicopter. Both get quite different—but equally valid—
pictures of a city. Whereas the helicopter is likely to miss details at the corner
of a specific street, it is able to pick up differential patterns of light. Whereas
word count approaches sometimes miss what elementary students could see, they
provide linguistic information “from a distance,” a distance that normal readers
do not have because it is virtually impossible to ignore what is being said and
concentrate on how something is said.
Clearly, there have been a variety of theoretical and methodological approaches to
A research approach that takes advantage of linguistic styles has not been a staple
of most current social, personality, or clinical perspectives. In this section we
stand back and summarize psychological features of relatively natural word use.
The psychometrics of word use are examined—with particular attention to words
that tap linguistic styles. Some of the basic dimensions of word use are then
demonstrated to be related to a variety of individual difference variables, such as
demographic markers, traditional personality measures, and differences in mental
and physical health.
Psychometric Properties of Word Use
The first step in exploring the links between word use and various individual
difference markers is to establish the psychometrics of words themselves. That
is, do people’s word usage patterns fulfill the basic psychometric requirements
of stability across time and consistency across context. Several investigators have
begun to address this problem.
Gleser et al. (1959) had people talk for 5 minutes about an interesting life ex-
perience and obtained a measure of internal consistency by calculating split-half
reliabilities. Across 21 language categories (e.g., word count, adjectives, substan-
tives, pronouns, feelings) the average correlation between successive 2-minute
intervals was 0.51, providing the first evidence that word choice is stable within a
very short time frame. Using the General Inquirer approach, Schnurr et al. (1986)
provided further support for the temporal stability of language use by reporting
high within-person rank order correlations for the 83 variables of the Harvard III
dictionary over a period of one week.
Pennebaker & King (1999) analyzed a large body of text samples taken from
diaries, college writing assignments, and journal abstracts written across days and
even years and demonstrated good internal consistency (across text type) for 36
language dimensions. The language variables were taken from the LIWC dictio-
nary and comprised standard linguistic dimensions (e.g., articles, prepositions,
pronouns) as well as broader psychological concepts (e.g., emotion words, causa-
tion words, words indicating social processes). Across several studies, word use
in written language emerged as reliable across time, topic, and text source.
In a recent naturalistic field study, Mehl & Pennebaker (2002a) sampled stu-
dents’ everyday conversations twice for two days separated by 4 weeks using a
newlydevelopedminimally intrusiverecording devicecalled the electronically ac-
tivated recorder (EAR) (Mehl et al. 2001). Again, the linguistic analyses showed
that students’ spontaneous worduseis stable overtime (averagetest-retest correla-
tionforstandard linguistic variables:r = 0.41,psychological processes: r = 0.24)
and consistent across social context (e.g., worduse at home versusin public places
or in an amusement versus work context). These last two studies provide partic-
ularly promising evidence, as they demonstrate reliability based on an extremely
large body of text samples (Pennebaker & King 1999) and spontaneous word use
sampled from the entire spectrum of participants’ everyday real life conversations
(Mehl & Pennebaker 2002a).
Taken together, existing studies on the psychometrics of word use suggest that
people’s word choice is sufficiently stable over time and consistent across topic
or context to use language as an individual difference measure. This is true for
both basic grammatical categoriesas well as more psychologically based language
Demographic Variables
With language use fulfilling the psychometric properties of an individual differ-
ence marker, are there basic differences in word use as a function of age and
AGE Whereas a fair amount of research exists on discourse and aging (Coupland
&Coupland 2001), virtually no studies haveaddressed howwordusechangesover
the life-span. In two overlapping projects Pennebaker & Stone (2002) explored
the links between language use and age. In a cross-sectional analysis, multiple
written or spoken text samples from disclosure studies from over 3000 research
participants from 45 different studies representing 21 laboratories in 3 countries
were subjected to computer text analyses to determine how people change in their
use of 60 text dimensions as a function of age. A separate longitudinal project ana-
lyzed the collected works of 10 well-known novelists, playwrights, and poets who
lived in the past 500 years. The results of the two projects converged in that both
studies found pronounced differences in language use over the life-span. Whether
famous authors were expressing themselves through their literature, experimental
research participants were writing about traumatic experiences, or control partic-
ipants were writing about their plans for the day, people exhibited remarkably
consistent changes in their linguistic styles. With increasing age, individuals used
more positive emotion words, fewer negative emotion words, fewer first person
singular self-references, more future tense, and fewer past tense verbs. Age was
also positively correlated with an increase in cognitive complexity (e.g., causation
words, insight words, long words). In addition to challenging some of the cultural
stereotypes on aging, these results suggest that language use can serve as a subtle
linguistic age marker.
GENDER In contrast to other demographic variables, the link between word use
and gender has been extensively studied. Differences in women’s and men’s lan-
guage have received widespread attention within the scientific community as well
as in the popular media. Lakoff (1975) published a seminal work relating gender
differences in language use to differential access to social power. She argued that
women’s lack of power in society results in their using a less assertive speech
that manifests itself in a higher degree of politeness, less swearing, more frequent
tag questions (e.g., “It is ..., isn’t it?”), more intensifiers (e.g., really, so), and
more hedges (e.g., sort of, perhaps, maybe; also known as qualifiers or uncertainty
words). Other early literature reviews (Haas 1979, Jay 1980) generally supported
these findings. Overall, men were more directive, precise, and also less emotional
in their language use.
Recently, Mulac et al. (2001) summarized the findings of more than 30 empir-
ical studies and reported relatively unambiguous gender effects for 16 language
features. According to this, typical male language features include references to
quantity, judgmental adjectives (e.g., good, dumb), elliptical sentences (“Great
picture.”), directives (“Write that down.”), and “I” references. Typical female lan-
guage features among others comprise intensive adverbs (e.g., really, so), ref-
erences to emotions, uncertainty verbs (seems to, maybe), negations (e.g., not,
never), and hedges. Contrary to Lakoff (1975), no consistent gender differences
were found in tag questions. Also, this review did not find that men and women
reliably differed in their use of first person plural or second person pronouns as
well as filler words in their natural speech (e.g., you know, like).
In evaluating these results, it is important to consider that, despite the com-
paratively large number of studies that went into the review, some of the findings
are based on only a couple of studies with sometimes rather small language sam-
ples and only one text source. The evidence for men using “I,” “me,” and “my”
at a higher rate than women, for example, comes from two studies conducted
by Mulac and his colleagues that derived language exclusively from nonpersonal
writingssuchaspicture descriptions (Mulac & Lundell 1994) andfourth-gradeim-
(2001) review—reported significantly higher first person singular self-references
forwomenintranscriptsoforal narrativesaboutaninteresting ordramaticpersonal
life experience. Similarly, Pennebaker & King (1999) also found a higher use of
“I,” “me,” and “my” in female students’ stream of consciousness and coming-to-
college writings.
Mehl&Pennebaker(2002a)sampleddaily conversations of 52 college students
in their natural environment using the EAR technology (Mehl et al. 2001). Over-
all men, compared to women, in their everyday conversations used nearly four
times the amount of swear words and considerably more big words (consisting of
more than six letters), anger words and articles. Women used more filler words
(e.g., like, well), more discrepancy words (would, should, could), and more refer-
ences to positive emotions, though not more emotion words in general. Again, the
transcripts of women’s spontaneous everyday speech contained more first person
singular references.
Across the various studies women’s and men’s language differs on a variety of
dimensions. Whereas these differences are consistent with a sociological frame-
work of gender differences in access to power, at least some of them are also
open to alternative explanations such as women’s higher social engagement (e.g.,
Maccoby 1990). Despite the comparatively large number of studies available on
gender differences in language use, no clear picture has yet evolved. Future re-
search must more carefully consider and distinguish among different language
sources. Is the data based on written or spoken language? Directed or spontaneous
speech? Were same-sex or opposite-sex interactions sampled? Was the language
derived from personal or nonpersonal, emotional or neutral material? Also, be-
cause language use is an inherently social phenomenon, one has to consider po-
tentially bi-directional effects. In a recent study of e-mail conversations, Thomson
et al. (2001) showed, for example, that participants spontaneously accommodate
to gender-preferential language used by their conversation partners.
Traditional Personality Measures
of personality. Several researchers have echoed this observation (e.g., Furnham
1990, Scherer 1979, Weintraub 1989). Although the empirical support is growing,
the research linking self-reports of personality and word use is still in its early
THE BIG FIVE To our knowledge, only one study has attempted to correlate word
use to the Big Five personality dimensions (self-reports of extraversion, neuroti-
cism,agreeableness,conscientiousness, and openness to experience).Usingmulti-
found modest but reliable effects of personality on word choice, with correlations
rangingbetween 0.10and0.16.Overall,neuroticismwaspositivelycorrelatedwith
use of negative emotion words and negatively with positive emotion words; ex-
traversion correlated positively with positive emotion words and words indicative
of social processes; agreeableness was positively related to positive emotion and
negatively to negative emotion words. In addition, neuroticism was characterized
by a more frequent use of first person singular, a finding that is consistent with
the idea that excessive use of first person pronouns reflects a high degree of self-
involvement (e.g., Davis & Brock 1975, Ickes et al. 1986, Scherwitz & Canick
1988, Stirman & Pennebaker 2001, Weintraub 1989).
MOODS AND EMOTIONS Only a handful of researchers have looked at how other
personality variables are linked to unique word choices. Weintraub (1981, 1989)
reported that an anxious disposition correlates with the use of first person sin-
gular and a high amount of explainers (e.g., because, since, in order to) and
negatives (e.g., no, not, never). Self-ratings of anger are associated with an ab-
sence of qualifiers and a high use of negatives, rhetorical questions, and direct
references to other objects or people. Weintraub (1989) also found that a domi-
nant personality was associated with a high rate of commands, interruptions, and
NEED STATES Pennebaker& King(1999)examinedthelinguisticcorrelates of the
needs for achievement, power, and affiliation. Whereas the language of achieve-
ment motivation assessed with a TAT measure was characterized by a low degree
of immediacy (few first person singular pronouns, frequent use of articles, long
words, and discrepancy words), an orientation towards the social past (frequent
use of past tense and social words, infrequent use of present tense and positive
emotions) and a lack of rationalization (infrequent use of insight and causation
words, frequent use of negative emotion words), no such pattern was obtained
with the Personality Research Form (PRF) measure of achievement orientation.
tors, either with the TAT or with the PRF measure. Somewhat counter-intuitively,
participants high in TAT-based need for affiliation scored low on the social-past
language dimension, which suggests that whereas they frequently used positive
emotion words and present tense, they did not use many social words and past
tense. In the PRF measure the need for affiliation was negatively correlated with
making distinctions.
ship between language and stable aspects of the self? In an attempt to evaluate
different measures of implicit self-esteem, Bosson et al. (2000) had participants
write for 20 minutes about their deepest thoughts and feelings. Participants’ ex-
plicit self-esteem assessed with various self-report scales correlated (sometimes
marginally) with the use of negative emotion words. Use of self-references, how-
ever, were unrelated to both explicit and implicit measures of self-esteem.
Ickes et al. (1986) sought to discriminate between two conceptually related
psychological constructs, Machiavellianism and self-monitoring. Coding partici-
pants’ personal pronoun use in unstructured dyadic interactions, they found that
Machiavellianism was related to an increased self-focus as reflected by more fre-
quent use of first person singular pronouns. In contrast, however, self-monitoring
was characterized by an increased other-focus as indicated by participants’ higher
use of second and third person pronouns. The analysis ofspontaneous word choice
in this study was important in identifying a linguistic marker of focus of atten-
tion. This marker helped in clarifying a subtle but critical distinction between two
related impression management strategies.
SUMMARY Although self-reports of personality are often associated with word
use, the magnitudes of the relationships are surprisingly small. One explanation
is that personality self-reports reflect people’s theories of who they are. A self-
theory can often be at odds with the ways people present themselveslinguistically.
Indeed, in at least two studies in which people either wrote about emotional topics
(Pennebaker & Francis 1996) or talked about themselves on camera (Berry et al.
1997), judges’ ratings of the emotionality of the text samples were more highly
correlated with language use than with the writers’ or speakers’ self-reports of
emotionality. This raises the traditional question about the “gold standard” of
personality or emotionality: Should we rely on what people say about themselves
or what others say about them (e.g., Hofstee 1994)?
Mental Health and Psychopathology
Does language carry diagnostic information about a person’s mental health? Is
there evidence for distinct psychopathological linguistic styles? The link between
language use and clinical disorders has captured researchers’ interest for more
than 70 years and has resulted in a comparatively large number of clinical case
studies as well as empirical investigations (for reviews see Jeanneau 1991, Rieber
& Vetter 1995).
GENERAL PSYCHIATRIC DISORDERS Oxman, Rosenberg, and their colleagues en-
gaged in an extensive enterprise to use the General Inquirer as a diagnostic tool for
psychiatric disorders. In a series of studies theyshowedthat computerized linguis-
tic analyses of speech samples are capable of reliably and accurately classifying
patients into diagnostic groups, such as schizophrenia, depression, paranoia, or
somatization disorder (e.g., Tucker & Rosenberg 1975, Oxman et al. 1982). In a
comparison of the computer diagnosis against the diagnosis of professional psy-
chiatrists, the computer diagnosis emerged as superior to clinicians’ unstructured
reading of the transcripts of patients’ speech (Oxman et al. 1988).
DEPRESSION AND SUICIDALITY In a study of the spontaneous speech of five el-
derly depressed individuals, Bucci & Freedman (1981) found depression to be
related to an elevated use of first person singular pronouns and a lack of second
personand third person pronouns. Theauthorsinterpret these findings as reflecting
a weakness in connecting to others. Similarly, Weintraub (1981) found that when
depressed people are asked to talk about any personal topic for 10 minutes, they
use “I” at a higher rate than healthy individuals. Rude et al. (2002) confirmed
this linguistic self-focus in depression for written language use. In their study,
currently depressed students compared with never depressed students used signif-
icantly more first person singular pronouns in their personal essays. Interestingly,
the effect was exclusively produced by a higher use of the word “I.” The use of
“me,” “my,” and “mine” was comparable between the two groups.
Stirman & Pennebaker (2001) sought to learn whether suicidal ideation could
be linguistically detected. In an archival study, they compared the language use of
18 suicidal and nonsuicidal poets based on the corpus of their work over their
careers. In line with a social disengagement model of suicide, suicidal poets
were found to use first person plural pronouns at a lower and first person sin-
gular pronouns at a higher rate. They also made fewer references to other people
and used more words associated with death. Finally, Lorenz & Cobb (1952) an-
alyzed the language of 10 manic patients and also found that manics use first
person singular references at a higher rate. Taken together, the convergent re-
sults from studies of depression, suicidal ideation, and mania suggest that affec-
tive disorders are characterized by a high degree of self-preoccupation. Attention
habitually focused on the self linguistically surfaces in a more frequent use of
the first person pronouns such as “I,” “me,” and “mine” (e.g., Davis & Brock
SUMMARY Despite the sometimes conflicting results, language use can be an at-
tractive as well as subtle diagnostic marker. Future clinical studies should be more
rigorousinspecifyingclear clinicalinclusioncriteriaandmust relyonwell-defined
or standardized language samples. Better control conditions are also needed that
allow inferences about the uniqueness of word use patterns in clinical versus non-
clinical populations. Finally, it is necessary to shift toward a more theoretically
fueled approach that helps explain the links between psychopathology and lan-
Physical Health and Health Behavior
Can language use inform us about physical health from an individual difference
linking language use to physical health are sparse. However, a small group of
studies hints that features of disease- and/or health-related behaviors may be tied
to language use.
HEART DISEASE PRONENESS In a series of studies, Scherwitz (for reviews, see
Scherwitz et al. 1985, Scherwitz & Canick 1988) has linked self-involvement to
the Type A behavior pattern and coronary heart disease (CHD) outcomes. Self-
involvement is operationalized as the frequency and density with which a person
usesfirstpersonsingular pronounsinansweringthequestions duringthestructured
Type A interview. Results indicate that Type A is positively correlated with the
use of first person singular. Of more clinical importance, however, are the findings
that first person pronoun use in the structured interview is also related to systolic
and diastolic blood pressure, coronary atherosclerosis, and prospectively to CHD
incidence and mortality. Interestingly, the relationship between self-involvement
and CHD outcomes in most cases remained significant even after statistically
controlling for traditional risk factors such as age, cholesterol, cigarette smoking,
and Type A behavior (Scherwitz et al. 1985, Scherwitz & Canick 1988).
MORTALITY Drawing on the growing body of evidence that positive emotional
processes can impact health in a salutary way, Danner et al. (2001) analyzed auto-
biographicalsketchesfrom80nuns writtenintheir early20sforemotionalcontent.
A strong positive relationship between the number of positive emotion words and
life expectancy emerged from the longitudinal data. Although impressive, it again
raises the question about which kind of language samples predict which kind of
psychological and physical phenomena. Does the factthatnuns used morepositive
emotion words in a carefully produced one-page essay mean they approach their
world in a more positive way in general or is this positivity effect restricted to
specific verbal samples only?
Taken together, very few studies have linked word choice, physical health,
and health behaviors. The findings, however, are encouraging considering that
simply knowing how often an individual uses the words “I,” “me,” and “my”
can provide important information about a risk for future CHD or that simply
15 Nov 2002 18:13 AR AR178-PS54-21.tex AR178-PS54-21.sgm LaTeX2e(2002/01/18) P1: FHD
about that person’s life expectancy—information with substantial real-life social
What we say and how we say it changes depending on the situation we are in.
Piaget (1926) and other early developmentalists (e.g., McCarthy 1929) noted that
young children changed the ways they spoke depending on the context of their
interactions. As adults, we know that we use different words when addressing an
audience of our peers versus when talking with a close friend. Although research
on how language varies as a function of social situations has been systemati-
cally addressed in psychology and sociology, very little has relied on word use
per se.
Perhaps the first in depth discussion of situational and social variations in lan-
guage was by Goffman (1959) in his Presentation of Self in Everyday Life. Draw-
ing on dramaturgical metaphors, Goffman argued that we all play different roles
depending on the situation. In his analyses of groups, for example, Goffman sug-
gested that voice characteristics and other nonverbal and paralinguistic cues shift
depending on the formality of the situation, the nature of the audience, and the
degree to which the speaker is integrated with or excluded from the other ac-
tors. Although he did not focus on the words people used, his work served as an
important foundation.
Later research attempted to define which dimensions within social situations
are most likely to be associated with language and, eventually, word usage. Hymes
(1974), an anthropologist and a founder of sociolinguistics, argued that anyspeech
act must be considered within eight dimensions ranging from the setting of the
utterances, who the participants were, the goals of the interaction, etc. Other re-
searchers such as Brown & Fraser (1979) and Forgas (1985) expanded on the idea
of developing taxonomic structures of situations to help identify when and how
language shifted. Psychological dimensions of situations related to language and
communication included the situation’s formality, cooperativeness, and involve-
ment. Note that these approaches focused more on the nature of the interactions
than on the word usage (Forgas 1985).
Formal Versus Informal Settings
Perhaps the most studied situational variations in the use of language have been
between formal and informal situations. In addition to some of the early work on
code switching, more recent research on politeness and verbal immediacy mark
word shifts as a function of setting.
Code switching refers to changes in language, dialect, accent, or even forms
of address that occur—often automatically—among interactants. Among U.S.
Spanish-English bilinguals, for example, it is common for individualsto use Span-
ish in informal social settings and English in more formal situations. Analyses of
bilingualradio programs suggestthatspeakersmay switchtoSpanish when talking
about emotional topics and English when discussing work, finances, or politics.
Parallel findings can be seen in the use of the formal versus informal “you” in
Spanish (Usted versus tu), French (vous versus tu), and German (Sie versus Du)
(Brown & Gilman 1960, Sebeok 1960, Vaes et al. 2002).
Inherent in formal settings are disparities in power among interactants and an
adherence to culturally proscribed norms of behavior. Goffman (1967) suggested
that within such status-discrepant situations, individuals engage in “dramaturgic”
work to sustain and enhance their public face. Brown & Levinson’s (1987) polite-
ness theory takes into account an individual’s efforts to preserve the “face(s)” of
others with whom one communicates. Whereas politeness theory is comprised of
specific linguistic strategies to minimize threat to another’s face, most studies are
concerned with these tactics at the phrase level. Typically, the corpus of language
is independently coded by human judgesnoting the frequencyof each tactic. How-
ever, in many of Brown & Levinson’s tactics word-level markers of politeness can
be parsed out. For example, they propose impersonalizing the speaker and hearer
by avoiding the pronouns I and you, using past tense to create distance and time,
diminishing the force of speech by using hedge wordssuch as perhaps, using slang
to convey ingroup membership, and using inclusiveforms (we and let’s) to include
speaker and hearer.
In an interesting application of the language of politeness in organizational
studies, Morand (2000) had participants engage in laboratory role-plays in which
they were required to address a hypothetical other of a given high or low sta-
tus. Morand then independently coded the transcripts for the presence of polite-
ness tactics. At the word level participants used more hedge words, past tense,
subjunctive, formal words, honorifics (sir, Mr.), and apologies. Similar word-
level findings are embedded in the phrases detected in the majority of polite-
ness studies (Ambady et al. 1996, Brown & Gilman 1989, Brown & Levinson
A separate group of studies has found support for the centrality of the for-
mal/informal dimension based on inductive analyses of language use. Wiener &
Mehrabian (1968) and Mehrabian (1971) posited that a basic dimension to lan-
guage was verbal immediacy. Individuals who were verbally immediate tend to
use the present tense, are more personal in their interaction, and draw on the
speaker and audience’s shared realities. Markers of verbal immediacy were found
to be more common in informal settings than in formal ones. Interestingly, paral-
lel and independent findings have been reported by two other labs. Biber (1988),
in his factor analysis of words, considered his first factor to be a marker of for-
mality/informality. Words that loaded on the factor included first person singular
and present tense verbs. Indeed, speech samples high on the informality factor
tended to be personal conversations or informal writing samples. Using a much
larger and homogeneous sample of students’ writings, Pennebaker & King (1999)
also found that the first and most robust factor was immediacy, which included
first person singular, present tense verbs, short words, discrepancy words (would,
should, could), and the non-use of articles.
Deception and Honesty
One of the more productive arenas for exploring word use has been in the de-
ception literature. Multiple labs have attempted to discover if people change the
ways they talk when being honest versus deceptive. In general, three classes of
word categories have been implicated in deception: pronoun use, emotion words,
and markers of cognitive complexity. Knapp et al. (1974) found that liars often
owing to a lack of personal experience (see also Buller et al. 1996, Dulaney 1982,
Knapp & Comadena 1979, Mehrabian 1971). Similarly, Wiener & Mehrabian
(1968) reported that liars were more “non-immediate” than truth-tellers, and re-
ferred to themselves less often in their stories. In an analysis of five laboratory
studies wherein participants were induced either to tell the truth or to lie about
their thoughts or behaviors, truth-tellers consistently used a higher rate of first
person singular pronouns (Newman et al. 2002).
Other studies have found that when individuals are made to be self-aware
they are more “honest” with themselves (e.g., Carver & Scheier 1981, Duval &
Wicklund 1972, Vorauer & Ross 1999) and self-references increase (e.g., Davis &
Brock 1975). Finally, individuals who respond defensively (i.e., self-deceptively)
when discussing personal topics tend to distance themselves from their stories
and avoid taking responsibility for their behavior (Feldman Barrett et al. 2002,
Sch¨utz & Baumeister 1999, Shapiro 1989). In short, deceptive communications
are characterized by fewer first person singular pronouns (I, me, and my).
In addition to pronoun use, the act of deception is generally associated with
heightened anxiety and, in some cases, guilt. Several labs have found slight but
consistent elevations in the use of negative emotion words during deception com-
pared with telling the truth (e.g., Knapp & Comadena 1979, Knapp et al. 1974,
Newman et al. 2002, Vrij 2000).
Finally, some promising results suggest that markers of cognitive complexity
are associated with truth-telling. One such word category, referred to as exclusive
exclude. Exclusive words require the speaker to distinguish what is in a category
from what is not in a category. In the Newman et al. (2002) studies, truth-tellers
used far more exclusive words than did liars. In the act of deception, it is far too
complex to invent what was done versus what was not done.
Emotional Upheavals
During periods of stress, trauma, or personal upheavals, people shift in the
ways they think and express themselves. The words people use during stress-
ful times change as well. Several studies of both personal and shared traumatic
experience suggest that pronouns, emotion words, and other parts of speech subtly
PERSONAL UPHEAVALS Capturing people’s word use during times of personal cri-
sis is often difficult and ethically questionable. One strategy is to capture the on-
going speech of public figures during tumultuous and quiescent times. One recent
study examined the way Mayor Rudolph Giuliani spoke during his press confer-
his first 5 years in office he was generally viewed as hostile, uncompromising, and
cold. Indeed, LIWC analyses of his language in 14 press conferences during this
time indicated that he used a verylowrate of first person singular pronouns, a rela-
tively low rate of positive emotion words, and a high rate of big words. In his sixth
year of mayor he was diagnosed with prostate cancer, separated from his wife,
and withdrew from the senate race against Hilary Clinton—all within the space of
two weeks. In the weeks after these events the press reported that his personality
seemed to havechanged and that he was becoming a warm person. Analyses of his
press conferences duringthis time found that his use of first person singular almost
tripled, his use of positive emotion words increased slightly, and his language be-
came simpler. Ayear and a half later, in the aftermath of the September 11 attacks,
his language switched again. His first person singular pronouns dropped slightly
and his use of specific and inclusive first person plural pronouns increased. His
use of both positive and negative emotion words increased and his language re-
mainedsimplebutwith increasing cognitivecomplexity(asmeasuredbyexclusive
SHARED UPHEAVALS A common observation is that during a shared crisis, peo-
ple come together. Several studies have demonstrated that immediately after a
large-scale trauma individuals drop in their use of the word “I” and increase in
their use of “we.” In online chat groups immediately after the announcement of
the death of Princess Diana, for example, use of first person plural increased
by 135% and use of “I” dropped by 12% for approximately a week. By 10
days after the event pronoun use returned to normal levels (Stone & Pennebaker
More striking was an ongoing study of natural conversations that took place in
the weeks surrounding the September 11 attacks. Approximately 15 students wore
the electronically activated recorder (EAR) (Mehl et al. 2001) that recorded for 30
seconds every 12.5 minutes for up to two weeks after the attacks. All participants
had previously worn the EAR for at least 1–2 days within the weeks prior to the
attacks. The language analyses indicated that use of first person plural increased
and first person singular decreased for at least 5 days following September 11.
Interestingly, the use of “we” words was rarely in reference to the participants’
country, ethnic group, or other abstract entity. Rather, the use of “we” generally
referred to people in the participants’ immediate setting (Mehl & Pennebaker
Finally, analyses of the language used in the school newspaper of Texas A&M
in the weeks before and after a tragic bonfire accident showed comparable effects.
That is, first person plural pronouns doubled, as did the use of negative emotions,
and use of big words dropped by over 10% (Gortner & Pennebaker 2002).
Social Interactions
Inmost cases, when two people interact they use words. Remarkably little research
has been conducted on the ways the interactants use words with each other. An
exception to this is a study by Cegala (1989), who sought to identify linguistic
correlates of conversational engagement and detachment. In the study, 120 par-
ticipants who did not know each other were asked to engage in a brief casual
interaction with a same-sex peer. Participants were preselected on self-reported
dispositional involvement in interactions and high-high, low-low, and high-low
involvement dyads were created. Contrasts between the couple types showed that
highly involved couples used a higher amount of certainty expressions, a higher
degree of verbal immediacy, and more relational pronouns (we, us, our).
Beyond word use, numerous studies have pointed to the coordination of com-
municative behaviors during conversation. Indeed, the development of communi-
cation accommodation theory (Giles & Coupland 1991) has explored how indi-
viduals adapt to each other’s communicative behaviors in order to promote social
approval or communication efficiency. According to communication accommoda-
tion theory, individuals negotiate the social distance between themselves and their
interacting partners, creating, maintaining, or decreasing that distance. This can
be done linguistically, paralinguistically, and nonverbally. Specific accommoda-
tive strategies may include speech styles, speech rate, pitch, accent convergence,
response latency, use of pauses, phonological variations, smiling, or gaze. Most
tests of the theory have not focused on word use.
To our knowledge, only one project has explored linguistic accommodation at
the word level (Niederhoffer & Pennebaker 2002). In two studies from Internet
chat rooms, individuals getting to know one another in dyads exhibited linguistic
style matching on both the conversational level as well as on a turn-by-turn level.
This coordinated use of language occurs at a remarkably low level and includes
word count and use of articles, prepositions, affect words, and cognitive words.
These effects appear to hold up across the perceived quality of an interaction, the
length of the interaction, whether face-to-face or on an internet-like chat, whether
for experimental credit or, in the case of a separate analysis of the Watergate
transcripts, to avoid impeachment and imprisonment.
Yet another interesting domain in which to consider communication patterns
is within marital interactions. Gottman (1994) created couple typologies on the
basis of communication patterns. Similarly, Ellis & Hamilton (1985) proposed
that married couples can be distinguished by linguistic themes such as elaboration,
complexity, and personal reference (see also Acitelli 1992). However,the majority
of research is on a broader level than word use, per se.
One notable exception is the research of Sillars et al. (1997). Using a large
sample of married couples, these researchers analyzed the first 40 utterances of
discussions about marital problems. They found evidence for linguistic markers of
relational characteristics such as increased usage of “we” pronouns in traditional
(interdependent), satisfied and older married couples as compared to “I” usage in
more autonomous couples. Interestingly, marital relationship subtypes (tradition-
als, separates, or independents) did not vary in linguistic elaboration (words per
utterance, number of nouns and adjectives); however, language use was related to
education. More educated participants had longer utterances and used more qual-
ifiers. Similar research suggests that in less traditional couples there is increased
togetherthesefindingssupportthe idea that surface features oflanguagecarryrela-
tional meaning. Furthermore, personal pronoun use (I, we) in marital interactions
can reflect differences in the degree to which couples frame their relationship as
inter- or independent.
Since 1986 dozens of studies have demonstrated that writing about emotional
upheavals can affect people’s psychological and physical health. The typical dis-
closure studies require participants to write for 3–5 days for 15–30 minutes per
day about either emotional or superficial topics. The writing intervention has
been found to reduce physician visits for illness (e.g., Pennebaker & Beall 1986,
Smyth 1998), improve medical markers of health (e.g., Smyth et al. 1999), bring
about higher grades among students (Pennebaker & Francis 1996), and result
in higher re-employment rates among adults who have lost their jobs (Spera
et al. 1994). These effects have been found for individuals across multiple cul-
tures, age groups, and instructional sets (for a broad review, see Lepore & Smyth
Why does writing or talking about emotional upheavals affect physical and
psychological health? This question, of course, goes beyond the writing paradigm
andaddressesthebroader questionofwhypsychotherapyitselfis effective.Several
overlapping possibilities exist. One dealswith the construction of a narrative. That
is,individualswho write abouttraumasnaturally come toacoherent understanding
of the event. Further, this understanding is thought to be inherent in the cognitive
language of their disclosure. Other possibilities include changes in perspectives
when writing that may influence individuals’ social orientations.
Use of Cognitive and Emotion Words
One of the primary motivations for developing the LIWC program was to learn
if the language individuals use while disclosing emotional topics could predict
long-term health changes. Based on the Pennebaker & Francis (1996) pilot study,
wefoundthataparticular linguistic “fingerprint” was associated withreductionsin
physician visits following participation in the disclosivewriting. Those who wrote
about traumas were more likely to benefit if, over the 3 days of writing, they used
a high number of positive emotion words, a moderate number of negative emotion
words, and, most important, an increasing number of cognitive (i.e., causal and
insight) words from beginning to the last day of writing.
These effects were applied to six writing studies in a more systematic way (see
Pennebaker et al. 1997). Again, the same linguistic pattern predicted improved
health.Theimplicationsofthesefindingsareintriguing. First,use ofemotionterms
is moderately important. Positive emotion words are linearly related to health,
related (an inverse-U function). These findings support current views on the value
of optimism (e.g., Scheier & Carver 1985, Peterson et al. 1988). At the same time
the negative emotion findings are consistent with the repressive coping literature
(Jamner et al. 1988) in that those people who do not use negative emotion words
in describing traumatic events are at greater risk for subsequent health problems
than those who use at least some negative emotion words.
Most striking, however, are the relative effect sizes for changes in cognitive
words. An increasing use of cognitive words accounted for far more variance in
health improvement than did emotion words. These data, as noted below, suggest
that the construction of a story or narrative concerning an emotional upheavalmay
be essential to coping. Particularly exciting is that this pattern of effects has now
been reported by three independent labs. Keough et al. (KA Keough, J Garcia,
CM Steele, unpublished) found that cognitive change over a 2-week diary-writing
period was linked to health improvements. In a lab study with medical students,
Petrie et al. (1999) discovered that the more individuals’ cognitive word counts
increased over the 3 days of writing, the greater their lymphocytecounts after each
day after writing. Klein & Boals (2001) have reported that an increase in cognitive
worduse overthe daysof writing is linkedto measuresof greater working memory
up to 12 weeks after the study.
Use of Word Analyses in Psychotherapy
A small group of psychoanalytically oriented researchers have been interested
in the ways clients use language in therapy sessions. Bucci (e.g., 1995) and
Mergenthaler (1996) have separately and together (Mergenthaler & Bucci 1999)
theauthorsidentifiedthreecategoriesof wordsthatareeasilycapturedincomputer
analyses: emotional tone, abstraction, and referential activity. Using this coding
system, the authors argue that successful therapy requires clients to move from
tion. Indeed, analyses of selected psychoanalytic therapy sessions (Mergenthaler
1996) as well as written disclosure essays (Bucci 1995) support these predictions.
These patterns of effects are remarkably consistent with the LIWC analyses of
Pennebaker et al. (1997).
References to Self and Others: Pronouns and Perspectives
Asmentionedabove,analternativecomputer-basedapproachto linguistic analysis
such as latent semantic analysis (LSA) relies on more inductive ways of establish-
ing the pattern of word use (e.g., Landauer & Dumais 1997). This technique has
been used to determine the degree to which two texts are similar in terms of their
content. In theory, one might predict that the more similar the content of trauma
essays over the 3–4 days of writing, the more the person’s health would improve.
If one made such a prediction, however, one would be wrong. LSA analyses of
three writing studies failed to uncover any relationship between linguistic content
and health.
An alternativeway to think about writing is to focus on writing style as opposed
to writing content. Style is, to a large extent, determined by the most commonly
used words, referred to as particles—pronouns, articles, conjunctions, preposi-
tions, and auxiliary verbs. Interestingly, most LSA techniques routinely omit par-
ticlesbecause theydonot carry the same information as morecontent-heavynouns
and verbs. Across a series of style-based LSA analyses, we have discovered that
particles in general and pronouns in particular have been found to strongly corre-
late with health improvements. Basically, the more individuals shift in their use
of pronouns from day to day in writing, the more their health improves. Across
threeseparatestudies,pronounshifts among trauma writers correlated between 0.3
and 0.5 with changes in physician visits (Campbell & Pennebaker 2002). Closer
inspection of these data suggest that healthy writing is associated with a relatively
high number of self-references on some days but not others. Alternatively, people
who always write in a particular voice—such as first person singular—simply do
not improve.
to change perspective in dealing with an emotional upheaval may be critically
important. The data also indicate that pronouns may be an overlooked linguistic
dimension that could have important meaning for researchers in health and social
psychology. After all, pronouns are markers of self-versus group identity (e.g., I
versus we) as well as of the degree to which people focus on or relate to others.
Pronouns may provide insight into people’s level of social integration as well as
This review is intended to whet researchers’ appetite for the power of words in
natural language. From a methodological perspective, the analysis of word use is
simple, reliable, fast, and relatively inexpensive. In addition, samples of words are
readily available from open-ended questionnaire items, the Internet, emails, banks
oftextcorpora,and transcriptsofspokentext.Despite thepracticalityof measuring
worduse, manyofthebiggestquestionssurroundtheirmeaningandinterpretation.
In this final section we point to some intriguing and vexing questions raised by the
word use approach.
Which Words Should We be Studying?
tent. Markers of linguistic style are generally associated with relatively common
words such as pronouns and articles. Many of the more content-heavy words—
nouns, regular verbs, and modifiers—have not yielded many consistent social or
psychological effects. This may reflect the fact that linguistic content is heavily
dependent on the situation or topic the person is instructed to think or talk about.
Three general topics that are ripe for investigation are the analysis of particles,
emotions, and traditional content dimensions.
PARTICLES OR FUNCTION WORDS Particles (which include pronouns, articles,
prepositions, conjunctives, and auxiliary words) are remarkable for several rea-
sons. In the English language there are fewer than 200 commonly used particles,
yettheyaccountforoverhalfof the wordsweuse.Ofparticular relevance,research
on brain damage to the language areas suggests that particles are processed in dif-
ferent regions and in different ways than content words. For example, damage to
Broca’s area (a region generally associated with the left frontal lobe) often causes
patients to speak hesitantly using nouns and regular verbs but not particles. Dam-
age to Wernicke’s area (left temporal lobe) has been reported to cause individuals
to speak in a “word salad” wherein they use a high number of particles but with
very little content (Miller 1995).
Particles serve as the glue that holds content words together. But particles are
more than mere glue. They are referential words that have tremendous social and
psychological meaning. To use a pronoun requires the speakerand listener to share
a common knowledge of who the referent is. Consider the following: “John went
to the store to buy some bread. After getting it, he drove home.” The pronouns
“it” and “he” are place holders and represent the shared and temporary knowledge
that it = bread and he = John. Pronoun use requires a relatively sophisticated
awareness of the audience’s ability to track who is who. Prepositions are also
referential. To know the meaning of over, on, to, etc. demands that the speaker and
listenerhave arudimentaryunderstandingofthe relative,real,orsymbolic location
of the speaker. Similar arguments can be made about articles (the use of “a” versus
“the”) and conjunctions (but, which). More informal settings presuppose a shared
frame of reference (cf., Brown 1968). Particles, then, can be construed as having
tremendous social implications. From a Grice (1975) perspective, the discerning
particle user must have some degree of social and cognitive skill.
All particles, of course, are not equally interesting from a social or personality
psychology perspective. Of those that have emerged in the word use literature,
pronouns are among the most revealing. Use of first person singular, for example,
is associated with age, sex, depression, illness, and more broadly, self-focus. First
person plural can variously be a marker of group identity and, on occasion, a
sign of emotional distancing (Pennebaker & Lay 2002). Second and third person
pronouns are, bydefinition, markers to suggest that the speaker is socially engaged
or aware.
Future research must begin exploring the nature of pronouns and other particles
in much greater detail. For example, psychological researchers have naively as-
sumed that all first person singular pronouns are comparable. Even William James
(1890) argued that there were profound differences between the “active” I and the
“passive” me. In fact, factor analyses of individual pronouns often find that all
first person singular pronouns do not always load on the same factor (Campbell
& Pennebaker 2002). Some very basic psychometric work is needed on pronouns
and other particles at the word level to disentangle their mathematical and psycho-
logical meaning.
EMOTION WORDS Virtually every psychologically based text analysis approach
has started from the assumption that we can detect peoples’ emotional states by
studying the emotion words they use. The reality is that in daily speech, emotional
writing, and even affect-laden poetry, less than 5% of the words people use can
be classified as emotional. In reviewing the various word use studies, it is striking
how weakly emotion words predict people’s emotional state.
From an evolutionary perspective, language did not emerge as a vehicle to
express emotion. In natural speech we generally use intonation, facial expres-
sion, or other nonverbal cues to convey how we feel. Emotional tone is also ex-
pressed through metaphor and other means not directly related to emotion words.
Taken together, it is our sense that emotion researchers should hesitate before
embarking on studies that rely exclusively on the natural production of emotion
CONTENT WORDS AND THEMES Although not emphasized in this article, word
count strategies are generally based on experimenter-defined word categories.
These categories are based on people’s beliefs about what words represent. Hence,
they are ultimately subjective and culture bound. Content-based dictionaries that
areaimed at revealingwhatpeople are saying havenot yielded particularly impres-
sive results owing in large part to the almost infinite number of topics people may
bedealingwith. Withtherapidlydevelopingfieldofartificialintelligence, the most
promisingcontentor theme-based approaches to textanalysisinvolveword pattern
analyses such as LSA. These purely inductive strategies provide a powerful way
to decode more technical or obscure linguistic topics. For researchers interested
in learning what people say—as opposed to how they say it—we recommend this
new analytic approach.
The adoption of a word use approach to the analysis of naturally occurring written
or spoken language is fraught with problems. Virtually all text analysis programs
that rely on word counts are unable to consider context, irony, sarcasm, or even
the problem of multiple meanings of words. Many of the traditional problems
studied in communication, such as ingroup-outgroup status, formality of settings,
15 Nov 2002 18:13 AR AR178-PS54-21.tex AR178-PS54-21.sgm LaTeX2e(2002/01/18) P1: FHD
andrequests,arenot easily detected with wordcounts(cf.,Krauss& Fussell 1996).
In a discussion of the potential shortcomings of a computer program such as the
General Inquirer, Zeldow & McAdams (1993) have questioned whether lower-
level word counts can have true psychological meaning. Although this review
pointsto the covariationbetween word counts and meaning, no onehasyetdevised
a compelling psychological theory of word usage.
The words a person uses clearly have an impact on the listener or reader. Just
as the words people choose when talking or writing may betray their thoughts
and feelings, those words may be processed at a low or nonconscious level by the
listener or reader. Indeed, the speed by which we read or hear words like “the” or
“my” in a sentence competeswith traditional primes used inexperimentalorsocial
psychology. The presumed power of the media or of great speakers or writers may
et al. 1995).
Farmoretopics surroundingwordusehavebeenoverlookedthancoveredinthis
review. We have not discussed differences between English and other languages,
the differences between written and spoken language, or the difficulties of second
language learning (where most of us make errors in particle use rather than content
words). We have not mentioned issues such as intelligence, stereotype commu-
nication, language proficiency, or the early development of word knowledge and
Despite these shortcomings, the spotty history of word count approaches points
to their potential value in psychological research. Most of us are adrift in a sea
of words—from the time we awake listening to the radio, to reading the morning
paper, to talking with family, colleagues, and friends. And we are spitting out
words at almost the same rate at which we are taking them in. Words are a central
feature of social, clinical, personality, and cognitive psychology. It is time that we
started taking them a bit more seriously and using them as tools in understanding
who we are and what we do.
