ArticlePDF Available
Natural Language Use as a Marker of Personality
Molly E. Ireland
University of Pennsylvania
Matthias R. Mehl
University of Arizona
Address correspondence to Molly E. Ireland (, Annenberg
School for Communication, University of Pennsylvania, 3620 Walnut Street, Philadelphia, PA
19104, or Matthias Mehl (, Department of Psychology, University of
Arizona, 1503 E University Blvd., P.O. Box 210068, Tucson, AZ 85721.
Natural language has historically been integral to the study of personality. Yet, research on how
personality is revealed through language use has only recently gained momentum. This chapter
reviews research on how different aspects of personality are manifested in the way people use
words. The chapter provides the conceptual foundation for research on linguistic markers of
personality, discusses the psychometric properties of natural word use, and summarizes findings
on how the Big Five personality dimensions (extraversion, agreeableness, neuroticism,
conscientiousness, and openness), trait emotionality (trait negativity, trait anger, and trait
positivity), and psychopathology-related personality traits (Typ A, depression, narcissism,
machiavellianism) are linked to patterns of word use. With progress in stationary and mobile
computing technology and parallel advances in computational linguistics, the field is bound to
experience a strong growth over the next years.
Keywords: Computational Linguistics, Big Five, Linguistic Inquiry and Word Count, LIWC,
Word Use, Text Analysis, Linguistic Markers
Natural Language Use as a Marker of Personality
Natural language has been integral to the study of personality since the field’s inception.
Lexical approaches to personality factor analyze the words people use to describe others in order
to zero in on a small number of mostly-independent personality traits. Early on, this approach
generated the Big Five model that remains a major force in personality research today (Costa &
McCrae, 1992; Goldberg, 1981). More holistic narrative approaches code content themes in the
stories that people tell about their lives in order to understand how individuals construe their own
personality and represent it to others (McAdams & Pals, 2006).
Despite personality researchers’ early recognition that language and stories might
accurately describe personality, the idea that natural language use also reflects personality and
might provide knowledge about individual differences above and beyond self-reports did not
gain traction until the last decade. Before the computer revolution, Walter Weintraub (1981)
spent decades amassing data on natural language use – primarily by training coders to count
words and phrases by hand – with little recognition. Although some accepted his premise that
verbal behavior, like other actions, reflects psychological processes, the psychological
community was slow to adopt the burdens that came with carrying out linguistic research on
sufficiently large samples by hand (Mehl, 2006a).
The recent surge in the popularity of studying language in the social and behavioral
sciences stems in large part from technological advances in computational linguistics. The
internet and computer science more generally have made it possible to easily compile and
analyze natural language corpora with word counts climbing into the millions and billions. As of
2010, Google Books had digitized 12 percent of all books ever published and has made those
data available to interested researchers (Michel et al., 2010). Social networking and publishing
sites like Livejournal, and, more recently, Twitter allow researchers to download their users’
language use for free or for relatively low costs (e.g., Golder & Macy, 2011; Ramirez-Esparza,
Chung, Kacewicz, & Pennebaker, 2008; Yarkoni, 2010). Facebook similarly allows researchers
to access users’ status updates, although they tend to have stricter restrictions on usage than other
sites (e.g., Kramer, 2010; Schwartz et al., 2013). The near-universal use of the internet has
helped psychology widen its sampling nets for a wide range of purposes. The rise of the internet
has been particularly fruitful for language research: Observing verbal communication or other
language behavior online is simple. Within minutes, online social media users often produce
hundreds of units of objectively and easily quantifiable (i.e. text-based) behavioral data.
Perhaps more importantly, sophisticated text analysis tools allow researchers to analyze
the increasingly large and diverse datasets that technological advances have made possible. One
of the first text analysis tools to enter into common usage is a computationally simple word count
program called the Linguistic Inquiry and Word Count (LIWC; Pennebaker, Francis & Booth,
2007). LIWC is a computerized text analysis program that outputs the percentage of words used
in a text or batch of texts that fall into one or more of over 80 grammatical (e.g., articles, first-
person singular pronouns), psychological (e.g., positive emotion, insight), and topical (e.g,
social, sex) categories. It does so by comparing each word in a text against a set of internal word
lists or dictionaries. Although the psychometrically-developed content categories have a (minor)
subjective component, the grammatical and linguistic categories are, for the most part, based on
objective and factual information about the established lexical members of that category.
LIWC, along with its predecessors the General Inquirer (Stone, Dunphy, Smith, &
Ogilvie, 1966) and DICTION (Hart, 1984), focus on word frequencies alone irrespective of
context. For example, the sentences, “I’ve never been less happy” and “I’m the happiest person
alive” would be coded as containing identical proportions of positive emotion words despite the
very different meaning of each sentence. Critics have pointed out that, in examples like these, the
programs’ context blindness can lead to noisy or difficult-to-interpret results.
The role of word count programs, however, is not to obviate self- and observer reports
but to complement them, and to clarify the gaps and inconsistencies they leave behind. A cursory
reading of the two statements about happiness above or a single-item measure of positive
emotion could tell you that the above speakers range from least to most happy, respectively.
However, much of word count programs incremental utility comes from the information that
questionnaires and content coding often misses: Programs like LIWC are able to tell us that the
speakers in the examples above, despite their differences, are focusing on happiness to similar
degrees. Studies have borne out the intuition that it’s often speakers’ focus rather than their
conveyed meaning that matters. For example, early expressive writing studies showed that
people who wrote about past traumas using positive emotion words tended to benefit more than
those who used exclusively negative emotion words (Pennebaker, 1997). This finding was
particularly striking because, due to the traumatic nature of the expressive writing topics, uses of
positive emotion words were frequently negated (e.g., “she’ll never forgive me,” “I’m not a good
Measurement and Psychometrics of Natural Language Use
Before serious and widespread work could begin on the links between natural language
use and personality, two questions needed to be addressed: First, how can we measure (i.e.,
analyze) natural language use? And, second, does natural language use fulfill the basic
psychometric requirements for a personality or individual difference variable (i.e., is it
moderately stable over time and across context)?
Before computerized text analysis, language analysis in both psychology and linguistics
was by necessity primarily qualitative. Linguists would often subject single utterances or
exchanges to intense scrutiny (see Tanenhaus & Trueswell, 1995). Others would draw inferences
about human behavior in general based on everyday observations of speech patterns of social
relationships (e.g., Lakoff, 1975). Psychologists who sought psychometrically tractable linguistic
data tended to use content coding methods, in which trained judges rated texts according to a set
of criteria that typically focused on writers’ or speakers’ use of content themes, such as anxiety,
hope, and health or sickness (Gottschalk & Gleser, 1969). However, even the most reliable and
statistically sound of these methods were slow, labor-intensive, and subject to human error.
A major benefit of word count approaches is that their reliability is never compromised
by subjective biases or experimenter error. Computer programs output the same results
regardless of the mental state of the experimenter using them, and they will always find exactly
how many times a certain word or group of words occurs in a given text regardless of how easily
overlooked those words might be (e.g., the notoriously invisible of1). Indeed, the context-
blindness of word count programs is beneficial in this respect. Whereas a human coder might be
bogged down by shades of meaning or biased assumptions about a speaker or writer,
computerized text analysis programs focus single-mindedly on what language alone can tell us.
Once programs were available that could rapidly and reliably identify language patterns
in large samples of texts, it was possible to establish the basic psychometrics of natural language
use. One of the first investigations into the reliability of language use was conducted by Gleser
and colleagues (1959). In that study, participants told a personal story in monologue for about 5
minutes. Transcripts were divided into two equal halves and coded for several linguistic and
psychological language categories, such as adjectives and feelings. The average correlation
between the two halves of these stories across all categories was moderate-to-high,
approximately r = .50. In other words, as anyone who has either written or read a story could tell
you, there is some variation in stories over time. Authors use different words when laying out the
setting of a story than when describing the climax (Vine & Pennebaker, 2009). Yet despite these
obvious differences, the linguistic fingerprint of an author or speaker tends to remain visible over
the course of a narrative. Similarly, individuals’ spoken language use tends to remain consistent
during hour-long life history interviews, with consistency coefficients (Cronbach’s alpha)
ranging from .41 to .64 for several stylistic and content language categories, despite stark
differences in interviewer questions between the first and second halves (Fast & Funder, 2008).
Several studies have now demonstrated that natural language use evidences substantial
consistency not only over the course of a narrative, but over time as well. Schnurr et al. (1986)
asked medical students to describe their experiences coming to medical school in two unscripted
monologues spaced one week apart. They found that language use, as analyzed with the General
Inquirer (Stone et al., 1966), was highly reliable over time. Across 83 content categories
measured in that study, including references to people, work, affect, and evaluations, the average
correlation between the two monologues was .78. Later, around the time that computerized text
analysis approaches were beginning to gain mainstream momentum, Pennebaker and King
(1999) further tested the limits of linguistic stability by comparing individuals’ language use
over longer stretches of time, lasting up to several years, and across diverse contexts, ranging
from scientific articles to students’ stream-of-consciousness essays. They once again found,
using word frequencies calculated by LIWC, that people maintain good linguistic consistency in
most language categories – often despite predictable situational fluctuations in language use.
The same consistency across time and place has been found for naturalistic, everyday
spoken language as well. In one of the first studies using the Electronically Activated Recorder
(EAR; Mehl, Pennebaker, Crow, Dabbs, & Price, 2001) methodology, Mehl and Pennebaker
(2003) recorded college students in their natural environments over the course of two 2-day
periods spaced 4 weeks apart. The EAR sampled 30 seconds of ambient sounds roughly every 12
minutes. All captured talking was transcribed and coded for participants’ location, activity, and
mode of conversation (i.e., telephone or face-to-face). With few exceptions, both linguistic and
psychological categories were substantially correlated across time, activity, and interaction
mode. Interestingly, the authors also found that function word categories, including grammatical
parts of speech such as pronouns and articles, were more consistent than were content-based
psychological categories (average function word r = .41; average psychological processes r =
Taken together, this past research establishes that people’s natural language use is
characterized by a good degree of temporal stability and cross-situational consistency. Therefore,
the ways in which people spontaneously use language – for example, idiosyncrasies in word
choices or speaking styles satisfy psychometric requirements for personality or individual
difference variables.
Language Style versus Language Content
Early analyses of personality and language tended to base their conclusions on transcripts
that had been coded for content words and phrases (e.g., Smith, 1992). Indeed, in computational
linguistics, the function words that make up language style continue to be referred to as “stop
words” because they are usually ignored during automated language processing. However,
individual differences in language style are often more psychologically telling and
psychometrically parsimonious than are differences in language content. Language style is
defined by a person’s use of function words, including pronouns, articles, conjunctions, and
several other categories that make up the grammatical structure of utterances (Table 1). Function
words tend to be short, frequently used, and have little meaning outside the context of a
conversation. In part because of these characteristics, they are processed fluently and largely
automatically during both language production and comprehension (Bell, Brenier, Gregory,
Girand, & Jurafsky, 2009; Levelt, 1989). Language content, on the other hand, is defined by a
person’s use of nouns, verbs, adjectives, and most adverbs. In short, content words determine
what a person says, and function words determine how they say it.
Table 1. Function word categories and examples.
LIWC Label
First-person singular pronouns
I, me, my
Third-person singular pronouns
she, his, her
Second-person singular pronouns
you, y’all, yours
Third-person singular pronouns
we, our, us
Third-person plural pronouns
they, their, them
Impersonal pronouns
it, those, there
a, an, the
and, but, because
in, under, about
Auxiliary verbs
shall, be, was
High frequency adverbs
quite, highly, very
no, not, never
much, few, lots
Note. Only basic-level and not superordinate function word categories
were included. Categories are from LIWC2007 (Pennebaker et al., 2007).
To a greater degree than function words, content words are practically constrained by the
topic or context of conversation. For example, group members assigned to solve math problems
together will uniformly use content words related to that task (e.g., calculate, solution) regardless
of whether they are each individually thinking about the problems in different ways. Function
words, on the other hand, are more loosely constrained by the topic of a conversation, allowing
people to discuss the same content in different styles. The versatility of function words allows
researchers to measure differences in language style across contexts rapidly and objectively.
Despite some degree of natural verbal and nonverbal convergence between individuals during
social interaction (Chartrand & van Baaren, 2009; Ireland & Pennebaker, 2010; Pickering &
Garrod, 2004), function words used during conversation reliably reflect differences in social
status, honesty, and leadership styles (Hancock, Curry, Goorha, & Woodworth, 2008;
Pennebaker, 2011; Slatcher, Chung, Pennebaker, & Stone, 2007; see Tausczik & Pennebaker,
2010 for a review).
A second reason that researchers might focus on language style instead of or in addition
to content is that function words tend to more directly reflect social cognition during
conversation. The relationship between function word use and social cognition is primarily a
practical matter: Because function words have little meaning outside of the context of a sentence,
they require common ground or shared social knowledge to be interpreted (Pennebaker et al.,
2003). For example, to understand the sentence He shut the dog in there, the speaker must know
that the listener shares his knowledge of the man, the dog, and the location in question. This
mutual understanding of a situation, its potential referents, and each conversation partner’s
knowledge of the situation is known as common ground and theoretically forms the foundation
of any successful conversation (Clark & Brennan, 1991). Given that interest in and attention to
others’ thoughts and feelings is an integral aspect of personality (e.g., particularly Big Five
extraversion and agreeableness), the ability to automatically extract individuals’ social cognitive
styles from their language use could be a valuable addition to personality researchers’ toolboxes.
A final reason for paying attention to style in addition to content is purely psychometric.
Individual differences in language observed in early language research focused primarily on
phrases that include both language content and style. To use women as an example, female
speakers tend to use more uncertainty phrases, such as I wonder if, and extra-polite phrases, such
as would you mind, than male speakers do (Holmes, 1995; Lakoff, 1975; Poole, 1979; Rubin &
Green, 1992). However, many of these phrases can be measured more simply by counting the
function words that they commonly include – specifically, in the previous examples, first-person
singular pronouns (e.g., I) and auxiliary verbs (e.g., would). Indeed, an analysis of formal writing
in the British National Corpus found that function words offered the most efficient way to
classify texts as authored by men or women (Koppel, Argamon, and Shimoni, 2003). Similarly, a
corpus analysis of spoken and written language collected in 70 studies revealed that function
words more reliably discriminated between male and female participants than did content words
(Newman, Groom, Handelman, & Pennebaker, 2008). In personality research specifically, direct
comparisons are relatively sparse. However, style appears to provide the best classification
accuracy for neuroticism, providing gains over content alone and even over content and style
combined (Argamon, Koppel, Pennebaker, & Schler, 2009). Whether similar effects will be
found for other personality traits and individual differences has yet to be conclusively
In the end, whether language content or style is a more reliable indicator of personality
and individual differences may depend to a large degree on what personality measures are used
to establish criterion validity. Although individual differences such as age and sex appear to be
more strongly linked to language style, personality traits as measured by Big Five scales are
often more consistently and strongly linked with language content – including both language
categories and individual words – than language style. This may be because research exploring
the link between language use and the Big Five has overwhelmingly used personality self-reports
as the gold standard of true personality, whereas demographic variables can be more objectively
The pattern of personality self-reports matching language content and demographic
individual differences matching language style may be due to a match between the automaticity
of the behavior and the measurement (Eastwick, Eagly, Finkel, & Johnson, 2011). For example,
a neurotic person who realizes that neuroticism is socially undesirable may – deliberately or not
project a cheerful exterior by using positive emotion words and by downplaying his anxiety in
self-report questionnaires. However, less accessible components of language use, such as
increased use of self-references like I and me, may correlate with less accessible behavioral
indicators of worry such as compulsively checking the status of a loved one’s flight or spending
extensive time on WebMD. Given the abundance of online language use – including e-mails,
blogs, and online chats, which are often archived by default (see Baddeley, 2011) -- and the fact
that many nonverbal behaviors can be accessed simply by downloading browser histories, future
personality research may be able to incorporate naturalistic measures of individuals’ online
behavior to triangulate when and where language style and content and behavioral and self-
reported personality converge.
The Big Five Personality Domain
The literature on language and the Big Five is the largest of the subareas within the study
of personality and natural language. To accommodate its size, the sections below first summarize
the samples commonly used in this research and next address major findings for each Big Five
dimension individually.
Language samples. The kind of naturalistic language that has perhaps most frequently
been subjected to computerized text analysis and linked with the Big Five is online or
computerized language use. In the roughly 20 years since the internet was made accessible to the
general public, language has become the most accessible naturalistic behavior available to
behavioral scientists. As the sections below will explore as well, everyday verbal behavior is
carried out online and often automatically saved in blogs, social networking sites, e-mail
accounts, online chats, and text messages. More formal texts abound as well, including a huge
range of academic submissions, ranging from admissions essays to published scholarly work, not
to mention nearly a fifth of the fictional novels, poetry collections, and nonfictional books
published in recorded history (see Michel et al., 2010).
Considering that this goldmine of information is often free and accessible to anyone with
the necessary web programming or copying-and-pasting skills, it is surprising that only a few
studies linking the Big Five and the kind of quasi-naturalistic language use that occurs in these
formats have been conducted. The studies that have been conducted show great promise,
however, for both understanding naturalistic manifestations of personality and for the
longstanding goal of automatically building personality profiles based on behavioral data (Dodds
& Dansforth, 2009; Mairesse & Walker, 2011; Mairesse, Walker, Mehl, & Moore, 2007).
A few studies have gone to the effort of collecting spoken language as it occurs in real
life. These studies were made possible by the advent of the EAR, or Electronically Activated
Recorder, about a decade ago (Mehl et al., 2001). The EAR is a programmable audio recorder
that periodically records snippets of ambient sounds (e.g., 30 seconds every 12 minutes). When
the EAR records, it captures any surrounding noise – including language used by subjects in their
daily interactions with their social networks. Later, trained transcribers listen to the recordings,
type the language they hear and typically also coding for basic features of subjects’ momentary
social environments (e.g., location, activity).
Within studies that have looked at laboratory writing or dialog tasks, language use largely
falls into two categories: tasks with face-valid relevance to personality, such as asking people to
talk about events that were important in shaping their identity, and those that attempt a more
circumspect route, such as asking students to describe an object (e.g., a water bottle; Pennebaker,
2011). Not surprisingly, considering that the criterion for personality is nearly always responses
to face-valid self-report scales, language used in the former tasks tends to correlate more strongly
with personality dimensions. For example, although Pennebaker & King found only a small
number of modest significant correlations (rs < .20) between self-reported Big Five traits and
language used in stream-of-consciousness writing and essays about coming to college, Hirsh and
Peterson (2008) and Fast and Funder (2008) found a large number of moderate correlations (rs =
.20-.40) between self-reported personality and language used in separate tasks that asked
participants to describe their life stories.
Extraversion. Among the Big Five, extraversion tends to leave some of the strongest and
most predictable traces in individuals’ online and physical environments (Gosling, 2008) and, to
a slightly lesser degree, language (Mairesse & Walker, 2006). People who are rated by others
and themselves as higher in extraversion use less inhibited (e.g., careful, avoid), tentative (e.g.,
doubt,,maybe), and self-focused language, more positive emotion words (e.g., adorable, nice),
are more talkative, and talk more about social topics, such as friends, people, communication,
and leisure activities (Augustine, Mehl & Larsen, 2011; Dewaele & Furnham, 1999; Fast &
Funder, 2008; Mehl et al., 2006; Mairesse & Walker, 2006; Oberlander & Gill, 2006; Qiu, Lin,
Ramsay, & Yang, 2012; Walker et al., 2007; Yarkoni, 2010). Word- and phrase-level analyses of
large corpora made up of Facebook status updates (Kosinski & Stillwell, 2012) and blog entries
(Yarkoni, 2010) have found that party, bar, can’t wait are among the best indicators of high
extraversion, and internet, computer, and cats are good indicators of low extraversion.
In terms of language style, extraverts use more immediate first-person plural and second-
person pronouns such as we and you (i.e., pronouns used primarily to talk with rather than about
a person; Dewaele & Furnham, 2000; Holtgraves, 2011; Yarkoni, 2010). Although extraverts do
not appear to use first-person singular pronouns at different rates than their introverted
counterparts overall (Mehl et al., 2006; Pennebaker & King, 1999; Yarkoni, 2010), some
evidence suggests that the link between extraversion and I is moderated by the words that first-
person singular co-occurs with. A study that counted pairs of co-occurring words rather than
individual words found that extraverts used some I-phrases more frequently and others less
frequently than introverts do, leading to null first-person singular correlations overall. For
example, Gill and Oberlander (2002) found that people asked to write e-mails to a close friend
used a greater variety of I-phrases and more bigrams containing negations (I don’t, I’m not) to
the degree that they reported being relatively introverted. In the same study, those higher in
extraversion limited themselves to a small number of less negative and perhaps implicitly more
social first-person phrases such as I’ll be and I was.
Agreeableness. A clear and intuitive indicator of agreeableness is linguistic positivity.
Agreeable people use more positive language, including both verbs (e.g., laughing) and
modifiers (e.g., lovely), and fewer negative emotion words (e.g., damn, jerk) in everyday speech
and writing (Augustine et al., 2011; Holtgraves, 2011; Küfner, Back, Nestler, & Egloff, 2010;
Yarkoni, 2010). They have also been found to use more social words (Küfner et al., 2010) and
self-references (Mehl et al., 2006), which, in the context of generally cheerful language, may
suggest polite hedge phrases such as I think rather than the ruminative self-references that
characterize negative affective traits (see Depression, below). First-person singular also signifies
lower social status, relative to other conversation partners (Kacewicz, Pennebaker, Davis, Jeon &
Graesser, in press) further suggesting that I-words used by agreeable individuals reflect polite
self-effacement rather than neurotic self-consciousness (see Holtgraves, 2010, for a review of
polite language use). Consistent with this interpretation, one of the best classifiers of high
agreeableness in Facebook status updates is the phrase thank you (Schwartz et al., 2013).
A facet-level analysis of agreeableness demonstrated that it may be one of the most
cohesive five-factor dimensions, linguistically and otherwise: Across most or all of its five
facets, people who rank higher in agreeableness talk more about home, family, communication,
and avoid sensitive topics such as death (e.g., coffin, killer; Yarkoni, 2010).
Another face valid linguistic correlate of agreeableness is swearing. Unsurprisingly,
people swear less to the degree that they report being more agreeable (Holtgraves, 2010; Mehl et
al., 2006; Yarkoni, 2010; see Jay, 2009). Indeed, the five words that best discriminate between
individuals ranking high and low in agreeableness in Facebook status updates are all swear
words (Schwartz et al., 2013). The negative correlation between agreeableness and swearing fits
with lay theories of personality as well and is correctly interpreted by outside observers of
students’ EAR recordings (Mehl et al., 2006). Given the low overall incidence of swearing
making up about 1/3 of a percent of spoken conversation and 1/10 of a percent of emotional
writing these results essentially mean that highly agreeable people are unlikely to swear even
once in a given sample, whereas a highly disagreeable person might swear only a few times.
Nevertheless, swearing, like negative emotion words, another low-frequency category, is a
potent and reliable indicator of agreeableness and other key psychological variables (e.g.,
Robbins, Focella, Kasle, Weihs, Lopez, & Mehl, 2011).
Neuroticism. Low emotional stability, or neuroticism, is characterized by negative
thoughts, anxiety, and ruminative self-focus (Teasdale & Green, 2004). The language use of
individuals who rate themselves as high in neuroticism reflects these characteristics in higher
rates of I-words, negations, and references to sadness, anger, and anxiety in written life stories,
stream of consciousness essays (in both Korean and English), blogs, and text messages
(Argamon et al., 2009; Holtgraves, 2011; Hirsh & Peterson, 2008; Lee et al., 2007; Mairesse et
al., 2007; Pennebaker & King, 1999; Qiu et al., 2012; Yarkoni, 2010). In Facebook status
updates, the words basketball and success were the best classifiers of emotionally stable
individuals, whereas the single best classifier of high neuroticism was the swear word fucking,
closely followed by variants of depression and the phrases I hate and sick of (Schwartz et al.,
In contrast with the majority of research cited above, everyday spoken language shows
very few signs of neuroticism (Mehl et al., 2006). The reason behind the discrepancy between
these sets of findings may lie in the extent to which individuals feel pressure to behave in
socially desirable or appropriate ways (Mehl & Holleran, 2008). Like depression, neuroticism is
known to be a socially undesirable trait, and the expression of negative emotion is typically
considered a private event. As such, individuals who are aware that they tend to be dysphoric and
anxious may mask their negative emotions from many of the individuals they interact with
directly in their daily lives. Thus, like depression, disclosure of negative emotions may turn out
to be moderated by closeness with one’s interaction partners (Baddeley, 2011; see Depression).
In anonymous experimental writing, during which showing obvious signs of neuroticism could
not possibly matter, or text messages and Facebook statuses, which are likely to be read by
friends rather than casual acquaintances or passersby, neurotic individuals may feel free to
express their chronic negativity (Holtgraves, 2011; Schwartz et al., 2013). Consistent with this
interpretation, Mehl and Holleran (2008) found that private but not public negative emotion word
usage reflects increased neuroticism. As a result of this moderation, when language is aggregated
across conversation contexts, ranging from public conversations with professors to private
conversations with relationship partners, the typical markers of neuroticism may be critically
attenuated, resulting in surprisingly modest effect sizes for intuitively valid markers of
neuroticism, such as anxiety words (Pennebaker & King, 1999).
Openness. The trait of openness is an interesting case that strongly recommends the
inclusion of facet level traits in research on language and personality. First, openness is relatively
difficult to capture linguistically. Analyses of naturalistic spoken and text message conversations
have produced a number of null findings and a handful of significant correlations that fail to
paint a cohesive or theory-consistent picture (Holtgraves, 2011; Mehl et al., 2006).
When meaningful patterns do emerge, the linguistic indicators of openness seem to
reflect only its intellectual aspects, ignoring facets related to artistic expression and emotionality.
Bloggers whose self-reports indicated a higher degree of openness used more articles and
prepositions and fewer personal pronouns and references to family and home (Yarkoni, 2010).
On Facebook, highly open individuals are distinguished by more frequent uses of the words
universe, writing, and music (Schwartz et al., 2013). In life history interviews, people who
ranked higher in openness in general and intellectualism specifically used more articles as well
(Fast & Funder, 2008), and in a Korean stream-of-consciousness writing sample, people who
were higher in openness produced more sentences and talked about sleeping and resting less
(Lee, Kim, Seo, & Chung, 2008). In other words, at least when talking about themselves or their
interests at length, people who are high in openness seemed to adopt a formal rather than
narrative writing style. This distinction is captured by Biber’s (1995) involved-informative
dimension of language use, which describes a high rate of verbs and pronouns at the involved
end of the spectrum and nouns (implied by the use of articles) and prepositions at the informative
end. Informative language tends to be more characteristic of male language use, scripted speech,
formal writing, and, in the last US presidential elections, liberal political candidates (Koppel et
al., 2003; Pennebaker, 2011).
Intellectualism and liberalism may be salient and central characteristics of openness, but
they are hardly necessary or sufficient for a person to rank relatively highly on a broad five-
factor measure of openness. The other facets of openness concern interest in art, emotionality,
adventurousness, and imagination. In the only study that examined the relationship between
language and the finer-grained facets of the Big Five, Yarkoni (2010) found that several of the
language categories that predicted emotionality and artistic interest were at odds with those that
predicted the remaining facets. People who are emotionally and artistically open use more
personal pronouns, references to physical states, positive emotions, and words related to leisure
or rest; liberal, intellectual, imaginative, and adventurous bloggers used fewer of each category.
In studies that reported language correlates of only the primary Big Five dimensions, this
moderation by facet is likely to have led to null results for categories that show full crossover
effects between facets and weakened results for categories that only occur for one or a few
facets. For example, in the Yarkoni (2010) sample, the modest negative correlation between
references to home and total openness scores was diluted by the finding that references to home
are unrelated to emotionality and artistic interests.
Conscientiousness. Like agreeableness, conscientiousness appears to manifest in
everyday language use as polite speech. In blogs as well as students’ EAR recordings and
stream-of-consciousness essays, conscientious individuals swear less and use fewer negative
emotion words regardless of their sex (Lee et al., 2007; Mairesse & Walker, 2006; Mehl at al.,
2006; Pennebaker & King, 1999; Yarkoni, 2010). At the level of individual words as well,
conscientiousness is best defined by the words that people high on the trait do not use. For the
facets of achievement striving and self-discipline in particular, almost every one of the top 10
strongest single-word predictors were negatively correlated (e.g., protest, boring), excepting only
a few positively correlated words, including ready and HR[human resources] (Yarkoni, 2010).
Schwartz and colleagues’ (2013) word- and phrase-level Facebook analysis corroborates these
findings: Although highly conscientious people use the phrases ready for, to work, and great day
more than less conscientious people do, the strongest correlates of conscientiousness were
negative, including several swear words as well as the words YouTube, and bored. The high
number of negative correlations suggests that, perhaps especially for those high in achievement
striving and self-discipline facets of conscientiousness, increased conscientiousness is associated
with regulation or inhibition of those words that serve as indicators of individuality for other
traits. That behavior pattern would be consistent with findings that people who rank higher in
conscientiousness tend to value conformity more than less conscientious individuals do (Roccas,
Sagiz, Schwartz, & Knafo, 2002).
Surprisingly, women used second-person pronouns, or you, much less to the extent that
they were conscientious, whereas the correlation was the opposite for men – a pattern that
observers listening to participants’ daily recordings accurately (although modestly) observed and
used in their personality ratings (Mehl et al., 2006). Like many pronouns, you is a versatile word.
However, it most notably predicts hostility, and is associated with both state and trait anger
(Simmons, Chambless, & Gordon, 2005; Weintraub, 1981). It may be the case that, in the
context of daily social interactions with friends and strangers, conscientious people strive to
follow societal gender norms that define men as assertive (more you) and women as
nonconfrontational (less you).
Moderation by sex. Sex2 appears to significantly moderate the strength of behavioral
signals of personality in some cases and reverse them in others. Moderation appears to be
particularly likely in cases where word categories have relatively low base rates and are taboo or
socially sensitive (characteristics that are obviously not unrelated). In an interesting study that
was designed to capture correlations between Big Five traits and swearing, a taboo linguistic
dimension that is especially elusive in more controlled written language use, Fast and Funder
(2008) had participants complete a life history interview as well as a spontaneous face-to-face
conversation with two strangers also in the experiment. When acquaintances’ ratings of
participants’ personality (i.e., informant reports) were compared with swearing across these two
natural dialog tasks, men but not women who swore were rated by informants as more
extraverted. (In that study, self-report personality measures were not collected.) Mehl et al.
(2006) found a similar moderation by sex: Untrained judges who listened to EAR recordings of
students’ daily lives rated men but not women as more extraverted when they swore. In that
study, self-reported extraversion was unrelated to swearing for either sex. In other words, in
contrast with individuals’ self-ratings, judges tend to interpret swearing as a sign of aggression or
assertiveness which, for men but not women, might be used as a heuristic for extraversion.
Indeed, in Mehl et al (2006) men who argued more during their EAR recordings were also seen
by judges – but not by themselves – as more extraverted.
As mentioned above, the display of content-based indicators of neuroticism, such as
negative emotion words, is likely moderated by whether language use is public and identifiable
or private and anonymous. Consistent with the trait’s characteristic negative emotionality, worry,
and tendency to over-report health symptoms (Watson & Pennebaker, 1989), women who are
more neurotic use more negative emotion words and references to health in naturalistic text
messages (Holtgraves, 2011). However, men who are more neurotic not only fail to use more
negative emotion words, but they also fail to use any language category that was examined more
or less than those who are more emotionally stable. Mehl et al.’s (2006) EAR recordings showed
a similar effect: whereas for women, neuroticism was reflected in decreased verbosity and
laughter and increased arguing, men’s neuroticism was reflected in more socializing and time
spent outdoors. These results may be due to gender differences in emotional display rules. It may
be that the traditional social sanction against male expressions of strong emotion causes men to
regulate their emotional language (even in private text messages) and, as an alternative, cope
with the negative emotions of neuroticism through social and physical activity to a greater degree
than women do.
Trait Emotionality
Presumably due to the strong presence of the Big Five approach in personality research,
trait positivity and negativity has received relatively little attention in language research. Yet,
there are a few studies that suggest both face-valid and non-obvious relationships between trait
emotionality and natural language. Despite the apparent importance of the trait, however, those
studies that have investigated language and trait affect have not examined its relationship with
and stylistic aspects of natural language such as pronoun usage. Rather the aim tends to be on
improving the ability to unobtrusively gauge individuals’ affect through automated text analysis
(e.g., Cohen, Minor, Najolia, & Hong, 2009; Dodds et al., 2010).
Trait negativity. Unlike more specific measures of depression or neuroticism, general
psychological distress appears to be strongly related to more references to negative emotions and
fewer references to positive emotions. This is what Cohen (2011) found in a sample of students
who were asked to speak in monologue about a recent disagreement. In this study, the author
used custom positive and negative emotion dictionaries that excluded positive emotion words
with more common non-emotional meanings such as like and pretty. Psychological distress in
that sample was unrelated to positive or negative emotion word usage for LIWC, although
LIWC’s negative emotion category was positively correlated with depression symptoms in the
same sample (Cohen, 2011). Part of the success of this study in accurately tracking negative
affect in dialogue may be due to the nature of the task, which likely afforded greater use of
emotional language than more neutral monologues. The study also suggests, however, that
LIWC, the most widely used text analysis program in personality research, has room to improve
its emotion categories (O’Carroll Bantum & Owen, 2009).
Trait anger. Walter Weintraub (1981), a pioneer in the study of personality and
language, found in a case study of a young man with an explosively angry temperament, that
angry speech is characterized by a high rate of second-person pronouns (e.g., you, you’re). A
follow-up analysis of simulated anger in spontaneous monologues produced by two trained
actors replicated the second-person singular finding and further found that angry speech
contained fewer uses of we, more uses of me, more swear words, and more negations (e.g., no,
not). (Coming before the computer revolution, Weintraub and collaborators counted pronouns by
hand.) The parallels between the chronically angry young man and the actors asked to simulate
anger suggest that these linguistic characteristics reflect both state and trait anger. Despite the
small sample sizes used in Weintraub’s original studies, his results presaged later findings
regarding the roles of decreased we and increased you and me as indicators of negativity during
marital disputes (Simmons et al., 2005) and hostility during interactions between family
members (Simmons, Chambless, & Gordon, 2008).
Trait positivity. Positive emotionality or trait positive affect is another broad and
important aspect of personality. Beyond the fact that being happy is valuable for its own sake,
happier people live longer, a fact that is reflected in language use. In a sample of 180
autobiographical sketches written by Catholic nuns when they first entered their convent, women
who used more positive language had lower mortality risk between the ages of 75 and 95
(Danner, Snowdon, & Friesen, 2002). However, Danner et al. (2002) did not measure trait
positive affectivity, optimism, or other enduring personality dimensions outside of nuns’ positive
emotion references. As such, it is unclear whether their results were, as the longitudinal results
strongly suggest, due to trait rather than state positivity. If we assume that positive language
tapped some chronic underlying trait in that study, it is not clear which trait it specifically
More recent research sheds some light on what trait or traits positive references in Danner
et al. (2006) may have captured. In unstructured experimental language use, trait positive affect
is, as expected, positively correlated with positive emotion word usage. For example, a sample of
students who were instructed to talk aloud about any topic that they chose for 3 minutes used
more positive emotion words to the degree that they scored higher on measures of trait positive
affect and behavioral approach system sensitivity, and used more negative words to the degree
that they ranked higher in trait negative affect and behavioral inhibition system sensitivity
(Cohen, Minor, Bailie, & Dahir, 2008).
Psychopathology-Related Personality Traits
Type A. Some of the earliest research on language use and personality interestingly came
out of behavioral medicine. In a series of studies, Scherwitz and colleagues (for reviews see
Scherwitz & Canick, 1988; Scherwitz, Graham, & Ornish, 1985) linked self-involvement to the
Type A behavior pattern and Coronary Heart Disease (CHD). Self-involvement was
operationalized as the frequency with which participants used first-person singular pronouns
during the structured Type A interview. Type A emerged as positively related to the use of first-
person singular. Further (and going beyond personality), use of first-person singular pronouns
was related to blood pressure, coronary atherosclerosis, and prospectively to CHD incidence and
mortality. Interestingly, the relationship between first-person singular use and CHD outcome
often remained significant even after statistically controlling for the Type A behavior pattern
(Scherwitz et al., 1985; Scherwitz & Canick, 1988). Despite these promising early findings, links
between pronoun use, Type A, and heart disease have not pursued in recent years (see
Rohrbaugh, Mehl, Shoham, Reilly, & Ewy, 2008).
Depression. One theory of depression suggests that emotional pain – like physical pain
forces people to pay attention to themselves (Pyszczynski & Greenberg, 1987). High degrees of
self-focus can be seen in people’s inattention to others and preoccupation with the self. Other
work on self-focus suggests that states of self-awareness are linked to elevated use of first-person
singular pronouns – especially the use of I-words such as I, I’m, I’ll (e.g., Davis & Brock, 1975).
Depressive states have been linked to higher use of I-words across several genres. In
some of the first systematic word count studies, Walter Weintraub (1981) found that stream of
consciousness writing produced much higher use of I-words in depressed patients than patients
dealing with other medical disorders. More recently, Rude, Gortner and Pennebaker (2004)
found that college students diagnosed with depression used more I-words than non-depressed
students when writing essays about their college experiences. Also, in natural speech captured
over several days of tape recordings, use of first-person singular is more frequent among those
with high depression scores than those with low depression scores (Mehl, 2006b).
The link between depression and increased self-focus is found not only in everyday
language use but in published writing as well. Being a published poet is a surprisingly dangerous
job. According to Kay Jamison (1995), published poets are up to 18 times as likely as the general
population to commit suicide. Her research suggests that poets have an extraordinarily high rate
of bipolar disorder which is known to be closely linked to suicidal behaviors. Motivated by this
observation, Stirman and Pennebaker (2001) analyzed the published work of 18 poets – 9 who
committed suicide, and 9 yoked controls. They found that those who eventually committed
suicide used first-person singular pronouns at higher rates than those who did not (Stirman &
Pennebaker, 2001). Ironically, suicidal poets did not use more negative emotion words than other
poets. Overall, suicidal poets’ language use showed that they were more self-focused and less
socially integrated than non-suicidal poets.
Surprisingly, depression diagnosis and sub-clinical symptomatology are only positively
correlated with references to negative emotions in certain contexts. For example, in naturalistic
e-mails, talking, and blog posts, clinically depressed people and those with higher subclinical
depression scores use as many or more references to positive feelings overall as never-depressed
and less depressed individuals do (Baddeley, 2011; Rodriguez, Holleran, & Mehl, 2010). In a
sample of college students with subclinical depression symptomatology, Mehl (2006) found no
correlation between depression symptom severity and use of positive emotion words, sadness,
and anger words in everyday spoken language. It is clear, however, that people suffering from
depression experience more negative emotions than do unaffected people, and that those who are
vulnerable to or currently experiencing depression preferentially focus on negative over positive
stimuli (Bistricky, Ingram, & Atchley, 2008; Teasdale, 1988). Null results for negative emotion
word use and depression symptoms therefore suggest that depressed individuals inhibit the most
obvious markers of depression – negativity, sadness, irritability by regulating their emotional
language, perhaps in an attempt to avoid the negative social consequences of depression.
Depressed individuals may mask negative emotions in order to maintain their social
network. Jenna Baddeley and colleagues have conducted two studies of naturalistic language use
among individuals with major depressive disorder (MDD) that may shed light on where and to
whom negative emotions are expressed during depression. In an EAR study that recorded the
daily lives of individuals from the community with and without MDD over 3-4 days, she found
that depressed individuals laugh, socialize, and talk as much as non-depressed people (Baddeley,
Pennebaker, & Beevers, 2012). In contrast with other EAR findings, those participants with
MDD did use negative emotions more frequently than never-depressed participants; however,
their negative emotion word use was moderated by interaction partner, such that they primarily
expressed negativity only with close friends and relationship partners.
A similar pattern emerged in real-life e-mails. In a recent study, Baddeley (2011)
downloaded e-mails sent and received by individuals with and without MDD over the course of
1 year. For MDD participants, that year included a depressive episode and a period of remission
that each lasted at least 1 month. She found that depressed individuals overall used more positive
language than did matched controls, and that when depressed individuals used negative emotion
words they did so primarily with correspondents they rated as being close to them.
Depressed individuals’ tendency to hide their depression from all but their closest friends
may be an effective method of coping with the negative social consequences of depression, such
as being avoided by non-depressed friends (Schaefer, Kornienko, & Fox, 2011). Content words,
such as the negative emotion terms awful and cry, and their meaning are more salient during
language processing than are function words like I and me; as a result, they are easier to inhibit
and control (Schmauder, Morris, & Poynor, 2000; Townsend & Saltz, 1972). Therefore, when
depressed individuals attempt to appear normal, content markers of depression decrease and
function word markers of depression remain the same, results suggest.
The Dark Triad. Another area of psychopathology-related individual differences that
language researchers are increasingly interested in is the Dark Triad (Paulhus & Williams, 2002).
The Dark Triad is comprised of a group of socially antagonistic personality traits:
Machiavellianism (a tendency to strategically deceive and manipulate others), narcissism (a
tendency to hold inflated, grandiose self-views), and psychopathy (a tendency to show disregard
and a lack of empathy and remorse towards others). Similar to research on subclinical
depression, the Dark Triad personalities are typically considered sub-clinical, non-pathological
versions of “full blown” clinical personality disorders. Machiavellianism, narcissism, and
psychopathy share a “dark” socially malevolent core but also have unique psychological aspects
and correlates (Paulhus & Williams, 2002). Among the dark triad personalities, subclinical
narcissism has by far received the most scientific attention (Campbell & Miller, 2011).
The idea that narcissism is characterized by a frequent use of first-person singular
pronouns was straightforward given narcissists’ pervasive self-focus in social interactions. This
hypothesis was therefore also the first to be subjected to empirical tests. Raskin and Shaw (1988)
found that subclinical narcissism correlated positively with the use of first-person singular and
negatively with the use of first-person plural in tape-recorded impromptu monologues. Such a
bias towards linguistic egocentrism (Weintraub, 1981) appeared theoretically meaningful and
suggested that narcissism is distinctly manifested in people’s natural use of self-references.
Building on Raskin and Shaw’s (1988) landmark study, other researchers have taken
first-person singular use as a face-valid linguistic indicator of self-focus. DeWall, Buffardi,
Bonser, and Campbell (2011) found that narcissistic Facebook users who did not draw attention
to themselves in their profile pictures (e.g., by posting a sexy, revealing photo) used first-person
singular self-references at a high rate in their profile self-descriptions. Similarly, narcissistic
participants who did not use antisocial language (i.e., swear or anger words) in personality essays
(explaining why they possess the traits of honesty, trustworthiness, and kindness) used first-
person singular self-references at a high rate in their essays. In both studies, presumably, the use
of first-person singular indexed a compensatory verbal attention seeking strategy—although,
interestingly, it did not show a (reported) significant zero-order correlation with narcissism.
Most recently, DeWall, Pond, Campbell and Twenge (2011) tested the notion of a
generational increase in narcissism by tracking the use of self-references in popular U.S. song
lyrics. The authors reported a linear trend pointing to an increase in first-person singular
pronouns between 1980 and 2007. Again, presumably, the use of first-person singular in song-
lyrics indexes (cultural) egocentrism and the data are meant to speak to the debate on whether or
not the U.S. society has become more narcissistic over time (Trzesniewski & Donnellan, 2010).
Several other studies, though, question the simple logic of I-use equals narcissistic self-
focus. Sampling language representatively from all of participants’ daily conversations (using the
EAR method), Holtzman, Vazire, and Mehl (2011) found no systematic association between use
of first-person singular and narcissism (r = .13, p = .25). Also, Fast and Funder (2010) found no
simple correlation between use of first-person singular and narcissism among participants who
underwent a life history interview (r = .02 for women and r = .11, n.s. for men). Among men but
not women, authority – a facet of narcissismwas significantly positively related to I-use.
However, self-sufficiency, yet another facet, showed a significant negative association with I-use
among men. Among women, I-use was generally more strongly related to reported and observed
anxiety and depressive symptoms than it was to any facet of narcissism – a finding that is
consistent with prior research on the link between first-person singular self-references and
depression (Mehl, 2006b; Rude, Gortner, & Pennebaker, 2004; Weintraub, 1981). The scientific
question of whether or when first-person singular use indexes narcissism is thus still open.
Conceptually, it seems important to distinguish between self-focus and self-awareness, with
narcissists presumably ranging high on the former but low on the latter. Future research should
aim at reconciling the discrepant findings by identifying language contexts that particularly
afford the expression of narcissism and neuroticism (or negative affect or depression),
Going beyond the narcissism-I link, Holtzman and colleagues (2010) found that
narcissism in general, and especially its “toxic” components Superiority/Arrogance and
Exploitativeness/Entitlement were correlated with a more frequent use of swear and anger words.
The study further found that narcissists make more sexual references in their daily language use.
Finally, in the only language study on the Dark Triad personality trait of Machiavellianism,
Ickes, Reidhead and Patterson (1986) used text analysis to compare Machiavellianism and self-
monitoring. Following the idea that Machiavellianism is a form of self-oriented, assimilative
impression management, it was positively related to the use of first-person singular (and plural)
pronouns and negatively related to the use of second- and third-person pronouns. Self-
monitoring, on the other hand, as a form of other-oriented, accommodative impression
management, was negatively related to first-person singular use and positively related to second-
person pronouns.
Summary and Future Directions
The aim of this chapter has been to provide a blueprint of research on language and
personality up to this point that depicts both its structural soundness and need for additions and
improvements. In closing, we will provide an overview of the existing studies in form of a
summary table (see Table 2) and outline a few recommendations for future progress.
Table 2. Summary of Linguistic Indicators of Personality
Big Five
second-person pronouns (+), first-
person plural pronouns (+), positive
emotion (+), social (+), leisure (+), sex
(+), inhibition (-), tentativeness (-)
positivity (+), first-person singular
pronouns (+), social (+), home (+),
family (+), communication (+), death (-
), money (-), swearing (-)
articles (+), prepositions (+), personal
pronouns (-), family (-), home (-), rest (-
swearing (-), negative emotion (-)
first-person singular pronouns (+),
negative emotion (+)
Type A
first-person singular pronouns (+)
first-person singular pronouns (+/0),
negative emotion (+), anger (+), swear
(+), sex (+), first-person plural pronouns
first-person singular pronouns (+), first-
person plural pronouns (+), second
person pronouns (-), third-person
pronouns (-)
first-person singular pronouns (+),
negative emotion (+)
Trait Emotionality
second-person singular pronouns (+)
Negative emotions
negative emotion (+)
Positive emotions
positive emotion (+)
Note. See text for references. For the Big Five, only the most common and universal correlates
are listed.
Finding consistent threads among studies is sometimes made difficult by differing
methodologies. Even among studies that used the same text analysis tool, some focused only on
linguistic content rather than all categories, and others used different versions of a program that
include several non-overlapping categories. The literature on language and personality would no
doubt benefit from more comprehensive reporting of effects, in papers or in online supplemental
materials. The existing studies suggest that both content and style categories are critical.
Although content words are more susceptible to self-regulation and thus tend to be lower fidelity
indicators of internal states, the degree to which a person’s language use fails to reflect their self-
or informant-reported personality is often a telling indicator of self-regulatory personality
processes and person x situation interactions (Baddeley, 2011; Baddeley & Pennebaker, 2012;
Mehl et al., 2006; Mehl & Holleran, 2008). Style words are often more challenging to interpret,
but are valuable as the mostly automatic, and therefore more psychologically representative,
indicators of attentional focus and thinking styles (see Tausczik & Pennebaker, 2010). Content
and style are two sides of a data-rich coin, and personality psychology has much to gain from
increasingly considering both aspects of language use.
In order to correctly interpret the nature and true magnitude of effects, studies of
language and personality may also need to increasingly measure and consider a range of
potential moderators or modifiers, including facet-level trait measures (Yarkoni, 2010),
individuals’ sex (Mehl et al., 2006), whether language use is public or private (Mehl & Holleran,
2008), the closeness of conversation partners (Baddeley, 2011) and linguistic co-occurrences
(Gill & Oberlander, 2002). Specifically for function words, which are by definition
extraordinarily versatile, research has shown that moderators matter. For example, whether I or
you is said by a man or a woman and in the context of an angry or cheerful communication can
dramatically influence which psychological processes those words reflect (Fast & Funder, 2010;
Mehl et al., 2006; Tausczik & Pennebaker, 2010).
Context effects, such as the types of communication that a situation affords or demands,
are important considerations in any area of behavioral research. Studies of language use are no
exception. Just as a highly extraverted person would not be expected to behave dramatically
differently than an introverted person in a situation lacking the potential for social interaction,
personality traits that are predominantly defined by differences in social interaction are likely to
leave fewer observable traces in solitary writing such as stream-of-consciousness essays.
Furthermore, writing or speaking tasks that resemble criterion measures of personality (e.g., self-
report personality questionnaires and essays describing one’s personality) are bound to be more
highly correlated than naturalistic measures of language (e.g., Hirsh & Peterson, 2008).
However, perhaps in part due to the influence of corpus linguistics, where language from a wide
range of communication media are frequently compiled into a single dataset comprising billions
of words, studies of linguistic indicators of personality have only recently come to seriously
consider communication context. Given that so many personality dimensions hinge on how
people react to and interact with others, it is particularly important – in studies of natural
language use and beyond – for personality research to increasingly study the links between
naturally occurring dialog, self-reports, and observer reports. As naturalistic language research
expands with ongoing advances in audio recording technology and computer science methods, it
should become easier to understand how linguistic signals are attenuated and warped by
contextual influences such as experimental task, communication medium, and motivation.
The accomplishments of computerized text analysis in the last 15 years have been
extraordinary. However, the software designers, programmers, and data analysts behind this
revolution readily admit that there is room for improvement. Cohen and colleagues’ (2009) and
S. Cohen’s (2011) research on the measurement of trait affect points to a possible need to
improve word-count measurements of common positive emotion words, which are often used in
ways that do not reflect positivity (e.g., I was pretty bored, someone like you), by considering
their linguistic contexts. New discoveries made in function word categories that are new to the
most recent version of LIWC (Pennebaker, Booth, & Francis, 20007) suggest that finer grained
analyses based on words’ grammatical roles have the potential to clarify mixed results in past
research and shed light on the cognitive mechanisms underlying personality dimensions.
Measures of within-text context – and the usability of tools that consider linguistic
context – are bound to improve studies of language and personality as well. A word’s location in
a text or sentence (Vine & Pennebaker, 2009) and its neighboring words (Gill & Oberlander,
2004) clearly matter but are rarely considered in psychological text analyses. Programs such as
Latent Semantic Analysis (Landauer & Dumais, 1997) and WordSmith (Scott, 2008) handle such
variables and, as they become more widely known and user-friendly, stand to greatly enrich
future research.
In this famous monograph on personality, Allport (1937) wrote “language is a
codification of common human experience, and by analyzing it much may be found that reflects
the nature of human personality” (p. 373). Interestingly, the field of personality and language use
only started getting serious momentum more than half a century later. As the research reviewed
in this chapter reveals, though, the field is now rich, vibrant, and has already produced many
important discoveries. We expect that the immense progress in (stationary and mobile)
computing technology and parallel advances in computational linguistics will create a strong
push for the field over the next years and lead to critical improvements in the complexity with
which naturalistic language can be analyzed (Schwartz et al., 2013; Ranganath, Jurafsky, &
McFarland, 2013). It is our sense that the field will thrive to the extent that it uses these
technologically-driven, “bottom-up”, analytic advances and, at the same time, balances them
with innovative theoretical developments and clarifications from “top down”. To achieve this, it
will undoubtedly become necessary for researchers from different fields to “cross-talk”. Social
psychologists, personality psychologists, cognitive psychologists, linguists, communication
scholars, computer scientists and other researchers will need to engage in conversations and
collaborations and thereby transcend (and hopefully reduce) traditional discipline boundaries to
more fully understand how our words reflect our selves.
1At some point, you may have received the following test over e-mail: “How many Fs
does the following passage contain? ‘Finished files are the result of years of scientific study
combined with the experience of years.’” Finding only three Fs tends to result from readers
skipping ofs.
2The term sex is used by default to refer to all differences in personality-language links
between men and women. However, gender may be more appropriate in cases where linguistic
differences seem to be more strongly influenced by gender norms than biology (see Eagly,
Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the
author of an anonymous text. Communications of the ACM, 52, 119.
Augustine, A. A., Mehl, M. R., & Larsen, R. J., (2011). A positivity bias in written and spoken
English, and its moderation by personality and gender. Social Psychology and
Personality Science, 2, 508-515.
Baddeley, J. L., Beevers, C. G., & Pennebaker, J. W. (2012). Everyday social behavior during a
major depressive episode. Manuscript under revision, Social Psychology and Personality
Science. University of Texas at Austin, Austin, TX.
Baddeley, J. L. (2011). Email communications among people with and without major depressive
disorder. Unpublished doctoral dissertation. University of Texas at Austin, Austin, TX.
Baddeley, J. L., & Singer, J. A. (2008). Telling losses: Personality correlates and functions of
bereavement narratives. Journal of Research in Personality, 42, 421-438.
Bell, A., Brenier, J., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on
durations of content and function words in conversational English. Journal of Memory
and Language, 60, 92-111. Elsevier Inc. doi:10.1016/j.jml.2008.06.003
Bistricky, S. L., Ingram, R. E., & Atchley, R. A. (2011). Facial affect processing and depression
susceptibility: Cognitive biases and cognitive neuroscience. Psychological Bulletin, 137,
998-1028. doi:10.1037/a0025348
Brown, P. & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge:
Cambridge University Press.
Burke, P. A., & Dollinger, S. J. (2005). A picture’s worth a thousand words: Language use in
autophotographic essay. Personality and Social Psychology Bulletin, 31, 536-548.
Campbell, W. K., & Miller, J. D. (2011). The Handbook of Narcissism and Narcissistic
Personality Disorder: Theoretical Approaches, Empirical Findings, and Treatments.
Hoboken, NJ: John Wiley & Sons.
Clark, H. H., & Brennan, S. A. (1991). Grounding in communication. In L. B. Resnick, J. M.
Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149).
Washington, DC: APA Books.
Cohen, A. S., Minor, K. S., Baillie, L. E., & Dahir, A. M. (2008). Clarifying the linguistic
signature: Measuring personality from natural speech. Journal of Personality Assessment,
90, 559-563.
Cohen, A. S., Minor, K. S., Najolia, G. M., & Lee Hong, S. (2009). A laboratory-based
procedure for measuring emotional expression from natural speech. Behavior Research
Methods, 41, 204-12. doi:10.3758/BRM.41.1.204
Cohen, S. J. (2011). Measurement of negativity bias in personal narratives using corpus-based
emotion dictionaries. Journal of Psycholinguistic Research,40: 119-135.
Costa, P. T., Jr., & McCrae, R. R. (1992). Normal personality assessment in clinical practice:
The NEO Personality Inventory. Psychological Assessment, 4, 5-13.
Danner, D. D., Snowdon, D. A., & Friesen, W. V. (2002). Positive emotions in early life and
longevity: Findings from the Nun Study. Journal of Personality and Social Psychology,
80, 804-813.
Davis, D., & Brock, T. C. (1975). Use of first-person pronouns as a function of increased
objective self-awareness and performance feedback. Journal of Experimental Social
Psychology, 11, 381-388.
Dewaele, J-M., & Furnham, A. (2000). Personality and speech production: A pilot study of
second language learners. Personality and Individual Differences, 28, 355-365.
DeWall, C. N., Buffardi, L. E., Bonser, I., & Campbell, W. K. (2011). Narcissism and implicit
attention seeking: Evidence from linguistic analyses of social networking and online
presentation. Personality and Individual Differences, 51, 57-62.
DeWall, C. N., Pond, R. S., Campbell, W. K., & Twenge, J. M. (2011). Tuning in to
psychological change: Linguistic markers of self-focus, loneliness, anger, antisocial
behavior, and misery increase over time in popular U.S. song lyrics. Psychology of
Aesthetics, Art, and Creativity.
Dodds, P. S., & Danforth, C. M. (2009). Measuring the happiness of large-scale written
expression: Songs, blogs, and presidents. Journal of Happiness Studies, 11, 441-456.
Eagly, A. (1995). The science and politics of comparing women and men. American
Psychologist, 50, 145-158.
Eastwick, P. W., Eagly, A. H., Finkel, E. J., & Johnson, S. E. (2011). Implicit and explicit
preferences for physical attractiveness in a romantic partner: A double dissociation in
predictive validity. Journal of Personality and Social Psychology, 101, 993-1011.
Fast, L. A, & Funder, D. C. (2008). Personality as manifest in word use: correlations with self-
report, acquaintance report, and behavior. Journal of Personality and Social Psychology,
94, 334-46. doi:10.1037/0022-3514.94.2.334
Fast, L. A., & Funder, D. C. (2010). Gender differences in the correlates of self-referent word
use: authority, entitlement, and depressive symptoms. Journal of Personality, 78, 313-38.
Gill, A. J. & Oberlander, J. (2002). Taking care of the linguistic features of extraversion.
Proceedings of the 24th Annual Conference of the Cognitive Science Society, 363—368.
Goldberg, L. R. (1981). Language and individual differences: The search for universals in
personality lexicons. In L. Wheeler (Ed.), Review of personality and social psychology
(pp. 141-165). Beverly Hills: Sage.
Goldenfeld, N., Baron-Cohen, S., & Wheelwright, S. (2007). Empathizing and systemizing in
males, females, and autism: A test of the neural competition theory. Autism, 1-16.
Golder, S. A. & Macy, M. W. (2011). Diurnal and seasonal mood vary with work, sleep and
daylength across diverse cultures. Science, 333, 1878-1881.
Gosling, S. D. (2008). Snoop: What your stuff says about you. New York: Basic books.
Gosling, S. D., Ko, S. J., Mannarelli, T., & Morris, M. E. (2002). A Room with a cue: Judgments
of personality based on offices and bedrooms. Journal of Personality and Social
Psychology, 82, 379-398.
Gottschalk, L. A., Gleser, G. C. (1969). Measurement of Psychological States Through the
Content Analysis of Verbal Behaviour. Berkeley, CA: University of California Press.
Groom, C. J., & Pennebaker, J. W. (2005). The language of love: Sex, sexual orientation, and
language use in online personal advertisements. Sex Roles, 52, 447-461.
Hancock, J., Curry, L., Goorha, S., & Woodworth, M. (2008). On lying and being lied to: A
linguistic analysis of deception. Discourse Processes. 45:1-23.
Hart, R. P. (1984). Verbal style and the presidency: A computer-based analysis. New York:
Academic Press.
Hirsh, J. B., & Peterson, J. B. (2009). Personality and language use in self-narratives. Journal of
Research in Personality, 43(3), 524-527. doi:10.1016/j.jrp.2009.01.006
Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of Research
in Personality, 45(1), 92-99. doi:10.1016/j.jrp.2010.11.015
Holtgraves, T. (2010). Social psychology and language: Words, utterances and
conversations. In S. Fiske, D. Gilbert, & G. Lindzey (Eds.), Handbook of social
psychology, 5th edition.
Holtzman, N. S., Vazire, S., & Mehl, M. R. (2010). Sounds like a narcissist: Behavioral
manifestations of narcissism in everyday life. Journal of Research in Personality, 44,
478-484. doi:10.1016/j.jrp.2010.06.001
Ickes, W., & Reidhead, S., & Patterson, M. (1986). Machiavellianism and self-monitoring: As
different as “me” and “you”. Social Cognition, 4, 58 – 74.
Jay, T. (2009). The utility and ubiquity of taboo words. Perspectives on Psychological Science, 4,
Kacewicz, E., Pennebaker, J. W., Davis, M., Jeon, M., & Graesser, A. C. (in press, pending
minor revision). The language of status hierarchies. Social Psychological and Personality
Koppel, M., Argamon, S. & Shimoni, A. (2003), Automatically categorizing written texts by
author gender. Literary and Linguistic Computing, 17, 401-412.
Kosinski, M. & Stillwell, D. (2012). myPersonality research wiki: myPersonality project. In
Kramer, A. D. I. (2010). An unobtrusive behavioral model of “gross national happiness”.
Proceedings of the 28th International Conference on Human Factors in Computing
Systems - CHI, 287-290.
Küfner, A. C. P., Back, M. D., Nestler, S., & Egloff, B. (2010). Tell me a story and I will tell you
who you are! Lens model analyses of personality and creative writing. Journal of
Research in Personality, 44, 427-435. doi:10.1016/j.jrp.2010.05.003
Lakoff, R. T. (1975). Language and woman's place. New York: Harper & Row.
Landauer, T. K., and Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semantic
Analysis theory of the acquisition, induction, and representation of knowledge.
Psychological Review, 104, 211-240
Lee, C. H., Kim, K., Seo, Y. S., & Chung, C. K. (2007). The relations between personality and
language use. Journal of General Psychology, 134, 405-413.
Mairesse, F. & Walker, M. (2011). Controlling user perceptions of linguistic style: Trainable
generation of personality traits. Computational Linguistics, 37, 445-488.
Mairesse, F., & Walker, M. A. (2006). Words mark the nerds : Computational models of
personality recognition through language. Proceedings of the 28th Annual Conference of
the Cognitive Science Society, 543–548.
Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the
automatic recognition of personality in conversation and text. Journal of Artificial
Intelligence Research, 30, 457-500.
McAdams, D. P., & Pals, J. L. (2006). A new Big Five: Fundamental principles for an integrative
science of personality. American Psychologist, 61, 204-17. doi:10.1037/0003-
Mehl, M. R. (2006a). Quantitative text analysis. In M. Eid & E. Diener (Eds.), Handbook of
multimethod measurement in psychology (pp.141–156). Washington, DC: American
Psychological Association.
Mehl, M. R. (2006b). The lay assessment of sub-clinical depression in daily life. Psychological
Assessment, 18, 340-345.
Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat:
Manifestations and implicit folk theories of personality in daily life. Journal of
Personality and Social Psychology, 90, 862-877.
Mehl, M. R. & Holleran, S. E. (2008). How taking a word for a word can be problematic:
Context-dependent linguistic markers of extraversion and neuroticism. Paper presented at
the 11th Conference of the International Association for Language and Social Psychology,
Tucson, Arizona.
Mehl, M. R., & Pennebaker, J. W. (2003). The sounds of social life: A psychometric analysis of
students’ daily social environments and natural conversations. Journal of Personality and
Social Psychology, 84, 857-870.
Mehl, M., Pennebaker, J.W., Crow, D.M., Dabbs, J., & Price, J. (2001). The Electronically
Activated Recorder (EAR): A device for sampling naturalistic daily activities and
conversations. Behavior Research Methods, Instruments, & Computers, 33, 517-523.
Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Te Google Books Team,
Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A.,
and Aiden, E. L. (2010). Quantitative analysis of culture using millions of digitized
books. Science. doi: 10.1126/science.1199644
Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender
differences in language use: An analysis of 14,000 text samples. Discourse Processes,
45(3), 211-236. doi:10.1080/01638530802073712
Oberlander, J. & Gill, A.J. (2006). Language with character: A corpus-based study of individual
differences in e-mail communication. Discourse Processes, 42, 239-270.
O‘Carroll Bantum, E., & Owen, J. E. (2009). Evaluating the validity of computerized content
analysis programs for identification of emotional expression in cancer narratives.
Psychological Assessment, 21, 79-88.
Paulhus, D.L. & Williams, K.M. (2002). The Dark Triad of personality: Narcissism,
machiavellianism, and psychopathy. Journal of Research in Personality, 36, 556–563.
Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. New York:
Bloomsbury Press.
Pennebaker, J. W. (1997). Opening up: The healing power of expressing emotion. New York:
Guilford Press.
Pennebaker, J.W., Francis, M.E., & Booth, R.J. (2007). Linguistic Inquiry and Word Count
(LIWC): LIWC 2007 [Computer program]. Austin, TX:
Pennebaker, J. W. & Ireland, M. E. (2011). Using literature to understand authors: The case for
computerized text analysis. Scientific Study of Literature, 1, 34-48.
Pennebaker, J. W., & Lay, T. C. (2002). Language use and personality during crises: Analyses of
mayor Rudolph Giuliani’s press conferences. Journal of Research in Personality, 36,
Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural
language. use: Our words, our selves. Annual Review of Psychology, 54, 547-77.
Poole, M. E. (1979). Social class, sex, and linguistic coding. Language and Speech, 22, 49–67.
Pyszczynski, T., & Greenberg, J. (1987). Self-regulatory perseveration and the depressive self-
focusing style: A self-awareness theory of reactive depression. Psychological
Bulletin, 102, 122-138.
Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you tweet: Personality expression
and perception on Twitter. Journal of Research In Personality, 46, 710-718.
Ranganath, R., Jurafsky, D., & McFarland, D. A. (2013). Detecting friendly, flirtatious,
awkward, and assertive speech in speed-dates. Computer Speech & Language, 27(1), 89-
Ramírez-Esparza, N., Chung, C., Kacewicz, E., & Pennebaker, J. W. (2008). The Psychology of
word use in depression forums in English and in Spanish: Testing two text analytic
approaches. Proceedings of the International Conference on Weblogs and Social Media
(ICWSM 2008).
Raskin, R., & Shaw, R. (1988). Narcissism and the use of personal pronouns. Journal of
Personality, 56, 2, 393-404.
Robbins, M. L., Focella, E. S., Kasle, S., Weihs, K. L., Lopez, A. M., & Mehl, M. R., (2011).
Naturalistically observed swearing, emotional support and depressive symptoms in
women coping with illness. Health Psychology, 30, 789-792.
Roccas, S., Sagiv, L., Schwartz, S. H., & Knafo, A. (2002). The Big Five personality factors and
personal values. Personality and Social Psychology Bulletin, 28, 789-801.
Rodriguez, A. J., Holleran, S. E., & Mehl, M. R. (2010). Reading between the lines: The lay
assessment of subclinical depression from written self-descriptions. Journal of
Personality, 78, 575-98. doi:10.1111/j.1467-6494.2010.00627.x
Rohrbaugh, M. J., Mehl, M. R., Shoham, V., Reilly, E. S., & Ewy, G. a. (2008). Prognostic
significance of spouse we talk in couples coping with heart failure. Journal of Consulting
and Clinical Psychology, 76, 781–9.
Rude, S. S., Gortner, E.-M., & Pennebaker, J. W. (2004). Language use of depressed and
depression-vulnerable college students. Cognition & Emotion, 18, 1121–1133.
Schaefer, D. R., Kornienko, O., & Fox, A. M. (2011). Misery does not love company: Network
selection mechanisms and depression homophily. American Sociological Review, 76,
Scherwitz, L., Canick, J. (1988). Self-reference and coronary heart disease risk. In K. Houston, &
C. R. Snyder (Eds.), Type A behavior pattern: Research, theory, and intervention. New
York: John Wiley & Sons.
Scherwitz, L., Graham, L. E., Ornish, D. (1985). Self-involvement and the risk factors for
coronary heart disease. Advances, 2, 6 – 18.
Schmauder, A.R., Morris, R.K., & Poynor, D.V. (2000) Lexical processing and text integration
of function and content words: Evidence from priming and eye fixations. Memory &
Cognition, 7, 1098-1108.
Schnurr, P. P., Rosenberg, S. D., Oxman, T. E., & Tucker, G. J. (1986). A methodological note
on content analysis: Estimates of reliability. Journal of Personality Assessment, 50, 601-
Schwartz, H. A. Eichstaedt, J., Dziurzynski, L., Kern, M., Blanco, E., Kosinski, M. Stillwell, D.,
Seligman, M., & Ungar, L. H.. (2013). Toward personality insights from language
exploration in social media. AAAI-2013 Spring Symposium: Analyzing
Microtext. Stanford, California.
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., .
. . Ungar, L. H. (2013). Personality, Gender, and Age in the Language of Social Media:
The Open-Vocabulary Approach. PLoS One, 8(9), e73791.
Scott, M., 2008, WordSmith Tools version 5, Liverpool: Lexical Analysis Software.
Simmons, R. A., Gordon, P. C., & Chambless, D. L. (2005). Pronouns in marital interaction.
Psychological Science, 16, 932-6.
Simmons, R. a, Chambless, D. L., & Gordon, P. C. (2008). How do hostile and emotionally
overinvolved relatives view relationships? What relatives’ pronoun use tells us. Family
Process, 47, 405–19.
Slatcher, R. B., Chung, C. K., Pennebaker, J. W., & Stone, L. D. (2007). Winning words:
Individual differences in linguistic style among U.S. presidential and vice presidential
candidates. Journal of Research in Personality, 41, 63-75.
Smith, C. P. (Ed.). (1992). Motivation and personality: Handbook of thematic content analysis.
Cambridge, MA: Cambridge University Press.
Stirman, S.W., & Pennebaker, J.W. (2001). Word use in the poetry of suicidal and non-suicidal
poets. Psychosomatic Medicine 63, 517-522.
Stone, P. J., Dunphy, D. C., Smith, M. S., & Ogilvie, D. M. (1966). The general inquirer: A
computer approach to content analysis. Cambridge: MIT Press.
Tanenhaus, M.K. & Trueswell, J.C. (1995). Sentence comprehension. In Eimas & Miller (Eds.)
Handbook in Perception and Cognition, Volume 11: Speech Language and
Communication, pp. 217-262. New York: Academic Press.
Tausczik, Y. R., & Pennebaker, J. W. (2009). The psychological meaning of words: LIWC and
computerized text analysis methods. Journal of Language and Social Psychology, 29, 24-
54. doi:10.1177/0261927X09351676
Teasdale, J. D. (1988). Cognitive vulnerability to persistent depression. Cognition and Emotion,
2, 247-274.
Teasdale, J. D., & Green, H. A. C. (2004). Ruminative self-focus and autobiographical memory.
Personality and Individual Differences, 36, 1933–1943.
Townsend, D. J. & Saltz, E. (1972). Phrases vs. meaning in the immediate recall of
sentences. Psychonomic Science, 29, 381-384.
Trzesniewski, K.H. & Donnellan, M.B. (2010). Rethinking “Generation Me”: A study of cohort
effects from 1976–2006. Perspectives in Psychological Science, 5, 58–75.
Vazire, S., & Gosling, S. D. (2004). e-Perceptions: personality impressions based on personal
websites. Journal of personality and social psychology, 87, 123-32.
Vine, V. & Pennebaker, J. W. (2009). [The arc of narrative project]. Unpublished raw data.
University of Texas at Austin, Austin, TX.
Watson, D. & Pennebaker, J. W. (1989). Health complaints, stress, and disease: Exploring the
central role of negative affectivity. Psychological Review, 96, 234-254.
Weintraub, W. (1981). Verbal behavior: Adaptation and psychopathology. New York: Springer.
Yarkoni, T. (2010). Personality in 100,000 Words: A large-scale analysis of personality and
word use among bloggers. Journal of Research in Personality, 44, 363-373. Elsevier Inc.
... LIWC analyzes both content words, which communicate some kind of meaning, like who, what, where, or why (nouns, regular verbs, adjectives, and adverbs), and function or style words (pronouns, prepositions, auxiliary verbs, conjugations, etc.) that are used to link meaningful words together, which are generated from a deep level of the mind and are often automatic and used unconsciously, consequently revealingِanِindividual'sِ psychological state (Boyd, 2017;Tausczik & Pennebaker, 2010). The advantage of LIWC'sِword-counting approach for exploring the psychological processes found in individuals'ِlanguageِisِthatِtheِreliabilityِofِLIWC'sِresultsِisِneverِunderminedِbyِ experimenter error or subjective bias (Ireland & Mehl, 2014). ...
... It can be problematic to rely on self-reportِquestionnairesِasِtheِ"gold standard"ِ scores for personality research because of potential response biases and self-knowledge constraints (Paulhus & Vazire, 2007). Linguistic analysis has become a technique for personality researchers to assess personality in a less biased and more reliable way (Ireland & Mehl, 2014;Kern et al., 2019;Obschonka et al., 2017;Yarkoni, 2010). A moreِ"psychologicallyِtelling"ِandِpsychometricallyِparsimoniousِmethodِofِ determiningِindividualِdifferencesِisِlanguageِstylesِ(anِindividual'sِuseِofِfunctionِorِ "stop"ِwords), how an individual says things, rather than differences in language content (anِindividual'sِuseِofِnouns,ِverbs,ِadjectives,ِandِmostِadverbs),ِwhat an individual says (Ireland & Mehl, 2014;Yarkoni, 2010). ...
... Linguistic analysis has become a technique for personality researchers to assess personality in a less biased and more reliable way (Ireland & Mehl, 2014;Kern et al., 2019;Obschonka et al., 2017;Yarkoni, 2010). A moreِ"psychologicallyِtelling"ِandِpsychometricallyِparsimoniousِmethodِofِ determiningِindividualِdifferencesِisِlanguageِstylesِ(anِindividual'sِuseِofِfunctionِorِ "stop"ِwords), how an individual says things, rather than differences in language content (anِindividual'sِuseِofِnouns,ِverbs,ِadjectives,ِandِmostِadverbs),ِwhat an individual says (Ireland & Mehl, 2014;Yarkoni, 2010). ...
Full-text available
Creativity is most commonly assessed through methods such as questionnaires and specific tasks, the validity of which can be weakened by scorer or experimenter error, subjective and response biases, and self-knowledge constraints. Linguistic analysis provides researchers with an automatic, objective method of assessing creativity, free from human error and bias. This study used 419 creativity text samples from a wide range of creative individuals (Big-C, Pro-C, and Small-c) to investigate whether linguistic analysis can, in fact, distinguish between creativity levels and creativity domains using creativity dictionaries and personality dimension language patterns in the Linguistic Inquiry and Word Count (LIWC) text analysis program. Creative individuals used more words on the creativity dictionaries as well as more Introversion and Openness to Experience Language Pattern words than less creative individuals. Regarding creativity domains, eminent artists used more Introversion and Openness to Experience Language Pattern words than eminent scientists. Text analysis through LIWC was able to successfully distinguish between the three creativity levels, in some cases, and the two creativity domains with statistical significance. These findings lend support to the use of linguistic analysis as a partially valid form of creativity assessment.
... Inguistic inquiry and word count analyzes both content words, which communicate some kind of meaning, like who, what, where, or why (nouns, regular verbs, adjectives, and adverbs), and function or style words (pronouns, prepositions, auxiliary verbs, and conjugations, etc.) that are used to link meaningful words together, which are generated from a deep level of the mind and are often automatic and used unconsciously, consequently revealing an individual's psychological state (Tausczik and Pennebaker, 2010;Boyd, 2017). The advantage of LIWC's wordcounting approach for exploring the psychological processes found in individuals' language is that the reliability of LIWC's results is never undermined by experimenter error or subjective bias (Ireland and Mehl, 2014). ...
... It can be problematic to rely on self-report questionnaires as the "gold standard" scores for personality research because of potential response biases and self-knowledge constraints (Paulhus and Vazire, 2007). Linguistic analysis has become a technique for personality researchers to assess personality in a less biased and more reliable way (Yarkoni, 2010;Ireland and Mehl, 2014;Obschonka et al., 2017;Kern et al., 2019). A more "psychologically telling" and psychometrically parsimonious method of determining individual differences is language styles or how an individual says things, rather than differences in language content or what an individual says (Yarkoni, 2010;Ireland and Mehl, 2014). ...
... Linguistic analysis has become a technique for personality researchers to assess personality in a less biased and more reliable way (Yarkoni, 2010;Ireland and Mehl, 2014;Obschonka et al., 2017;Kern et al., 2019). A more "psychologically telling" and psychometrically parsimonious method of determining individual differences is language styles or how an individual says things, rather than differences in language content or what an individual says (Yarkoni, 2010;Ireland and Mehl, 2014). ...
Full-text available
The purpose of this study was twofold: first, to be among the first attempts to validate linguistic analysis as a method of creativity assessment and second, to differentiate between individuals in varying scientific and artistic creativity levels using personality language patterns. Creativity is most commonly assessed through methods such as questionnaires and specific tasks, the validity of which can be weakened by scorer or experimenter error, subjective and response biases, and self-knowledge constraints. Linguistic analysis may provide researchers with an automatic, objective method of assessing creativity, and free from human error and bias. The current study used 419 creativity text samples from a wide range of creative individuals mostly in science (and some in the arts and humanities) to investigate whether linguistic analysis can, in fact, distinguish between creativity levels and creativity domains using creativity dictionaries and personality dimension language patterns, from the linguistic inquiry and word count (LIWC) text analysis program. Creative individuals tended to use more words on the creativity keyword dictionaries as well as more introversion and openness to experience language pattern words than less creative individuals. Regarding creativity domains, eminent scientists used fewer introversion, and openness to experience language pattern words than eminent artists. Text analysis through LIWC was able to partially distinguish between the three creativity levels, in some cases, and the two creativity domains (science and art). These findings lend support to the use of linguistic analysis as a partially valid assessment of scientific and artistic creative achievement.
... Although research on the intended communication of specific emotions is lacking, there are studies demonstrating the existence of linguistic correlates of specific emotional states, as well as personality traits. Much of this research involves the use of the Linguistic and Inquiry Word Count program (LIWC; Pennebaker et al., 2015) to analyze language differences as a function of a person's emotional state or personality (for a review see Ireland & Mehl, 2014). For example, in terms of emotionality, anger (both state and trait), has been demonstrated to be related to an increased use of second-person pronouns (Simmons et al., 2005;Simmons et al., 2008;Weintraub, 1981). ...
Full-text available
In this research I explored the communication of emotions in digital contexts. Specifically, how well are people able to implicitly communicate discrete emotional states with words alone, and what are some of the correlates of this ability? In two experiments, senders created text messages designed to communicate 22 specific emotions (e.g., disgust), without naming the emotion, and receivers were asked to identify the emotion being conveyed. Senders and receivers indicated their degree of confidence that they successfully conveyed/recognized each emotion, and all participants completed measures of empathy and perspective taking. Emotion recognition (50%) far exceeded chance (5%) when a multiple-choice procedure was used (Experiment 2) and was substantial (20%) when participants were required to generate their own emotion labels (Experiment 1). When receivers failed to recognize the specific emotion, their errors were almost always of the same valence as the conveyed emotion (85% in Experiment 1 and 91% in Experiment 2), a rate that far exceeded chance (50%). Even though implicit emotional communication was relatively successful, the confidence rating of senders (but not receivers) was unrelated to communicative success. Emotion communication was more successful when the receiver was female and higher in empathy and perspective taking. In contrast, the gender and empathy level of senders was unrelated to communicative success. Overall, these results demonstrate that people can, to varying degrees, communicate emotions in digital contexts with words only.
... Chen & Bond, 2010), but also in specific word manipulations (e.g. Ireland & Mehl, 2014;Newman et al., 2008). ...
Full-text available
The study focuses on psychological-linguistic analysis of utterances provided by N = 2522 respondents aged 18-89 years in the period of March-May 2020, for the research of JUPSYCOR (Psychological Impacts of the Coronavirus Epidemic in the Czech Republic). The utterances relate to the interpretation of the state of emergency, the COVID-19 epidemic, and its subjectively perceived impacts. Simultaneously, the study examines the relationship between the analysed texts and the results of the SEHW (Scales of Emotional Habitual Subjective Well-being) questionnaire, which determines the valence of experienced emotions. The aim of the study is to analyse the lexical and morphological layers of the utterances, especially which specific words resonated in the individual questions, what is their emotional load, and which linguistic features of the texts may refer to the respondents' positive/negative emotional response. One of the outputs based on the results of the quantitative analyses determines that the most distinctive words are connected to negative emotions and most frequently relate to social environment, anxiety, and inhibition. Furthermore, the study proves a positive correlation between a fear scale and a higher occurrence of future tense and use of emotionally negatively loaded words, especially in women. Numerous differences among the individual age and gender cohorts were also proved. The significance of the study lies predominantly in the combination of the linguistic and psychological levels of the analysis, in the utilization of two mutually complementary utterances, and in the presentation of new insights on how people use words when they face an unexpected and emotionally disturbing situation. Supplementary information: The online version contains supplementary material available at 10.1007/s12646-021-00613-y.
Full-text available
The study is a follow-up to three published anglophone researches examining the relation between the use of linguistic categories and personality characteristics as outlined in the Big Five model, with the purpose of replicating these and elaborating for the Czech language. The comparative research study in Czech focuses on analysis of both grammatical and semantic variables in six types of text (written and oral), produced by N = 200 participants. Within the study, there were six confirmed relations, however, these appear only in certain types of text. The results show not only an essential role of the text register, but they also allow us to evaluate the universality of findings of studies in English in comparison with other, especially Slavic, languages.
Full-text available
Narracje o emocjach mogą stanowić odzwierciedlenie umysłowych reprezentacji emocji i mieć wpływ na to, w jaki sposób reprezentacje te są konstruowane. Sposób mówienia (w tym dynamika wypowiedzi) zależy od cech psychologicznych mówiącego. W badaniu przeanalizowano związki między dynamiką wypowiedzi o niepokoju i zadowoleniu a cechami osobowości (mierzonymi kwestionariuszem SF IPIP NEO PI-R). Ujawniono dodatnie związki dynamiki wypowiedzi z neurotycznością, a ujemne z ekstrawersją i otwartością. SUMMARY Narratives about emotions may reflect the mental representation of emotions and have an impact on how these representations are constructed. The way of speaking (including dynamics of speech) is dependent on the psychological characteristics of the speaker. In this study, we examine the relationships between the dynamics of texts about anxiety and contentment and personality traits (measured with the SF IPIP NEO PI-R). The results showed positive relationships between text dynamics and neuroticism, and negative relationships with extraversion and openness.
Prior research has shown that gender-based leadership differences can influence organizational functions and behaviour, including strategy, resource allocation, and culture development. It therefore follows that gender-based leadership differences may impact communication. Building on research identifying gendered differences in language use, this study explored Twitter content differences among municipalities in the United States, differentiated by the level of gender-based representation at both the executive (mayoral) and legislative (council/commission) levels. The results uncovered commonly accepted gendered language differences based on mayoral gender, though the influence of council representation partially deviated from accepted differences. The study results form a foundation for larger investigations into gender-based impacts on social media communication and the linguistic styles necessary for maximum engagement.
Environmental organisations use a variety of text based communication formats to mobilize supporters to take collective action on their behalf. Yet we know little about the characteristics of collective action communication used by environmental organizations, nor its mobilization potential. In this paper we investigate whether environmental organizations’ website communication reflects variables focal to the Social Identity Model of Collective Action that are known to motivate intentions to engage in collective action: identity, negative emotions, and efficacy. We use Linguistic Inquiry and Word Count (LIWC) text analysis software to investigate the extent to which linguistic cues associated with these focal variables are communicated in text obtained from 497 environmental advocacy organizations’ websites. Findings demonstrate that environmental groups use identity and efficacy language more than the LIWC comparison text corpus, but show consistently low usage of both negative and positive emotion language. Our analyses also suggest that groups with the greatest financial resources use identity and efficacy language more frequently than groups with fewer financial resources. We suggest that future research investigating the effects of linguistic cues on message receivers is an important next step for both advancing the SIMCA framework as well as assisting environmental organizations’ ability to more effectively mobilize supporters through their text based communication formats.
Full-text available
Online depression communities offer people with depressed symptoms new opportunities to obtain health information and provide social support for each other to fight against the depression. We sought to investigate whether usage of online community help improve depression outcomes and determine which types of usage behaviors have positive or negative effects on depression. We proposed that two dimensions of the sense of belonging (sense of identity and trust) and three dimensions of the sense of support (informational, emotional, and socializing) have significant effects on depression, and further considered gender difference and its effect on depression. We obtained a dataset consisting of 465,337 posts from 244 members from a popular online depression community to test all 10 proposed hypotheses. The results reveal that (i) the sense of shared identity, trust, informational support, and emotional support have positive effects on depression, while socializing support have negative effects on depression, and (ii) the sense of shared identity and trust have more positive effects on depression for female users than male users while socializing support has a more negative effect on depression for female users than for male users. The findings have important practical implications for designers and managers of online depression communities.
Full-text available
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Conference Paper
Full-text available
Language in social media reveals a lot about people's personality and mood as they discuss the activities and relationships that constitute their everyday lives. Although social media are widely studied, researchers in computational linguistics have mostly focused on prediction tasks such as sentiment analysis and authorship attribution. In this paper, we show how social media can also be used to gain psychological insights. We demonstrate an exploration of language use as a function of age, gender, and personality from a dataset of Facebook posts from 75,000 people who have also taken personality tests, and we suggest how more sophisticated tools could be brought to bear on such data.
Handwritten autobiographies from 180 Catholic nuns, composed when participants were a mean age of 22 years, were scored for emotional content and related to survival during ages 75 to 95. A strong inverse association was found between positive emotional content in these writings and risk of mortality in late life (p < .001). As the quartile ranking of positive emotion in early life increased, there was a stepwise decrease in risk of mortality resulting in a 2.5-fold difference between the lowest and highest quartiles. Positive emotional content in early-life autobiographies was strongly associated with longevity 6 decades later. Underlying mechanisms of balanced emotional states are discussed.
The focus of many psychologists today is not so much on the traits and long-term characteristics of the people who participate in our research as on their reactions to events and situations. Psychologists are concerned with changing transitory psychological states, but have not yet developed fully effective techniques for their assessment. Content analysis of verbal communications can be helpful in assessing such states. Content analysis is based on the assumption that the language in which people choose to express themselves contains information about the nature of their psychological states. This assumption implies a representational or descriptive model of language, in contrast to the instrumental or functional model preferred, for example, by Mahl [1]. Content analysis can be applied only to verbal, not to nonverbal communications. However, although content analysis cannot be applied to nonverbal communications, inferences can be made about people’s states through objective and systematic identification of specified characteristics of their verbal communications [2, 3]. Content analysis of verbal communications is a way of listening to and interpreting people’s communicated accounts of events. When agreement between independent interpretations is achieved, the essential requirement of scientific endeavor (intersubjective agreement) is met [4].
The Handbook of Narcissism and Narcissistic Personality Disorder is the definitive resource for empirically sound information on narcissism for researchers, students, and clinicians at a time when this personality disorder has become a particularly relevant area of interest. This unique work deepens understanding of how narcissistic behavior influences behavior and impedes progress in the worlds of work, relationships, and politics.
This is a great book that aims to popularize the study of how function words such as pronouns, but also articles, prepositions and auxiliary verbs reveal personality traits and roles within relationships. i James Pennebaker is a social psychologist who has made major contributions in understanding how people who've gone through traumatic experiences may be helped by writing about them. He has also invested a good deal of time in developing techniques for counting function words and interpreting differences in their distributions. And while Pennebaker focuses on interpreting differences in distributions as reflections of different personality types or different roles in relations, he is interested in a wide range of other topics in which the interpretation of the word frequencies might play a role, including psychological health, emotions, honesty vs. deception, corporate and regional identity, literature, authorship attribution, authority in relationships, and political appeal.
Sex, politeness and language: Sex and language, What is politeness?, Why do women and men interact differently?, Analysing linguistic politeness, Social dimensions and linguistic analysis, Cross-cultural contrasts, What's in store Who speaks here? Interacting politely: Who's got the floor?, Who's asking questions?, Who's interrupting and why?, Back-channeling - a female speciality?, Agreeable and disagreeable responses, Conclusion Soft and low-hedges and boosters as politeness devices: Hedges and boosters, Hedges and boosters and 'women's language, Tag questions, Pragmatic particles- you know, I think, sort of, of course, The pragmatic tag eh, A prosodic hedge - the HRT, How typical are these politeness patterns?, Conclusion What a lovely tie! Compliments and positive politeness strategies: Paying compliments, Who pays most compliments?, How do women and men pay compliments?, What do women and men compliment each other about?, Can a compliment be a power play?, Compliment responses, What about other speech acts?, Conclusion Sorry! Apologies and negative politeness strategies: Why apologise?, Who apologises most, How do women and men apologise?, What deserves an apology?, Offending the boss is a serious matter, Friends and forgiveness, How do people respond to an apology?, Why apologise? Some answers, Speech acts and politeness - what next? Why politeness matters: Patterns of politeness, Being polite in class, Peer interaction in professional contexts, Strategies for change, Conclusion.