Filler words (I mean, you know, like, uh, um) are commonly used in spoken conversation. The authors analyzed these five filler words from transcripts recorded by a device called the Electronically Activated Recorder (EAR), which sampled participants' language use in daily conversations over several days. By examining filler words from 263 transcriptions of natural language from five separate studies, the current research sought to clarify the psychometric properties of filler words. An exploratory factor analysis extracted two factors from the five filler words: filled pauses (uh, um) and discourse markers (I mean, you know, like). Overall, filled pauses were used at comparable rates across genders and ages. Discourse markers, however, were more common among women, younger participants, and more conscientious people. These findings suggest that filler word use can be considered a potential social and personality marker.
Charlyn M. Laserna, Yi-Tai Seih and James W. Pennebaker
Gender, and Personality
: Filler Word Use as a Function of Age,You Know . . . Who Like Says Um
Um . . . Who Like Says You
Know: Filler Word Use as a
Function of Age, Gender, and Personality
Charlyn M. Laserna1, Yi-Tai Seih1,
and James W. Pennebaker1
The way we use language in natural spoken conversation is revealing. For instance,
certain aspects of language such as dialects and colloquialisms can be used to deter-
mine where a person was raised. How someone speaks can also indicate whether the
listener is a friend or a stranger. Language may even reveal characteristics such as
Corresponding Author:
Yi-Tai Seih, Department of Psychology, University of Texas, 1 University Place, 108 East Dean Keeton,
Austin, TX 78712, USA.
gender, age, and personality. One widely used but often overlooked feature of lan-
guage are filler words, which are speech irregularities used in spoken conversation and
commonly regarded as superfluous language spoken by careless speakers (Strassel,
2004). Surprisingly little is known about whether filler words have psychological
implications with regard to communication. The current study examines filled pauses
and discourse markers, two primary categories of filler words. Unlike traditional lin-
guistic research, which investigates how filler words are used, we examined individual
differences to determine who is using these filler words when they converse. To set the
stage for our research, we review filled pauses and discourse markers in language use.
Filled pauses are short utterances commonly used in spontaneous speech (Brennan
& Williams, 1995; Swerts, 1998), uh and um being two of the most frequently used
filled pauses within the English language (Strassel, 2004). In verbal communication,
filled pauses are hypothesized to either act as an unconscious sign of speech disflu-
ency or serve as a signal sent by speakers to convey a certain message. The content of
this message varies and may inform listeners that the speaker needs a pause to collect
his or her thoughts (Fox Tree, 2007) or block the listener from taking the speaker’s
turn away (Maclay & Osgood, 1959). The use of filled pauses tends to increase when
a speaker is faced with challenging choices (Christenfeld, 1994), yet at the same time,
listeners view speakers as less anxious when the speakers use filled pauses (Christenfeld,
1995). Listeners also tend to view filled pauses as an indication that speakers are
unsure about what is being said, suggesting that filled pauses may be a more deliberate
signal sent from the speaker (Brennan & Williams, 1995; Fox Tree, 2007). In either
theory, filled pauses appear to be associated with the processing of complex thoughts.
Since filled pauses have a linguistic effect on spontaneous speech, is this effect
influenced by any variables? In a study performed by Bortfeld, Leon, Bloom, Schober,
and Brennan (2001) where transcriptions of conversation pairs were analyzed, an
increase in the use of uh and um in addition to other disfluency rates were associated
with being older, discussing unfamiliar domains, and taking on a directive role during
conversation. Another study by Tottie (2011) analyzed the frequency of uh and um in
two subcorpora from the British National Corpus that consists of transcribed tele-
phone, face-to-face, and interview conversations. The study discovered that older
people, males, and those with higher levels of education used more filled pauses in
speech than younger people, females, and individuals with lower levels of education.
In a sense, filled pauses may act as markers that identify speakers’ gender, age, and
socioeconomic status.
Unlike filled pauses, discourse markers are short phrases that do not contain any
grammatical information yet are prevalent in natural speech (Fox Tree & Schrock,
2002; Fuller, 2003; Matei, 2011; Strassel, 2004). Although they do not serve a gram-
matical purpose, both laypeople and researchers alike perceive discourse markers as
purposeful signals to a listener rather than as mere signs of disfluency (Fox Tree,
2007). They are generally proposed to act as transitions between different sections of
conversation (Clark, 1996), but discourse marker use seems to heavily depend on the
specific discourse marker. Often, the actual basic meanings of the words that consti-
tute a discourse marker determine its function. For example, the phrase I mean serves
as an indication that a speaker is planning to modify what is said, and you know is used
when the speaker is asking a listener to make inferences about the conversation (e.g.,
Fox Tree, 2007). Other research suggests that another purpose of you know is to con-
firm the understanding of a listener (Erman, 2001). The purpose of the discourse
marker like is more ambiguous, but some studies suggest that speakers use it as a
hedge when they do not want to fully commit to what they say (Fuller, 2003; Sharifian
& Malcolm, 2003). However, Liu and Fox Tree (2012) have countered the suggestion
that like acts as a hedge by showing that this discourse marker exhibits different pat-
terns from other hedges and likely has its own unique function.
Filled pauses and discourse markers are considered to be two categories of filler
words. If the use of filled pauses is affected by certain demographic variables, is the
use of discourse markers also affected by similar variables? A study has examined the
frequencies of discourse markers like and you know with the MICASE corpus (Schleef,
2005). This corpus contains 68 people (18 instructors and 50 students) and consists of
8 hours of lectures and 10 hours of seminars from an equal number of male and female
instructors. The results showed that female students used the discourse marker like
more than male students. In addition, students were more likely to use like than profes-
sors. Since professors are generally older than students, this finding concerning con-
versational roles may suggest that age affects discourse marker use.
Although previous research has described the underlying meanings and functions
of the two types of filler words, some limitations still exist in current literature. For
instance, past research examined discourse markers and filled pauses within one study
and discovered differences between these two categories (e.g., Fox Tree, 2006; Fox
Tree, Mayer, & Betts, 2011), but little research has been conducted on exploring the
personalities of the people who tend to use filler words. Although Mairesse and Walker
(2008) have shown that it is possible to estimate personality by examining certain
language parameters such as filled pauses (e.g., I mean, err, you know), their personal-
ity results were generated by human judges instead of original speakers and did not
show any direct correlation with filler words.
Personality traits can be assessed by self-report measures or judges’ ratings, but
little research examines the correlation between self-report personality traits and filler
words. It may be worthwhile to determine if self-reported personality is comparable to
assessed personality deduced from judges’ ratings on the use of filler words. In addi-
tion, if filler words are found to be reliable personality markers, further research using
self-report personality measures may be able to use filler word frequency to quickly
approximate personality traits in participants. Overall, the purpose of the current
research was to investigate the psychometric properties of filler words and revisit the
relationships between filler words, demographic variables, and personality traits.
The current research aimed to investigate how the frequency of filled pauses and
discourse markers used in the English language varies with two basic demographic
variables (gender and age) and personality traits. The present study focused on three
common discourse markers in the English language (I mean, you know, and like) and
two filled pauses (uh and um). The psychometric properties of these five filler words
and two categories were examined. Because most past research on filler words has
Table 1. The Descriptions of Transcriptions From Each Study for Analysis.
Number of
female Age (SD)
Word count
(SD) Days Description
Mehl and
13 62 21.4 (3.50) 6,750 (2,784) 10 Analysis of participants’
reactions to September 11,
Mehl and
50 54 19.0 (1.31) 1,007 (590) 4 A study on patterns in the
natural language of college
and Beevers
27 63 32.5 (13.8) 4,066 (2,950) 4 An examination of linguistic
indicators of negative social
functioning with depressive
Fellows (2009) 76 51 35.2 (5.88) 4,786 (4,469) 1 A study about how preschool-
aged children and parents
use emotion language
Gosling, and
97 47 18.7 (0.91) 995 (526) 2 A study that examined
personality traits by using
natural language
Note. The average word count for each participant was 2,692, and the total word count from the participants was
been based on experimental data (Tottie, 2011), the present study focused on transcrip-
tions that were transcribed from daily conversations recorded by Electronically
Activated Recorders (EARs). The EAR is an electronic device designed for sampling
natural spoken conversation during daily activities (Mehl, Pennebaker, Crow, Dabbs,
& Price, 2001). By using the EAR, the present study could examine filler words within
natural, extended interactions over the course of several days.
The transcriptions of 263 participants (137 females) were included in the current study.
The participants of the transcribed conversations ranged in age from 17 to 69 years
(M = 25.1, SD = 9.38). The 263 participants were from five studies whose detailed
information is shown in Table 1.
EAR Corpus and Coding
This study used a corpus of transcriptions obtained through the EAR, which is a device
programmed to automatically take audio recordings after set intervals of time (Mehl et
al., 2001). The EAR was worn by participants for a period of 2 to 3 days while they
went about their daily lives, giving the EAR the ability to collect truly spontaneous
conversation. Any clearly audible conversations between participants and the listener
were then transcribed. Those performing the transcribing were instructed to not omit
filled pauses and discourse markers.
The present study used the computerized text analysis program Linguistic Inquiry and
Word Count (Pennebaker, Booth, & Francis, 2007) to calculate the rates of filled
pauses and discourse markers used within each transcription as well as the total num-
ber of words spoken by an individual during a conversation. These calculations were
then used to determine the proportion of conversation devoted to filled pauses and
discourse markers. The proportions for each age and gender were then statistically
analyzed and compared.
Three of the transcription sets were from studies where participants’ personalities
were determined using the Big Five Inventory (Fellows, 2009; Mehl & Pennebaker,
2003a; Mehl & Pennebaker, 2003b). One study used the Ten-Item Personality Scale on
participants (Baddeley et al., 2013), and one study used the NEO Personality Inventory
on participants (Mehl et al., 2006). Since these three different versions of personality
scales were highly related to each other (Gosling, Rentfrow, & Swann, 2003), all per-
sonality scores were standardized for the current study and examined according to the
Big Five personality traits. Eleven participants did not complete any personality mea-
sure, resulting in a total of 252 participants included in personality analysis.
The current study sought to examine three aspects of filler words. First, the psycho-
metric properties of the five filler words were examined to clarify the associations
between filler words. Second, filled pauses and discourse markers were correlated
with age and gender. Third, the two types of filler words were examined according to
personality traits.
Each of the five filler words was analyzed by its base rates, which are presented in
Table 2. A one-way within-subjects analysis of variance showed that filler word rates
were used at significantly different rates, F(4, 1,048) = 141.8, p < .001. The least sig-
nificant difference post hoc comparison indicated that participants used like more than
the other four filler words included in the study (ps < .001). Correlation analysis was
performed to determine any associations between filler words. As shown in Table 2, uh
was not related to the discourse markers I mean, you know, and like, implying that the
underlying mechanism behind certain filler words might have different concepts. The
correlations between gender, age, and each filler word are also reported in Table 2 as
additional information.
To understand the structure of the filler words, we employed an exploratory factor
analysis with the five filler words. A principal component method with a varimax rota-
tion was used. The Kaiser-Meyer-Olkin measure of sampling adequacy was signifi-
cant (Kaiser-Meyer-Olkin = .65, p < .001), indicating that these five filler words were
factorable. The scree plot suggested two factors, and the two factors together accounted
for 60.8% of the total variance. The factor loading matrix is shown in Table 2. The first
extracted factor included the discourse markers I mean, you know, and like, whereas
the second extracted factor included the filled pause uh and um, supporting past theo-
ries of filled pauses.
Thus, the first factor referred to discourse markers, and the second factor referred
to filled pauses. According to the findings of our factor analysis, the rates of I mean,
you know, and like were summed to be the rate of discourse markers (M = 1.43; SD =
1.40), and the rates of uh and um were summed to be the rate of filled pauses (M =
0.64; SD = 0.63). Importantly, the rates of filled pauses and discourse markers were
positively correlated with each other (r = .26, p < .001), strengthening the idea that
both filled pauses and discourse markers belong within the same category. These two
categories were used in the following analyses.
The rate of discourse markers was positively associated with gender (male = 1,
female = 2; r = .20, p < .01) but negatively associated with age (r = −.50, p < .001),
suggesting that female and young participants were more likely to use discourse mark-
ers. On the contrary, the rate of filled pauses was not associated with gender (r = −.04,
p = .50) but associated with age (r = −.12, p = .05).
With these correlational findings, we became curious about the developmental
trend of these two types of filler words. We divided participants into four categories:
early college (17-19), late college (20-22), early adulthood (23-34), and adulthood (35
and older). Two 2 (gender) × 4 (age categories) between-subjects analyses of variance
were conducted separately on the two categories of filler words. The mean rates are
presented in Figure 1. With regard to discourse markers, there was a significant inter-
action effect between gender and age, F(3, 255) = 4.08, p < .01. The least significant
difference post hoc comparisons indicated that females used more discourse markers
than males in early and late college (ps < .001). The main effect on gender was signifi-
cant, F(1, 255) = 8.71, p < .01, and so was the main effect on age, F(3, 255) = 45.2,
p < .001. On the contrary, the interaction effect and the main effect of gender on filled
pause rates were not significant. Only the main effect on age on filled pause rates was
significant, F(3, 255) = 2.67, p = .05. Overall, the use of discourse markers and filled
pauses displayed a developmental trend.
Last, to examine the relationship between filler word use and personality, we cor-
related personality scores with the rates of discourse markers and filled pauses while
Table 2. Basic Psychometric Properties, Correlations on Gender and Age, and Component
Loadings for the Five Filler Words.
Loadings for the Five Filler Words.
Mean (SD) (1) (2) (3) (4) (5) Gender Age Factor 1 Factor 2
(1) I mean 0.12 (0.18) −.05 −.24*** .67 .18
(2) you know 0.18 (0.28) −.26*** — −.16** −.11 .74 .10
(3) like 1.13 (1.17) −.32*** .48*** — −.19*** −.54*** .80 −.13
(4) uh 0.35 (0.42) −.03 .05 .05 −.15* −.01 −.13 .89
(5) um 0.29 (0.40) −.19** .22*** .36*** .21*** — −.09 −.21*** .45 .60
Note. Gender: Male = 1; Female = 2. Factor 1 is discourse markers, whereas Factor 2 is filled pauses.
***p < .001. **p < .01. *p < .05.
controlling for gender and age (df = 248). Only conscientiousness was found to be
related to discourse markers (r = .14, p = .03), which could, in theory, be attributed to
a Type I error given the number of correlations tested. None of the Big Five personal-
ity traits were related to the use of filled pauses.
Figure 1. Mean rates of discourse markers and filled pauses by gender and age per person.
Note. The sample size was 123 for early college, 36 for late college, 59 for early adulthood, and 45 for
adulthood. The discourse marker category included I mean, you know, and like. The filled pause category
included uh and um.
Past research has mainly discussed filled pauses and discourse markers separately and
neglected to examine the relation between these two types of filler words. The current
research sought to look at a bigger picture and analyze filled pauses and discourse
markers in relation to one another. There were several interesting findings regarding
the psychometric properties of the filler words in this study. First, two factors were
extracted from our factor analysis and were found to be related to each other. This
finding suggests that the use of filled pauses and discourse markers is not identical
despite both categories having been discussed together as filler words (e.g., Strassel,
2004). In addition to the factor analysis, the use of filled pauses was found to be asso-
ciated with age but not with gender, whereas the use of discourse markers was found
to be associated with both gender and age. This suggests that people who were young,
female, or both young and female are more likely to use discourse markers. This result
supports previous findings regarding the use of the discourse marker like (Schleef,
2005). Finally, the use of discourse markers was associated with conscientiousness,
indicating that discourse markers can potentially serve as personality markers.
The present research has practical significance because it has shown that filler
words can serve as markers for age and gender. Our results extended previous research
by demonstrating a developmental trend that indicates that the gender effect on the use
of discourse markers only emerges during early and late college. As people become
older, the gender effect disappears. This trend may be indicative of a normative life
transition into adult roles, such as when one graduates from college and enters a job
market. A career role change may be the possible factor that leads people to change
their use of filler words.
What type of people are more likely to use discourse markers or filled pauses? In
our correlational results, conscientious people used more discourse markers. The pos-
sible explanation for this association is that conscientious people are generally more
thoughtful and aware of themselves and their surroundings. When having conversa-
tions with listeners, conscientious people use discourse markers, such as I mean and
you know, to imply their desire to share or rephrase opinions to recipients. Thus, it is
expected that the use of discourse markers may be used to measure the degree to which
people have thoughts to express. As for filled pauses, their use has been considered to
be a reflection of anxiety (e.g., Christenfeld & Creager, 1996; Scherer & Scherer,
1981). However, our measure of neuroticism was not related to the use of filled pauses
in this research. The claim that speaker anxiety is related to the use of filled pauses
should be more carefully examined in future research.
Previous research has documented filler words as markers of people’s psychologi-
cal states (Erman, 2001; Fuller, 2003). In the current study, we not only clarified the
psychometric structure of the two types of filler words but also extended the work to
personality traits. When people first meet people, they usually approximate strangers’
personalities and base their opinions on what is said and how they say it. From a meth-
odological standpoint, the use of discourse markers can provide a quick behavioral
measure of personality traits. More important, we used extended conversations with
speakers to study how filler words function in daily lives. This strategy provides better
ecological validity to investigate filler word use. With an increased understanding of
why and how filler words are used in verbal communication, we anticipate that people
may one day be able to use the active interpretation of filler words to improve the qual-
ity of their communication with others.
Charlyn M. Laserna is a medical student at the University of Texas at Houston Medical
School. She received her undergraduate degree at the University of Texas at Austin in 2012. Her
research interests surround language and verbal social cues.
338 Journal of Language and Social Psychology 33(3)
Yi-Tai Seih currently works for the Department of Psychology at the University of Texas at
Austin as a research associate. He received his PhD at the University of Texas at Austin in 2013.
His research focuses on the interplay between language and interpersonal relationships. His
most recent research focuses on how recipients perceive complaint language.
James W. Pennebaker is a professor and chair for the Department of Psychology at the
University of Texas at Austin. His most recent research focuses on the nature of language and
social dynamics in the real world. The words people use serve as powerful reflections of their
personality and social worlds.
