ArticlePDF Available

Abstract and Figures

Women are generally assumed to be more talkative than men. Data were analyzed from 396 participants who wore a voice recorder that sampled ambient sounds for several days. Participants' daily word use was extrapolated from the number of recorded words. Women and men both spoke about 16,000 words per day.
Content may be subject to copyright.
Are Women Really More Talkative
Than Men?
Matthias R. Mehl,
* Simine Vazire,
Nairán Ramírez-Esparza,
Richard B. Slatcher,
James W. Pennebaker
ex differences in conversational behavior
have long been a topic of public and
scientific interest (1, 2). The stereotype of
female talkativeness is deeply engrained in
Western folklore and often considered a scien-
tific fact. In the first printing of her book,
neuropsychiatrist Brizendine reported, Awom-
an uses about 20,000 words per day while a
man uses about 7,000 (3). These numbers have
since circulated throughout television, radio, and
print media (e.g., CBS, CNN, National Public
Radio, Newsweek,theNew York Times,andthe
Washington Post). Indeed, the 20,000-versus-
7000 word estimates appear to have achieved
the status of a cultural myth in that comparable
differences have been cited in the media for the
past 15 years (4).
In reality, no study has systematically recorded
the natural conversations of large groups of
people for extended periods of time. Consequent-
ly, there have not been the necessary data for
reliably estimating differences in daily word
usage among women and men (5). Extrapolating
from a reanalysis of tape-recorded daily con-
versations from 153 participants from the British
National Corpus (6), Liberman recently esti-
mated that women speak 8805 words and men
6073 words per day . However, he acknowledged
that these estimates may be problematic because
no information was available regarding when
participants decided to turn off their manual tape
recorders (4).
Over the past 8 years, we have developed a
method for recording natural language using the
electronically activated recorder (EAR) (7). The
EAR is a digital voice recorder that unob-
trusively tracks peoples real-world moment-to-
moment interactions. It operates by periodically
recording snippets of ambient sounds, including
conversations, while participants go about their
daily lives. Because of the covert digital record-
ing, it is impossible for participants to control or
even to sense when the EAR is on or off. For
the purpose of this study, the EAR can be used
to track naturally spoken words and to estimate
how many words women and men use over the
course of a day .
In the default paradigm, participants wear
the EAR for several days during their waking
hours. The device is programmed to record for
30 s every 12.5 min. All captured words spoken
by the participant are transcribed. The number
of spoken words per day can then be estimated
by extrapolating from a simple word count, the
number of sampled sound files, and the record-
ing time per sound file.
We addressed the question about sex differ-
ences in daily word use with data from six samples
based on 396 participants (210 women and 186
men) that were conducted between 1998 and
2004. Five of the samples were composed of
university students in the United States, and the
sixth, university students in Mexico. T able 1 pro-
vides background information on the samples
along with estimates for the number of words that
female and male participants spoke per day (8).
The data suggest that women spoke on aver-
age 16,215 (SD = 7301) words and men 15,669
(SD = 8633) words over an assumed period of,
on average, 17 waking hours. Expressed in a
common effect-size metric (Cohens d = 0.07),
this sex difference in daily word use (546
words) is equal to only 7% of the standardized
variability among women and men. Further , the
difference does not meet conventional thresh-
olds for statistical significance (P = 0.248, one-
sided test). Thus, the data fail to reveal a reliable
sex difference in daily word use. Women and
men both use on average about 16,000 words
per day, with very large individual differences
around this mean.
A potential limitation of our analysis is that
all participants were university students. The
resulting homogeneity in the samples with
regard to sociodemographic characteristics may
have affected our estimates of daily word usage.
However , none of the samples provided support
for the idea that women have substantially larger
lexical budgets than men. Further, to the extent
that sex differences in daily word use are as-
sumed to be biologically based, evolved adapta-
tions (3), they should be detectable among
university students as much as in more diverse
samples. We therefore conclude, on the basis of
available empirical evidence, that the wide-
spread and highly publicized stereotype about
female talkativeness is unfounded.
References and Notes
1. R. Lakoff, Language and Womans Place (Harper, New
York, 1975).
2. L. Litosseliti, Gender and Language: Theory and Practice
(Arnold, London, 2006).
3. L. Brizendine, The Female Brain (Morgan Road, New York,
4. M. Liberman, Sex-Linked Lexical Budgets, http://itre.cis.
(first accessed 12 December 2006).
5. D. James, J. Drakich, in Gender and Conversational
Interaction, D. Tannen, Ed. (Oxford Univ. Press, New
York, 1993), pp. 281313.
6. P. Rayson, G. Leech, M. Hodges, Int. J. Corpus Linguist. 2 ,
133 (1997).
7. M. R. Mehl, J. W. Pennebaker, M. Crow, J. Dabbs, J. Price,
Behav. Res. M ethods Instrum. Comput. 33, 517
8. Details on methods and analysis are available on Science
9. This research was supported by a grant from the National
Institute of Mental Health (MH 52391). We thank
V. Dominguez, J. Greenberg, S. Holleran, C. Mehl,
M. Peterson, and T. Sc hmader for their valuable
Supporting Online Material
Materials and Methods
Fig. S1
Table S1
16 January 2007; accepted 3 April 2007
Department of Psychology, University of Arizona, Tucson,
AZ 85721, USA.
Department of Psychology, Washington
University, St. Louis, MO 63130, USA.
Department of
Psychology, University of Texas at Austin, Austin, TX
78712, USA.
*To whom corre spondence should be addressed. E-mail:
Table 1. Estimated number of words spoken per day for female and male study participants across
six samples. N = 396. Year refers to the year when the data collection started; duration refers to the
approximate number of days participants wore the EAR; the weighted average weighs the respective
sample group mean by the sample size of the group.
Sample Year Location Duration
Sample size (N)
Estimated average number
(SD) of words spoken per day
Women Men Women Men
1 2004 USA 7 days 1829 56 56 18,443 (7460) 16,576 (7871)
2 2003 USA 4 days 1723 42 37 14,297 (6441) 14,060 (9065)
3 2003 Mexico 4 days 1725 31 20 14,704 (6215) 15,022 (7864)
4 2001 USA 2 days 1722 47 49 16,177 (7520) 16,569 (9108)
5 2001 USA 10 days 1826 7 4 15,761 (8985) 24,051 (10,211)
6 1998 USA 4 days 1723 27 20 16,496 (7914) 12,867 (8343)
Weighted average 16,215 (7301) 15,669 (8633)
6 JULY 2007 VOL 317 SCIENCE www.sciencemag.org82
... People in modern, literate societies tend to experience language through spoken interactions with other people, broadcast media such as television, and reading written texts. Brysbaert et al. (2016) estimate spoken language experience from social interactions at a total of 11.69 million tokens per year (based on recoded data from Mehl et al., 2007). Watching television is another important form of spoken language experience, which Brysbaert and colleagues estimate at an upper bound of 27.26 million words per ...
Full-text available
The distributional pattern of words in language forms the basis of linguistic distributional knowledge and contributes to conceptual processing, yet many questions remain regarding its role in cognition. We propose that corpus-based linguistic distributional models can represent a cognitively plausible approach to understanding linguistic distributional knowledge when assumed to represent an essential component of semantics, when trained on corpora representative of human language experience, and when they capture the diverse distributional relations that are useful to cognition. Using an extensive set of cognitive tasks that vary in the complexity of conceptual processing required, we systematically evaluate a wide range of model families, corpora, and parameters, and demonstrate that there is no one-size-fits-all approach for how linguistic distributional knowledge is used across cognition. Rather, linguistic distributional knowledge is a rich source of information about the world that can be accessed flexibly according to the conceptual complexity of the task at hand.
... Attempts to establish more general or representative frequencies of swearing require a broad contextual spectrum, which has largely been achieved by empirical investigations of spontaneous speech, such as observation or corpus analysis. Reviewing a series of naturalistic observations and corpus studies (including Jay 1992Jay , 1999McEnery 2006;Mehl & Pennebaker 2003;Mehl et al. 2007), Jay (2009a calculated that speakers use approximately 60-90 offensive words per day, corresponding to 0.3% to 0.7% of the 15,000-16,000 words speakers produce daily in total. Based on his corpus analysis of UK and US MySpace-profiles, Thelwall (2008) found that swear word usage averaged between 0.2%-0.3% of individual platform users' total word count. ...
Interpersonal Pragmatics of Swearing: definitions, criteria, methods of investigation, positive and negative interpersonal functions
... Over these years, there have emerged numerous scientific findings across different fields with the help of the original lexicon. (Mehl et al. 2007) found that women and men both spoke about 16,000 words per day, dispelling the myth of female talkativeness. (Bucci and Maskit 2007) showed that word counting approach tends to be less biased than clinician's self-reports in therapeutic improvements. ...
Linguistic Inquiry and Word Count (LIWC) is a word counting software tool which has been used for quantitative text analysis in many fields. Due to its success and popularity, the core lexicon has been translated into Chinese and many other languages. However, the lexicon only contains several thousand of words, which is deficient compared with the number of common words in Chinese. Current approaches often require manually expanding the lexicon, but it often takes too much time and requires linguistic experts to extend the lexicon. To address this issue, we propose to expand the LIWC lexicon automatically. Specifically, we consider it as a hierarchical classification problem and utilize the Sequence-to-Sequence model to classify words in the lexicon. Moreover, we use the sememe information with the attention mechanism to capture the exact meanings of a word, so that we can expand a more precise and comprehensive lexicon. The experimental results show that our model has a better understanding of word meanings with the help of sememes and achieves significant and consistent improvements compared with the state-of-the-art methods. The source code of this paper can be obtained from
... Goldberg, 1990) extroversion (v. introversion) has been shown to correlate with more, faster and louder talk (e.g., Argyle, 1988), faster and more sizable gestures (e.g., Mehl et al., 2007) as well as more direct eye contact (Riggio and Friedman, 1986). The question arising, then, is whether and to what extent extroverts stand a better chance of mustering emotional resonance than introverts, who tend to talk less, use longer pauses and more hesitations, produce fewer and less expansive gestures and make less eye contact. ...
Full-text available
Storytelling pivots around stance seen as a window unto emotion: storytellers project a stance expressing their emotion toward the events and recipients preferably mirror that stance by affiliating with the storyteller’s stance. Whether the recipient’s affiliative stance is at the same time expressive of his/her emotional resonance with the storyteller and of emotional contagion is a question that has recently attracted intriguing research in Physiological Interaction Research. Connecting to this line of inquiry, this paper concerns itself with storytellings of sadness/distress. Its aim is to identify factors that facilitate emotion contagion in storytellings of sadness/distress and factors that impede it. Given the complexity and novelty of this question, this study is designed as a pilot study to scour the terrain and sketch out an interim roadmap before a larger study is undertaken. The data base is small, comprising two storytellings of sadness/distress. The methodology used to address the above research question is expansive: it includes CA methods to transcribe and analyze interactionally relevant aspects of the storytelling interaction; it draws on psychophysiological measures to establish whether and to what degree emotional resonance between co-participants is achieved. In discussing possible reasons why resonance is (not or not fully) achieved, the paper embarks on an extended analysis of the storytellers’ multimodal storytelling performance (reenactments, prosody, gaze, gesture) and considers factors lying beyond the storyteller’s control, including relevance, participation framework, personality, and susceptibility to emotion contagion.
... Fire must have greatly extended the social day and enhanced interactional opportunities [86]. A cross-cultural study of conversation suggests that individuals spend on average 4.5 h conversing, in which time they each may produce 16 000 words in 1500 turns at talking (extrapolating from [87]). ...
The deep structural diversity of languages suggests that our language capacities are not based on any single template but rather on an underlying ability and motivation for infants to acquire a culturally transmitted system. The hypothesis is that this ability has an interactional base that has discernable precursors in other primates. In this paper, I explore a specific evolutionary route for the most puzzling aspect of this interactional base in humans, namely the development of an empathetic intentional stance. The route involves a generalization of mother–infant interaction patterns to all adults via a process (cuteness selection) analogous to, but distinct from, RA Fisher's runaway sexual selection. This provides a cornerstone for the carrying capacity for language. This article is part of the theme issue ‘Revisiting the human ‘interaction engine’: comparative approaches to social action coordination’.
Prior research points to gender differences in some early language skills, but is inconclusive about the mechanisms at play, providing evidence that both infants' early input and productions may differ by gender. This study examined the linguistic input and early productions of 44 American English‐learning infants (93% White) in a longitudinal sample of home recordings collected at 6–17 months (in 2014–2016). Girls produced more unique words than boys (Cohen's d = .67) and this effect grew with age, but there were no significant gender differences in language input (d = .22–.24). Instead, caregivers talked more to infants who had begun to talk (d = .93–.97), regardless of gender. Therefore, prior results highlighting gender‐based input differences may have been due, at least partly, to this talking‐to‐talkers effect.
Full-text available
Understanding the factors that increase the transmissibility of the recently emerging variants of SARS-CoV-2 can aid in mitigating the COVID-19 pandemic. Enhanced transmissibility could result from genetic variations that improve how the virus operates within the host or its environmental survival. Variants with enhanced within-host behavior are either more contagious (leading infected individuals to shed more virus copies) or more infective (requiring fewer virus copies to infect). Variants with improved outside-host processes exhibit higher stability on surfaces and in the air. While previous studies focus on a specific attribute, we investigated the contribution of both within-host and outside-host processes to the overall transmission between two individuals. We used a hybrid deterministic-continuous and stochastic-jump mathematical model. The model accounts for two distinct dynamic regimes: fast-discrete actions of the individuals and slow-continuous environmental virus degradation processes. This model produces a detailed description of the transmission mechanisms, in contrast to most-viral transmission models that deal with large populations and are thus compelled to provide an overly simplified description of person-to-person transmission. We based our analysis on the available data of the Alpha, Epsilon, Delta, and Omicron variants on the household secondary attack rate (hSAR). The increased hSAR associated with the recent SARS-CoV-2 variants can only be attributed to within-host processes. Specifically, the Delta variant is more contagious, while the Alpha, Epsilon, and Omicron variants are more infective. The model also predicts that genetic variations have a minimal effect on the serial interval distribution, the distribution of the period between the symptoms’ onset in an infector–infectee pair.
Full-text available
Based on previous research about gender differences we investigated whether varying language utilization across gender can be found in standardized text documents such as job applications. To this end, 581 cover letters, CVs and complete application documents were analyzed using linguistic inquiry and word count. Some language differences between men and women could be shown, even if as expected, there were smaller effects than in comparative studies in less formalized contexts. These differences were specific for cover letters and CVs. A differentiated examination showed that gender differences in the cover letters were largely determined by function words. In contrast, differences in the CVs were mainly due to differences in content words. The findings add the context of recruiting to lend support to the framework of gender differences in language across different contexts. Implications for candidates, recruiters and companies are discussed.
Full-text available
Both recently and historically, naturalistic datasets and corpus analyses have played an important role in the formulation and testing of key theories and hypotheses in language development and use. The present work details ways in which an existing tool, the Electronically Activated Recorder (EAR), can be used in the cognitive and language science domains to better understand the content of day-to-day speech. From our sample of 75 young adult college students – a population with diverse linguistic experiences – we found enormous variability in the total amount of speech produced and the number of unique words spoken. Further, we discovered that individuals who speak frequently may not be the same individuals that produce long utterances, and we quantified the contexts in which individuals tend to speak. We argue that studies examining naturalistic speech in adults are rare, and through our data, we aim to demonstrate how the EAR can be used in novel ways to create both individual and group-level corpora of adults’ spoken language use.
Full-text available
Our use of language embodies attitudes as well as referential meanings. ‘Woman's language’ has as foundation the attitude that women are marginal to the serious concerns of life, which are pre-empted by men. The marginality and powerlessness of women is reflected in both the ways women are expected to speak, and the ways in which women are spoken of. In appropriate women's speech, strong expression of feeling is avoided, expression of uncertainty is favored, and means of expression in regard to subject-matter deemed ‘trivial’ to the ‘real’ world are elaborated. Speech about women implies an object, whose sexual nature requires euphemism, and whose social roles are derivative and dependent in relation to men. The personal identity of women thus is linguistically submerged; the language works against treatment of women, as serious persons with individual views. These aspects of English are explored with regard to lexicon (color terms, particles, evaluative adjectives), and syntax (tag-questions, and related aspects of intonation in answers to requests, and of requests and orders), as concerns speech by women. Speech about women is analyzed with regard to lady : woman, master : mistress, widow : widower , and Mr : Mrs., Miss , with notice of differential use of role terms not explicitly marked for sex (e.g. professional ) as well. Some suggestions and conclusions are offered for those working in the women's liberation movement and other kinds of social reform; second language teaching; and theoretical linguistics. Relevant generalizations in linguistics require study of social mores as well as of purely linguistic data.
Full-text available
In this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.
Full-text available
A recording device called the Electronically Activated Recorder (EAR) is described. The EAR tape-records for 30 sec once every 12 min for 2–4 days. It is lightweight and portable, and it can be worn comfortably by participants in their natural environment. The acoustic data samples provide a nonobtrusive record of the language used and settings entered by the participant. Preliminary psychometric findings suggest that the EAR data accurately reflect individuals’ natural social, linguistic, and psychological lives. The data presented in this article were collected with a first-generation EAR system based on analog tape recording technology, but a second generation digital EAR is now available.
Full-text available
A recording device called the Electronically Activated Recorder (EAR) is described. The EAR taperecords for 30 sec once every 12 min for 2-4 days. It is lightweight and portable, and it can be worn comfortably by participants in their natural environment. The acoustic data samples provide a nonobtrusive record of the language used and settings entered by the participant. Preliminary psychometric findings suggest that the EAR data accurately reflect individuals' natural social, linguistic, and psychological lives. The data presented in this article were collected with a first-generation EAR system based on analog tape recording technology, but a second generation digital EAR is now available.
The book introduces both theoretical and applied perspectives, identifying and explaining the relevant frameworks and drawing on a range of activities/examples of how gender is constructed in discourse.
  • P Rayson
  • G Leech
  • M Hodges
P. Rayson, G. Leech, M. Hodges, Int. J. Corpus Linguist. 2, 133 (1997).
The Female Brain (Morgan Road
  • L Brizendine
L. Brizendine, The Female Brain (Morgan Road, New York, 2006).
Sex-Linked Lexical Budgets
  • M Liberman
M. Liberman, Sex-Linked Lexical Budgets, http://itre.cis. (first accessed 12 December 2006).
  • M R Mehl
  • J W Pennebaker
  • M Crow
  • J Dabbs
  • J Price
M. R. Mehl, J. W. Pennebaker, M. Crow, J. Dabbs, J. Price, Behav. Res. Methods Instrum. Comput. 33, 517 (2001).
This research was supported by a grant from the National Institute of Mental Health (MH 52391)
  • J Dominguez
  • S Greenberg
  • C Holleran
  • M Mehl
  • T Peterson
This research was supported by a grant from the National Institute of Mental Health (MH 52391). We thank V. Dominguez, J. Greenberg, S. Holleran, C. Mehl, M. Peterson, and T. Schmader for their valuable feedback. 443 (7460) 16,576 (7871) 2 2003 USA 4 days 17-23 42 37 14,297 (6441) 14,060 (9065)