Content uploaded by Emil O. W. Kirkegaard
Author content
All content in this area was uploaded by Emil O. W. Kirkegaard on Oct 14, 2017
Content may be subject to copyright.
MANKIND QUARTERLY 2017 58:1 180-194
180
Sex Distribution, Life Expectancy and Educational
Attainment of Comedians
Emil O. W. Kirkegaard*
Ulster Institute for Social Research, United Kingdom
*Address for correspondence: emil@emilkirkegaard.dk
A dataset of 1408 comedians was created by scraping the
English Wikipedia. Each comedian’s sex was estimated based on
gendered English pronouns and gendered categories. Overall, the
distribution was 79% male. When examined over time, it was evident
that the distribution was changing to become less male dominated.
Among those born 1880-1910, 90% were male compared to 61%
among those born 1980-1990. 17% of comedians were Jewish,
nearly evenly distributed by sex. Comedians born around 1915 were
found to live 5.4-10.4 years longer than matched members of the
general population. Comedians were found to be highly educated,
especially women. Comedians who were members of Scandinavian
comedian clubs (n=235) were examined as an independent
replication. Of these, 83% were male.
Key Words: Comedians, Wikipedia, Sex, Gender, Humor,
Education, Scraping, Life span
Although humor plays a central role in human life (Carroll, 2014; Hurley,
Dennett & Adams, 2011), especially in dating (Campbell, Martin & Ward, 2008;
Cann, Davis & Zapata, 2011), there have been surprisingly few quantitative
studies of comedians, although there has been a fair number of non-quantitative
descriptions (Fisher & Fisher, 1981; e.g. Mizejewski, 2014). A few studies have
examined cognitive ability in comedians as well as cognitive ability’s relationship
with humor ability in college students (Greengross, Martin & Miller, 2012; Janus,
1975; Janus, Bess & Janus, 1978). Comedians were found to average very high
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
181
IQ scores, of 126-138.1
It has been noted that comedians tend to be male, and some have proposed
evolutionary reasons for this state of affairs (Hitchens, 2007, 2008; Stanley,
2008). However, no one seems to have systematically compiled data about
comedians, so the sex distribution is not known at present. The purpose of this
study was to examine the sex distribution of comedians in a large, worldwide
sample. Furthermore, there were two secondary goals: a) to briefly examine the
longevity of comedians, and b) to briefly examine whether comedians have high
educational attainment which would corroborate the unusually high ability scores
reported.
Analyses
General approach
All analyses were done in R, see the supplementary materials for details.
Data was either scraped from Wikipedia using the rvest package (Wickham &
RStudio, 2016), or downloaded through Wikimedia’s API by use of the WikipediR
package (Keyes & Tilbert, 2016). The initial list of comedians was from the List of
comedians article (https://en.wikipedia.org/wiki/List_of_comedians). This article
only contains a few comedy groups and writers, which were excluded. This
resulted in a list of 1408 links to pages. Moving averages were fitted with local
regression (loess) (James et al., 2013). The span parameter was chosen by cross
validation using the bisoreg package (Curtis, 2015).
There are two general ways to extract information out of the Wikipedia
articles. One can look for keywords in the free text and identify patterns that are
useful for inferring information of interest. This approach can be quite
cumbersome and fail-prone due to the messiness of natural language. The
second approach is to rely on Wikipedia’s categories. These are standardized, so
classifying persons is easy. However, less developed pages may miss some
1 The sample sizes were n=31 (28 male) comedians in Greengross et al., and
n=14 female comedians, n=48 male comedians in the Janus studies.
Greengross did not report a mean IQ, but their mean verbal IQ scores were
1.34 d higher than the college students in the same study. This was a New
Mexico sample of 400 psychology undergraduates, so the mean ability is
perhaps around 110. The standard deviation is a bit suppressed due to ability
selection, so the 1.34 gap is perhaps something like 1.10 in the general
population. This gives an estimate of about 126. No adjustments were made
for Flynn effect, but if done, the estimates are slightly reduced.
MANKIND QUARTERLY 2017 58:1
182
categories and would thus lead to false negatives or missing data depending on
interpretation. Furthermore, there are only a set number of categories, so if no
category covers the desired information, one cannot use this approach.
Nationality and Jewishness
No attempt was made to estimate detailed ethnicities or nationalities, but a
few were estimated based on categories. These were American (69%), British
(18%), Canadian (6%), Australian (3%), New Zealander (0%, 1 person), and
Scandinavian (0%, 2 persons). Together these summed to what amounts to 97%
of cases. This is somewhat misleading because 2.98% were assigned two
nationalities, 0.14% were assigned 3, while 6% were assigned none and 91%
were assigned one. For the English-speaking countries, the fractions of
comedians in the sample were approximately in line with the countries’ population
size. This is in line with the null model that no English-speaking country is
particularly better than the others at being funny.2
It has been claimed that (female) comedians tend to be Jewish (Hitchens,
2007), so this was investigated. Jewishness is not a nationality, so it is not
included in the previously reported numbers. In fact, a cross tabulation showed
that 89% of Jews in the sample of comedians were Americans. In total, 17% of
the sample was Jewish. There was no noteworthy effect of sex (17% among
males, 15% among females).
Year of birth and death
It is possible to estimate each comedian’s year of birth/death by either looking for
birth/death information in the free text, or by relying on the categories. Because
of difficulties in extracting this information from the free text, year of birth and
death were extracted solely from the categories. These provided birth year for
1322 cases. There were data points for both time of birth and death for 305 cases.
Figure 1 shows the distribution of year of birth in the sample.
Figure 2 shows the distribution of age at death. The mean/median age at
death was 69/71. These values are lower than the value expected of the present
day general population. However, the numbers are not easily interpretable
because most of the comedians who have already died were born a long time
ago, thus, we should compare them to persons born at the same time. Among
those who are dead, the mean/median year of birth is 1915/1918. Figure 3 shows
the historical trend of age at death.
2 The population proportions are: UK 20%, CAN 11%, AUS 7.3%, NZ 1.4%
relative to the US.
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
183
Figure 1. Distribution of year of birth.
Figure 2. Distribution of age at death. The vertical line shows the mean.
MANKIND QUARTERLY 2017 58:1
184
Figure 3. Age at death by year of birth. Smoothed fit using LOESS. Shaded
region shows 95% confidence interval.
The large decrease in age at death seen after 1920 is due to the
mathematical impossibility of being born in e.g. 1970 and having died at old age
in the year 2016 (the oldest possible age at death is 46). This is an obvious case
of survivorship bias (Mangel & Samaniego, 1984), also known as censoring in
survival analysis (Harrell, 2015, chapter 17). This bias starts with the cohort born
after about 1917. Very few of those born in 1917 are still alive as of now (they
would be 100 years old), so there is little downward bias in the estimate from that
year. At later years, the bias is easy to detect because it breaks the monotonic
increase in mean age at death seen in earlier cohorts.
Life expectancy for men born 1915 in the US was about 52.5 years (Noymer
& Garenne, 2000). However, this includes deaths in childhood. It is not possible
to be included in the list of noted comedians (sufficiently noted to be listed on
Wikipedia) if one died in childhood. Thus, a more fair comparison is looking at the
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
185
mean age at death of persons who had enough time to be eligible for inclusion.
Such an age may be somewhere from 30 to 50. Figure 4 shows life expectancy
by year of birth conditional on survival to a given age for the United Kingdom.3
Depending on which assumption we make about the age needed to become a
notable comedian, the expected mean age at death is somewhere around 66-71.
The observed (smoothed) mean age at death for comedians born in 1915 was in
fact 76.9. Thus, comedians had an increased lifespan of approximately 6-11
years.
Sex also biases the comparison. Comedians were 86% male in 1915 but the
numbers given in Figure 4 are based on both sexes, thus biasing them for the
purpose of this comparison because women live longer than men. The sex
difference in life expectancy was 4 years for the American (USA) cohort born in
1910 (Noymer & Garenne, 2000). Thus, to adjust both numbers to extrapolate to
100% male groups, we subtract 2 from the predicted values based on Figure 4,
and subtract .6 from the observed age at death.4 This yields adjusted predictions
of 64-69 versus an adjusted mean age at death of 76.3. Thus, we observe a life
expectancy 7.3-12.3 years higher than in the general population.
A higher than average life expectancy is expected from the above average
cognitive ability because higher cognitive ability is related to lower mortality
(Arden et al., 2016; Deary, 2009; Gottfredson, 2004). Arden et al. found a
correlation of IQ with lifespan of .12. Edwards (2013) reports a standard deviation
for life span of about 15 years. Assuming the average IQ of comedians is 2
standard deviations above the population average (130), the IQ predicted
difference in lifespan is .12 * 2 * 15 = 3.6 years. Thus, 29% to 50% of the difference
in lifespan is accounted for by the difference in cognitive ability. This is a very
tentative estimate because the correlation reported by Arden et al. is based on
samples where not all persons had yet died, meaning that there was some range
restriction (due to censoring) and consequently a bias towards 0 in the estimated
correlation.
3 I was unable to find a similar figure for Americans (USA). However, given that
the United Kingdom and the USA are similar culturally and genetically, we may
use this as a proxy.
4 This calculation is done by setting male as the baseline (0) and female to 4.
Thus, a group with 14% females would be 0.14 * 4 = 0.56 above the male baseline.
The same reason applies to the 50% female estimate, 4 * 0.5 = 2.
MANKIND QUARTERLY 2017 58:1
186
Figure 4. Life expectancy by year of birth conditional on survival to specific age
(Roser, 2016).
Sex distribution
Two methods were used to estimate the sex of a given person. First, all
English gendered pronouns were counted for the page contents.5 Only exactly
matching words (not parts of words) were counted. Then the sum of each gender
was calculated followed by a male proportion. The person was assigned to the
sex with the most pronouns. Almost all cases had ratios close to 1 or 0, and the
cases with fractions close to .5 were examined manually. No errors were found.
Relatively large fractions of opposite gender pronouns were usually due to long
sections on the person’s children or partners.
5 Male: he, his, him, himself. Female: she, her, hers, herself.
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
187
Second, the gendered categories were used to estimate sex as above.6
There was unanimous agreement on gender using the categories except for one
case (Eddie Izzard) because he has the category Male-to-female cross-dressers
which gets counted as both genders. Still, the categories for his page were 8
males and 1 female, and so he was correctly assigned to male by this method.
Data were not entirely complete for either method (the pronoun method was
missing 1 case), but there were overlapping data for n=1297. There was 100%
agreement for these cases (1029 male, 268 female; 79% male). Figure 5 shows
the estimated male% over time.
Figure 5. Historical trend of male % in the sample of comedians. Smoothed fit
using LOESS. Shaded region shows the 95% confidence interval.
The trend is somewhat erratic from about 1960 onward. Since the degree of
smoothness was chosen by cross-validation, this erraticity is probably not just
sampling error, but reflects real changes. As we move to the persons born later
than about 1980, the number of cases becomes very small and the confidence
6 The male categories were those that had male in them. The female had female,
women or actresses. Thus, there was more variation in female categories than
male because the word actor could not be assumed to be male only. Across
human languages, it is common for the male variants to be the unmarked forms
of words making their interpretation ambiguous (Newell, 2005).
MANKIND QUARTERLY 2017 58:1
188
interval correspondingly wide.7 In general, there is a falling trend such that
comedians are becoming increasingly female. The upwards spike around the
early 1960s co-occurred with the counter culture movement and may be related.
Likewise, the relatively steep decline around 1975-1985 co-occurred with the
second wave feminist movement.
Sex distribution of comedians in Scandinavia
Many countries have organizations or arrangers for stand-up comedians
which showcase their members on the websites. This makes it easy to count the
members and estimate their sex from name or visual inspection. The websites of
such organizations were found for Denmark, Norway and Sweden.8 Most of these
comedians are not covered by the Wikipedia list due to lack of international fame
(they only produce material in their own language). To make sure, the category
method was used to identify Scandinavian comedians. There were only two, only
one of which was still active. Thus, the data serve as an independent replication.
Table 1 shows the demographics.
Table 1. Stand-up comedians in Scandinavian countries.
Country Name Total Males
Females
Male% url
Denmark Comedy ZOO 41 37 4 90 https://www.fbi.dk/
komikere
Denmark Stand-up.dk 29 27 2 93 http://stand-up.dk/
komikere
Sweden
Norra Brunn
Comedy 123 9595 2288 77
http://norrabrunncomedy.se
Sweden
Bokestandup.se
83
61
22
73
http://bokestandup.se
Sweden Macespeakers 17 12 5 71
http://www.macespeakers.se
/talarformedling/
komiker/
Norway Stand up Norge 71 62 9 87 http://www.standup.no
7 The age of persons born in 1990 is about 26, which is slightly below our lower
bound assumption from earlier about the age needed to become a noted
comedian. The lack of persons thus indirectly validates the assumption.
8 These were chosen because the author reads these languages fluently and
they do not seriously overlap with the Wikipedia-based data.
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
189
The different organizations within countries overlap to a large extent in the
comedians they have on contract and thus there is a lack of strict independence.
The overall median value is 82% male. If one uses only the largest organization
from each country to avoid the dependency problem, the weighted mean is 83%
(n=235). As a robustness check, the largest Swedish site was scraped for age
information. 64 persons reported year of birth, yielding a mean age of 47. Thus,
the relatively lower male% for Sweden did not seem to be obviously related to
younger comedians. The youngest person was 27 and the oldest 73. Thus, this
again validates the assumption that persons below about age 30 rarely become
noted comedians.
Education
It is possible to attempt to find details about the degree or institution attended
by careful content mining of the free text and building a large database of
universities, as well as their international ranks (see e.g. Hsu & Wai, 2015).
However, this would require a more sophisticated study. The purpose of the
present study was merely to examine whether the crude educational attainment
of comedians is consistent with the high mean levels of cognitive ability reported
in previous studies.
A simple approach was taken using categories. There are categories for
persons who graduated from different colleges and universities. There are
probably not categories for every college or university, but the coverage is likely
to be very good for the developed world from where virtually all comedians in the
sample came. Thus, one can check whether each person belongs to any alumni
category. If they do, they are classified as having graduated, and if not, they are
not. Thus, false positives are unlikely but false negatives are somewhat likely.
Measurement error would then tend to deflate the figures. Figure 6 shows the
historical trend.
As in the general population, there is an upward trend. There are also clear
non-monotonic patterns in the trend, as was seen for the male% in Figure 5.
When the general upward trend is removed by linear regression from both trends,
they correlate .59 [95% CI: .45, .71]. This might just reflect the fact that men used
to be better educated than women, which presumably also means that male
comedians used to be better educated than female comedians. However, Figure
7 shows that this is not the case. The alumni rates are in fact generally higher for
women and are very high for the last reliable data. The up and down trends
replicate within sexes, so they are not (entirely) due to some sex-related
confounding. In general, comedians tend to be well educated, even compared to
contemporary rates.
MANKIND QUARTERLY 2017 58:1
190
Figure 6. Percent university/college graduates by year of birth. Smoothed by
LOESS. Shaded region shows 95% confidence interval.
Figure 7. Percent university/college graduates by year of birth and sex.
Smoothed by LOESS. Shaded region shows 95% confidence interval.
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
191
Discussion and conclusion
The sex distribution of comedians was found to be male dominated (overall
79% male), as expected based on narrative reports (Hitchens, 2007). However,
it is less male-dominated in recent history, which might have been expected.
Large changes in the sex distribution of professions have happened before. An
example with physicians is shown in Figure 8. There are large (about 1d) sex
differences in the people-things dimension, such that women have stronger
preferences for working with people (Su, Rounds & Armstrong, 2009). For this
reason, and given the observed trend, it is not inconceivable that the comedian
profession becomes more female in the future. On the other hand, Greengross
and Miller (2011) found that male college students were funnier than female ones
by 0.38d. This is likely to be an estimate because college students are selected
for cognitive ability, which is a strong correlate of humor production ability (β=0.59
in their structural equation model). Thus, a pure ability selection would result in
high proportions of men, depending on the selection ratio. As such, the change
in sex demographics over time probably reflects some complex function of the
relative importance of humor production ability for comedian jobs (and becoming
a noted comedian), the sex distributions of humor production ability, and sex
differences in vocational interests (Su et al, 2009).
Figure 8. Sex distribution of physicians by age bracket.
High to very high levels of educational attainment were found, congruent with
the high cognitive ability found in earlier studies (Greengross et al., 2012; Janus,
MANKIND QUARTERLY 2017 58:1
192
1975; Janus et al., 1978). In contrast to a small epidemiological study (Stewart &
Thompson, 2015), a large increase in longevity was observed as was expected
due to its association with education and cognitive ability (Arden et al., 2016;
Deary, 2009; Gottfredson, 2004; Lubinski, 2009). However, the observed
increase did not seem to be entirely due to the high level of cognitive ability,
though this conclusion is very tentative. The relatively large proportion of Jews is
congruent with the high cognitive ability of comedians, the high cognitive ability
of (Ashkenazi) Jews (Cochran, Hardy & Harpending, 2006), and the cultural
achievements of Jews in other fields (Lynn, 2011).
Supplementary material and acknowledgments
Supplementary materials including code, high quality figures and data can
be found at https://osf.io/patd3/.
References
Arden, R., Luciano, M., Deary, I.J., Reynolds, C.A., Pedersen, N.L., Plassman, B.L., … &
Visscher, P.M. (2016). The association between intelligence and lifespan is mostly
genetic. International Journal of Epidemiology 45: 178-185. https://doi.org/10.1093/ije/
dyv112
Campbell, L., Martin, R.A. & Ward, J.R. (2008). An observational study of humor use while
resolving conflict in dating couples. Personal Relationships 15: 41-55.
https://doi.org/10.1111/j.1475-6811.2007.00183.x
Cann, A., Davis, H.B. & Zapata, C.L. (2011). Humor styles and relationship satisfaction in
dating couples: Perceived versus self-reported humor styles as predictors of satisfaction.
Humor ― International Journal of Humor Research 24(1): 1-20.
https://doi.org/10.1515/humr.2011.001
Carroll, N. (2014). Humour: A Very Short Introduction, 1st edition. Oxford: Oxford
University Press.
Cochran, G., Hardy, J. & Harpending, H. (2006). Natural history of Ashkenazi intelligence.
Journal of Biosocial Science 38: 659-693. https://doi.org/10.1017/S0021932005027069
Curtis, S.M. (2015). bisoreg: Bayesian Isotonic Regression with Bernstein Polynomials,
version 1.4. Retrieved from https://cran.r-project.org/web/packages/bisoreg/index.html
Deary, I.J. (2009). Introduction to the special issue on cognitive epidemiology. Intelligence
37: 517-519. https://doi.org/10.1016/j.intell.2009.05.001
Edwards, R.D. (2013). The cost of uncertain life span. Journal of Population Economics
KIRKEGAARD, E.O.W. EDUCATIONAL ATTAINMENT OF COMEDIANS
193
26: 1485-1522. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3285408/
Fisher, S. & Fisher, R.L. (1981). Pretend the World Is Funny and Forever: A Psychological
Analysis of Comedians, Clowns, and Actors. Hillsdale, N.J: L. Erlbaum Associates.
Gottfredson, L.S. (2004). Intelligence: Is it the epidemiologists’ elusive “fundamental
cause” of social class inequalities in health? Journal of Personality and Social Psychology
86: 174-199. https://doi.org/10.1037/0022-3514.86.1.174
Greengross, G. & Miller, G. (2011). Humor ability reveals intelligence, predicts mating
success, and is higher in males. Intelligence 39: 188-192. 10.1016/j.intell.2011.03.006
Greengross, G., Martin, R.A. & Miller, G. (2012). Personality traits, intelligence, humor
styles, and humor production ability of professional stand-up comedians compared to
college students. Psychology of Aesthetics, Creativity, and the Arts 6(1): 74-82.
https://doi.org/10.1037/a0025774
Harrell, F.R. (2015). Regression Modeling Strategies, 2nd edition.
Hitchens, C. (2007, January 1). Why women aren’t funny. Retrieved December 2, 2016,
from http://www.vanityfair.com/culture/2007/01/hitchens200701
Hitchens, C. (2008, March 3). Why women still don’t get it. Retrieved December 2, 2016,
from http://www.vanityfair.com/culture/2008/04/hitchens200804
Hsu, S. & Wai, J. (2015). These 25 schools are responsible for the greatest advances in
science. Quartz. https://qz.com/498534/these-25-schools-are-responsible-for-the-
greatest-advances-in-science/
Hurley, M.M., Dennett, D.C. & Adams, R.B. (2011). Inside Jokes: Using Humor to
Reverse-Engineer the Mind. Cambridge, Mass.: MIT Press.
James, G., Witten, D., Hastie, T. & Tibshirani, R. (eds.) (2013). An Introduction to
Statistical Learning: With Applications in R. New York: Springer.
Janus, S.S. (1975). The great comedians: Personality and other factors. American
Journal of Psychoanalysis 35(2): 169-174. https://doi.org/10.1007/BF01358189
Janus, S.S., Bess, B.E. & Janus, B.R. (1978). The great comediennes: Personality and
other factors. American Journal of Psychoanalysis 38(4): 367-372.
Keyes, O. & Tilbert, B. (2016). WikipediR: A MediaWiki API Wrapper (Version 1.4.0).
Retrieved from https://cran.r-project.org/web/packages/WikipediR/index.html
Lubinski, D. (2009). Cognitive epidemiology: With emphasis on untangling cognitive ability
and socioeconomic status. Intelligence 37: 625-633. https://doi.org/10.1016/j.intell.
2009.09.001
Lynn, R. (2011). The Chosen People: A Study of Jewish Intelligence and Achievement.
Whitefish MT: Washington Summit Publishers.
MANKIND QUARTERLY 2017 58:1
194
Mangel, M. & Samaniego, F.J. (1984). Abraham Wald’s work on aircraft survivability.
Journal of the American Statistical Association 79(386): 259-267.
https://doi.org/10.1080/01621459.1984.10478038
Mizejewski, L. (2014). Pretty/Funny: Women Comedians and Body Politics. University of
Texas Press.
Newell, H.C. (2005). A Consideration of Feminine Default Gender. University of
Cincinnati. Retrieved from https://etd.ohiolink.edu/ap/10?0::NO:10:P10_ACCESSION_
NUM: ucin1123162261
Noymer, A. & Garenne, M. (2000). The 1918 influenza epidemic’s effects on sex
differentials in mortality in the United States. Population and Development Review 26:
565-581.
Roser, M. (2016). Life expectancy. Retrieved from https://ourworldindata.org/life-
expectancy/
Stanley, A. (2008, March 3). Who says women aren’t funny? Retrieved December 2,
2016, from http://www.vanityfair.com/news/2008/04/funnygirls200804
Stewart, S. & Thompson, D.R. (2015). Does comedy kill? A retrospective, longitudinal
cohort, nested case–control study of humour and longevity in 53 British comedians.
International Journal of Cardiology 180: 258-261. https://doi.org/10.1016/j.ijcard.
2014.11.152
Su, R., Rounds, J. & Armstrong, P.I. (2009). Men and things, women and people: A meta-
analysis of sex differences in interests. Psychological Bulletin 135: 859-884.
https://doi.org/10.1037/a0017364
Wickham, H. & RStudio (2016). RVEST: Easily Harvest (Scrape) Web Pages (Version
0.3.2). Retrieved from https://cran.r-project.org/web/packages/rvest/index.html