Content uploaded by Ian Tattersall
Author content
All content in this area was uploaded by Ian Tattersall on Jan 17, 2019
Content may be subject to copyright.
June 30, 2014 // from the upcoming issue (Volume 27, No. 2)
Mr. Murray, You Lose the Bet
Nicholas Wade's newest book, A Troublesome Inheritance, suggests a
biological basis for the existence of five distinct human 'races.' Charles
Murray's Wall Street Journal review of the book praises Wade for shunning
political correctness, but misses an important point: It's all based on some
very bad science.
By Rob DeSalle and Ian Tattersall
Nicholas Wade's new book on the biology of human races, A Troublesome Inheritance,
has by now been reviewed in many venues. The book has a simple structure. The first
part argues that scientific orthodoxy can be stifling, and that in order to break from it
there have to be brave purveyors of the truth. The second section argues that there is
indeed genetic evidence for the biological basis of race. The third part suggests that,
because there are races, we can now pinpoint a reason why different peoples purportedly
behave differently. In his Wall Street Journal review of the book, Charles Murray
suggests that this last part will be the target of most criticisms, reasoning that:
"The orthodoxy's clerisy will take that route, ransacking these chapters [the final five
chapters] for material to accuse Mr. Wade of racism, pseudoscience, reliance on tainted
sources, incompetence and evil intent. You can bet on it." (Italics added).
In contrast, our intent here is to examine the science and premises in the first two parts or
first five chapters of this book. This is because only if the premises of these chapters have
any scientific validity can the third part of the book be taken seriously.
Our reading of the first half of A Troublesome Inheritance indicates that Wade has made
at least seven mistakes that are routinely committed when genomics and genetic
information are used to examine the biological basis for human races, and are used as a
justification for reifying race as a biological reality. We start with a foundational problem
that all scientists face:
1. Misunderstanding the nature of hypothesis testing.
This first aspect of the "biology of race" controversy gets at the very core of what science
really is, and indeed what the problems really are in understanding human variation. It is
commonly accepted that the hypothetico-deductive approach provides the most sound
and productive way to conduct science. In contrast, inductive approaches are to be
avoided, because induction can only confirm what one already knows. This latter position
might at first sound extreme; if you have an approach that actually confirms a scientific
phenomenon, why not use it? The answer is simple: Science advances at the cost of
hypotheses that are rejected, while inductive processes will always give you a positive
answer. Hence, with respect to racial variation in human populations the proper approach
is to pose hypotheses, and subsequently test them.
Unfortunately, one of the most common methodologies applied in the analysis of human
population genetic information takes an entirely inductive approach. Called
STRUCTURE, it throws data at an algorithm and asks: "How many units do I have?"
This method is approvingly cited by Wade as the ultimate proof that there are five races
of humankind. But while the algorithm itself is an important technical advance, how the
results of such analyses bear on definitions of "race" is an entirely separate question
because, as we have suggested, STRUCTURE is an inherently inductive approach. And
while inductive approaches do a great job of summarizing and displaying information
given a specific set of prior knowledge of a system, and in doing so can encourage the
formulation of new hypotheses and refinement of existing hypotheses, they cannot be
used to test hypotheses.
To make scientific statements about race, then, we need to have hypotheses in hand,
arrived at inductively or otherwise. So what useful hypotheses can we offer up with
respect to human genetics and the existence of human races? The most obvious
hypothesis is:
H0 = There are n "races" of a type of organism (A) that correspond to the n
geographical divisions (often taken to be Africa, Asia and Europe) that we see on
the planet today.
But simply posing our hypothesis in these terms brings us to the second problem with
using biology to "prove" race:
2. Subjectivity in defining race (or a misunderstanding of what a species is).
How can we test a hypothesis of the kind we have just presented? First of all, we need a
definition of "race" that is both objective and operationally testable. Without such a
definition we cannot proceed to test the hypothesis. We cannot ask an algorithm to give
us an idea of the number of races, because that would be inductive. We do have a good
idea of what a species is, but the definition of the subordinate units of "race" and
"subspecies" are substantially less than objective. In fact, we defy any scientist,
journalist, philosopher or layperson to define race meaningfully in this biological context,
and in such a way that it can be used to test H0 above. And if this can't be done, H0
becomes a useless hypothesis. However if, in contrast, you change the hypothesis to:
H1 = There are n species of a type of organism (A, B and C) corresponding to the
geographical divisions (for the sake of argument, Africa, Asia and Europe) that
we see on the planet today.
Then we do have a testable hypothesis because we do have an operational definition of
species. You might object that this is just semantics. But in fact, objective definitions are
hugely important in hypothesis testing. Without objective criteria to test our hypotheses,
we simply cannot reject them.
But then you might say that "I will objectively define a race as being differentiated from
other closely related entities." This is slightly better, but it is still subjective and
untestable because "differentiated" is an extremely vague term. Putting numbers on it
does not necessarily help, because if, for example, you refine your definition by saying
that "a race is a group of organisms that are 50% divergent from the next most closely
related group of things," you still have two problems. The first is that the 50% figure is
entirely arbitrary, and others might think your "magic" number is not so magical. Most
scientists will agree that genetic or morphological cohesion, or reproductive isolation, lie
at the core of what a species is. But there is no consensus as to what degree of divergence
is significant as entities go their separate ways in nature. For one group the magic number
might be 5%, so that if it achieves over a 5% divergence level the probability of ending
up with complete divergence, and hence becoming a new species, is high. But for another
group of organisms, the magic number might well be 95%.
The second problem is that, whatever percent divergence you choose, it must mean
something biological. The species definition that most taxonomists use (see below)
requires 100% divergence in traits. It is either/or, and there is no subjectivity to it. The
biological meaning of that 100% is that your entity is no longer meaningfully
reproducing or significantly swapping genes with its closest relatives. They are on
separate and historically established evolutionary trajectories. Percent divergence might
mean something if researchers could pin down a magic threshold, but as we have just
pointed out this is a very slippery concept.
Yet this is how Wade described the process of species formation in a recent broadcast
interview:
"Since evolution happens all the time, it's a continuous, unstoppable process that as a
population splits, the two halves will continue to evolve, but now independently. So, over
time they will accumulate differences between each other and eventually they'll become
new species."
While we know from experience that radio interviews can be harrowing, and that it is
difficult to completely explain things in short sound bites, this description of species
formation is pretty close to the portrayal he provides in his book. And what is particularly
enlightening is that, directly prior to offering this definition he said:
"… regionality underlines the fact of race because the populations on each continent have
been evolving independently since we left our African homeland about 50,000 years
ago."
The subjective perception of species, population evolution and regionality expressed here
leads to unwarranted conclusions about the existence of any entity below the level of the
species Homo sapiens. This appears to reflect a failure on Wade's part to grasp the
subtleties of taxonomic science. This misapprehension has led to the third mistake we see
in his reasoning:
3. A misunderstanding of the rigors of taxonomic science.
Understanding our origins, and indeed the biology of all organisms on the planet, is really
a problem of taxonomy. This vital branch of natural history is sometimes derided as
"stamp collecting," but this claim could hardly be farther from the truth. Taxonomy is a
well-developed and highly scientific endeavor that has been around in some form ever
since humans began to name things. The science of taxonomy combines simple but
rigorous hypothesis testing approaches, with objective definitions of species. It is true
that taxonomists occasionally use the terms "subspecies" and "race" in their descriptions,
but only as conveniences to imply future hypotheses to be tested.
The genomic approach to the existence of races in human beings has usually involved
collecting the frequencies of variants at a large number of locations in the human
genome, from increasingly large numbers of people. Of course, nobody would put much
stock in a test of a hypothesis involving only two individuals from each of the geographic
regions suspected of diverging. If one examines too few individuals there is a danger of
over-diagnosing the number of entities (i.e. of finding purely random evidence for
differentiation). Another caveat is that examining too few populations will also result in
over-diagnosis. Consider the following scenario: populations of a cosmopolitan organism
are examined for their genetic variability by sequencing the genomes of individuals from
Africa and Oceania. Not surprisingly some genetic differences are detected and found to
be significant, in that some are unique to the individuals from Africa while others are
unique to individuals from Oceania. A big hoopla could be made, and species existence
could be claimed, but this would be poor science because the severity of the test is so low
as to make the test meaningless. Why? Because the organism might also exist in Europe,
the Americas and East Asia. By leaving out the populations "in between" one would miss
the connectedness of the two populations initially sequenced. This phenomenon in
widely-distributed populations has led many researchers of human genetics to the words
of Frank Livingstone: "There are no human races, there are only clines."
Wade understands this. Here is how he describes a genome-level polymorphism study
and how it can be interpreted in a taxonomic context. He first uses a study by Rosenberg
et al. (2006) to suggest that there are five clusters of people on the planet. This important
study used genomic information (nearly 400 markers) from 1,000 people, and employed
the STRUCTURE clustering approach. These 1,000 subjects "clustered naturally into five
groups, corresponding to the five continental races." This study was soon criticized by
several researchers, who objected that intermediate populations needed to be examined to
exclude potential clinal variation. Wade then describes the next study that Rosenberg et
al. did, which was to increase the number of markers to nearly 1,000 (REF). Not
surprisingly, they obtained the same results. Wade uses this second study to suggest that
more data in this case address the "cline" criticism. More data would certainly help – they
always do – but the critical addition in this case would not be more genetic markers, but
more individuals from different geographic areas. These were not supplied, but Wade
nevertheless uses the expanded genomic information (i.e. the doubling of the number of
markers) to state categorically that "They found the clusters are real." (Italics added).
More importantly for our argument about taxonomy, Wade goes on to discuss the
inclusion of new information (using a newer genetic survey technology than in the
Rosenberg et al. study) to address the problem. In this newer study, (Jun et al., 2008)
1,000 different individuals were surveyed, but from 51 well defined geographic areas.
And instead of five major groups, the researchers in this study clustered their subjects
into seven major groups. What is more, when even more subjects were added to
Rosenberg's data set, as was done by Sarah Tishkoff and her colleagues (Tishkoff et al.,
2008), 14 clusters were inferred. You might have smelled a rat here. But here is how
Wade handles this new information:
"It might be reasonable to elevate the Indian and Middle Eastern groups (the two new
ones) to the level of major races, making seven in all. But then many more
subpopulations could be declared races, so to keep things simple, the five-race continent
based scheme seems the most practical for most purposes." (Chapter 5, p 102)
Any self-respecting taxonomist would avoid the kind of language used by Wade here. It
is unscientific and circular. We have heard the argument that just because inferences
about the number of races vary, it doesn't mean race doesn't exist. An argument
commonly used to shore up this view is that people disagree on the number of shapes, but
shapes still exist. But this argument merely trivializes the definitions we use in science
generally and taxonomy specifically.
There are 6-7 billion human beings on the planet, and the best test of any hypothesis
about human genomes and populations would include them all. Of course, this is not
possible at present. But if it were possible, and the clustering were performed as in the
two studies we refer to above, we wonder how many groups might fall out. We suspect
that, depending on the markers used, it might be as many as the number of nuclear
families there are on the planet. Certainly the patterns that would emerge from such a
global analysis would not be anywhere near clear with respect to any definition of race
that one could come up with. Clearly, clustering is inadequate on its own to address
problems like this in taxonomy and systematics. Which brings us to our fifth mistake
made by proponents of a biological basis for race.
4. Misunderstanding the meaning of clustering and evolutionary trees.
Wade's "evidence" for the biological basis of races is based purely on clustering. But
clustering is only one way genetic data (or any other kind of discrete data) can be
analyzed to test hypotheses. Perhaps a better way to do this is to use a branching diagram
based on the reconstruction of the evolutionary events that led to the branches.
Significantly, Wade does not present this kind of information or analysis in his book,
possibly because researchers have for a long time realized that branching diagrams
cannot represent the patterns of evolution of individuals that belong to the same species,
something that directly reflects the difficulty and artificiality of sorting individuals into
"races." Branching diagrams can be very useful when used on single genes, and are
extremely informative when used on clonal molecules like the maternally inherited
mitochondrial DNA and the paternally inherited Y chromosome. But, to our knowledge,
no correctly-conceived attempt to build evolutionary trees with a large number of
recombining genetic regions such as those on our autosomes has resulted in a tree with
any resolution. The bottom line here, then, is that hierarchical structuring of humans
using phylogenetic trees based on the entire genome gives an unrecognizable and
unresolved bush. But if that is the case, why do clustering methods appear to recover
"structure"? We suggest that part of the reason is the next mistake in our list, the one that
is made in doing genetic studies of geographically separated human populations by
cherry picking, or the phenomenon we prefer to call the "Stephen Colbert effect."
5. Cherry picking AIMs: The Stephen Colbert Effect.
Most of the early clustering studies used a number of genetic markers (in the range of
1,000 markers). More modern studies up the ante into the hundreds of thousands of
markers. These markers are chosen because they are believed to be informative about the
ancestry of people, which is why they are known as "Ancestral Informative Markers," or
AIMs. These markers are established using what we like to call the "white swan"
principle. People of different geographic origins have their genomes scanned, and when a
particular variant appears at high frequency for a geographic location, that variant is said
to be a marker for people from the geographic region concerned.
It is safe to say that this procedure introduces a bias into how the data are interpreted.
This bias is so extreme that, when Stephen Colbert was presented with a genetic survey
of his genome on the PBS show Faces of America, he was told he is 100% Caucasian.
Some of the other guests were given similar results: YoYo Ma was told he is 100%
Asian. But some individuals were shocked by their results, among them Eva Longoria
who was given figures that deviated considerably from her prior view of her ancestry.
So what was going on? Currently, there are nearly 30 million places along the
chromosomes of humans (of 3 billion total places) at which we can vary. But between
any two randomly-chosen humans there are only about 3 million places at which
individual people might have different DNA sequences. So if the typical ancestry study
uses 300,000 markers (not too far from the actual number examined by commercial
laboratories nowadays), it will only be looking at 10% of the potential differences
between any two genomes, or about 0.1% of the entire genome. At best, then, these
studies scan less than 1% of a human genome. What about the other 99% or so? Much of
this remainder is not variable, but that part of it which is variable is African in origin.
This means that 99% of the total variation in any human genome should be considered as
African. And what that in turn means is that Stephen Colbert is actually 99% African, and
at most 1% Caucasian.
A common argument used against this observation is known as the "Mount Everest
Paradox." The argument goes as follows: The elevation of Mount Everest differs from the
surface of the ocean by an incredibly small fraction (about 0.0008) of the Earth's
diameter. But anyone standing at the foot of Mount Everest can tell the difference, and it
is huge. Again, this is a trivial and unscientific argument: One could just as easily argue
that, to a bacterium, a golf ball looks like Mount Everest (indeed, a 0.0002 percentage
diameter-wise). Indeed, any golfer can tell you how hard it is to find a golf ball in the
rough. It is not the changes or differences that matter, but rather what the differences
mean, and whether or not there is an objective way to interpret them. Some researchers
prefer to interpret this information in the context of ancestry, which brings us to the sixth
major mistake Wade makes.
6. Conflating racially based genetic differences with explanation of ancestry.
The broader availability of genetic ancestry testing has made it something of the norm
amongst people who are interested in their ancestry. But what do ancestry tests tell us?
They basically tell us about the chunks of DNA in our genomes and where they might
have come from. In this context, as some authors have claimed, ancestry testing has
become a proxy for race determination. This is an unfortunate development in the use of
genetics and genomics, mostly because our genomes are mosaics of ancestry, even
including chunks of DNA that show ancestry with other species. But this ancestry
approach is also flawed when it comes to our understanding of race in humans – again,
because there are no definitions as to how many of the variants (and even more
complicated, which variants) can make a difference between groups of people. Because
ancestry can be traced all the way to the related family level, we suggest that the ancestry
approach is not informative to the hypotheses we posed in the first part of this piece. Like
race, ancestry is clinal with respect to any purported higher level, and ancestry simply
connects us with one another. So what, in the end, do genetic ancestry tests tell us?
Perhaps a good way to view the whole ancestry business is to use a term recently
appearing in the literature to describe ancestry tests from companies: "recreational
genomics." Such recreational approaches offer little, if anything, to science. It is arguable
whether they even offer anything to those engaged in the recreation.
It is often argued that, in order to study the movements of humans and their evolutionary
history, we need to speak about races. But this is entirely false, because we already have
an excellent grasp of how humans migrated in the past based on mtDNA and Y
chromosomes and the fossil record. We are not impeded at all in these endeavors by the
lack of formally defined biological races. This is because we use clinal markers that
follow individual haplotypes, and hence no a priori definition of race is needed to
interpret the results of such tree-based analyses. It is also argued that ancestry is an
important component in medicine, and the jump is made then that race is essential to the
health of people. Because we argue that medicine will soon benefit from individualized
genomics – and because, as we point out in our book, race and ancestry have been poor
tools in medicine – we suggest that there are no coherent nor permanently cogent reasons
to consider race in medicine. Perhaps ancestry will be important, but a concept of race in
medicine is really barking up the wrong tree.
7. Conflating variation and allele frequency differences with adaptation (and hence
elements of the human condition).
Adaptation and allele frequencies are the focus of Wade's last five chapters, and are
extensively discussed in our book Race? Debunking a Scientific Myth. Wade's apparent
justification for this is that we need to have a notion of races so that we can explain why
some of us look different from others. Yet nearly all of the (remarkably few)
"adaptations" that can be identified appear to be intensely local in their occurrence – for
example, the diverse responses to high-altitude living, and to living under intense solar
radiation – and are not at all usefully illuminated by any concept of major "races."
As noted above, Charles Murray placed the bet that attacks on Wade's book would be
made on more sociological lines, based on scientists' fear of breaking away from
tyrannical orthodoxy. Indeed, Wade addresses this tyranny issue in the first few pages of
A Troublesome Inheritance. The fear from his perspective is that unorthodox thinking
tends to get stifled by orthodoxy so that progress, both scientific and social, is impeded.
We could not disagree more with Murray and Wade on this matter, but have refrained
from going anywhere near that kind of argument. To us, the most important thing is that
when the science itself is examined, and placed under real scrutiny, the thesis of the book
fails miserably and Mr. Murray loses his bet.
We call Wade's insistence that science advances by departure from orthodoxy the Indiana
Jones Fallacy. It is especially important to understand this idea's fallacious nature because
all of the positive reviews of Wade's book (Murray's included) have harped on the far-
reaching importance of Wade's departure from the tyranny of scientific orthodoxy.
As scientists, we recognize how gratifying it would be if every published scientific paper
was earth-shaking and unorthodox. If so, scientific progress would be rapid and
unlimited! But the sad truth is that much of science is rather boring and procedural – just
as rigor demands. Even the hypothesis that there are genetic differences amongst people
from different geographic regions – classifiable or not – is really quite mundane, since of
course there are differences, as there are in any widespread species. We don't need to
spend millions of dollars sequencing genomes to know this. The real questions are
whether or not the differences really are significant, and/or interpretable in a rigorous
scientific context, and whether the classification of people into races helps us to
understand them. In the first case, while there may be minor differences, they do not
seem to sort out on larger scales. And in the second case, the answer is a resounding
"No!"
This last point may seem at first glance a bit counter-intuitive, because on the street it is
often possible to broadly sort a fairly large proportion of your fellow citizens by general
geographic origin. And indeed, for almost all of the past 50,000 years or so since Homo
sapiens has been widely present throughout the Old World, our hunting-gathering
precursors were sparsely spread out across vast landscapes, and constantly buffeted by
rapidly-changing climatic and environmental conditions. This provided optimal
circumstances for the incorporation of minor genetic novelties into local populations, and
explains why, for example, Africans generally tend to resemble each other more closely
than they do Eastern Asians or Europeans. But all of us remained members of one single,
interbreeding species, and we guarantee that the edges between populations were never
sharp. What is more, over the past ten thousand years since the adoption of a more settled
way of life, demographic circumstances have changed entirely as populations have
mingled on a large scale and often over vast distances. This, above all, is why it is
hopeless to look for the boundaries that are necessary if we are to usefully recognize
"races." The central tendencies may be there, but the boundaries aren't. Which means that
"race" is a totally inadequate way of characterizing, or even of helping us to understand,
the glorious variety that is humankind.
Rob DeSalle is a curator at the American Museum of Natural History in the Sackler
Institute for Comparative Genomics, a co-director of its molecular laboratories and a
member of the Board of Directors of the Council for Responsible Genetics. He has
written over 300 peer-reviewed scientific publications and several books.
Ian Tattersall is curator emeritus in the American Museum of Natural History and
author of several books, including Paleontology: A Brief History of Life (2010). Tattersall
and DeSalle co-authored Race? Debunking a Scientific Myth (2012) and Human Origins:
What Bones and Genomes Tell Us about Ourselves (2007).
Literature Cited
Li, Jun Z., Devin M. Absher, Hua Tang, Audrey M. Southwick, Amanda M. Casto,
Sohini Ramachandran, Howard M. Cann et al. "Worldwide human relationships inferred
from genome-wide patterns of variation." science 319, no. 5866 (2008): 1100-1104.
Murray, C, Book review: A Troublesome Inheritance by Nicholas Wade. (2014). Wall
Stret Journal. May 2, 2014.
Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky,
L. A., & Feldman, M. W. (2002). Genetic structure of human populations. Science, 298
(5602), 2381-2385.
Tishkoff, S. A., Reed, F. A., Friedlaender, F. R., Ehret, C., Ranciaro, A., Froment, A., ...
& Williams, S. M. (2009). The genetic structure and history of Africans and African
Americans. Science, 324(5930), 1035-1044.
Wade, N. A Troublesome Inheritance: Genes, Race and Human History. Penguin, 2014.