The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution
ABSTRACT To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. Yes Yes
- SourceAvailable from: Sang H Kim[Show abstract] [Hide abstract]
ABSTRACT: Since the completion of the bovine sequencing projects, a substantial number of genetic variations such as single nucleotide polymorphisms have become available across the cattle genome. Recently, cataloguing such genetic variations has been accelerated using massively parallel sequencing technology. However, most of the recent studies have been concentrated on European Bos taurus cattle breeds, resulting in a severe lack of knowledge for valuable native cattle genetic resources worldwide. Here, we present the first whole-genome sequencing results for an endangered Korean native cattle breed, Chikso, using the Illumina HiSeq 2,000 sequencing platform. The genome of a Chikso bull was sequenced to approximately 25.3-fold coverage with 98.8% of the bovine reference genome sequence (UMD 3.1) covered. In total, 5,874,026 single nucleotide polymorphisms and 551,363 insertion/deletions were identified across all 29 autosomes and the X-chromosome, of which 45% and 75% were previously unknown, respectively. Most of the variations (92.7% of single nucleotide polymorphisms and 92.9% of insertion/deletions) were located in intergenic and intron regions. A total of 16,273 single nucleotide polymorphisms causing missense mutations were detected in 7,111 genes throughout the genome, which could potentially contribute to variation in economically important traits in Chikso. This study provides a valuable resource for further investigations of the genetic mechanisms underlying traits of interest in cattle, and for the development of improved genomics-based breeding tools.Molecules and Cells 08/2013; · 2.21 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Nellore cattle play an important role in beef production in tropical systems and there is great interest in determining if genomic selection can contribute to accelerate genetic improvement of production and fertility in this breed. We present the first results of the implementation of genomic prediction in a Bos indicus (Nellore) population. Influential bulls were genotyped with the Illumina Bovine HD chip in order to assess genomic predictive ability for weight and carcass traits, gestation length, scrotal circumference and two selection indices. 685 samples and 320 238 single nucleotide polymorphisms (SNPs) were used in the analyses. A forward-prediction scheme was adopted to predict the genomic breeding values (DGV). In the training step, the estimated breeding values (EBV) of bulls were deregressed (dEBV) and used as pseudo-phenotypes to estimate marker effects using four methods: genomic BLUP with or without a residual polygenic effect (GBLUP20 and GBLUP0, respectively), a mixture model (Bayes C) and Bayesian LASSO (BLASSO). Empirical accuracies of the resulting genomic predictions were assessed based on the correlation between DGV and dEBV for the testing group. Accuracies of genomic predictions ranged from 0.17 (navel at weaning) to 0.74 (finishing precocity). Across traits, Bayesian regression models (Bayes C and BLASSO) were more accurate than GBLUP. The average empirical accuracies were 0.39 (GBLUP0), 0.40 (GBLUP20) and 0.44 (Bayes C and BLASSO). Bayes C and BLASSO tended to produce deflated predictions (i.e. slope of the regression of dEBV on DGV greater than 1). Further analyses suggested that higher-than-expected accuracies were observed for traits for which EBV means differed significantly between two breeding subgroups that were identified in a principal component analysis based on genomic relationships. Bayesian regression models are of interest for future applications of genomic selection in this population, but further improvements are needed to reduce deflation of their predictions. Recurrent updates of the training population would be required to enable accurate prediction of the genetic merit of young animals. The technical feasibility of applying genomic prediction in a Bos indicus (Nellore) population was demonstrated. Further research is needed to permit cost-effective selection decisions using genomic information.Genetics Selection Evolution 02/2014; 46(1):17. · 3.49 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Domestication traits in animal and plant species distinguishing them from closely related wild species are discussed. Data are given on what such traits are found not only at the phenotypic level but also in polymorphism of electrophoretic variants of protein groups with a different biochemical function, DNA fragments flanked by inverted short repeats. The assumption that retrovirus infections can participate in the formation of domestication traits is substantiated.Russian Agricultural Sciences 39(1).
second model, the two main conditions were para-
metrically modulated by the two categories,
respectively (SOM, S5.1). The activation of the
precuneus was higher for hard dominance-solvable
games than for easy ones (Fig. 4A and table S10).
The activation of the insula was higher for the
highly focal coordination games than for less fo-
cal ones (Fig. 4B and table S11). Previous studies
also found that precuneus activity increased when
the number of planned moves increased (40, 41).
The higher demand for memory-related imagery
and memory retrieval may explain the greater
precuneus activation in hard dominance-solvable
games. In highly focal coordination games, the
participants may have felt quite strongly that the
pool students must notice the same salient fea-
ture. This may explain why insula activation cor-
relates with NCI.
Participants might have disagreed about which
games were difficult. We built a third model to
investigate whether the frontoparietal activation
correlates with how hard a dominance-solvable
game is and whether the activation in insula and
ACC correlates with how easy a coordination
metrically modulated by each participant’s proba-
bility of obtaining a reward in each game (SOM,
S2.2 and S5.2). We found a negative correlation
between the activation of the precuneus and the
participant’s probability of obtaining a reward in
dominance-solvable games (Fig. 4C and table
S12), which suggests that dominance-solvable
mental challenges. In a previous study on work-
ing memory, precuneus activity positively cor-
related with response times, a measure of mental
effort (24). Both findings are consistent with the
interpretation that subjective measures reflecting
harder tasks (higher efforts) correlate with acti-
insula activation and the participant’s probability
of obtaining a reward again suggests that co-
ordination games with a highly salient feature
strongly activated the “gut feeling” reported by
many participants (Fig. 4D and table S13). A
previous study found that the subjective rating of
“chills intensity” in music correlates with activa-
tion of insula (42). Both findings are consistent
with the interpretation that the subjective inten-
sity of how salient a stimulus is correlates with
activation in insula.
As mentioned, choices were made significant-
solvable games. The results of the second and
third models provide additional support for the
idea that intuitive and deliberative mental pro-
cesses have quite different properties. The “slow
and effortful” process was more heavily taxed
The “fast and effortless” process was more
strongly activated when coordination was easy.
References and Notes
1. J. Schaeffer et al., Science 317, 1518 (2007).
2. Previous fMRI studies of game-playing include Gallagher
et al. (43) and Bhatt and Camerer (44), but they address
different issues. In particular, Bhatt and Camerer found
higher insula and ACC activity when comparing choices to
first-order beliefs in dominance-solvable games.
3. We are considering here coordination without visual or
other contact. Nonhuman primates seem able to
coordinate their actions (simultaneously pulling on bars
to obtain food) when they are in visual contact (45).
4. J. Mehta, C. Starmer, R. Sugden, Am. Econ. Rev. 84, 658
5. T. Schelling, J. Conflict Resolution 2, 203 (1958), p. 211.
6. D. Kahneman, Am. Psychol. 58, 697 (2003).
7. K. Stanovich, R. West, Behav. Brain Sci. 23, 645 (2000).
8. A. Rubinstein, Econ. J. 117, 1243 (2007).
9. See (46). In our experiment, the average number of steps
required to find out the game-theoretic solution for all
40 dominance-solvable games is 3.675.
10. R. Jung, R. Haier, Behav. Brain Sci. 30, 135 (2007).
11. V. Goel, R. Dolan, Neuropsychologia 39, 901 (2001).
12. I. Noveck, V. Goel, K. Smith, Cortex 40, 613 (2004).
14. P. Kyllonen, R. Christal, Intelligence 14, 389 (1990).
15. M. D’Esposito, Philos. Trans. R. Soc. London Ser. B 362,
16. A. Baddeley, Nat. Rev. Neurosci. 4, 829 (2003).
17. In coordination games, the participant has to encode and
hold this information as well. However, because the
targets of both players are the same, the demand on this
capacity should be smaller.
18. E. Smith, J. Jonides, Proc. Natl. Acad. Sci. U.S.A. 95,
19. N. Müller, R. Knight, Neuroscience 139, 51 (2006).
20. E. Smith, J. Jonides, Science 283, 1657 (1999).
21. T. Wager, E. Smith, Cogn. Affect. Behav. Neurosci. 3, 255
22. M. Berryhill, I. Olson, Neuropsychologia 46, 1775 (2008).
23. A. Cavanna, M. Trimble, Brain 129, 564 (2006).
24. M. Wallentin, A. Roepstorff, R. Glover, N. Burgess,
Neuroimage 32, 1850 (2006).
25. M. Wallentin, E. Weed, L. Østergaard, K. Mouridsen,
A. Roepstorff, Hum. Brain Mapp. 29, 524 (2008).
26. A. D. Craig, Nat. Rev. Neurosci. 3, 655 (2002).
27. A. MacDonald III, J. Cohen, A. Stenger, C. Carter, Science
288, 1835 (2000).
28. J. Decety et al., Neuroimage 23, 744 (2004).
29. J. S. Winston et al., Nat. Neurosci. 5, 277 (2002).
30. T. Singer et al., Science 303, 1157 (2004).
31. A. Bartels, S. Zeki, Neuroreport 11, 3829 (2000).
32. J. Woodward, J. Allman, J. Physiol. (Paris) 101, 179 (2007).
33. A. D. Craig, Nat. Rev. Neurosci. 10, 59 (2009).
34. W. Seeley et al., J. Neurosci. 27, 2349 (2007).
35. J. Downar, A. Crawley, D. Mikulis, K. Davis, Nat. Neurosci.
3, 277 (2000).
36. J. Downar, A. Crawley, D. Mikulis, K. Davis, J. Neurophysiol.
87, 615 (2002).
37. K. Davis et al., J. Neurosci. 25, 8402 (2005).
38. K. Taylor, D. Seminowicz, K. Davis, Hum. Brain Mapp.,
in press; published online 15 December 2008;
39. See (47). The NCI can be interpreted as the probability
that two randomly chosen individuals make the same
choice relative to the probability of successful
coordination if all choose randomly (SOM, S2.5).
40. S. Newman, P. Carpenter, S. Varma, M. Just,
Neuropsychologia 41, 1668 (2003).
41. J. Fincham et al., Proc. Natl. Acad. Sci. U.S.A. 99, 3346
42. A. Blood, R. Zatorre, Proc. Natl. Acad. Sci. U.S.A. 98,
43. H. Gallagher, A. Jack, A. Roepstorff, C. Frith, Neuroimage
16, 814 (2002).
44. M. Bhatt, C. Camerer, Games Econ. Behav. 52, 424 (2005).
45. K. Mendres, F. de Waal, Anim. Behav. 60, 523 (2000).
46. C. Camerer, Behavioral Game Theory: Experiments in
Strategic Interaction (Princeton Univ. Press, Princeton,
NJ, 2003), Chap. 5.
47. N. Bardsley, J. Mehta, C. Starmer, R. Sugden, CeDEx
Discussion Paper No. 2008-17 (Centre for Decision
Research and Experimental Economics, Nottingham,
UK, 2008); available at www.nottinghamnetlearning.com/
48. We thank M. Hsu for helpful comments on the
manuscript and J.-Y. Leu, J.T.-Y. Wang, D. Niddam, and
participants at many seminars for discussions. Technical
assistance from C.-R. Chou, C.-T. Chen, C.-H. Lan,
S.-C. Lin, K.-L. Chen, Y.-Y. Chung, W.-Y. Lin, S. Hsu,
R. Chen, and the National Taiwan University Hospital MRI
Laboratory is greatly appreciated. This work was
supported by the National Science Council of Taiwan
(grant NSC 94-2415-H-002-004).
Supporting Online Material
Materials and Methods
Figs. S1 to S9
Tables S1 to S18
8 September 2008; accepted 24 February 2009
The Genome Sequence of Taurine
Cattle: A Window to Ruminant
Biology and Evolution
The Bovine Genome Sequencing and Analysis Consortium,* Christine G. Elsik,1
Ross L. Tellam,2Kim C. Worley3
coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs
shared among seven mammalian species of which 1217 are absent or undetected in noneutherian
(marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes
have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific
variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism
are generally highly conserved, although five metabolic genes are deleted or extensively diverged from
their human orthologs. The cattle genome sequence thus provides a resource for understanding
mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
omesticated cattle (Bos taurus and Bos
of nutrition and livelihood to nearly 6.6
billion humans. Cattle belong to a clade phyloge-
netically distant from humans and rodents, the
Cetartiodactylorder of eutherian mammals,which
24 APRIL 2009VOL 324
on April 23, 2009
first appeared ~60 million years ago (1). Cattle
represent the Ruminantia, which occupy diverse
terrestrial environments with their ability to
efficiently convert low-quality forage into energy-
dense fat, muscle, and milk. These biological
processes have been exploited by humans since
domestication,whichbegan inthe NearEastsome
important world heritage and a scientific resource
for understanding the genetics of complex traits.
The cattle genome was assembled with
methods similar to those used for the rat and sea
urchin genomes (3, 4). The most recent assem-
blies, Btau3.1 and Btau4.0, combined bacterial
shotgun (WGS) sequences. Btau3.1 was used for
gene-specific analyses. Btau4.0, which includes
finished sequence data and used different map-
ping methods to place the sequence on chromo-
somes, was used for all global analyses other
than gene prediction. The contig N50 (50% of
the genome is in contigs of this size or greater)
is 48.7 kb for both assemblies; the scaffold N50
for Btau4.0 is 1.9 Mb. In the Btau4.0 assembly,
90% of the total genome sequence was placed
on the 29 autosomes and X chromosome and
tag (EST) sequences, 95.0% were contained in
the assembled contigs. With an equivalent gene
distribution in the remaining 5% of the genome,
the estimated genome size is 2.87 Gbp. Compar-
ison with 73 finished BACs and single-nucleotide
polymorphism (SNP) linkage data (5, 6) con-
firmed this assembly quality with greater than
92% genomic coverage, and fewer than 0.8% of
SNPs were incorrectly positioned at the resolu-
tion of these maps (3, 4).
We used the cattle genome to catalog protein-
coding genes, microRNA (miRNA) genes, and
ruminant-specific interspersed repeats, and we
manually annotated over 4000 genes. The
consensus protein-coding gene set for Btau3.1
(OGSv1), from six predicted gene sets (4),
consists of 26,835 genes with a validation rate
of 82% (4). On this basis, we estimate that the
cattle genome contains at least 22,000 protein-
coding genes. We identified 496 miRNA genes
of which 135 were unpublished miRNAs (4).
About half of the cattle miRNA occur in 60 ge-
nomic miRNA clusters, containing two to seven
miRNA genes separated by less than 10 kbp (fig.
S2).The overall GC content of the cattle genome
is 41.7%, with an observed-to-expected CpG
ratio of 0.234, similar to that of other mammals.
The cattle genome has transposable element
classes like those of other mammals, as well as
large numbers of ruminant-specific repeats (table
S4) that compose 27% of its genome. The
1Department of Biology, 406 Reiss, Georgetown University,
37th and O Streets, NW, Washington, DC 20057, USA. E-mail:
email@example.comCSIRO Livestock Industries, 306 Carm-
ody Road, St. Lucia, QLD 4067, Australia. E-mail: ross.tellam@
csiro.au3Human Genome Sequencing Center, Department of
Molecular and Human Genetics, Baylor College of Medicine, MS
BCM226, One Baylor Plaza, Houston, TX 77030, USA. E-mail:
*All authors with their affiliations and contributions are
listed at the end of this paper.
Fig. 1. Protein orthology comparison among genomes of cattle, dog, human,
mouse, and rat (Bos taurus, Canis familiaris, Homo sapiens, Mus musculus,
Rattus norvegicus, representing placental mammals), opossum (Monodelphis
domestica, marsupial), and platypus (Ornithorhynchus anatinus, monotreme).
(A) The majority of mammalian genes are orthologous, with more than half
preserved as single copies (dark blue); a few thousand have species-specific
duplications (blue); another few thousand have been lost in specific lineages
(orange). We also show those lacking confident orthology assignment (green),
and those that are apparently lineage specific [unique (white)]. Placental-
specific orthologs are shown in pink. Single- or multiple-copy genes were
defined on the basis of representatives in human, bovine, or dog; mouse or
rat; and opossum or platypus. (B) Venn diagram showing shared orthologous
groups (duplicated genes were counted as one) between laurasiatherians
(cattle and dog), human, rodents (mouse and rat), and nonplacental mammals
(opossum and platypus) on the basis of the presence of a representative gene
in at least one of the grouped species [as in (A)]. (C) Distribution of ortholog
protein identities between human and the other species for a subset of strictly
conserved single-copy orthologs. (D) A maximum likelihood phylogenetic tree
using all single-copy orthologs supports the accepted phylogeny and quantifies
the relative rates of molecular evolution expressed as the branch lengths.
VOL 32424 APRIL 2009
on April 23, 2009
nuclear element (LINE) lacked a functional open
inactive (7). However, Bov-B repeats with intact
ORF were identified in the genome, and their
phylogeny (fig. S4) indicates that some are still
actively expanding and evolving. Mapping chro-
mosomal segments of high- and low-density
ancient repeat content, L2/MIR [a LINE/SINE
(short interspersed nuclear element) pair] and
Bov-B, and more recent repeats, Bov-B/ART2A
(Bov-B–derived SINE pair), revealed that the
genome consists of ancient regions enriched for
L2/MIR and recent regions enriched for Bov-B/
ART2A (fig. S7). Exclusion of Bov-B/ART2A
from contiguous blocks of ancient repeats sug-
gests that evolution of the ruminant or cattle ge-
nome experienced invasions of new repeats into
regions lacking ancient repeats. Alternatively,
older repeats may have been destroyed by inser-
tion of ruminant- or cattle-specific repeats. AGC
trinucleotide repeats, the most common simple-
sequence repeat (SSR) in artiodactyls (which
include cattle, pigs, and sheep), are 90- and 142-
fold overrepresented in cattle compared with hu-
man and dog, respectively (fig. S10). Of the
AGC repeats in the cattle genome, 39% were
associated with Bov-A2 SINE elements.
A comparative analysis examined the rate of
protein evolution and the conservation of gene
repertoires among orthologs in the genomes of
tal mammals); opossum (marsupial); and platy-
pus (monotreme). Orthology was resolved for
>75% of cattle and >80% of human genes (Fig.
1A).There were 14,345 orthologous groups with
representatives in human, cattle, or dog; mouse
or rat; and opossum or platypus, which represent
16,749 cattle and 16,177 human genes, respec-
We also identified 1217 placental mammal–
specific orthologous groups with genes present in
human, cattle, or dog; mouse or rat; but not opos-
sum or platypus. About 1000 orthologs shared
between rodents and laurasiatherians (cattle and
dog), many of which encode G protein–coupled
receptors, appear to have been lost or may be
misannotated in the human genome (Fig. 1B).
Gene repertoire conservation among these mam-
mals correlates with conservation at the amino
acid–sequence level (Fig. 1C). The elevated rate
(8) was supported by the higher amino acid se-
quence identity between human and dog or cattle
proteins. However, maximum-likelihood analysis
of amino acid substitutions in single-copy ortho-
primates and rodents (1) (Fig. 1D).
Alternative splicing is a major mechanism for
transcript diversification (9), yet the extent of its
evolutionary conservation and functional impact
remain unclear. We used the cattle genome to
analyze the conservation of the most common
form of alternative splicing, exon skipping, de-
fined as a triplet of exons in which the middle
exon is absent in some transcripts, in a set of
dog, and cattle (4). We examined 277 cases, with
different conservation patterns between human and
mouse, in 16 different cattle tissues with reverse
transcription polymerase chain reaction (4). These
splicing events were divided into a shared set (163
in both human and mouse) and a nonshared set
(114 in human but not in mouse). Of the 277, we
S5), which suggested that the majority of genes
with exon-skipping in human were present and
regulated in cattle and that, if an event is shared
our data agree with the upper bound from previous
analyses with human and rodents [e.g., (10)].
We constructed a cattle-human Oxford grid
(fig. S12) (4) to conduct synteny-based chromo-
genome organization is more similar to cattle's
than rodents' because most cattle chromosomes
primarily correspond to part of one human chro-
mosome, albeit with multiple rearrangements
[e.g., (11)]. In contrast, the cattle-mouse Oxford
Lineage-specific evolutionary breakpoints were
(a group encompassing artiodactyls and carni-
coordinates (Fig. 2) (4). Primate, dog, rodent,
mouse, and rat lineage-specific breakpoint posi-
tionary breakpoint regions (EBRs) were identified
in the cattle lineage, of which 100 were cattle- or
ruminant-specific and 24 were artiodactyl-specific
(e.g., Fig. 2). Nine additional EBRs represent pre-
sumptive ferungulate-specific rearrangements. Bos
taurus chromosome 16 (BTA16) is populated with
four ferungulate-specific EBRs, which suggests
that this region was rearranged before the Artio-
dactyla and Carnivora divergence (Fig. 2). Such
conserved regions demonstrate that many inver-
sions that occurred before the divergence of the
carnivores and artiodactyls have probably been
retained in the ancestral form within the human
genome. In contrast to the cattle genome, a pig
physical map identified only 77 lineage-specific
EBRs. Interchromosomal rearrangements and in-
versions characterize most of the lineage-specific
Fig. 2. Examples of EBRs. Ferungulate-, artiodactyl-,
and primate-specific EBRs on HSA1 at 175 to 247
Mbp (other lineage-specific EBRs not shown).
Homologous synteny blocks constructed for the
macaque, chimp, cattle, dog, mouse, rat, and pig
genomes were used for pairwise comparisons (4).
White areas correspond to EBRs. Arrows to the right
of the chromosome ideogram indicate positions of
representative cattle-specific; artiodactyl-specific
(specific to the chromosomes of pigs and cattle);
ferungulate-specific (cattle, dog, and pig); primate-
specific (human, macaque, and chimp); and
hominoid-specific (human and chimp) rearrange-
ments. Opossum is shown as an outgroup to the
eutherian clade, which allows classification of
Table 1. Changes in the number of genes in innate
are present in unassigned scaffolds, i.e., they are not
b-defensin genes is uncertain. Interferon subfamily
pseudogenes predicted on the basis of frame-shift
mutations or stop codons within the first 100 amino
subfamily of IFN and are so named for convenience.
BPI, Bactericidal and/or permeability-increasing;
RNase, ribonuclease; LBP, lipopolysaccharide-binding
protein; ULBP, UL16-binding protein.
24 APRIL 2009 VOL 324
on April 23, 2009
An examination of repeat families and in-
dividual transposable elements within cattle-,
a significantly higher density of LINE-L1 ele-
ments and the ruminant-specific LINE-RTE re-
peat family (12) in cattle-specific EBRs relative
to the remainder of the cattle genome (table S6).
In contrast, the SINE-BovA repeat family and
the moreancient tRNAGlu-derivedSINErepeats
(13) were present in lower density in cattle-
(table S7). The differences in repeat densities
were generally consistent in cattle-, artiodactyl-
and ferungulate-specific EBRs, with the excep-
tion of the tRNAGlu-derived and LTR-ERVL
repeats, which are at higher densities in artiodac-
tyl EBRs compared with the rest of the genome.
common ancestor of Suina (pigs and peccaries),
Ruminantia, and Cetacea (whales) (13), which
suggests that tRNAGlu-derived SINEs were
involved in ancestral artiodactyl chromosome re-
arrangements. Furthermore, the lower density of
the more ancient repeat families in cattle-specific
EBRs suggests that either more recently arising
repeat elements were inserted into regions lack-
ing ancient repeats or that older repeats were
destroyed by this insertion (table S7). The repeat
elements differing in density in EBRs were also
found in regions of homologous synteny, which
suggests that repeats may promote evolutionary
density in cattle-specific EBRs are thus unlikely
to be caused by the accumulation of repeats in
EBRs after such rearrangements occur. We
identified a cattle-specific EBR associated with
a bidirectional promoter (figs. S14 and S15) that
may affect control of the expression of the
CYB5R4 gene, which has been implicated in
human diabetes and, therefore, may be important
in the regulation of energy flow in cattle (4).
We identified 1020 segmental duplications
(SDs) corresponding to 3.1% (94.4 Mbp) of the
cattle genome (4). Duplications assigned to a
chromosome showed a bipartite distribution with
respect to length and percent identity (fig. S16),
and interchromosomal duplications were shorter
identity) relative to intrachromosomal duplications
(median length 20 kbp, ~97% identity) and tended
to be locally clustered (fig. S17). Twenty-one of
these duplications were >300 kbp and located in
regions enriched for tandem duplications (e.g.,
BTA18) (fig. S18). This pattern is reminiscent of
the duplication pattern of the dog, rat, and mouse
but different from that of primate and great-ape
genomes (14, 15). On average, cattle SDs >10 kbp
represent 11.7% of base pairs in 10-kbp intervals
located within cattle-specific EBRs and 23.0% of
base pairs located within the artiodactyl-specific
EBRs.Bycontrast, inthe remainder of the genome
sequence assigned to chromosomes the fraction of
SDswas1.7% (P< 1× 10−12).These dataindicate
that SDs play a role in promoting chromosome
nation [e.g., (16)] and suggest that either a
significant fraction of the SDs observed in cattle
occurred before the Ruminant-Suina split, and/or
that the sites for accumulation of SDs are non-
randomly distributed in artiodactyl genomes.
SDs involving genic regions may give rise to
new functional paralogs. Seventy-six percent
(778 out of 1020) of the cattle SDs correspond
sequence identity (median 98.7%). This suggests
that many of these gene duplications are specific
to either the artiodactyla or the Bos lineage and
tend to encode proteins that often interface with
the external environment, particularly immune
proteins and sensory and/or olfactory receptors.
Several of these gene duplications are also
duplicated in other mammalian lineages (e.g.,
A, defensins, and pregnancy-associated glyco-
proteins). Paralogs located in segmental duplica-
tions that are present exclusively in cattle may
have functional implications for the unique phys-
iology, environment, and diet of cattle.
An overrepresentation of genes involved in
reproduction in cattle SDs (tables S8 and S9) is
associated with several gene families expressed in
the ruminant placenta. These families encode the
glycoproteins (on BTA29), trophoblast Kunitz
domain proteins (on BTA13), and interferon tau
(IFNT) (on BTA8). A gene family encoding
prolactin-related proteins (on BTA23) was only
identified in the assembly-dependent analysis of
of fetal growth, maternal adaptations to pregnancy,
and the coordination of parturition (17, 18). Al-
though type I interferon (IFN) genes are primarily
sion of the corpus luteum during early pregnancy,
which results in a uterine environment receptive to
early conceptus development (20).
Signatures of positive selection (obtained by
measurement of their rates of synonymous and
(4), including 10 immune-related genes (i.e.,
IFNAR2, IFNG, CD34, TREM1, TREML1,
viously mentioned, immune genes are overrepre-
of genes varying in cattle relative to mouse include
a cluster of b-defensin genes, which encode anti-
microbial peptides; the antimicrobial cathelicidin
genes [which showincreased sequence diversity of
the mature cathelicidin peptides (21)]; and changes
ber and organization of genes involved in adaptive
immune responses in cattle compared with human
and mouse (4). This extensive duplication and di-
be because of the substantial load of microorga-
the risk of opportunistic infections at mucosal sur-
stronger and more diversified innate immune re-
sponses at these locations. Another possibility is
that immunity may have been under selection due
to the herd structure, which can promote rapid dis-
ease transmission. Also, immune function–related
duplicated genes have gained nonimmune functions,
e.g., IFNT (see above), and the C-class lysozyme
in the abomasum (see below).
There has been substantial reorganization of
gene families encoding proteins present in milk.
One such rearrangement affecting milk compo-
sition involves the histatherin (HSTN) gene with-
in the casein gene cluster on BTA6 (fig. S21). In
the cattle genome, HSTN is juxtaposed to a
regulatory element (BCE) important (23) for b-
casein (CSN2) expression, and as a probable
consequence, HSTN is regulated like the casein
genes during the lactation cycle. This rearrange-
the BCE is also the probable cause of deletion of
one of the two copies of a-S2–like casein genes
(CSN1S2A) present in other mammalian genomes
(24). The biological implications of this change
in casein gene copy number are not yet clear.
gene cluster arose from both a laurasiatherian
SD and a cattle-specific EBR, which resulted in
two mammary gland–expressed SAA3-like genes,
gene on BTA15 (fig. S21). SAA3.2 has been
shown to inhibit microbial growth (25). Two ad-
ditional milk protein genes were associated with
SDs: cathelicidin (CATHL1) and b2-microglobulin
(B2M)—part of the neonatal Fc receptor (FcRn)
cells of many tissues including the gut and
mammary gland (26, 27). IgG is the predominant
in human milk (28). Unlike humans, who acquire
passive immunity from the mother via placental
transfer of immunoglobulins during pregnancy,
calves acquire passive immunity by ingestion of
IgG in milk (28). B2M is also redistributed in
epithelial cells upon calving, and it protects IgG
negative effects on passive immune transfer (29).
The additional copy of the gene encoding B2M
cows’ milk and an increased capacity for uptake in
fer of immunity to the calf is one of the important
functions of milk, it is striking that lactation-related
genes affected by genomic rearrangements often
encode immune-related proteins in milk.
Cattle metabolic pathways demonstrated a
strong degree of conservation among the compre-
hensive set of genes involved in core mammalian
metabolism (4) and permitted an examination of
unique genetic events that may be related to
ruminant-specific metabolic adaptations. How-
ever, among 1032 genes examined from the hu-
man metabolic pathways, five were deleted or
lipase A2, group IVC), FAAH2 (fatty acid amide
hydrolase 2), IDI2 (isopentenyl-diphosphate delta
isomerase 2), GSTT2 (glutathione S-transferase
VOL 32424 APRIL 2009
on April 23, 2009
theta 2), and TYMP (thymidine phosphorylase),
which may be adaptations that impact on fatty
acid metabolism (PLA2G4C and FAAH2); the
mins, steroid hormones, and cholesterol) (IDI2);
detoxification (GSTT2); and pyrimidine metabo-
lism (TYMP). Phylogenetic analysis shows that
PLA2G4C was deleted ~87 to 97 million years
ago in the laurasiatherian lineages (fig. S22).
Strikingly, ~20% of the sequences from two
abomasum (last chamber of the cattle stomach)
EST libraries (a total of 2392 sequences) corre-
spond to three C-typelysozymegenes.Lysozyme
primarily functions in animals as an antibacterial
protein, which suggests that they probably func-
tion in the abomasum (similar to the monogastric
stomach) to degrade the cell walls of bacteria
contains 10 C-type lysozyme genes (table S14
that six of the seven remaining C-type lysozyme
genes are expressed primarily in the intestinal
tract, which suggests additional roles for the
encoded proteins in ruminant digestion.
In summary, the biological systems most af-
fected by changes in the number and organization
of genes in the cattle lineage include reproduction,
immunity, lactation, and digestion. We highlighted
the evolutionary activity associated with chromo-
somal breakpoint regions and their propensity for
promoting gene birth and rearrangement. These
changes in the cattle lineage probably reflect meta-
bolic, physiologic, and immune adaptations due to
microbial fermentation in the rumen, the herd
environment and its influence on disease transmis-
sion, and the reproductive strategy of cattle. The
cattle genome and associated resources will
facilitate the identification of novel functions and
regulatory systems of general importance in mam-
mals and may provide an enabling tool for genetic
improvement within the beef and dairy industries.
References and Notes
1. W. J. Murphy, P. A. Pevzner, S. J. O'Brien, Trends Genet.
20, 631 (2004).
2. R. L. Willham, J. Anim. Sci. 62, 1742 (1986).
3. Y. Liu et al., BMC Genomics 10, 180 (2009).
4. Materials, methods, and additional discussion are available
on Science online.
5. H. Nilsen et al., Anim. Genet. 39, 97 (2008).
6. A. Prasad et al., BMC Genomics 8, 310 (2007).
7. H.S. Malik, T. H. Eickbush, Mol. Biol. Evol. 15, 1123 (1998).
8. C. I. Wu, W. H. Li, Proc. Natl. Acad. Sci. U.S.A. 82, 1741
9. B. Modrek, C. J. Lee, Nat. Genet. 34, 177 (2003).
10. R. Sorek, R. Shamir, G. Ast, Trends Genet. 20, 68 (2004).
11. A. Everts-van der Wind et al., Proc. Natl. Acad. Sci. U.S.A.
102, 18526 (2005).
12. D. Kordis, F. Gubensek, Gene 238, 171 (1999).
13. M. Shimamura, H. Abe, M. Nikaido, K. Ohshima, N. Okada,
Mol. Biol. Evol. 16, 1046 (1999).
14. J. A. Bailey, E. E. Eichler, Nat. Rev. Genet. 7, 552 (2006).
15. J. A. Bailey et al., Science 297, 1003 (2002).
16. W. J. Murphy et al., Science 309, 613 (2005).
17. K. Hashizume et al., Reprod. Fertil. Dev. 19, 79 (2007).
18. J. H. Larson et al., Physiol. Genomics 25, 405 (2006).
19. S. Y. Zhang et al., Immunol. Rev. 226, 29 (2008).
20. R. M. Roberts, Y. Chen, T. Ezashi, A. M. Walker, Semin.
Cell Dev. Biol. 19, 170 (2008).
21. M. Scocchi, S. Wang, M. Zanetti, FEBS Lett. 417, 311 (1997).
22. M. G. Katze, Y. He, M. Gale Jr., Nat. Rev. Immunol. 2,
23. C. Schmidhauser et al., Mol. Biol. Cell 3, 699 (1992).
24. M. Rijnkels, L. Elnitski, W. Miller, J. M. Rosen, Genomics
82, 417 (2003).
25. A. J. Molenaar et al., Biomarkers 14, 26 (2009).
26. B. Mayer et al., J. Dairy Res. 72 (suppl. S1), 107 (2005).
28. T. J. Newby, C. R. Stokes, F. J. Bourne, Vet. Immunol.
Immunopathol. 3, 67 (1982).
29. M. L. Clawson et al., Mamm. Genome 15, 227 (2004).
30. D. M. Irwin, J. Mol. Evol. 41, 299 (1995).
31. J. H. Larson et al., BMC Genomics 7, 227 (2006).
32. Funded by the National Human Genome Research
Institute (NHGRI U54 HG003273); the U.S. Department
of Agriculture's Agricultural Research Service (USDA-ARS
agreement no. 59-0790-3-196) and Cooperative State
Research, Education, and Extension Service National
Research Initiative (grant no. 2004-35216-14163); the
state of Texas; Genome Canada through Genome British
Columbia; the Alberta Science and Research Authority;
the Commonwealth Scientific and Industrial Research
Organization of Australia (CSIRO); Agritech Investments
Ltd., Dairy Insight, Inc., and AgResearch Ltd., all of New
Zealand; the Research Council of Norway; the Kleberg
Foundation; and the National, Texas, and South Dakota
Beef Check-off Funds. The master accession for this WGS
sequencing project is AAFC03000000. The individual WGS
sequences are AAFC03000001 to AAFC03131728, and the
scaffold records are CM000177 to CM000206 (chromosomes)
and DS490632 to DS495890 (unplaced scaffolds).
The Bovine Genome Sequencing and Analysis Consortium
Principal Investigator: Richard A. Gibbs1
Analysis project leadership: Christine G. Elsik,2,3Ross L. Tellam4
Sequencing project leadership: Richard A. Gibbs,1Donna M.
Muzny,1George M. Weinstock5,1
Analysis group organization: David L. Adelson,6Evan E. Eichler,7,8
Laura Elnitski,9Christine G. Elsik,2,3Roderic Guigó,10Debora L.
Hamernik,11Steve M. Kappes,12Harris A. Lewin,13,14David J. Lynn,15
Frank W. Nicholas,16Alexandre Reymond,17Monique Rijnkels,18
Loren C. Skow,19Ross L. Tellam,4Kim C. Worley,1Evgeny M.
Sequencing project white paper: Richard A. Gibbs,1Steve M.
Kappes,12Lawrence Schook,13Loren C. Skow,19George M.
Gene prediction and consensus gene set: Tyler Alioto,10
Stylianos E. Antonarakis,20Alex Astashyn,24Charles E. Chapple,10
Hsiu-Chuan Chen,24Jacqueline Chrast,17Francisco Câmara,10
Christine G. Elsik2,3(leader), Olga Ermolaeva,24Roderic Guigó,10
Charlotte N. Henrichsen,17Wratko Hlavina,24Yuri Kapustin,24Boris
Kiryutin,24Paul Kitts,24Felix Kokocinski,25Melissa Landrum,24
Donna Maglott,24Kim Pruitt,24Alexandre Reymond,17Victor
Sapojnikov,24Stephen M. Searle,25Victor Solovyev,26Alexandre
Souvorov,24Catherine Ucla,20George M. Weinstock,5,1Carine Wyss20
Experimental validation of gene set: Tyler Alioto,10Stylianos E.
Antonarakis,20Charles E. Chapple,10Jacqueline Chrast,17Francisco
Câmara,10Roderic Guigó10(leader), Charlotte N. Henrichsen,17
Alexandre Reymond,17Catherine Ucla,20Carine Wyss20
MicroRNA analysis: Juan M. Anzola,3Daniel Gerlach,20,21Evgeny M.
GC composition analysis: Eran Elhaik,27,28Christine G. Elsik2,3
(leader), Dan Graur,27Justin T. Reese2
Repeat analysis: David L. Adelson6(leader), Robert C. Edgar,29
John C. McEwan,30Gemma M. Payne,30Joy M. Raison31
Protein ortholog analysis: Thomas Junier,19,20Evgenia V.
Kriventseva,32Evgeny M. Zdobnov20,21,22(leader)
Exon-skipping analysis: Jacqueline Chrast,17Eduardo Eyras,33,34
Charlotte N. Henrichsen,17Mireya Plass,34Alexandre Reymond17
Evolutionary breakpoint analysis and Oxford grid: Ravikiran
Donthu,13Denis M. Larkin,13,14Harris A. Lewin13,14(leader), Frank
Bidirectional promoter analysis: Laura Elnitski9(leader), Denis M.
Larkin,13,14Harris A. Lewin,13,14James Reecy,35Mary Q. Yang9
Segmental duplication analysis: David L. Adelson,6Lin Chen,7Ze
Cheng,7Carol G. Chitko-McKown,36Evan E. Eichler7,8(leader),
Laura Elnitski,9Christine G. Elsik,2,3George E. Liu,37Lakshmi K.
Matukumalli,38,37Jiuzhou Song,39Bin Zhu39
Analysis of gene ontology in segmental duplications: Christine G.
Elsik,2,3David J. Lynn15(leader), Justin T. Reese2
Adaptive evolution: Daniel G. Bradley,40Fiona S.L. Brinkman,15
Lilian P.L. Lau,40David J. Lynn15(leader), Matthew D. Whiteside15
Innate immunity: Ross L. Tellam4(leader), Angela Walker,41
Thomas T. Wheeler42
Lactation: Theresa Casey,43J.Bruce German,44,45Danielle G. Lemay,45
David J. Lynn,15Nauman J. Maqbool,46Adrian J. Molenaar,42
Metabolism: Harris A. Lewin13,14(leader), Seongwon Seo,47Paul
Adaptive immunity: Cynthia L. Baldwin,49Rebecca Baxter,50
Candice L. Brinkmeyer-Langford,19WendyC. Brown,51Christopher
P. Childers,2Timothy Connelley,52Shirley A. Ellis,53Krista Fritz,19
Elizabeth J. Glass,50Carolyn T.A. Herzig,49Antti Iivanainen,54
Kevin K. Lahmers,51Loren C. Skow19(leader)
Annotationdata management: Anna K. Bennett,2Christopher P.
Childers,2C. Michael Dickens,3Christine G. Elsik2,3(leader), James
G.R. Gilbert,25Darren E. Hagen,2Justin T. Reese,2Hanni Salih3
Manual annotation organization: Jan Aerts,55Alexandre R.
Caetano,56Brian Dalrymple,4Christine G. Elsik,2,3Jose Fernando
Garcia,57Richard A. Gibbs,1Clare A. Gill,3,58Debora L. Hamernik,11
Stefan G. Hiendleder,59Erdogan Memili,60Frank W. Nicholas,16
James Reecy,35Monique Rijnkels,18Loren C. Skow,19Diane
Spurlock,35Paul Stothard,48Ross L. Tellam,4GeorgeM.Weinstock,5,1
John L. Williams,61Kim C. Worley1
cDNA tissues, libraries, and sequencing: Lee Alexander,62
Michael J. Brownstein,63Leluo Guan,48Robert A. Holt64(leader),
Steven J.M. Jones64(leader), Marco A. Marra64(leader), Richard
Moore,64Stephen S. Moore48(leader), Andy Roberts,62Masaaki
Taniguchi,65,48Richard C. Waterman62
Genome sequence production: Joseph Chacko,1Mimi M.
Chandrabose,1Andy Cree1(leader), Marvin Diep Dao,1Huyen H.
Dinh1(leader), Ramatu Ayiesha Gabisi,1Sandra Hines,1Jennifer
Hume1(leader), Shalini N. Jhangiani,1Vandita Joshi,1Christie L.
Kovar1(leader), Lora R. Lewis,1Yih-shin Liu,1John Lopez,1
Margaret B. Morgan,1Donna M. Muzny1(leader), Ngoc Bich
Nguyen,1Geoffrey O. Okwuonu,1San Juana Ruiz,1Jireh
Santibanez,1Rita A. Wright1
Sequence finishing: Christian Buhay1(leader), Yan Ding,1
Shannon Dugan-Rocha1(leader), Judith Herdandez,1Michael
Automated BAC assembly: Amy Egan,1Jason Goodell,1Katarzyna
Sequence production informatics: Gerald R. Fowler1(leader),
Matthew Edward Hitchens,1Ryan J. Lozado,1Charles Moen,1David
Steffen,66,1James T. Warren,1Jingkun Zhang1
BAC mapping: Readman Chiu,64Steven J.M. Jones,64Marco A.
Marra64(leader), Jacqueline E. Schein64
Genome assembly: K. James Durbin,67,1Paul Havlak,68,1Huaiyang
Jiang,1Yue Liu,1Xiang Qin,1Yanru Ren,1Yufeng Shen,1,69Henry
Song,1George M. Weinstock,5,1Kim C. Worley1(leader)
Sequence library production: Stephanie Nicole Bell,1Clay Davis,1
Angela Jolivet Johnson,1Sandra Lee,1Lynne V. Nazareth1(leader),
Bella Mayurkumar Patel,1Ling-Ling Pu,1Selina Vattathil,1Rex Lee
BAC production: Stacey Curry,1Cerissa Hamilton,1Erica
Sequence variation detection: Lynne V. Nazareth,1David A.
Markers and mapping: David L. Adelson,6Jan Aerts,55Wes Barris,4
Gary L. Bennett,36Brian Dalrymple,4André Eggen,70Clare A. Gill,3,58
Ronnie D. Green,71Gregory P. Harhay,36Matthew Hobbs,72Oliver
Jann,50Steve M. Kappes12(leader), John W. Keele,36Matthew P.
Kent,73Denis M. Larkin,13,14Harris A. Lewin,13,14Sigbjørn Lien,73
John C. McEwan,30Stephanie D. McKay,74Sean McWilliam,4
Stephen S. Moore,48Frank W. Nicholas,16Gemma M. Payne,30
Abhirami Ratnakumar,75,4Hanni Salih,3Robert D. Schnabel,74
Timothy Smith,36Warren M. Snelling,36Tad S. Sonstegard,37
Roger T. Stone,36Yoshikazu Sugimoto,76Akiko Takasuga,76Jeremy
F.Taylor,74RossL.Tellam,4Curtis P.VanTassell,37John L.Williams61
Genomic DNA: Michael D. MacNeil62
Manual annotation: Antonio R.R. Abatepaulo,77Colette A. Abbey,3
Jan Aerts,55Virpi Ahola,78Iassudara G. Almeida,57Ariel F. Amadio,79
Elen Anatriello,77Suria M. Bahadue,2Cynthia L. Baldwin,49Rebecca
Baxter,50Anna K. Bennett,2Fernando H. Biase,13Clayton R. Boldt,3
Candice L. Brinkmeyer-Langford,19Wendy C. Brown,51Alexandre R.
Caetano,56Jeffery A. Carroll,80Wanessa A. Carvalho,77Theresa
Casey,43Eliane P. Cervelatti,57Elsa Chacko,81Jennifer E. Chapin,3Ye
Cheng,35Christopher P. Childers,2Jungwoo Choi,3Adam J. Colley,82
Timothy Connelley,52Tatiana A. de Campos,56Marcos De Donato,83
24 APRIL 2009VOL 324
on April 23, 2009
Isabel K.F. de Miranda Santos,56,77Carlo J.F. de Oliveira,77Heather
Deobald,84Eve Devinoy,85C. Michael Dickens,3Kaitlin E. Donohue,2
Peter Dovc,86Annett Eberlein,87Shirley A. Ellis,53Carolyn J.
Fitzsimmons,59Alessandra M. Franzin,77Krista Fritz,19Gustavo R.
Garcia,77Jose Fernando Garcia,57Sem Genini,61J. Bruce German,44,45
James G.R. Gilbert,25Clare A. Gill,3,58Cody J. Gladney,3Elizabeth J.
Glass,50Jason R. Grant,48Marion L. Greaser,88Jonathan A. Green,74
Darryl L. Hadsell,18Darren E. Hagen,2Hatam A. Hakimov,89Rob
Halgren,43Jennifer L. Harrow,25Elizabeth A. Hart,25Nicola
Hastings,90,50Marta Hernandez,91Carolyn T.A. Herzig,49Stefan G.
Hiendleder,59Matthew Hobbs,72Zhi-Liang Hu,35Antti Iivanainen,54
Aaron Ingham,4Terhi Iso-Touru,78Catherine Jamis,2Oliver Jann,50
Kirsty Jensen,50Dimos Kapetis,61Tovah Kerr,51Sari S. Khalil,2Hasan
Khatib,92Davood Kolbehdari,48,93Charu G. Kumar,13Dinesh
Kumar,94,35Richard Leach,50Justin C-M Lee,2Danielle G. Lemay,45
Changxi Li,95,48George E. Liu,37Krystin M. Logan,96Roberto
Malinverni,61Nauman J. Maqbool,46Elisa Marques,48William F.
Martin,45Natalia F. Martins,56Sandra R. Maruyama,77Raffaele
Mazza,97Kim L. McLean,84Juan F. Medrano,98Erdogan Memili,60
Adrian J. Molenaar,42Barbara T. Moreno,57Daniela D. Moré,77Carl T.
Muntean,3Hari P. Nandakumar,19Marcelo F.G. Nogueira,99Ingrid
Olsaker,100Sameer D. Pant,82Francesca Panzitta,61Rosemeire C.P.
Pastor,57Mario A. Poli,101Nathan Poslusny,2Satyanarayana
Rachagani,35Shoba Ranganathan,81,102Andrej Razpet,86James
Reecy,35Penny K. Riggs,3,58Monique Rijnkels,18Gonzalo Rincon,98
Nelida Rodriguez-Osorio,60,103Sandra L. Rodriguez-Zas,13Natasha
E. Romero,3Anne Rosenwald,2Lillian Sando,4Sheila M. Schmutz,84
Seongwon Seo,47Libing Shen,2Laura Sherman,48Loren C. Skow,19
Bruce R. Southey,104Diane Spurlock,35Ylva Strandberg Lutzow,4
Jonathan V. Sweedler,104Imke Tammen,72Masaaki Taniguchi,65,48
Ross L. Tellam,4Bhanu Prakash V.L. Telugu,74Jennifer M. Urbanski,2
Yuri T. Utsunomiya,57Chris P. Verschoor,82Ashley J. Waardenberg,4,105
Angela Walker,41Zhiquan Wang,48Robert Ward,106Rosemarie
Weikard,87Thomas H. Welsh Jr.,3,58Thomas T. Wheeler,42Stephen
N. White,51,107John L. Williams,61Laurens G. Wilming,25Kris R.
Wunderlich,3Jianqi Yang,108Feng-Qi Zhao109
and Human Genetics, Baylor College of Medicine, One Baylor
Plaza, Houston, TX 77030, USA.2Department of Biology, 406
Reiss, Georgetown University, 37th & O Streets NW,
Washington, DC 20057,USA.3Department of Animal Science,
2471, USA.4Livestock Industries, Commonwealth Scientific
and Industrial Research Organization (CSIRO), 306 Carmody
Road, St. Lucia, Queensland, 4067, Australia.5The Genome
of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63108,
USA.6School of Molecular and Biomedical Science, School of
Agriculture, Food and Wine, The University of Adelaide,
Adelaide, SA, 5005, Australia.
Sciences, University of Washington, 1705 NE Pacific Street,
Seattle, WA 98195–5065, USA.
Institute, Seattle, WA 98195, USA.9National Human Genome
Research Institute, National Institutes of Health, 5625 Fishers
Lane, Rockville, MD 20878, USA.
Regulation and Grup de Recerca en Informática Biomédica,
Institut Municipal d’Investigació Mèdica, Universitat Pompeu
Fabra, 08003 Barcelona, Catalonia, Spain.11U.S. Department
of Agriculture (USDA), Cooperative State Research, Education,
& Extension Service, 1400 Independence Avenue SW, Stop
2220, Washington, DC 20250–2220, USA.
Program Staff, USDA–Agricultural Research Service, 5601
Animal Sciences, University of Illinois at Urbana–Champaign,
1201 West Gregory Drive, Urbana, IL 61801, USA.14Institute
1201 West Gregory Drive, Urbana, IL 61801, USA.15Depart-
ment of Molecular Biology and Biochemistry, Simon Fraser
University, 8888 University Drive, Burnaby, BC, V5A 1S6,
Canada.16Faculty of Veterinary Science, University of Sydney,
Sydney, NSW, 2006, Australia.
Genomics, University of Lausanne, Lausanne, 1015, Switzer-
land.18Children's Nutrition Research Center, USDA–Agricultural
Research Service, Department of Pediatrics–Nutrition, Baylor
College of Medicine, 1100 Bates Street, Houston, TX 77030–
sciences, Texas A&M University, College Station, TX 77843,
USA.20Department of Genetic Medicine and Development,
University of Geneva Medical School, 1 rue Michel-Servet,
7Department of Genome
8Howard Hughes Medical
10Center for Genomic
17Center for Integrative
19Department of Veterinary Integrative Bio-
Geneva, 1211, Switzerland.21Swiss Institute of Bioinformatics,
1 rue Michel-Servet, Geneva, 1211, Switzerland.22Division of
Molecular Biosciences, Imperial College London, South
Kensington Campus, London, SW7 2AZ, UK.23Department of
Veterinary Pathobiology, Texas A&M University, College
Station, TX 77843, USA.24National Center for Biotechnology
Information, National Library of Medicine, National Institutes
of Health, Bethesda, MD 20892, USA.25Informatics Depart-
ment, Wellcome Trust Sanger Institute, Hinxton, Cambridge,
CB10 1HH,UK.26Department of Computer Science, University
of London, Royal Holloway, Egham, Surrey, TW20 0EX, UK.
27Department of Biology and Biochemistry, University of
Houston, Houston, TX 77204, USA.
Institute of Genetic Medicine, BRB 579, Johns Hopkins
MD 21205, USA.2945 Monterey Drive, Tiburon, CA 94920,
USA.30Animal Genomics, AgResearch, Invermay, PB 50034,
Mosgiel, 9053, New Zealand.31eResearch SA, University of
Adelaide, North Terrace, Adelaide, SA, 5005, Australia.
32Department of Structural Biology and Bioinformatics,
University of Geneva Medical School, 1 rue Michel-Servet,
Geneva, 1211, Switzerland.33Catalan Institution for Research
and Advanced Studies, 08010 Barcelona, Catalonia, Spain.
34Computational Genomics,Universitat Pompeu Fabra, 08003
Barcelona, Catalonia, Spain.35Department of Animal Science,
Iowa State University, 2255 Kildee Hall, Ames, IA 50011–
3150, USA.36Meat Animal Research Center, USDA–Agricultural
Research Service, Clay Center, NE 68933, USA.37Bovine Func-
Beltsville Agricultural Research Center (BARC)–East, Beltsville,
Manassas, VA 20110, USA.39Department of Bioengineering,
Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland.
41Department of Veterinary Pathobiology, 245 Bond Life Sci-
42Dairy Science and Technology Section, AgResearch, Ruakura
New Zealand.43Department of Animal Science, Michigan State
University, East Lansing, MI 48824–1225, USA.44Nestlé Re-
search Centre, Vers chez les Blanc CH, Lausanne 26, 1000,
Switzerland.45Department of Food Science and Technology,
University of California–Davis, Davis, CA 95616, USA.46Bio-
informatics, Mathematics and Statistics, AgResearch, Ruakura
Chungnam National University, Daejeon, 305-764, Korea.
48Department of Agricultural, Food and Nutritional Science, Uni-
versity of Alberta, 410 AgFor Centre, Edmonton, AL, T6G 2P5,
University of Massachusetts, Amherst, MA 01003, USA.50The
Roslin Institute and Royal (Dick) School of Veterinary Studies,
University of Edinburgh, Roslin, Midlothian, EH25 9PS, UK.
51Department of Veterinary Microbiology and Pathology, Wash-
ington State University, Pullman, WA 99164, USA.52Division of
Infection and Immunity, The Roslin Institute, Royal (Dick)
School of Veterinary Science, University of Edinburgh, Roslin,
Midlothian, EH25 9RG, UK.53Immunology Division, Institute
for Animal Health, Compton, RG20 7NN, UK.54Department of
Basic Veterinary Sciences, University of Helsinki, Post Office
Box 66, Helsinki, FIN-00014, Finland.55Genome Dynamics
and Evolution, Wellcome Trust Sanger Institute, Hinxton,
Cambridge, CB10 1SA, UK.56Embrapa Recursos Genéticos e
Biotecnologia, Final Avenida W/5 Norte, Brasilia, DF, 70770-
UNESP—Sao Paulo State University, Aracatuba, SP, 16050-
77843, USA.59JS Davies Epigenetics and Genetics Group,
School of Agriculture, Food & Wine and Research Centre for
Reproductive Health, The University of Adelaide, Roseworthy
Campus, Roseworthy, SA, 5371, Australia.60Department of
Animal and Dairy Sciences, Mississippi Agricultural and For-
estry Experiment Station, Mississippi State University, Mis-
sissippi State, MS 39762, USA.61Parco Tecnologico Padano,
Via Einstein, Polo Universitario, Lodi, 26900, Italy.
Keogh Livestock and Range Research Laboratory, USDA-
Agricultural Research Service, Miles City, MT 59301, USA.
63Laboratory of Genetics, National Institute of Mental Health,
NIH, Building 49, B1EE16, 49 Convent Drive, Bethesda, MD
47Division of Animal Science and Resource,
49Department of Veterinary and Animal Sciences,
57Animal Production and Health Department,
58Texas AgriLife Research, College Station, TX
Cancer Agency, 675 West 10th Avenue, Vancouver, BC, V5Z
1L3, Canada.65Division of Animal Sciences, National Institute
66Bioinformatics Research Center, Baylor College of Medicine,
One Baylor Plaza, Houston, TX 77030, USA.67Department of
Biomolecular Engineering, University of California at Santa
Cruz, Santa Cruz, CA 95064, USA. 68Department of Computer
Science, University of Houston, Houston, TX 77204–3010,
Computational Biology and Bioinformatics, Columbia Uni-
versity, New York, NY 10032, USA.70INRA, Animal Genetics
Jouy-en-Josas, France. 71Pfizer Animal Genetics, Pfizer Animal
Health, New York, NY 10017, USA.72Faculty of Veterinary
Science, University of Sydney, Camden, NSW, 2570, Australia.
73Centre for Integrative Genetics and Department of Animal
and Aquacultural Sciences, Norwegian University of Life
Sciences, Arboretveien 6, Ås, 1432, Norway.
Animal Sciences, University of Missouri, 920 East Campus
Drive, Columbia, MO 65211, USA.75Department of Medical
Biochemistry and Microbiology, Uppsala University, Uppsala
Biomedical Centre Husargatan 3, Uppsala, 75 123, Sweden.
76Shirakawa Institute of Animal Genetics, Nishigo, Fukushima
961-8061, Japan.77Department of Biochemistry and Immu-
Av Bandeirantes 3900, Ribeirão Preto, SP, 14049-900, Brazil.
78Biotechnology and Food Research, MTT Agrifood Research
Finland, Jokioinen, FI-31600, Finland.79EEA Rafaela, Instituto
Unit, USDA–Agricultural Research Service, Lubbock, TX 79403,
USA.81Department of Chemistry and Biomolecular Sciences
& ARC Centre of Excellence in Bioinformatics, Macquarie
University, Sydney, 2109, NSW, Australia.82Department of
N1G2W1, Canada.83Instituto de Investigaciones en Bio-
medicina y Ciencias Aplicadas, Universidad de Oriente, Avenida
of Animal and Poultry Science, University of Saskatchewan,
Saskatoon, SK, S7N 5A8, Canada.85INRA–UR1196, Génomique
et Physiologie de la Lactation, F78352 Jouy-en-Josas, France.
86Department of Animal Science, University of Ljubljana, Groblje
Research Institute for the Biology of Farm Animals (FBN),
Dummerstorf, 18196, Germany.
Sciences, University of Wisconsin–Madison, 1805 Linden Drive,
Biology, University of Guelph, Guelph, ON, N1G 2W1, Canada.
90Cell Biology and Biophysics, European Molecular Biology
Laboratory (EMBL)–Heidelberg, Meyerhofstrasse 1, Heidelberg,
Agrario de Castilla y Leon (ITACyL), Carretera de Burgos km 119,
Valladolid, 47071, Spain.
University of Wisconsin, Madison, WI 53706, USA.93Monsanto
Company, 3302 SE Convenience Blvd, Ankeny, IA 50021, USA.
94Genes & Genetic Resources Molecular Analysis Lab, National
Bureau of Animal Genetic Resources, Baldi Bye Pass, Karnal,
Haryana, 132001, India.95Lacombe Research Centre, Agri-
2W6, Canada.97Zootechnics Institute, Università Cattolica del
Sacro Cuore, via Emilia Parmense 84, Piacenza, 29100, Italy.
98Department of Animal Science, University of California at
Davis, One Shields Avenue, Davis, CA 95616, USA.99Departa-
mento de Ciências Biológicas, Faculdade de Ciências e Letras,
UNESP–São Paulo State University, Av Dom Antônio 2100,
Vila Tênis Clube, Assis, SP, 19806-900, Brazil.100Depart-
ment of Basic Sciences and Aquatic Medicine, Norwegian
School of Veterinary Science, Post Office Box 8146 Dep,
Oslo, NO-0033, Norway.
Favret, Instituto Nacional de Tecnología Agropecuaria (INTA),
Las Cabañas y de Los Reseros s/n CC25, Castelar, Buenos Aires,
B1712WAA, Argentina.102Department of Biochemistry, Yong
Loo Lin School of Medicine, National University of Singapore,
8 Medical Drive, Singapore, 117597, Singapore.103Grupo
CENTAURO, Universidad de Antioquia, Medellín, Colombia.
104Department of Chemistry, University of Illinois, Urbana,
IL 61801, USA.105Eskitis Institute for Cell and Molecular
Therapies, Griffith University, Nathan, QLD, 4111, Australia.
64Genome Sciences Centre, British Columbia
69Department of Computer Science and Center for
88Department of Animal
92Department of Dairy Science,
101Instituto de Genética Ewald
VOL 32424 APRIL 2009
on April 23, 2009
106Nutrition and Food Sciences, Utah State University,
Logan, UT 84322, USA.107Animal Disease Research Unit,
USDA–Agricultural Research Service, Pullman, WA 99164,
versity of Iowa, 51 Newton Road, Iowa City, IA 52242,
USA.109Department of Animal Science, 211 Terrill, Uni-
108Department of Pharmacology, 2-344 BSB, Uni-
versity of Vermont, 570 Main Street, Burlington, VT
Supporting Online Material
Materials and Methods
Figs. S1 to S23
Tables S1 to S14
10 December 2008; accepted 16 March 2009
Genome-Wide Survey of SNP
Variation Uncovers the Genetic
Structure of Cattle Breeds
The Bovine HapMap Consortium*
The imprints of domestication and breed development on the genomes of livestock likely differ
from those of companion animals. A deep draft sequence assembly of shotgun reads from a single
Hereford female and comparative sequences sampled from six additional breeds were used to
develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from
19 geographically and biologically diverse breeds. These data show that cattle have undergone a
rapid recent decrease in effective population size from a very large ancestral population, possibly
due to bottlenecks associated with domestication, selection, and breed formation. Domestication
and artificial selection appear to have left detectable signatures of selection within the cattle genome,
yet the current levels of diversity within breeds are at least as great as exists within humans.
cattle (Bos taurus), this resulted in the develop-
he emergence of modern civilization was
accompanied by adaptation, assimilation,
and interbreeding of captive animals. In
ment of individual breeds differing in, for ex-
ample, milk yield, meat quality, draft ability, and
tolerance or resistance to disease and pests. How-
ever, despite mapping and diversity studies (1–5)
and the identification of mutations affecting some
structure and history of cattle are not known.
Cattle occur as two major geographic types,
the taurine (humpless—European, African, and
Asian) and indicine (humped—South Asian, and
East African), which diverged >250 thousand
years ago (Kya) (3). We sampled individuals
representing 14 taurine (n = 376), three indicine
(n = 73) (table S1), and two hybrid breeds (n =
48), as well as two individuals each of Bubalus
quarlesi and Bubalus bubalis, which diverged
from Bos taurus ~1.25 to 2.0 Mya (9, 10). All
breeds except Red Angus (n = 12) were rep-
resented by at least 24 individuals. We preferred
individuals that were unrelated for ≥4 genera-
tions; however, each breed had one or two sire,
dam, and progeny trios to allow assessment of
Single-nucleotide polymorphisms (SNPs) that
were polymorphic in many populations were pri-
marily derived by comparing whole-genome se-
quence reads representing five taurine and one
indicine breed to the reference genome assembly
obtained from a Hereford cow (10) (table S2).
This led to the ascertainment of SNPs with high
minor allele frequencies (MAFs) within the dis-
covery breeds (table S5). Thus, as expected, with
trio progeny removed, SNPs discovered within
the taurine breeds had higher average MAFs
*The full list of authors with their contributions and affiliations
is included at the end of the manuscript.
Principal Component 1
Principal Component 2
Fig. 1. (A) Population structure assessed by InStruct. Bar plot, generated
by DISTRUCT, depicts classifications with the highest probability under
the model that assumes independent allele frequencies and inbreeding
coefficients among assumed clusters. Each individual is represented by a
vertical bar, often partitioned into colored segments with the length of
each segment representing the proportion of the individual’s genome
from K = 2, 3, or 9 ancestral populations. Breeds are separated by black
lines. NDA, N'Dama; SHK, Sheko; NEL, Nelore; BRM, Brahman; GIR, Gir;
SGT, Santa Gertrudis; BMA, Beefmaster; ANG, Angus; RGU, Red Angus;
HFD, Hereford; NRC, Norwegian Red; HOL, Holstein; LMS, Limousin; CHL,
Charolais; BSW, Brown Swiss; JER, Jersey; GNS, Guernsey; PMT, Piedmontese;
RMG, Romagnola. (B) Principal components PC1 and PC2 from all SNPs.
Taurine breeds remain separated from indicine breeds, and admixed breeds
24 APRIL 2009 VOL 324
on April 23, 2009