The zebrafish reference genome sequence and its
relationship to the human genome
Kerstin Howe1*, Matthew D. Clark1,2*, Carlos F. Torroja1,3, James Torrance1, Camille Berthelot4,5,6, Matthieu Muffato7,
John E. Collins1, Sean Humphray1,8, Karen McLaren1, Lucy Matthews1, Stuart McLaren1, Ian Sealy1, Mario Caccamo2,
Carol Churcher1, Carol Scott1, Jeffrey C. Barrett1, Romke Koch9, Gerd-Jo ¨rg Rauch10, Simon White1, William Chow1, Britt Kilian1,
Leonor T. Quintais7, Jose ´ A. Guerra-Assunça ˜o7, Yi Zhou11, Yong Gu1, Jennifer Yen1, Jan-Hinnerk Vogel1, Tina Eyre1,
Seth Redmond1, Ruby Banerjee1, Jianxiang Chi1, BeiyuanFu1, ElizabethLangley1, Sean F. Maguire1, Gavin K.Laird1, David Lloyd1,
Emma Kenyon1, Sarah Donaldson1, Harminder Sehra1, Jeff Almeida-King1, Jane Loveland1, Stephen Trevanion1, Matt Jones1,
Mike Quail1, Dave Willey1, Adrienne Hunt1, John Burton1, Sarah Sims1, Kirsten McLay1, Bob Plumb1, Joy Davis1, Chris Clee1,
Karen Oliver1, Richard Clark1, Clare Riddle1, David Eliott1, Glen Threadgold1, Glenn Harden1, Darren Ware1, Beverly Mortimer1,
Giselle Kerry1, Paul Heath1, Benjamin Phillimore1, Alan Tracey1, Nicole Corby1, Matthew Dunn1, Christopher Johnson1,
Christopher Stevens1, Joanna Harley1, Karen Holt1, Georgios Panagiotidis1, Jamieson Lovell1, Helen Beasley1, Carl Henderson1,
Daria Gordon1, Katherine Auger1, Deborah Wright1, Joanna Collins1, Claire Raisen1, Lauren Dyer1, Kenric Leung1,
Lauren Robertson1, Kirsty Ambridge1, Daniel Leongamornlert1, Sarah McGuire1, Ruth Gilderthorp1, Coline Griffiths1,
Deepa Manthravadi1, Sarah Nichol1, Gary Barker1, Siobhan Whitehead1, Michael Kay1, Jacqueline Brown1, Clare Murnane1,
Emma Gray1, Matthew Humphries1, Neil Sycamore1, Darren Barker1, David Saunders1, Justene Wallis1, Anne Babbage1,
Sian Hammond1, Maryam Mashreghi-Mohammadi1, Lucy Barr1, Sancha Martin1, Paul Wray1, Andrew Ellington1,
Nicholas Matthews1, Matthew Ellwood1, Rebecca Woodmansey1, Graham Clark1, James Cooper1, Anthony Tromans1,
Darren Grafham1, Carl Skuce1, Richard Pandian1, Robert Andrews1, Elliot Harrison1, Andrew Kimberley1, Jane Garnett1,
Nigel Fosker1, Rebekah Hall1, Patrick Garner1, Daniel Kelly1, Christine Bird1, Sophie Palmer1, Ines Gehring10, Andrea Berger10,
Christopher M. Dooley1,10, Zu ¨beyde Ersan-U¨ru ¨n10, Cigdem Eser10, Horst Geiger10, Maria Geisler10, Lena Karotki10, Anette Kirn10,
Judith Konantz10, Martina Konantz10, Martina Oberla ¨nder10, Silke Rudolph-Geiger10, Mathias Teucke10, Kazutoyo Osoegawa12,
Baoli Zhu12, Amanda Rapp13, Sara Widaa1, Cordelia Langford1, Fengtang Yang1, Nigel P. Carter1, Jennifer Harrow1, Zemin Ning1,
Pieter J. de Jong12, Leonard I. Zon11, John H. Postlethwait13, Christiane Nu ¨sslein-Volhard10, Tim J. P. Hubbard1,
Hugues Roest Crollius4,5,6, Jane Rogers1,2& Derek L. Stemple1
Zebrafish have become a popular organism for the study of verte-
brate gene function1,2. The virtually transparent embryos of this
species, and the ability to accelerate genetic studies by gene knock-
downor overexpression, haveled tothe widespread use of zebrafish
ingly, the study of human genetic disease3–5. However, for effective
modelling of human genetic disease it is important to understand
orthologous human genes. To examine this, we generated a high-
quality sequence assembly of the zebrafish genome, made up of an
ordered and oriented using a high-resolution high-density meiotic
map. Detailed automatic and manual annotation provides evidence
of more than 26,000 protein-coding genes6, the largest gene set of
one obvious zebrafish orthologue. In addition, the high quality of
mic features such as a unique repeat content, a scarcity of pseudo-
genes, an enrichment of zebrafish-specific genes on chromosome 4
and chromosomal regions that influence sex determination.
able organism in the 1980s. The systematic application of genetic
screens led to the phenotypic characterization of a large collection of
duce defects in a variety of organ systems with pathologies similar to
human disease. Such investigations have also contributed notably to
our understanding of basic vertebrate biology and vertebrate deve-
lopment. In addition to enabling the systematic definition of a large
range of early developmental phenotypes, screens in zebrafish have
contributed more generally to our understanding of the factors con-
trolling the specification of cell types, organ systems and body axes of
Although its contributions have already been substantial, zebrafish
research holds further promise to enhance our understanding of the
detailed roles of specific genes in human diseases, both rare and com-
mon. Increasingly, zebrafish experiments are included in studies of
human genetic disease, often providing independent verification of
the activity of a gene implicated in a human disease3,5,10. Essential to
this enterpriseis a high-qualitygenomesequence and complete anno-
tation of zebrafish protein-coding genes with identification of their
*These authors contributed equally to this work.
d’Ulm, Paris F-75005, France.6CNRS, UMR 8197, 46 rue d’Ulm, Paris F-75005, France.7EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
8Illumina Cambridge, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1XL, UK.9Hubrecht Laboratory, Uppsalalaan 8, 3584 CT Utrecht, The Netherlands.10Max Planck Institute for
DevelopmentalBiology,Spemannstraße35, 72076Tu ¨bingen,Germany.11StemCell Programand DivisionofHematologyand Oncology,Children’sHospital and DanaFarberCancer Institute,1 Blackfan
Circle, Karp 7, Boston, Massachusetts 02115, USA.12Children’s Hospital Oakland, 747 52nd Street, Oakland, California 94609, USA.13Institute of Neuroscience, University of Oregon, 1254 University of
Oregon, 222 Huestis Hall, Eugene, Oregon 97403-1254, USA.14Karlsruhe Institute of Technology (KIT), Campus North, Institute of Toxicology and Gentics (ITG), Hermann-von-Helmholtz-Platz 1, 76344
Eggenstein-Leopoldshafen, Germany.15Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA.
0 0 M O N T H 2 0 1 3 | V O L 0 0 0 | N A T U R E | 1
Macmillan Publishers Limited. All rights reserved
The zebrafish genome-sequencing project was initiated at the
Wellcome Trust Sanger Institute in 2001. We chose Tu ¨bingen as the
zebrafish reference strain as it had been used extensively to identify
and mouse genome projects. The Zv9 assembly is a hybrid of high-
sequence (17%), with a total size of 1.412gigabases (Gb) (Table 1). The
map called the Sanger AB Tu ¨bingen map (SATmap), named after the
strains of zebrafish used to make the map (Supplementary Information).
Zebrafish are members of the teleostei infraclass, a monophyletic
group that is thought to have arisen approximately 340 million years
agofromacommonancestor11. Comparedtoothervertebrate species,
this ancestor underwent an additional round of whole-genome dupli-
cation (WGD) called the teleost-specific genome duplication (TSD)12.
Gene duplicates that result from this process are called ohnologues
(after Susumu Ohno who suggested this mechanism of gene duplica-
tion)13. Zebrafish possess 26,206 protein-coding genes6, more than
any previously sequenced vertebrate, and they have a higher number
of species-specific genes in their genome than do human, mouse or
chicken. Some of this increased gene number is likely to be a con-
sequence of the TSD.
A direct comparison of the zebrafish and human protein-coding
genes reveals a number of interesting features. First, 71.4% of human
genes have at least one zebrafish orthologue, as defined by Ensembl
genes have a one-to-one relationship with a zebrafish orthologue. The
second largest orthology class contains human genes that are assoc-
iated with many zebrafish genes (the ‘one-human-to-many-zebrafish’
class), with an average of 2.28zebrafish genes for each human gene,
clearly identifiable zebrafish orthologue; for example, the leukaemia
inhibitory factor (LIF), oncostatin M (OSM) or interleukin-6 (IL6)
genes, although the receptors lifra, lifrb, osmr and il6r are clearly
present in the zebrafish genome. It is possible that zebrafish proteins
with functionally similar activities to LIF, OSM and IL-6 exist, but that
their sequence divergence is so great that they cannot be recognized as
but does have an orthologue of the BRCA1-associated BARD1 gene,
which encodes an associated and functionally similar protein and a
brca2 gene, which plays an important role in oocyte development,
probably reflecting its role in DNA damage repair15.
Zebrafish have been used successfully to understand the biological
detail3–5. To investigate the number of potential disease-related genes,
we compared the list of human genes possessing at least one zebrafish
orthologue with the 3,176 genes bearing morbidity descriptions that
are listed in the Online Mendelian Inheritance in Man (OMIM) data-
base. Of these morbid genes, 2,601 (82%) can be relatedto at least one
zebrafish orthologue. A similar comparison identified at least one
zebrafish orthologue for 3,075 (76%) of the 4,023 human genes impli-
cated in genome wide association studies (GWAS).
Zv9 shows an overall repeat content of 52.2%, the highest reported
so far in a vertebrate. All other sequenced teleost fish exhibit a much
lower repeat content, with an average of less than 30%. This result
Alternatively,therepeatcontent of the othersequencedteleost species
may be under-represented, as these assemblies are mostly WGS16.
are type I (retrotransposable elements), with more than 4.3million
placements covering 44% of the sequence, whereas only 11% of the
zebrafish genome sequence is covered by type I elements in less than
excess of type II DNA transposable elements. Indeed, 2.3million
instances of type II DNA transposable elements cover 39% of the
zebrafish genome sequence (Supplementary Table 12), whereas type
II repeats cover only 3.2% of the human genome.
This pronounced abundance of type II transposable elements is
unique among the sequenced vertebrate genomes, and the genome
sequence shows evidence of recently active type II transposable ele-
II transposable elements is Xenopus tropicalis (25% type II transpos-
able elements), whereas the sequenced and annotated teleost fish (the
pufferfish Takifuguand Tetraodon, the three-spined stickleback (Gas-
terosteus aculeatus) and the medaka (Oryzias latipes)) each possess
type II transposable element coverage of less than 10%, which may
relate to the fact that the zebrafish genome diverges basally from the
other sequenced and annotated teleost genomes17. Zebrafish type II
transposable elements are divided into 14superfamilies with 401
repeat families in total (Supplementary Table 12). The DNA and
hAT superfamilies are the most abundant and diverse in the zebrafish
genome, together covering 28% of the sequence. The type II transpos-
able element abundance of zebrafish, or lack of retrotransposable ele-
ments, may provide an explanation for the low zebrafish pseudogene
content (Supplementary Table 14).
sive heterochromatin. Chromosome 4 is known to be late-replicating
and hybridization studies suggest that genomic copies of 5S ribosomal
DNA (rDNA), which are not notably present on any other chro-
mosome, are scattered along the long arm at high redundancy18.
Table 1 | Assembly and annotation statistics for the Zv9 assembly
Total clone length(bp)
Total WGS31 contig length(bp)
Placed scaffold length(bp)
Unplaced scaffold length(bp)
Maximum scaffold length(bp)
No. of clones
No of WGS31 contigs
No. of placed scaffolds
No. of unplaced scaffolds
Immunoglobulin/T-cell receptor gene segments
Data are based on Ensembl version 67. N50, the scaffold size above which 50% of the total length of the sequence assembly can be found.
Table 2 | Comparison of human and zebrafish protein-coding genes
and their orthology relationships
Relationship typeHuman Core relationshipZebrafish Ratio
One to one
One to many
Many to one
Many to many
Data and orthology relationship definitions are based on Ensembl Compara version 67 (http://
2 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 3
Macmillan Publishers Limited. All rights reserved
shows a remarkable increase in repeat content, which continues
through to the telomere of the long arm. At approximately 27Mb,
the otherwise uniform presence of the satellite repeat SAT-2 on the
long arm ends abruptly. This location is also the starting point of
uniform MOSAT-2 distribution, a satellite repeat that is nearly absent
from all other chromosomes but highly enriched on the long arm of
chromosome 4. The subtelomeric region of the long arm shows a
distinct distribution of repeat elements, with relatively fewer inter-
spersed elements and an increased content of satellite, simple and
tandem repeats that do not harbour 5SrDNA sequences. Moreover,
thegene content isreducedonthelongarmandtheguanine–cytosine
content is slightly increased.
0 1020 3040 5060
~7 million SNPs
430 F2s genotyped
at 140,000 SNPs
23456789 10 11
1213 14 15 16 1718 19 20 21 22 23 24 25
–log10 P value
20 304050 60
Figure 2 | Sex determination signal on
sequenced to approximately 403 depth using
Illumina GAII technology. We found
approximately 7million SNPs between the two
SATmap founders. This number of SNPs between
just two homozygous zebrafish individuals is far in
excessof that seen between anytwohumans andis
human diploid genomes29. Genetically identical,
heterozygous F1fish of both sexes resulted from
crossing the founders. The F1individuals were
crossed to generate a panel of F2individuals, each
with its own unique set of meiotic recombinations
between AB and Tu ¨bingen (Tu ¨) chromosomes,
which were uncovered by dense genotyping with a
set of 140,306 SNPs covering most of the genome.
b, Genome-wide Pvalues for tests of genotype
difference between sexes, arranged by
chromosome. The dotted line corresponds to
differences that are expected once in 100 random
genome scans, and the dashed line corresponds to
scans. The only locus that is statistically significant
at these levels is on chromosome 16. c, Genotype
frequencies for malesandfemales on chromosome
16. The grey line at 0.5 corresponds to expectation
for heterozygotes (solid lines) and the grey line at
0.25 corresponds to expectation for homozygotes
(dashed and dotted lines). The light grey shaded
box corresponds to the region in which empirical
P,0.01, the dark grey shaded box corresponds to
the region in which P,0.001.
Figure 1 | Landscapeofchromosome4. a,Exoncoverage(blue),stackedwith
I transposable elements (red), type II transposable elements (grey) and other
repeat types (blue), including dust, tandem and satellite repeats. c, Sequence
composition (grey bars,clones; blue bars, WGS contigs). d, Genetic marker
placements (red, SATmap markers; blue,heat shock meiotic map markers;
black,Massachusetts General Hospital meiotic map markers). Marker
placements have been normalized so that the maps can be compared. Near-
24.4Mb (Z20450)28. The xaxis shows the chromosomal position in Mb. a and
b were calculated as percentage coverage over 1-Mb overlapping windows
100-kb windows. The y axis for d shows the normalizationof marker positions
relative to the span of the individual map. Similar graphs for the other
chromosome are provided in the Supplementary Information.
0 0 M O N T H 2 0 1 3 | V O L 0 0 0 | N A T U R E | 3
Macmillan Publishers Limited. All rights reserved
The long arm of chromosome 4 also has a special structure with
respect to gene orthology and synteny. Approximately 80% of the
genes present have no identifiable orthologues in human. In fact,
110 genes (out of 663) have no identifiable orthologues in any other
sequenced teleost genome and indeed seem to be zebrafish-specific
gene families alone providing 77.5% of the genes, the largest of which
contains noless than109duplicatesinthisregion.Thelargestof these
families correspond to NOD-like receptor proteins19with putative
roles in innate immunity and zinc finger proteins. We also observed
a very high density of small nuclear RNAs (snRNAs) on chromosome
4, and in particular those that encode spliceosome components. The
cohortof snRNAscarried on the long arm of chromosome 4 accounts
for 53.2% of all snRNAs in the zebrafish genome. In addition, in a
specific groupof zebrafish derived recently froma natural population,
the subtelomeric region of the long arm of chromosome 4 has been
found to contain a major sex determinant with alleles that are 100%
ment, suggesting that this chromosome may be, might have been, or
may be becoming, a sex chromosome in this particular population20.
rate genomic regions have been identified as influencing sex deter-
mination, and these vary between the strains and even within the
families studied20,21. Our meiotic map, SATmap, which was generated
to anchor the genomic sequence, provided an opportunity to examine
SATmap we took advantage of the fact that it is possible to create
double haploid individuals that contain only maternally derived
DNA, that are homozygous at every locus and that can be raised until
they are fertile22(Fig. 2a). To investigate the interesting finding that
SATmapF1fish could beeithermale or femalewhilebeing genetically
identical and heterozygous at every polymorphic locus, we sought a
genetic signal for sex determination in the F2generation, in which
these polymorphisms segregate. Using morphological secondary sex-
Although most chromosomes showed no significant genetic bias for a
particular sex, we found that most of chromosome 16 carried a strong
signal (P59.131027) with a broad peak around the centromere
(Fig. 2b, c). Homozygotes for the Tu ¨bingen (grandmaternal) allele
had a very high probability of being female, whereas homozygotes for
the AB (grandpaternal) allele were very unlikely to be female (Fig. 2).
The number of protein-coding genes among vertebrates is rela-
tively stable, although even closely relatedspeciesmay show great dis-
paritiesin thenatureof theirprotein-coding gene content. We carried
out a four-way comparison between the proteome of two mammals
fraction of shared and species-specific genes present in each genome
(Fig. 3a). A core group of 10,660genes is found in all four species and
probably approximates an essential set of vertebrate protein-coding
genes. This number is somewhat less than the core set of 11,809ver-
tebrate genes identified previously as being common to three fish
genomes (Tetraodon, medaka, zebrafish) and three amniotes (human,
mouse, chicken)16, butthediscrepancy probablyreflects the improved
annotation of these genomes that often results in fusing fragmented
gene structures. Each taxon has between 2,596 and 3,634 species-
specific genes. The notable excess observed in zebrafish may be a
from the WGD, but with no orthologue in amniotes, are counted as
two specific genes. Furthermore, 2,059genes are found in human,
mouse and zebrafish but not in chicken, and this number is two times
higher than the number of genes that are found in all amniotes but
not in zebrafish (892). It is unclear whether these genes have been
lost along the evolutionary branch leading to the chicken, or whether
this is due to annotation or orthology assignation errors in the chi-
We identified double-conserved synteny (DCS) blocks between all
back and Tetraodon). DCS blocks are defined as runs of genes in the
non-duplicated species that are found on two different chromosomes
adjacent in the duplicated species24. The DCS between zebrafish and
human are represented on either side of each human chromosome
(Supplementary Fig. 15). Using DCS blocks, we identified zebrafish
paralogous genes that are part of DCS blocks and consistent with the
identified 3,440pairs of such ohnologues (26% of the all genes), for a
total of 8,083genes when subsequent duplications are taken into
account. It is notable that although true pairs of ohnologues may exist
within the same chromosome owing to post-TSD rearrangements,
we excluded such cases as we cannot reliably distinguish them from
segmental duplications. This number of ancestral genes retained as
Figure 3 | Evolutionary aspects of the zebrafish genome. a, Orthologue
orthology relationships from Ensembl Compara 63. Genes shared across
species areconsidered in terms ofcopies at the time ofthesplit. Forexample,a
gene that exists in one copy in zebrafish but has been duplicated in the human
relationships between zebrafish chromosomes. Chromosomes are represented
as coloured blocks. The position of ohnologous genes between chromosomes
are linked in grey (for clarity, links between chromosomes that share less than
20ohnologues have been omitted). The image was produced using Circos30.
4 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 3
Macmillan Publishers Limited. All rights reserved
duplicates in zebrafish is higher, both in absolute number and
in proportion, than in other fish genomes (chi-squared test, all
We compared the 8,083zebrafish TSD ohnologues with human
ohnologues originating from the two rounds of WGD that are com-
mon to all vertebrates and find that the two sets overlap strongly (chi-
enriched in specific functions (neural activity, transcription factors)
constraint than genes that have lost their second copy.
A circular representation of ohnologue pairs (Fig. 3b) highlights
chromosomes, or parts of chromosomes, that descended from the
same pre-duplication ancestral chromosome (for example, chromo-
chromosome 16 and chromosome 19 are unique in their one-to-one
conservation of synteny. Consistent with the conservation of synteny,
chromosome 16 and chromosome 19 possess clusters of orthologues
of genes associated with the mammalian major histocompatibility
complex (MHC) as well as the hoxab and hoxaa clusters, respectively,
which are each orthologous to the human HOXA cluster25.
Since the earliest whole-genome shotgun-only assembly became
public in 2002, the zebrafish reference genome sequence has enabled
many new discoveries to be made, in particular the positional cloning
of hundreds of genes from mutations affecting embryogenesis, beha-
viour, physiology, and health and disease. Moreover, the annotated
enrichment reagents, which are accelerating both positional cloning
the zebrafish reference genome sequencing is complete, a few poorly
assembled regions remain, which are being resolved by the Genome
Reference Consortium (http://genomereference.org).
We generated cloned libraries of large fragments of genomic DNA, assembled a
physical map of large-insert clones and completely sequenced a set of minimally
overlapping clones. Inaddition, wegenerated WGSsequencesby end-sequencing
a mixture of large- and short-insert libraries. Overlapping clone sequences were
combined with WGS sequences and tied to the meiotic map, SATmap, which
The sequence data can be found in the BioProject database, under accession
used high-throughput short-read complementary DNA sequencing and obtained
a deep-coverage data set for messenger RNAs expressed in zebrafish at various
incorporating filtered elements from the complementary DNA sequencing gene
build, was merged with the manually curated gene models to produce a compre-
hensive annotation in Ensembl version 67 (http://may2012.archive.ensembl.org/
Danio_rerio/Info/Index). Detailed descriptions of all the methods used for this
project are available in the Supplementary Information.
Received 23 August 2012; accepted 21 March 2013.
Published online 17 April 2013.
1.Driever, W. et al. A genetic screen for mutations affecting embryogenesis in
zebrafish. Development 123, 37–46 (1996).
Haffter, P. et al. The identification of genes with unique and essential functions in
the development of the zebrafish, Danio rerio. Development 123, 1–36 (1996).
of the 16p11.2 copy number variant. Nature 485, 363–367 (2012).
Panizzi, J. R. et al. CCDC103 mutations cause primary ciliary dyskinesia by
disrupting assembly of ciliary dynein arms. Nature Genet. 44, 714–719 (2012).
glycosylation of alpha-dystroglycan. Nature Genet. 44, 581–585 (2012).
Collins, J. E., White, S., Searle, S. M. & Stemple, D. L. Incorporating RNA-seq data
into the zebrafish Ensembl genebuild. Genome Res. 22, 2067–2078 (2012).
Talbot, W. S. et al. A homeobox gene essential for zebrafish notochord
development. Nature 378, 150–157 (1995).
Gritsman, K. et al. The EGF-CFC protein one-eyed pinhead is essential for nodal
signaling. Cell 97, 121–132 (1999).
Ober, E. A., Verkade, H., Field, H. A. & Stainier, D. Y. Mesodermal Wnt2b signalling
positively regulates liver specification. Nature 442, 688–691 (2006).
response to mycobacterial infections. Cell 148, 434–446 (2012).
11. Amores, A., Catchen, J., Ferrara, A., Fontenot, Q. & Postlethwait, J. H. Genome
evolution and meiotic maps by massively parallel DNA sequencing: spotted gar,
an outgroup for the teleost genome duplication. Genetics 188, 799–808
12. Meyer, A. & Schartl, M. Gene and genome duplications in vertebrates: the one-to-
Biol. 11, 699–704 (1999).
13. Wolfe, K. Robustness–it’s not where you think it is. Nature Genet. 25, 3–4 (2000).
14. Vilella, A. J. et al. EnsemblCompara GeneTrees: complete, duplication-aware
phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009).
15. Rodrı ´guez-Mari, A. et al. Roles of brca2 (fancd1) in oocyte nuclear architecture,
gametogenesis, gonad tumors, and genome stability in zebrafish. PLoS Genet. 7,
evolution. Nature 447, 714–719 (2007).
Zool. B 308, 563–577 (2007).
18. Sola,L.& Gornung,E.Classicalandmolecularcytogenetics ofthezebrafish,Danio
rerio (Cyprinidae, Cypriniformes): an overview. Genetica 111, 397–412 (2001).
19. Stein, C., Caccamo, M.,Laird, G. & Leptin, M. Conservation and divergence ofgene
families encoding components of innate immune response systems in zebrafish.
Genome Biol. 8, R251 (2007).
20. Anderson, J. L. et al. Multiple sex-associated regions and a putative sex
chromosome in zebrafish revealed by RAD mapping and population genomics.
PLoS ONE 7, e40701 (2012).
21. Bradley, K. M. et al. An SNP-based linkage map for zebrafish reveals sex
determination loci. G3 (Bethesda) 1, 3–9 (2011).
22. Streisinger, G., Walker, C., Dower, N., Knauber, D. & Singer, F. Production of clones
of homozygous diploid zebra fish (Brachydanio rerio). Nature 291, 293–296
23. Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient
genome duplication in the yeast Saccharomyces cerevisiae. Nature 428, 617–624
the early vertebrate proto-karyotype. Nature 431, 946–957 (2004).
25. Amores, A. et al. Developmental roles of pufferfish Hox clusters and genome
evolution in ray-fin fish. Genome Res. 14, 1–10 (2004).
26. Kettleborough, R. N. W. et al. A systematic genome-wide analysis of zebrafish
protein-coding gene function. Nature (in the press).
27. Varshney, G. K. et al. A large-scale zebrafish gene knockout resource for the
genome-wide study of gene function. Genome Res. 23, 727–735 (2013).
28. Freeman, J. L. et al. Definition of the zebrafish genome using flow cytometry and
cytogenetic mapping. BMC Genomics 8, 195 (2007).
29. The 1000 Genomes Project Consortium An integrated map of genetic variation
from 1,092 human genomes. Nature 491, 56–65 (2012).
30. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics.
Genome Res. 19, 1639–1645 (2009).
Supplementary Information is available in the online version of the paper.
Acknowledgements We wish to thank R. Durbin, E. Birney, A. Scally, C. P. Ponting,
zebrafish information network (ZFIN) for funding part of the manual annotation of the
zebrafish genome and the ZFIN staff for support with gene nomenclature and other
and improvementofthe zebrafishgenomeassembly. Weare indebted tothe Ensembl
team for providing a browser and database that greatly facilitated the use and the
analyses of the zebrafish genome. We thank A. Pirani at Affymetrix for genotyping
GM085318 (to J.H.P.), NIH grant P01 HD22486 (to J.H.P.) and R01 OD011116 (later
changed to R01 RR020833) (to J.H.P.). We would like to acknowledge the support
of the European Commission’s Sixth Framework Programme (contract no.
HEALTH-F4-2010-242048, ZF-HEALTH). R.G. was supported by the German Human
Genome Project (DHGP Grant 01 KW 9627 and 01 KW 9919). C.N.-V., G.-J.R. and R.G.
Project at the Wellcome Trust Sanger Institute was funded by Wellcome Trust grant
Author Contributions K.H., M.D.C., D.L.S., C.B., H.R.C., A.E. and K.M. wrote the
produced the SATmap. Z.N. and Y.G. produced the WGS31 assembly. J.T., W.C. and
C.F.T. generated the Zv9 assembly. Previous assemblies were produced by M.C., who
developed the first assembly integration process,andby S.R., T.E. and I.S. coordinated
by K.H. The analyses and figures for the manuscript were produced by J.T., K.H., C.B.,
M.M., J.H., L.T.Q., J.A.G.-A. and J.Y. K.A., J.W., S.P., J.C., G.T., G.H., G.G., P.H. and B.K. are
involved in the ongoing improvement of the zebrafish genome assembly. Manual
annotationwas produced byG.K.L., D.L., E.K., S.D., H.S., J.A.-K. andJ.L.andcoordinated
byJ.H. and M.W.Automated annotation(Ensembl) wasprovided byJ.E.C., S.W., J.-H.V.,
S.T. and S.M.J.S. The genome sequencing was carried out by C.C., K.M., S.M., C.S., J.C.,
B.F., E.L., S.F.M., M.J., M.Q., D.W., A.H., J.B., S.S., K.M., B.P., J..D., C.C., K.O., B.M., G.K., B.P.,
L.D., K.L., L.R., K.A., D.L., S.M., R.G., C.G., D.M., S.N., G.B., S.W., M.K., J.B., C.M., E.G., M.H.,
0 0 M O N T H 2 0 1 3 | V O L 0 0 0 | N A T U R E | 5
Macmillan Publishers Limited. All rights reserved
A.T., D.G., C.S., R.P., R.A., E.H.,A.K., J.G., N.F., R.H.,P.G., D.K., C.B. and S.P. Thegeneration Download full-text
of maps used in the initial assemblies and the production of clone tiling paths were
K.O., B.Z. and P.J.d.J. generated and provided clone libraries. The Zebrafish Genome
Project was coordinated by L.I.Z., J.H.P., C.N.-V., T.J.P.H., J.R. and D.L.S.
Author Information Sequence data have been submitted to the BioProject database
under accession PRJNA11776. Reprints and permissions information is available at
www.nature.com/reprints. The authors declare no competing financial interests.
Readers are welcome to comment on the online version of the paper.
Correspondence and requests for materials should be addressed to D.L.S.
This work is licensed under a Creative Commons Attribution-
NonCommercial-Share Alike 3.0 Unported licence. To view a copy of this
licence, visit http://creativecommons.org/licenses/by-nc-sa/3.0
6 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 3
Macmillan Publishers Limited. All rights reserved