A 360-kb interchromosomal duplication of the human HYDIN locus
Norman A. Doggetta,⁎, Gary Xiea, Linda J. Meinckea, Robert D. Sutherlanda, Mark O. Mundta,
Nicolas S. Berbarib, Brian E. Davyb, Michael L. Robinsonb,1, M. Katharine Ruddc,
James L. Weberd, Raymond L. Stallingse, Cliff Hana
aDOE Joint Genome Institute and Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
bDivision of Molecular and Human Genetics, Children's Research Institute, Ohio State University, 700 Children's Drive, Columbus, OH 43205, USA
cDivision of Human Biology, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, C3-168, Seattle, WA 98109, USA
dCenter for Medical Genetics, Marshfield Medical Research Foundation, 1000 North Oak Avenue, Marshfield, WI 54449, USA
eChildren's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX 78229, USA
Received 13 April 2006; accepted 19 July 2006
Available online 30 August 2006
The HYDIN gene located in human chromosome band 16q22.2 is a large gene encompassing 423 kb of genomic DNA that has been suggested
as a candidate for an autosomal recessive form of congenital hydrocephalus. We have found that the human HYDIN locus has been very recently
duplicated, with a nearly identical 360-kb paralogous segment inserted on chromosome 1q21.1. The duplication, among the largest
interchromosomal segmental duplications described in humans, is not accounted for in the current human genome assembly and appears to be part
of a greater than 550-kb contig that must lie within 1 of the 11 sequence gaps currently remaining in 1q21.1. Both copies of the HYDIN gene are
expressed in alternatively spliced transcripts. Elucidation of the role of HYDIN in human disease susceptibility will require careful discrimination
among the paralogous copies.
© 2006 Elsevier Inc. All rights reserved.
Keywords: Segmental duplication; Interchromosomal duplication; Gene duplication; HYDIN gene; Human chromosome 16; Human chromosome 1
The human HYDIN gene was suggested as a candidate for
congenital hydrocephalus (an accumulation of cerebrospinal
fluid within the ventricles of the brain) based on the finding of
mutations in the mouse Hydin gene, which were found to be
causative for congenital hydrocephalus in hy3 mice . The
human homologue of Hydin, located in chromosome band
16q22.2, contains conserved exons spanning 423 kb of genomic
DNA. While mutations in human HYDIN have not yet been
established for congenital hydrocephalus, there has been one
case of hydrocephalus associated with a reciprocal translocation
very near the cytogenetic location of the HYDIN gene in 16q22,
t(4;16) (q35;q22.1) . Pulsed-field gel analysis of this region
further refined the translocation breakpoint to an altered 1.2-Mb
NotI fragment with a CALB2 probe , which can now be
deduced to encompass the HYDIN locus. CALB2 is 160 kb
distal to HYDIN in the finished sequence spanning this region
and both are contained in a single 1,145,723-bp NotI fragment
(chromosome 16: 69,291,728–70,437,450; NCBI build 35).
Congenital hydrocephalus, an accumulation of cerebrospinal
fluid within the ventricles of the brain, affects an estimated 1 in
1000 live births . The X-linked congenital hydrocephalus
conditions, congenital stenosis of the aqueduct of Sylvius
(HSAS, OMIN 307000) and MASA syndrome (MASA,
OMIN 303350), have been shown to be caused by mutations
common inherited forms of the disease. Evidence exists for
autosomal recessive forms of the disease but this has not been
theidentification of HYDIN as a candidate forhuman congenital
duplicated onto chromosome 1, which complicates the identifi-
cation of mutations that could be associated with disease.
Genomics 88 (2006) 762–771
⁎Corresponding author. Fax: +1 505 665 3024.
E-mail address: firstname.lastname@example.org (N.A. Doggett).
1Current address: Zoology Department, 256 Pearson Hall, Miami University,
Oxford, OH 45056, USA.
0888-7543/$ - see front matter © 2006 Elsevier Inc. All rights reserved.
Identification of 16q22.2–1q21.1 paralogy
The duplication of the HYDIN locus from 16q22.2 to 1q21.1
was identified during finishing of chromosome 16 . Initially
we noticed that in the BAC contig (AC027281→A-
C109135→AC130459→AC099495) we had assembled at
16q22.2, the percentage sequence identity in the overlap
between some clones (∼99.5%) was a little lower than is
usually found between clones of different haplotypes
(∼99.9%). We then generated additional clone coverage from
the RP11 library to recover clones with the same haplotype from
this region. Two clone paths, representing each haplotype, were
assembled after sequencing several additional clones from the
RP11 library, with sequence identity between overlapping
clones nearly 100% (Fig. 1). Surprisingly, two ends of one path
(AC092369→AC109135→AC136634) appeared to extend
into the middle of a sequenced clone, AL049742, belonging to
chromosome 1q21.1. The other clone path (AC027281→A-
C138625→AC099495) fit perfectly into the chromosome 16
map and sequence. The paralogous region between the two
paths is 357.3 kb in length with an average sequence identity of
99.2%. Comparison with mouse chromosome sequences that
are syntenic to human 1q21.1 and 16q22.2 regions revealed
only sequences homologous to the chromosome 16 location,
consistent with this being the original copy of the duplication.
The alignment between the paralogues has one major
interruption due to the insertion of a long interspersed nuclear
element-1 (LINE-1) of 6089 bp in the chromosome 1 paralogue
(Fig. 2). This insertion was accompanied by a duplication of
15 bp, “AAGAAGGGAAAACAC,” which flanks the insertion
site on the chromosome 1 paralogue. There are also two minor
microdeletions of mostly repetitive DNA that have occurred
within the duplicons, one of 1679 bp on chromosome 16 (bases
10,543–12,222 within the chromosome 1 duplicon) and another
of 2092 bp on chromosome 1 (bases 132,520–134,612 within
the chromosome 16 duplicon). Most other differences between
paralogues are small indel changes occurring in simple
sequence repeats. The high level of sequence identity (99.2%)
and minimal amount of rearrangement between the paralogous
copies suggested a very recent duplication event.
To determine how recently the duplication occurred in
primate evolution we designed a pair of primers to detect the
Fig. 1. Physical map of human chromosome 16 and human chromosome 1 encompassing the HYDIN duplication. Accession numbers of sequenced BAC clones and
aligned mRNAs are indicated on the left. Horizontal arrows illustrate transcripts produced primarily from chromosome 16 (red) or chromosome 1 (blue). Six
transcripts from chromosome 16 transverse the boundary of the duplicon, causing their truncation in the chromosome 1 copy. Note that the extent of the duplication
(chromosome 16:69,402,781–69,760,074, NCBI build 35) does not include the entire coding region of the HYDIN gene model. Dashed gray box defines the
boundaries of paralogy between the 16q22.2 and the 1q21.1 contigs. Chromosome band 1q21.1 is shown at the bottom aligned to its base pair position. Black boxes
within this band denote locations of sequence and clone gaps. Possible sites for insertion of the HYDIN duplication contig into gaps within chromosome 1q21.1 are
shown by dashed lines extending to gaps. Solid lines extending to two gaps represent the most likely sites for insertion of the duplication, extending off of contig
NT_034398 in the current assembly.
763N.A. Doggett et al. / Genomics 88 (2006) 762–771
presence of the duplication with PCR by targeting the insertion
site. One pair of primers (FLK) flanked the duplication insertion
site on chromosome 1 and the second pair of primers (JCT)
performed against a panel of primate DNAs including lemur,
spider monkey, macaque, orangutan, gorilla, chimpanzee, and
human (Fig. 3). PCR amplification with the JCT primers,
detecting the HYDIN insertion on chromosome 1q21.1,
occurred only in the human DNA, indicating that the HYDIN
duplication is specific to humans. Amplification with the FLK
primers, detecting the flanking sequence without the duplica-
tion, occurred in all primates tested. We note that amplification
that are shared in clone AL049742 (also within 1q21.1 but not
part of the HYDIN contig). In fact, sequence identity between
clone AL049742 and sequences flanking the HYDIN duplica-
tion are extensive, indicating additional segmental duplications
are involved. We found a similar sequence of extending 41 kb at
99.1% identity between AL049742 and drafted clone
AC092369 immediately adjacent to the 5′ end of the HYDIN
duplication sequence and another 42 kb of sequence shared at
99.6%identitybetweenAL049742 andAC137783 immediately
adjacent to the 3′ end of the HYDIN duplication. These
additional duplications on either side of the HYDIN duplication
made it impossible to design flanking primers that would be
unique to the HYDIN duplication contig.
To confirm the PCR results, which indicated that nonhuman
primates did not contain the HYDIN duplication, we performed
Fig. 3. Detection of the HYDIN duplication in primates. (A) PCR primer design flanking the insertion duplication site on chromosome 1 (FLK) and spanning a
junction of the duplication site on chromosome 1 (JCT). The shaded region represents the site of the HYDIN duplication insertion. FLK primers should produce a
fragmentof 248 bp in the absenceof the inserted duplicationor fromadditionalduplicated flankingsequence present in AL049742.JCT primers producea fragment of
685 bp only when the duplication is present. (B) Agarose gel image of PCR products using FLK and JCT primers against a panel of primate DNAs: lanes 1, lemur; 2,
spider monkey; 3, macaque; 4, orangutan; 5, gorilla; 6, chimpanzee; and 7, human. Marker is 100 bp ladder.
Fig.2.Schematicrepresentation of theHYDINdupliconinsertioninto chromosome1. Thedupliconspans357,302bp of genomicDNAon chromosome16flankedby
Alu and LTR repeats and 364,611 bp of genomic DNA on chromosome 1 flanked by LINE-1 and LTR repeats. The orientations of repeats around the boundaries of the
duplicon are indicated by horizontal arrows. Average sequence identity between the paralogues is 99.2%. Alignment is interrupted by insertion of 6.1 kb of a LINE-1
into the chromosome 1 duplicon and two small microdeletions, one of 1.7 kb within the chromosome 16 duplicon and the other of 2.1 kb within the chromosome 1
764 N.A. Doggett et al. / Genomics 88 (2006) 762–771
FISH with BAC clone RP11-424M24 (AC027281) labeled with
biotin and detected with FITC–avidin against a normal human
cell line, GM08729, and two Pan troglodytes cell lines, PT5
and TANK (Supplemental Fig. 2). The human cell line had
signal on both 16q22 and 1q21. Both chimpanzee cell lines had
FITC signals on 16q22 but no signal on 1q21, confirming that
the HYDIN duplication is confined to humans.
To ascertain the frequency of the HYDIN duplication in
humans we performed FISH and additional PCR typing
experiments. FISH was performed with DNA from BAC
clone RP11-424M24 (AC027281) on chromosomes from 26
individuals, all of whom were found to contain the duplication
homozygously on both 1q21 and 16q22 (Fig. 4).To perform
typing within a larger panel of humans, we designed two pairs
of PCR primers from within the HYDIN duplication that would
detect length differences that would differentiate between the
paralogous copies. These primer pairs are approximately 40 kb
apart and present in the finished sequence of two different
clones from chromosome 16 (AC027281 and AC138625) and
each within two different clones covering the chromosome 1
copy (Table 1). Primer pair HYDIN-indel-61 detects a 61-bp
difference between paralogous copies and primer pair HYDIN-
indel-12 detects a 12-bp difference. These markers were typed
through about 190 individuals from the Human Diversity Panel
(part of the Human Variation Collections of the NIGMS
Repository at the Coriell Cell Repositories, Camden, NJ, USA).
These individuals were Pakistani, African, East Asian, and
Native American. All individuals carried both fragment sizes
with each marker. Also, the intensity ratio for the two bands for
each marker, by eye, did not seem to vary at all among these
individuals. These results showed that among a reasonably large
and diverse sample of humans all contained both chromosome
16 and chromosome 1 copies of the HYDIN duplication and,
further, that there was no evidence for a polymorphic presence
of the chromosome 1 copy.
Genes expressed from the duplication regions
Overall GC content of 41.57% across the duplicon is
relatively low, suggesting it is located in a gene-poor region,
consistent with its 16q22.2 cytogenetic location. The insertion
site in chromosome 1 occurred within a 20-kb window of 38.7%
GC content. To annotate the duplication region for gene content
we first searched the UniGene database from NCBI with the
Blastn program. Thirteen entries were found in the duplication
region (Fig. 1). One mouse gene, Hydin, AY173049, was found
located on mouse chromosome 8, in the region that is
homologous with the chromosome 16 location of the duplica-
tion. Alignments generated using Pipmaker indicated that the
AB058767, AK027571, AL834340, AL833826, AK057467,
AL122038/AL133042, and CR611620) and includes most of
AK074472, AL137259, BC043273, AK022933, BC028351) as
well as most exons of the predicted human HYDIN gene.
Sufficient sequence divergence exists between the paralogous
copies of the HYDIN gene to enable specification of cDNA
origin as being from chromosome 16 or chromosome 1.
Sequences of four mRNAs (AB058767, AK027571,
AL834340, and AL833826) align more closely to the chromo-
some 1 copy of the duplication, suggesting these are expressed
from this site, while nine mRNAs (AK026688, AK074472,
AL137259, BC043273, AK022933, AK057467, AL122038/
AL133042, BC028351, and CR611620) appear expressed from
the chromosome 16 locus based on sequence identity. The
mRNA transcripts appear to represent alternative splicing forms
or partial sequences of a larger HYDIN gene. A list of the
Fig. 4. FISH results on human chromosomes with DNA of RP11-424M24 (AC027281). All 26 individuals contained signals on both chromosome arms at 16q22 and
1q21. A representative FISH image is shown.
PCR primers detecting length differences within the HYDIN duplication
Primer pair Chromosome 16 size and
Chromosome 1 size and
HYDIN-indel-61 153 bp, AC027281
HYDIN-indel-12 175 bp, AC138625
214 bp, AC092369, AC109135
163 bp, AC109135, AC130459
765 N.A. Doggett et al. / Genomics 88 (2006) 762–771
chromosomal origin and tissue source for each of the HYDIN
transcripts is shown in Table 2. Transcripts arising from the
chromosome 16 HYDIN locus were isolated from lung, testis,
NT2 neuronal precursor cells, and Jurkat leukemic Tcells, while
transcripts arising from the chromosome 1 paralogue were
derived only from brain or neuronal cells (brain, amygdala, and
NT2 neuronal precursor cells), suggesting a more limited and
perhaps specialized pattern of expression of the chromosome 1
One of the transcripts expressed from chromosome 1,
AB058767, contains a 15-bp deletion (ACGGAGAAG-
GAGCGC) compared to the chromosome 16 gene structure
(deletion is within exon 12 of the chromosome 16 HYDIN gene
that occur immediately upstream of the deletion, suggesting this
occurred by a local DNA recombination error. This variation
permitted us to test whether differential expression patterns may
exist between paralogous transcripts from within the HYDIN
duplication. The chromosome 1 transcript AB058767 (mRNA
for KIAA1864 protein) was isolated from brain tissue. To
from chromosome 16, we aligned this mRNA sequence to the
chromosome 16 copy of the duplication and designed two pairs
of PCR primers, each with one primer overlapping the 15 bp
deleted from chromosome 1 (and thus unique to chromosome
this site. PCR was performed against a human cDNA panel,
including tissue from heart, brain, placenta, lung, liver, skeletal
muscle, kidney, and pancreas and cDNA from testis. PCR
amplification occurred only in human testis cDNA and in none
of the other cDNA libraries, suggesting that the paralogue to
AB058767 may have a pattern of expression different from that
of its chromosome 1 counterpart or is poorly represented in the
cDNA libraries tested (data not shown).
HYDIN gene model and expression
We established a predicted genomic structure of the human
HYDIN gene by sequence comparison between the mouse
Hydin sequence and the genomic sequence of chromosome 16.
This human HYDIN gene model spans over 423 kb of genomic
DNA and contains 86 exons. Seventy-two exons are identical
with exons from existing mRNAs (Supplemental Fig. 1) and 14
exons are supported with only mouse–human sequence identity.
The resulting full-length transcript is 15,732 bp long and
encodes a putative protein of 5120 amino acids (HYDIN
Supplemental Gene Model). This predicted human HYDIN
shows 80% DNA sequence identity and 77% protein sequence
identity with mouse Hydin. The 357.3-kb duplicon on chromo-
some 1 encompasses 79 exons (6 through 84) of the predicted
HYDIN gene and thus contains only a partial sequence.
Sequence alignments reveal that the mRNAs are clustered in
roughly three locations along the predicted human HYDIN gene
(Fig. 1 and Supplemental Fig. 1). Transcripts located near the 5′
end of HYDIN contain poly(A) tails that indicate ends of genes.
a single transcript expressed as human HYDIN—three primer
pairs from within the clusters of expressed transcripts and two
pairs of primers bridging these three clusters (Fig. 5A). We first
evaluated a panel of commercially available cDNAs prepared
from heart, brain, placenta, lung, liver, skeletal muscle, kidney,
and pancreas. Positive PCR results were observed from within
the clustered transcripts in heart, brain, and lung but only very
faint bands were detected with bridging primers in brain,
placenta, and liver (Fig. 5B). We were concerned that the
faintness of bands with the bridging primers might be due to the
quality of the cDNA libraries and repeated these experiments
using human total testicular RNA in RT-PCR (Fig. 5C). PCR
amplifications were observed with all primers indicating that
transcripts exist in testis that span the three clusters of mRNAs,
confirming the existence of full-length HYDIN transcripts in
human. These results in combination with the various mRNAs
that exist for partial HYDIN transcripts indicate that HYDIN
exhibits a complex expression of various isoforms, which are
subject to alternative splicing. Northern blot analysis of human
larger than 9 kb but smaller than the observed mouse Hydin
even larger HYDIN transcripts in brain or other tissues.
HYDIN gene conservation
While the predicted HYDIN gene product shares significant
similarity (77%) to the mouse Hydin protein, it is not similar to
any known proteins. The function of the mouse Hydin gene
product is still unknown; however, it was reported to contain a
314-amino-acid domain that is similar to the actin binding
protein caldesmon and, due to its expression pattern within
ciliated ependymal cells lining the third and fourth ventricles in
newborn mice, was suggested to have a role associated with
cilia or ciliated structures . Despite its large size, only a few
conserved SCOP domains have been identified in the putative
human HYDIN protein. Amino acids 193–272, 4319–4417,
and 4527–4602 share similarity with PapD, a fimbrial cha-
perone protein (E values of 7.7E−10, 3.7E−10, and 1.4E−5,
respectively), and residues 2270–2421 share similarity with
Chromosomal origin and tissue source of HYDIN transcripts
Retinoic acid-induced NT2
neuronal precursor cells
Retinoic acid-induced NT2
neuronal precursor cells
Jurkat leukemic T cell line
766N.A. Doggett et al. / Genomics 88 (2006) 762–771
apolipophorin-III (E value of 8E−6). Weak but significant
similarity to a major sperm protein (Motile_Sperm) and
putative metallopeptidase (DUF335) for residues 198–292
and 2681–2816, respectively, was found using Pfam (E value of
8.20E−2 and 8.60E−2, respectively). Major sperm proteins are
involved in sperm motility and oligomerize to form filaments.
Since HYDIN lacked similarity to previously known
proteins, we used tBlastn to compare the HYDIN protein to
six frame translations of a variety of recently drafted genomes.
Conserved homologues were found in Fugu rubripes (42.6%
identity over 3157 residues), Ciona intestinalis (49.6%
identity over 4106 residues), and Chlamydomonas reinhardtii
(36.0% identity over 3374 residues) (Supplemental Figs. 3–5).
Putative homologues were also found in genome sequences of
Giardia lamblia, Plasmodium falciparum, and Plasmodium
yoelii yoelii. Ci. intestinalis is an Ascidian (sea squirt) that
possesses a flagellar radial spoke, sperm flagella, and abundant
stigmatal cilia in the branchial basket that generate the feeding
current and abundant frontal cilia in the pharynx that transport
mucus over the walls. Ch. reinhardtii is a unicellular green
algae (Chlorophyta) that swims with two flagella. G. lamblia
(intestinalis) is a protozoan that moves with the aid of five
flagella. Pl. falciparum and Pl. yoelii yoelii, the causative
agents of malaria in humans and mice, are sporozoans that
lack cilia or flagella but may contain inactive genes for cilia or
flagella from their closely related protozoans. Thus, the
HYDIN protein is remarkably well conserved in the primitive
chordate, Ci. intestinalis, and among flagellated Protists and
The duplication of the HYDIN locus is among the largest
interchromosomal duplications in the human genome. The
largest segmental duplications occur on the sex chromosomes
and predominately involve intrachromosomal duplications
within the Y chromosome. Among the autosomes, there are
only three interchromosomal duplications larger than 300 kb
described in the current genome assembly (Supplemental
Table 1). Only the largest of these, a 372-kb segmental
duplication between chromosomes 18 and 21, is bigger than the
HYDIN duplication. There is only one intrachromosomal
duplication among the autosomes that is larger than the
HYDIN duplication, which is a 431-kb intraduplication on
chromosome 10. Excluding the HYDIN duplication itself, the
chromosomal band 16q22.2 containing the HYDIN gene is
quite low in segmental duplications, with only 1.8% (36.6 kb)
of its 2-Mb length identified as segmental duplications. In
contrast the paralogous site of the HYDIN duplication on
chromosome 1q21.1 is among the most duplicated regions in
the genome, with 63.6% of its finished sequence contained in
segmental duplications. Chromosome band 1q21.1 is estimated
to be 4.9 Mb and contains 3,998,164 bp of finished sequence, of
which 2,543,532 bp is contained in segmental duplications. Of
that, 1,917,429 bp are exclusively intrachromosomal duplica-
tions, 485,494 bp are both intra-and interchromosomal
duplications, and 140,609 bp are exclusively interchromosomal
duplications. Eleven clone gaps account for the approximately
902 kb of missing sequence in 1q21.1. Inclusion of the HYDIN
Fig. 5. PCR results against a human cDNA panel with primers designed from the predicted human HYDIN gene. (A) Primer design. Two pairs of primers, 1 and 2,
cover regions of the predicted HYDIN gene between existing mRNA transcripts, and three pairs of primers, 3, 4, and 5, are designed within the sequences covered by
mRNA entries in GenBank. (B) PCR results with primer pairs 1, 2, 3, 4, and 5. Templates: lanes A, heart; B, brain; C, placenta; D, lung; E, liver; F, skeletal muscle; G,
kidney; and H, pancreas. (C) PCR results with primer pairs 1, 2, 3, 4, and 5 using human total testicular RNA. Marker lanes are 100 bp ladder. (D) Northern blot of
human testis RNAwith a Hydin probe. The largest human transcript is well over 9 kb, but shorter than the mouse Hydin transcript. The human sample also shows a
prominent smaller transcript around 6 kb.
767 N.A. Doggett et al. / Genomics 88 (2006) 762–771
duplication and its contig increases the amount of segmental
duplications in this band to approximately 68%. An abundance
of interchromosomal duplications within pericentromeric
regions is well documented (reviewed in ). However,
chromosome band 1q21.1 is adjacent to the constitutive
heterochromatin band 1q12, which is composed of classical
satellite DNA and not the centromere. Similar large constitutive
heterochromatic bands consisting predominately of satellite II
or III DNA are located at 9q12, 16q11.2, and Yq12. Of these,
9q12 and Y12 contain abundant segmental duplications in
adjacent bands 9q13 and Yq11.23, respectively.
The sequence of the chromosome 1 clone contig in which the
paralogous copy of the HYDIN gene resides is not entirely
finished. The end containing the insertion site near the 5′ start of
the paralogous HYDIN transcripts is finished, with two finished
clones (AC136634 and AC137783) and an additional drafted
clone (AC138879) confirming the sequence of this insertion
junction (Fig. 1). Beyond the paralogous HYDIN sequence at
this end of the contig are numerous other copies of previously
identified segmental duplications from 1q21.1. The largest of
these are Segmental Duplications Nos. 361, 447, 449, 455, 552,
572, 574, and 1211, most of which contain numerous other
smaller duplications (see Segmental Dups Track in the UCSC
Genome Browser). The sequence at this end cannot be
consistently aligned with other finished sequences from
chromosome 1q21.1 and therefore must extend into 1 of the
11 sequence gaps of this band. The other end of the contig,
containing the 3′ end of the HYDIN paralogous copy, is
represented by two drafted clones (BX546456 and AC092369).
Beyond the HYDIN duplication at this end are also numerous
other copies of previously identified segmental duplications
from 1q21.1, including Segmental Duplications 361, 363, 456,
472, 486, 491, 522, 523, 551, 970, and 13399 and smaller
duplications contained within most of these (see Segmental
Dups Track in the UCSC Genome Browser). This end of the
contig appears to overlap with the genomic contig NT_034398
from 1q21.1, but confirmation of this possibility will require
finishing of clones BX546456 and AC092369.
of recurrent chromosomal rearrangements [10,11]. While repeti-
1q21.1 parology—Alu (Jo)/LTR repetitive elements at the
chromosome 16 duplicon junction and LINE-1/LTR repetitive
elements at the chromosome 1 junction—these did not share
sequence similarity, and thus the molecular mechanism by which
recombination and remains, for the moment, unclear.
Duplications within chromosomes (intrachromosomal) may
subsequently lead to deletions or rearrangements due to unequal
recombination events during meiosis between nearly identical
copies of the duplicon. This is observed in genomic mutation
disorders such as Charcot–Marie–Tooth 1A, DiGeorge/velo-
cardiofacial syndrome, Prader–Willi and Angelman syndromes,
Smith–Magenis syndrome, Williams–Beuren syndrome, and
neurofibromatosis type 1 (reviewed in ). A polymorphic
interstitial duplication on human chromosome 15q24–q26 has
been shown to be a susceptibility factor for panic and phobic
disorders . Polymorphic interchromosomal duplications of
members of the olfactory receptor gene family occur in the
subtelomeric region of several human chromosomes , and
polymorphic segmental duplications of the defensin gene
cluster have been reported at 8p23.1 .
Unequal crossing-over events (either intrachromosomal or
interchromosomal) occurring between repeat arrays during
mitosis can lead to mosaicism as has been observed for the
15q24–q26 duplicon, which is flanked by chromosome 15-
specific low-copy repeat sequences , and for subtelomeric
interchromosomal repeats on chromosomes 4 and 10 . At
the present time, we do not know whether the duplicated copy
of the HYDIN locus on chromosome 1 can contribute toward
altered susceptibility to hydrocephalus or other conditions. Due
to the high level of sequence identity between the two loci,
however, it will remain a challenge to differentiate positively
between the chromosome 16 and the chromosome 1 copies in
any mutational or expression analysis. Only the first 5 and the
last 2 exons of the predicted 86-exon HYDIN gene model are
unique to chromosome 16 and it is not likely possible to
generate primers that enable specific amplification of the
remaining chromosome 16 HYDIN exons for mutation screen-
ing of patients with autosomal congenital hydrocephalus.
Amplified gene fragments would need to be cloned and several
individual clone isolates sequenced to generate sequencing
reads representing both paralogues.
polymorphic duplications within the genomes of normal
individuals. Sebat et al.  found evidence for 76 large-scale
copy number polymorphisms in the human genome using
representational oligonucleotide microarray analysis of 20
individuals. They reported large-scale polymorphic losses of
862,373 and 209,899 bp in 1q21. The largest of these, which is
linked to Accession Nos. AC138775 and AL583842 in NCBI
chromosome 1, 142,374,952–142,584,851, within NT_004434.
Iafrate et al.  identified 255 large-scale polymorphisms by
and none of these involved the HYDIN locus, agreeing with our
findings that the HYDIN duplication shows no evidence of
structural polymorphism (in the absence of disease).
Recent segmental duplications often serve as “nurseries” for
the evolution of human-specific genes . The duplicated
copies of genes escape the evolutionary pressure of their parent
gene and are more freely able to acquire new functions. The
HYDIN segmental duplication described here provides evi-
dence for gene evolution through gene duplication, point
mutations, alternate splicing, and gene splitting. While a full-
length HYDIN gene is not possible from the chromosome 1 site
due to truncation at the 5′ and 3′ ends relative to chromosome
16, at least four transcripts are expressed from the chromosome
1 paralogue of the HYDIN duplication. These transcripts may
have a more limited range of tissue expression than transcripts
arising from chromosome 16; however, elucidation of any novel
or specialized functions of chromosome 1 HYDIN transcripts
will require further analysis. By comparison, in the mouse,
768N.A. Doggett et al. / Genomics 88 (2006) 762–771
9+2 cilia. The incidence of congenital hydrocephalus appears to
vary among different populations (reviewed in [20,21]). Further
population studies including genomic and gene expression level
analysis are needed to determine any role of the chromosome 1
duplication of the HYDIN locus in disease susceptibility.
On the basis of the June 2002 (NCBI build 30) draft human
genome assembly, a total of 107.4 Mb (3.53%) of the human
genome content was found to be involved in recent segmental
duplications . Subsequent analysis of the finished genome
sequence (NCBI build 35) revealed that segmental duplications
duplication is not represented in the most recent genome
assembly for chromosome 1 (NCBI build 35), its duplication
was detected in Accession Nos. AC027281, AC092369,
AC109135, and AC099495 in the Segmental Duplication
Database (http://humanparalogy.gs.washington.edu/) based on
comparison of Celera whole-genome shotgun sequence data
versus BAC clone sequences (described in ) (see also
WSSD Duplication Track in the UCSC Genome Browser).
and it remained unknown where the additional copy of the
duplicon existed. Upon further analysis of the Whole Genome
with the build 35 human genome assembly,we can account for a
total of approximately 5 Mb of genomic duplications, including
the one in this paper, detected by a high depth of Celera whole-
genome shotgun sequence reads but not identified as segmental
duplications (i.e., present in the WSSD Duplication Track but
absent from the Segmental Dups Track). The missing copies of
the current genome assembly or possibly represent structural
polymorphisms. Additional studies are needed to resolve these
remaining genomic questions.
De novo gene structure prediction
Genscan (http://genes.mit.edu/GENSCAN.html)  and GrailEXP (http://
grail.lsd.ornl.gov/grailexp/)  were used for ab initio gene finding.
Disparities were observed in the results of gene prediction programs, including
different use of splicing sites or exons between different gene models. These ab
initio prediction programs did not reveal any new genes compared with
previously known human or mouse genes, and neither was successful in
predicting the human HYDIN gene, because of its giant size.
Genes identified during annotation
For previously known human mRNA sequence, the genomic structures were
established by sequence comparison and alignment between the human mRNA
sequences and genomic DNA sequence of chromosome 16. The position of
putative exons in the sequence was determined with the SIM4 program (http://
globin.cse.psu.edu/globin/html/) . The primary location of mRNAs was
determined by using the Sim4 and Blastn program  to align and compare
identity with chromosome 16 or chromosome 1 paralogues.
For human HYDIN gene annotation, a tBlastn search was performed by
using the mouse Hydin protein sequence against the human genomic DNA. The
putative human HYDIN protein sequence was extracted after parsing the Blast
result. Further sequence analysis using tBlastn (putative human HYDIN protein
sequence as the query) allowed identification of the predicted HYDIN gene
structure on chromosome 16, and its splice sites were confirmed using the Splice
Site Prediction program (http://www.fruitfly.org/seq_tools/splice.html) and all
86 exons were computationally validated using the Genewise program (http://
Duplicon junction identification
Blast2 alignment (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html) 
was used to compare the 360 kb of paralogous sequence between chromosome
16 and chromosome 1. Sequence alignments were performed with Clustalw
software . After all repetitive elements were determined using the
RepeatMasker program (http://www.repeatmasker.org/), Pipmaker analysis
(http://www.bx.psu.edu/miller_lab/)  was performed to plot 360 kb of
chromosome 16 sequence against its paralogous sequence at chromosome 1.
Sequence analysis of predicted genes
Deduced amino acid sequences were analyzed for N-terminal signal
sequences and transmembrane domains using Psort (http://psort.ims.u-tokyo.
ac.jp/) . InterPRO (http://www.ebi.ac.uk/interpro/index.html)  was
used for profile, pattern, and motif searching. SMART software (http://smart.
embl-heidelberg.de/)  was used for signaling domain identification.
Divergent homologous sequences were determined by Psi-Blast searching .
PCR primer sequences are listed in Table 3. A human multiple-tissue cDNA
panel was purchased from ClonTech. Primate and human diversity DNA panels
were acquired from the Coriell Cell Repositories. PCRs with FLK and JCT
primers were performed in 10-μl reactions, with 10 mM Tris–HCl, pH 8.3,
50 mM KCl, 2 mM MgCl2, 480 mM dNTP's, 0.7 μM primers, 1.5 U AmpliTaq
Gold, and 500 ng DNA. Thermal cycling was performed in an ABI 9700 under
the following conditions: 95°C for 5 min initial denaturation, 35 cycles (95°C
for 30 s, 58°C for 30 s, 68°C for 1 min), and 68°C for 10 min final elongation.
PCRs of the multiple-tissue cDNA panel were performed in 50 μl containing
20 mM Tris–HCl, pH 8.5, 50 mM KCl, 1.5 mM MgCl2, 200 μM dNTP's,
0.2 μM primers, 0.75 μl Taq DNA polymerase (Clontech). These reactions were
run in an MJ thermocycler under the following conditions: 94°Cfor 2 min initial
denaturation, 30 cycles (94°C for 10 s, 60°C for 30 s, 68°C for 1 min), and
PCR primers used in this study
Primer nameSequence (5′ to 3′)
769N.A. Doggett et al. / Genomics 88 (2006) 762–771
68°C for 7 min final elongation. The methods for DNA typing using the human
diversity panel were performed as described .
Fluorescence in situ hybridization
BAC DNA was isolated using the Plasmid Maxi Prep Kit (Qiagen Ltd.,
Ireland) method and quantified using a fluorimeter with known standards.
Exactly 1 μg of purified DNA was then fluorescently labeled using either
spectrum-green–dUTP (Vysis, UK) or biotin-14–dATP (Invitrogen, USA) by
standardnick-translationreactionmixturesas recommended bythe supplier.The
DNA was then ethanol precipitated in the presence of 50× human Cot-1 DNA
(to block highly repetitive sequences) and resuspended in 22.5 μl of
hybridization buffer (50% formamide, 1× SSC, 10% dextran sulfate) overnight
at 37°C. The hybridization protocol recommended by Vysis, UK, was then
carried out on normal human chromosome preparations from individuals within
the Irish population as well as on the human cell line GM08729 (Coriell,) and
twochimpanzeecell lines,PT5 andTANK (ATCC,Manassas,VA,USA). Slides
hybridized with the biotin-labeled BAC, RPCI-11 424M24, were hybridized,
washed, and detected with avidin–FITC as described . Following the
hybridization and washing steps, the slides were counterstained with 4′,6-
diamidino-2-phenylindole dihydrochloride (DAPI-Antifade from Vysis).
Images were acquired with a CCD camera coupled to an Olympus BX60
microscope. Image processing was carried out using Cytovision 3.52 software
from Applied Imaging.
We thank Judy Tesmer and Meghan Doyal, Los Alamos
to Barbara Trask, Fred Hutchinson Cancer Research Center, for
providing the support and facilities for M.K.R. to perform FISH
on primate cell lines in her laboratory. This work was supported
by the U.S. Department of Energy Office of Biological and
Environmental Research under Contract W-7405-ENG-36.
Appendix A. Supplementary data
Supplementary data associated with this article can be found,
in the online version, at doi:10.1016/j.ygeno.2006.07.012.
 B.E. Davy, M.L. Robinson, Congenital hydrocephalus in hy3 mice is
caused by a frameshift mutation in Hydin, a large novel gene, Hum. Mol.
Genet. 12 (2003) 1163–1170.
 D.F. Callen, E.G. Baker, S.A. Lane, Re-evaluation of GM2346 from del
(16) (q22) to t(4;16) (q35;q22.1), Clin. Genet. 38 (1990) 466–468.
 N. Saguragawa, Y. Yokoyama, Clinical and moleculargenetics of inherited
hydrocephalus, Congenit. Anom. 34 (1994) 303–310.
 P.H. Schurr, C.E. Polkey (Eds.), Hydrocephalus, Oxford Univ. Press,
New York, 1993.
 A. Rosenthal, M. Jouet, S. Kenwrick, Aberrant splicing of neural cell
adhesion molecule L1 mRNA in a family with X-linked hydrocephalus,
Nat. Genet. 2 (1992) 107–112.
 M. Jouet, et al., X-linked spastic paraplegia (SPG1), MASA syndrome and
X-linked hydrocephalus result from mutations in the neural cell adhesion
gene L1CAM, Nat. Genet. 7 (1994) 402–407.
 L. Vits, et al., MASA syndrome is due to mutations in the neural cell
adhesion gene L1CAM, Nat. Genet. 7 (1994) 408–413.
 J. Martin, et al., The sequence and analysis of duplication-rich human
chromosome 16, Nature 432 (2004) 988–994.
 E.E. Eichler, Recent duplication, domain accretion and the dynamic
mutation of the human genome, Trends Genet. 17 (2001) 661–669.
 E. Kolomietz, M.S. Meyn, A. Pandita, J.A. Squire, The role of Alu repeat
clusters as mediators of recurrent chromosomal aberrations in tumors,
Genes Chromosomes Cancer 35 (2002) 97–112.
 J.A. Bailey, G. Liu, E.E. Eichler, An Alu transposition model for the origin
and expansion of human segmental duplications, Am. J. Hum. Genet. 73
 J.R. Lupski, Genomic disorders: structural features of the genome can lead
to DNA rearrangements and human disease traits, Trends Genet. 14 (1998)
 M. Gratacòs, et al., A polymorphic genomic duplication on human
chromosome 15 is a susceptibility factor for panic and phobic disorders,
Cell 106 (2001) 367–379.
 B.J. Trask, et al., Members of the olfactory receptor gene family are
contained in large blocks of DNA duplicated polymorphically near the
ends of human chromosomes, Hum. Mol. Genet. 7 (1998) 13–26.
 E.J. Hollox, J.A.L. Armour, J.C.K. Barber, Extensive normal copy number
variationof abeta-defensin antimicrobial-gene cluster, Am.J.Hum. Genet.
73 (2003) 591–600.
 P.G. van Overveld, et al., Interchromosomal repeat array interactions
between chromosomes 4 and 10: a model for subtelomeric plasticity, Hum.
Mol. Genet. 9 (2000) 2879–2884.
 J. Sebat, et al., Large-scale copy number polymorphism in the human
genome, Science 305 (2004) 525–528.
 A.J. Iafrate, et al., Detection of large-scale variation in the human genome,
Nat. Genet. 36 (2004) 949–951.
 J.L. Nahon, Birth of ‘human-specific’ genes during primate evolution,
Genetica 118 (2003) 193–208.
 C. Schrander-Stumpel, J.-P. Fryns, Congenital hydrocephalus: nosology
and guidelines for clinical approach and genetic counseling, Eur. J. Pediatr.
157 (1998) 355–362.
 F. Haverkamp, et al., Congenital hydrocephalus internus and aqueduct
stenosis: aetiology and implications for genetic counseling, Eur. J. Pediatr.
158 (1999) 474–478.
 J. Cheung, et al., Genome-wide detection of segmental duplications and
potential assembly errors in the human genome sequence, Genome Biol. 4
 International Human Genome Sequence Consortium, Finishing the~
euchromatic sequence of the human genome, Nature 431 (2004) 931–945.
 J.A. Bailey, et al., Recent segmental duplications in the human genome,
Science 297 (2002) 1003–1007.
 C. Burge, S. Karlin, Prediction of complete gene structures in human
genomic DNA, J. Mol. Biol. 268 (1997) 78–94.
 Y. Xu, R.J. Mural, E.C. Uberbacher, Inferring gene structures in
genomic sequences using pattern recognition and expressed sequence tags,
in: T. Gaasterland, P. Karp, K. Karplus, C. Ouzounis, C. Sander, A. Valencia
(Eds.), Proceedings of the Fifth International Conference on Intelligent
Systems for Molecular Biology, AAAI Press, Menlo Park, CA, 1997,
 L. Florea, G. Hartzell, Z. Zhang, G.M. Rubin, W. Miller, A computer
program for aligning a cDNA sequence with a genomic DNA sequence,
Genome Res. 8 (1998) 967–974.
 S.F. Altschul, et al., Gapped BLAST and PSI-BLAST: a new generation of
 E. Birney, M. Clamp, R. Durbin, GeneWise and genomewise, Genome
Res. 14 (2004) 988–995.
 T.A. Tatusova, T.L. Madden, BLAST 2 sequences, a new tool for
comparing protein and nucleotide sequences, FEMS Microbiol. Lett. 174
 J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through sequence
weighting, position-specific gap penalties and weight matrix choice,
Nucleic Acids Res. 22 (1994) 4673–4680.
 S. Schwartz, et al., PipMaker—A Web server for aligning two genomic
DNA sequences, Genome Res. 10 (2000) 577–586.
 K. Nakai, M. Kanehisa, Expert system for predicting protein localization sites
in gram-negative bacteria, Proteins Struct. Funct. Genet. 11 (1991) 95–110.
 N.J. Mulder, et al., The InterPro database, brings increased coverage and
new features, Nucleic Acids Res. 31 (2003) 315–318.
 J. Schultz, F. Milpetz, P. Bork, C.P. Ponting, SMART, a simple modular
770N.A. Doggett et al. / Genomics 88 (2006) 762–771
architecture research tool: identification of signaling domains, Proc. Natl. Download full-text
Acad. Sci. U. S. A. 95 (1998) 5857–5864.
 J.L. Weber, et al., Human diallelic insertion/deletion polymorphisms, Am.
J. Hum. Genet. 71 (2002) 854–862.
 B.J. Trask, B. Birren, E. Green, P. Hieter, R. Myers (Eds.), Genome
Analysis: A Laboratory Manual, vol. 4, Cold Spring Harbor Laboratory
Press, New York, 1998, pp. 303–413.
Web site references
http://bioweb.pasteur.fr/seqanal/interfaces/clustalw.html: ClustalW multi-
http://www.ebi.ac.uk/Wise2/:Genewise proteinto genomicDNAsequence
http://genes.mit.edu/GENSCAN.html: Genscan gene structure prediction.
http://grail.lsd.ornl.gov/grailexp/: GRAILEXP gene discovery suite.
http://www.ebi.ac.uk/interpro/index.html: InterPRO database.
http://www.ncbi.nlm.nih.gov/blast/: NCBI BLAST pages.
http://www.bx.psu.edu/miller_lab/: PipMaker alignments program.
http://psort.ims.u-tokyo.ac.jp/: PSORT protein localization prediction.
http://www.repeatmasker.org/: RepeatMasker program.
http://globin.cse.psu.edu/globin/html/: Sim4 program.
http://smart.embl-heidelberg.de/: SMART Simple Modular Architecture
http://www.fruitfly.org/seq_tools/splice.html: Splice site prediction by
http://genome.cse.ucsc.edu/cgi-bin/hgGateway: UCSC Genome Browser.
771N.A. Doggett et al. / Genomics 88 (2006) 762–771