A 360-kb interchromosomal duplication of the human HYDIN locus

DOE Joint Genome Institute and Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
Genomics (Impact Factor: 2.28). 01/2007; 88(6):762-71. DOI: 10.1016/j.ygeno.2006.07.012


The HYDIN gene located in human chromosome band 16q22.2 is a large gene encompassing 423 kb of genomic DNA that has been suggested as a candidate for an autosomal recessive form of congenital hydrocephalus. We have found that the human HYDIN locus has been very recently duplicated, with a nearly identical 360-kb paralogous segment inserted on chromosome 1q21.1. The duplication, among the largest interchromosomal segmental duplications described in humans, is not accounted for in the current human genome assembly and appears to be part of a greater than 550-kb contig that must lie within 1 of the 11 sequence gaps currently remaining in 1q21.1. Both copies of the HYDIN gene are expressed in alternatively spliced transcripts. Elucidation of the role of HYDIN in human disease susceptibility will require careful discrimination among the paralogous copies.

Download full-text


Available from: Norman A Doggett
  • Source
    • "reference assembly , many groups have described shortcomings of this resource , including remaining gaps , single - nucleotide errors , or gross misassembly due to complex haplotypic variation ( Eichler et al . 2004 ; Doggett et al . 2006 ; Kidd et al . 2010 ; Chen and Butte 2011 ; The 1000 Genomes Project Consortium 2012 ) . Both gaps and misassembled regions often arise because the DNA sequence used for the assembly was from multiple diploid sources contain - ing complex structural variation . Because such loci often contain medically relevant gene families , it is imp"
    [Show abstract] [Hide abstract]
    ABSTRACT: A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.
    Preview · Article · Nov 2014 · Genome Research
  • Source
    • "proteins found in the anchoring filaments of skin keratinocytes that strengthen attachment to the dermis (Gerecke et al., 1994). The predicted intracellular portion of H04D03.1 also has similarity to human hydin (E value 1.7), a hydrocephalus-inducing protein of 5120 amino acids (Davy and Robinson, 2003; Doggett et al., 2006) (Fig. S4). "
    [Show abstract] [Hide abstract]
    ABSTRACT: During the development of the nervous system, the migration of many cells and axons is guided by extracellular molecules. These molecules bind to receptors at the tips of the growth cones of migrating axons and trigger intracellular signaling to steer the axons along the correct trajectories. We have identified a novel mutant, enu-3 (enhancer of Unc), that enhances the motor neuron axon outgrowth defects observed in strains of Caenorhabditis elegans that lack either the UNC-5 receptor or its ligand UNC-6/Netrin. Specifically, the double-mutant strains have enhanced axonal outgrowth defects mainly in DB4, DB5 and DB6 motor neurons. enu-3 single mutants have weak motor neuron axon migration defects. Both outgrowth defects of double mutants and axon migration defects of enu-3 mutants were rescued by expression of the H04D03.1 gene product. ENU-3/H04D03.1 encodes a novel predicted putative trans-membrane protein of 204 amino acids. It is a member of a family of highly homologous proteins of previously unknown function in the C. elegans genome. ENU-3 is expressed in the PVT interneuron and is weakly expressed in many cell bodies along the ventral cord, including those of the DA and DB motor neurons. We conclude that ENU-3 is a novel C. elegans protein that affects both motor axon outgrowth and guidance.
    Full-text · Article · Feb 2011 · Developmental Biology
  • Source
    • "We divided the genome into nonoverlapping regions of 100 kb. We excluded 37 regions containing sites where all individuals in both populations were heterozygous (Supplemental Table 1), as these may represent variation between undocumented paralogs (Doggett et al. 2006). Thus, our final data set consisted of 10,497 regions of 100 kb, each with over 500 bp of exon sequence, comprising 25,769 kb of exome sequence in each of 10 individuals. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Exome sequences, which comprise all protein-coding regions, are promising data sets for studies of natural selection because they offer unbiased genome-wide estimates of polymorphism while focusing on the portions of the genome that are most likely to be functionally important. We examine genomic patterns of polymorphism within 10 diploid autosomal exomes of European and African descent. Using coalescent simulations, we show how polymorphism, site frequency spectra, and intercontinental divergence in these samples would be influenced by different modes of positive selection. We examine putatively selected loci from four previous genome-wide scans of SNP genotypes and demonstrate that these regions indeed show unusual population genetic patterns in the exome data. Using a series of conservative criteria based on exome polymorphism, we are able to fine-scale map signatures of selection, in many cases pinpointing a single candidate SNP. We also identify and evaluate novel candidate selection genes that show unusual patterns of polymorphism. We sequence a portion of one novel candidate locus, IVL, in 74 individuals from multiple continents and examine global genetic diversity. Thus, we confirm, narrow, and supplement existing catalogs of putative targets of selection, and show that exome data sets, which are likely to soon become common, will be powerful tools for identifying adaptive genetic variation.
    Full-text · Article · Oct 2010 · Genome Research
Show more