Are you Arian Smit?

Claim your profile

Publications (13)288.67 Total impact

  • Article: DupMasker: a tool for annotating primate segmental duplications.
    [show abstract] [hide abstract]
    ABSTRACT: Segmental duplications (SDs) play an important role in genome rearrangement, evolution, and the copy-number variation (CNV) of primate genomes. Such sequences are difficult to detect, a priori, because they share no defining sequence features that distinguish them from unique portions of the genome. Current sequence annotation of segmental duplications requires computationally intensive, genome-wide self-comparisons that cannot be easily implemented on new data sets. Based on the successful implementation of RepeatMasker, we developed a new genome annotation tool, DupMasker. The program uses a library of nonredundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and nonhuman primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. We predict this tool will be valuable in the annotation of large-insert sequence clones, allowing putative unique and duplicated regions of the genomes to be annotated prior to whole genome assembly comparisons.
    Genome Research 08/2008; 18(8):1362-8. · 13.61 Impact Factor
  • Source
    Article: Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates.
    [show abstract] [hide abstract]
    ABSTRACT: We investigated the evolution of the families of LINE-1 (L1) retrotransposons that have amplified in the human lineage since the origin of primates. We identified two phases in the evolution of L1. From approximately 70 million years ago (Mya) until approximately 40 Mya, three distinct L1 lineages were simultaneously active in the genome of ancestral primates. In contrast, during the last 40 million years (Myr), i.e., during the evolution of anthropoid primates, a single lineage of families has evolved and amplified. We found that novel (i.e., unrelated) regulatory regions (5'UTR) have been frequently recruited during the evolution of L1, whereas the two open-reading frames (ORF1 and ORF2) have remained relatively conserved. We found that L1 families coexisted and formed independently evolving L1 lineages only when they had different 5'UTRs. We propose that L1 families with different 5'UTR can coexist because they don't rely on the same host-encoded factors for their transcription and therefore do not compete with each other. The most prolific L1 families (families L1PA8 to L1PA3) amplified between 40 and 12 Mya. This period of high activity corresponds to an episode of adaptive evolution in a segment of ORF1. The correlation between the high activity of L1 families and adaptive evolution could result from the coevolution of L1 and a host-encoded repressor of L1 activity.
    Genome Research 02/2006; 16(1):78-87. · 13.61 Impact Factor
  • Source
    Article: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution
    [show abstract] [hide abstract]
    ABSTRACT: We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome—composed of approximately one billion base pairs of sequence and an estimated 20,000–23,000 genes—provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
    Nature 12/2004; 432(7018):695-716. · 36.28 Impact Factor
  • Source
    Article: Genome sequence of the Brown Norway rat yields insights into mammalian evolution.
    [show abstract] [hide abstract]
    ABSTRACT: The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
    Nature 05/2004; 428(6982):493-521. · 36.28 Impact Factor
  • Source
    Article: MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences.
    [show abstract] [hide abstract]
    ABSTRACT: Analysis of multiple sequence alignments can generate important, testable hypotheses about the phylogenetic history and cellular function of genomic sequences. We describe the MultiPipMaker server, which aligns multiple, long genomic DNA sequences quickly and with good sensitivity (available at http://bio.cse.psu.edu/ since May 2001). Alignments are computed between a contiguous reference sequence and one or more secondary sequences, which can be finished or draft sequence. The outputs include a stacked set of percent identity plots, called a MultiPip, comparing the reference sequence with subsequent sequences, and a nucleotide-level multiple alignment. New tools are provided to search MultiPipMaker output for conserved matches to a user-specified pattern and for conserved matches to position weight matrices that describe transcription factor binding sites (singly and in clusters). We illustrate the use of MultiPipMaker to identify candidate regulatory regions in WNT2 and then demonstrate by transfection assays that they are functional. Analysis of the alignments also confirms the phylogenetic inference that horses are more closely related to cats than to cows.
    Nucleic Acids Research 08/2003; 31(13):3518-24. · 8.03 Impact Factor
  • Source
    Article: Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution.
    [show abstract] [hide abstract]
    ABSTRACT: Six measures of evolutionary change in the human genome were studied, three derived from the aligned human and mouse genomes in conjunction with the Mouse Genome Sequencing Consortium, consisting of (1) nucleotide substitution per fourfold degenerate site in coding regions, (2) nucleotide substitution per site in relics of transposable elements active only before the human-mouse speciation, and (3) the nonaligning fraction of human DNA that is nonrepetitive or in ancestral repeats; and three derived from human genome data alone, consisting of (4) SNP density, (5) frequency of insertion of transposable elements, and (6) rate of recombination. Features 1 and 2 are measures of nucleotide substitutions at two classes of "neutral" sites, whereas 4 is a measure of recent mutations. Feature 3 is a measure dominated by deletions in mouse, whereas 5 represents insertions in human. It was found that all six vary significantly in megabase-sized regions genome-wide, and many vary together. This indicates that some regions of a genome change slowly by all processes that alter DNA, and others change faster. Regional variation in all processes is correlated with, but not completely accounted for, by GC content in human and the difference between GC content in human and mouse.
    Genome Research 02/2003; 13(1):13-26. · 13.61 Impact Factor
  • Source
    Article: Human-mouse alignments with BLASTZ.
    [show abstract] [hide abstract]
    ABSTRACT: The Mouse Genome Analysis Consortium aligned the human and mouse genome sequences for a variety of purposes, using alignment programs that suited the various needs. For investigating issues regarding genome evolution, a particularly sensitive method was needed to permit alignment of a large proportion of the neutrally evolving regions. We selected a program called BLASTZ, an independent implementation of the Gapped BLAST algorithm specifically designed for aligning two long genomic sequences. BLASTZ was subsequently modified, both to attain efficiency adequate for aligning entire mammalian genomes and to increase its sensitivity. This work describes BLASTZ, its modifications, the hardware environment on which we run it, and several empirical studies to validate its results.
    Genome Research 02/2003; 13(1):103-7. · 13.61 Impact Factor
  • Article: Initial sequencing and comparative analysis of the mouse genome.
    [show abstract] [hide abstract]
    ABSTRACT: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
    Nature 01/2003; 420(6915):520-62. · 36.28 Impact Factor
  • Article: Initial sequencing and comparative analysis of the mouse genome
    [show abstract] [hide abstract]
    ABSTRACT: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
    Nature 12/2002; 420(6915):520-562. · 36.28 Impact Factor
  • Source
    Article: Initial sequencing and comparative analysis of the mouse genome
    [show abstract] [hide abstract]
    ABSTRACT: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
    Nature 11/2002; 420:520-562. · 36.28 Impact Factor
  • Source
    Article: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes.
    [show abstract] [hide abstract]
    ABSTRACT: The compact genome of Fugu rubripes has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds. In this 365-megabase vertebrate genome, repetitive DNA accounts for less than one-sixth of the sequence, and gene loci occupy about one-third of the genome. As with the human genome, gene loci are not evenly distributed, but are clustered into sparse and dense regions. Some "giant" genes were observed that had average coding sequence sizes but were spread over genomic lengths significantly larger than those of their human orthologs. Although three-quarters of predicted human proteins have a strong match to Fugu, approximately a quarter of the human proteins had highly diverged from or had no pufferfish homologs, highlighting the extent of protein evolution in the 450 million years since teleosts and mammals diverged. Conserved linkages between Fugu and human genes indicate the preservation of chromosomal segments from the common vertebrate ancestor, but with considerable scrambling of gene order.
    Science 09/2002; 297(5585):1301-10. · 31.20 Impact Factor
  • Article: Genomic analysis of the olfactory receptor region of the mouse and human T-cell receptor alpha/delta loci.
    [show abstract] [hide abstract]
    ABSTRACT: We have conducted a comparative genomic analysis of several olfactory receptor (OR) genes that lie immediately 5' to the V-alpha gene segments at the mouse and human T-cell receptor (TCR) alpha/delta loci. Five OR genes are identified in the human cluster. The murine cluster has at least six OR genes; the first five are orthologous to the human genes. The sixth mouse gene has arisen since mouse-human divergence by a duplication of a approximately 10-kb block. One pair of OR paralogs found at the mouse and human loci are more similar to each other than to their corresponding orthologs. This paralogous "twinning" appears to be under selection, perhaps to increase sensitivity to particular odorants or to resolve structurally-similar odorants. The promoter regions of the mouse OR genes were identified by RACE-PCR. Orthologs share extensive 5' UTR homology, but we find no significant similarity among paralogs. These findings extend previous observations that suggest that OR genes do not share local significant regulatory homology despite having a common regulatory agenda. We also identified a diverged TCR-alpha gene segment that uses a divergent recombination signal sequence (RSS) to initiate recombination in T-cells from within the OR region. We explored the hypothesis that OR genes may use DNA recombination in expressing neurons, e.g., to recombine ORs into a transcriptionally active locus. We searched the mouse sequence for OR-flanking RSS motifs, but did not find evidence to suggest that these OR genes use TCR-like recombination target sequences.
    Genome Research 02/2002; 12(1):81-7. · 13.61 Impact Factor
  • Article: PipMaker - A Web Server for Aligning Two Genomic DNA Sequences
    [show abstract] [hide abstract]
    ABSTRACT: this paper we describe an automated server for generating alignments and pips. A pip shows the position in one sequence of each aligning gap-free segment and plots its percent identity. As a complementary display, we also provide a plot of the position of each aligning segment in both species. We refer to these as dot plots, even though matches shown in conventional dot plots need not be contained within a statistically significant alignment and those in our plots are. Both displays allow rich annotation to be plotted along the appropriate axes to aid in correlating aligning segments with functional or structural features of the sequence. We provide examples of the application of PipMaker for finding exons and candidate regulatory elements in mammalian, nematode, and bacterial sequences. The server is able to compare a completed sequence from one species with an incomplete sequence from a second.
    02/2001;