Scott E. Devine

Scott E. Devine
University of Maryland, Baltimore | UMB · Institute for Genome Sciences

Ph.D.

About

99
Publications
24,864
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,064
Citations

Publications

Publications (99)
Article
Several large-scale Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES) projects have emerged recently that have provided exceptional opportunities to discover mobile element insertions (MEIs) and study the impact of these MEIs on human genomes. However, these projects also have presented major challenges with respect to the sca...
Article
Full-text available
Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of ma...
Article
Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV...
Article
Full-text available
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing 1,2 with continuous long-read or high-fidelity ³ sequencing...
Article
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of sr...
Article
Full-text available
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic var...
Preprint
Full-text available
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic var...
Article
Somatic LINE-1 (L1) retrotransposition has been detected in early embryos, adult brains, and the gastrointestinal (GI) tract, and many cancers, including epithelial GI tumors. We previously found numerous somatic L1 insertions in paired normal and GI cancerous tissues. Here, using a modified method of single-cell analysis for somatic L1 insertions,...
Preprint
Full-text available
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and emerging clinical diagnostic approaches utilize short-reads (srWGS), which present constraints for genome-wide discovery of structural variants (SVs). Alternative long-read single molecule technologies (lrWGS) offer significant advantages for g...
Article
Full-text available
Retrotransposable elements (RTEs) have actively multiplied over the past 80 million years of primate evolution, and as a consequence, such elements collectively occupy ∼ 40% of the human genome. As RTE activity can have detrimental effects on the human genome and transcriptome, silencing mechanisms have evolved to restrict retrotransposition. The b...
Preprint
Full-text available
The prevailing genome assembly paradigm is to produce consensus sequences that "collapse" parental haplotypes into a consensus sequence. Here, we leverage the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing (Strand-seq) and combine them with high-fidelity (HiFi) long sequencing reads, in a novel reference-free...
Article
Full-text available
The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios...
Preprint
Full-text available
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–...
Article
Full-text available
Glioma is a unique neoplastic disease that develops exclusively in the central nervous system (CNS) and rarely metastasizes to other tissues. This feature strongly implicates the tumor-host CNS microenvironment in gliomagenesis and tumor progression. We investigated the differences and similarities in glioma biology as conveyed by transcriptomic pa...
Article
Full-text available
Mobile element insertions (MEIs) represent ~25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases,...
Article
Full-text available
The human LINE-1 (or L1) element is a non-LTR retrotransposon that is mobilized through an RNA intermediate by an L1-encoded reverse transcriptase and other L1-encoded proteins. L1 elements remain actively mobile today and continue to mutagenize human genomes. Importantly, when new insertions disrupt gene function, they can cause diseases. Historic...
Article
Full-text available
Although human LINE-1 (L1) elements are actively mobilized in many cancers, a role for somatic L1 retrotransposition in tumor initiation has not been conclusively demonstrated. Here, we identify a novel somatic L1 insertion in the APC tumor suppressor gene that provided us with a unique opportunity to determine whether such insertions can actually...
Article
Full-text available
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-covera...
Article
Full-text available
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks...
Article
Full-text available
The clinical features of patients with pulmonary nontuberculous mycobacterial (PNTM) infection are well described, but the genetic components of infection susceptibility are not. To examine genetic variants in PNTM-affected patients, PNTM-unaffected family members and controls. Whole exome sequencing was done on 69 white PNTM-affected patients and...
Article
Full-text available
The developing mammalian brain is destined for a female phenotype unless exposed to gonadal hormones during a perinatal sensitive period. It has been assumed that the undifferentiated brain is masculinized by direct induction of transcription by ligand-activated nuclear steroid receptors. We found that a primary effect of gonadal steroids in the hi...
Article
Full-text available
A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to bui...
Article
Full-text available
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By...
Article
Full-text available
Human genetic variation is expected to play a central role in personalized medicine. Yet only a fraction of the natural genetic variation that is harbored by humans has been discovered to date. Here we report almost 2 million small insertions and deletions (INDELs) that range from 1 bp to 10,000 bp in length in the genomes of 79 diverse humans. The...
Article
Full-text available
In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes. Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount of genetic variation that is caused by these small INDELs is substantial. The number of INDELs...
Article
Two abundant classes of mobile elements, namely Alu and L1 elements, continue to generate new retrotransposon insertions in human genomes. Estimates suggest that these elements have generated millions of new germline insertions in individual human genomes worldwide. Unfortunately, current technologies are not capable of detecting most of these youn...
Article
Nuclear localization signals (NLSs) are amino acid sequences that target cargo proteins into the nucleus. Rigorous characterization of NLS motifs is essential to understanding and predicting pathways for nuclear import. The best-characterized NLS is the classical NLS (cNLS), which is recognized by the cNLS receptor, importin-alpha. cNLSs are conven...
Article
Full-text available
Alu retrotransposons evolved from 7SL RNA approximately 65 million years ago and underwent several rounds of massive expansion in primate genomes. Consequently, the human genome currently harbors 1.1 million Alu copies. Some of these copies remain actively mobile and continue to produce both genetic variation and diseases by "jumping" to new genomi...
Article
Full-text available
Like its retroviral relatives, the long terminal repeat retrotransposon Ty1 in the yeast Saccharomyces cerevisiae must traverse a permanently intact nuclear membrane for successful transposition and replication. For retrotransposition to occur, at least a subset of Ty1 proteins, including the Ty1 integrase, must enter the nucleus. Nuclear localizat...
Article
Full-text available
Proteins destined for import into the nucleus contain nuclear localization signals (NLSs) that are recognized by import receptors termed karyopherins or importins. Until recently, the only nuclear import sequence that had been well defined and characterized was the classical NLS (cNLS), which is recognized by importin alpha. However, Chook and cowo...
Article
Full-text available
Although a large proportion (44%) of the human genome is occupied by transposons and transposon-like repetitive elements, only a small proportion (<0.05%) of these elements remain active today. Recent evidence indicates that approximately 35-40 subfamilies of Alu, L1 and SVA elements (and possibly HERV-K elements) remain actively mobile in the huma...