
Scott E. DevineUniversity of Maryland, Baltimore | UMB · Institute for Genome Sciences
Scott E. Devine
Ph.D.
About
99
Publications
24,864
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,064
Citations
Publications
Publications (99)
Several large-scale Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES) projects have emerged recently that have provided exceptional opportunities to discover mobile element insertions (MEIs) and study the impact of these MEIs on human genomes. However, these projects also have presented major challenges with respect to the sca...
Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of ma...
Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV...
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing 1,2 with continuous long-read or high-fidelity ³ sequencing...
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of sr...
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic var...
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic var...
Somatic LINE-1 (L1) retrotransposition has been detected in early embryos, adult brains, and the gastrointestinal (GI) tract, and many cancers, including epithelial GI tumors. We previously found numerous somatic L1 insertions in paired normal and GI cancerous tissues. Here, using a modified method of single-cell analysis for somatic L1 insertions,...
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and emerging clinical diagnostic approaches utilize short-reads (srWGS), which present constraints for genome-wide discovery of structural variants (SVs). Alternative long-read single molecule technologies (lrWGS) offer significant advantages for g...
Retrotransposable elements (RTEs) have actively multiplied over the past 80 million years of primate evolution, and as a consequence, such elements collectively occupy ∼ 40% of the human genome. As RTE activity can have detrimental effects on the human genome and transcriptome, silencing mechanisms have evolved to restrict retrotransposition. The b...
The prevailing genome assembly paradigm is to produce consensus sequences that "collapse" parental haplotypes into a consensus sequence. Here, we leverage the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing (Strand-seq) and combine them with high-fidelity (HiFi) long sequencing reads, in a novel reference-free...
The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios...
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–...
Glioma is a unique neoplastic disease that develops exclusively in the central nervous system (CNS) and rarely metastasizes to other tissues. This feature strongly implicates the tumor-host CNS microenvironment in gliomagenesis and tumor progression. We investigated the differences and similarities in glioma biology as conveyed by transcriptomic pa...
Mobile element insertions (MEIs) represent ~25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases,...
The human LINE-1 (or L1) element is a non-LTR retrotransposon that is mobilized through an RNA intermediate by an L1-encoded reverse transcriptase and other L1-encoded proteins. L1 elements remain actively mobile today and continue to mutagenize human genomes. Importantly, when new insertions disrupt gene function, they can cause diseases. Historic...
Although human LINE-1 (L1) elements are actively mobilized in many cancers, a role for somatic L1 retrotransposition in tumor initiation has not been conclusively demonstrated. Here, we identify a novel somatic L1 insertion in the APC tumor suppressor gene that provided us with a unique opportunity to determine whether such insertions can actually...
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-covera...
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks...
The clinical features of patients with pulmonary nontuberculous mycobacterial (PNTM) infection are well described, but the genetic components of infection susceptibility are not.
To examine genetic variants in PNTM-affected patients, PNTM-unaffected family members and controls.
Whole exome sequencing was done on 69 white PNTM-affected patients and...
The developing mammalian brain is destined for a female phenotype unless exposed to gonadal hormones during a perinatal sensitive period. It has been assumed that the undifferentiated brain is masculinized by direct induction of transcription by ligand-activated nuclear steroid receptors. We found that a primary effect of gonadal steroids in the hi...
A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to bui...
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By...
Human genetic variation is expected to play a central role in personalized medicine. Yet only a fraction of the natural genetic variation that is harbored by humans has been discovered to date. Here we report almost 2 million small insertions and deletions (INDELs) that range from 1 bp to 10,000 bp in length in the genomes of 79 diverse humans. The...
In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes.
Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount
of genetic variation that is caused by these small INDELs is substantial. The number of INDELs...
Two abundant classes of mobile elements, namely Alu and L1 elements, continue to generate new retrotransposon insertions in human genomes. Estimates suggest that these elements have generated millions of new germline insertions in individual human genomes worldwide. Unfortunately, current technologies are not capable of detecting most of these youn...
Nuclear localization signals (NLSs) are amino acid sequences that target cargo proteins into the nucleus. Rigorous characterization of NLS motifs is essential to understanding and predicting pathways for nuclear import. The best-characterized NLS is the classical NLS (cNLS), which is recognized by the cNLS receptor, importin-alpha. cNLSs are conven...
Alu retrotransposons evolved from 7SL RNA approximately 65 million years ago and underwent several rounds of massive expansion in primate genomes. Consequently, the human genome currently harbors 1.1 million Alu copies. Some of these copies remain actively mobile and continue to produce both genetic variation and diseases by "jumping" to new genomi...
Like its retroviral relatives, the long terminal repeat retrotransposon Ty1 in the yeast Saccharomyces cerevisiae must traverse a permanently intact nuclear membrane for successful transposition and replication. For retrotransposition
to occur, at least a subset of Ty1 proteins, including the Ty1 integrase, must enter the nucleus. Nuclear localizat...
Proteins destined for import into the nucleus contain nuclear localization signals (NLSs) that are recognized by import receptors termed karyopherins or importins. Until recently, the only nuclear import sequence that had been well defined and characterized was the classical NLS (cNLS), which is recognized by importin alpha. However, Chook and cowo...
Although a large proportion (44%) of the human genome is occupied by transposons and transposon-like repetitive elements, only a small proportion (<0.05%) of these elements remain active today. Recent evidence indicates that approximately 35-40 subfamilies of Alu, L1 and SVA elements (and possibly HERV-K elements) remain actively mobile in the huma...