Thomas G Doak

Thomas G Doak
Indiana University Bloomington | IUB · Department of Biology/ National Center for Genome Assembly Support (NCGAS)

PhD Oncological Sci., U. of Utah

About

222
Publications
19,313
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,951
Citations

Publications

Publications (222)
Article
Full-text available
Whole-genome duplications (WGDs) have occurred in many eukaryotic lineages. However, the underlying evolutionary forces and molecular mechanisms responsible for the long-term retention of gene duplicates created by WGDs are not well understood. We employ a population-genomic approach to understand the selective forces acting on paralogs and investi...
Article
Full-text available
Because errors at DNA level power pathogen evolution, a systematic understanding of the rate and molecular spectrum of mutations could guide the avoidance and treatment of infectious diseases. We thus accumulated tens of thousands of spontaneous mutations in 768 repeatedly bottlenecked lineages of 18 strains from various geographical sites, tempora...
Article
Full-text available
It is a critical time to reflect on the National Ecological Observatory Network (NEON) science to date as well as envision what research can be done right now with NEON (and other) data and what training is needed to enable a diverse user community. NEON became fully operational in May 2019 and has pivoted from planning and construction to operatio...
Article
Full-text available
Chromosomes are well-organized carriers of genetic information in eukaryotes and are usually quite long, carrying hundreds and thousands of genes. Intriguingly, a clade of single-celled ciliates, Spirotrichea, feature nanochromosomes—also called "gene-sized chromosomes". These chromosomes predominantly carry only one gene, flanked by short telomere...
Preprint
Full-text available
1 Background Recent advances in genome and metagenome sequencing have dramatically enriched the collection of genomes of bacterial species related to human health and diseases. In metagenomic studies phylogenetic trees are commonly used to depict, describe, and compare the bacterial members of the community under study. The most accurate tree-buil...
Article
Full-text available
We describe the rate and spectrum of spontaneous mutations for the social amoeba Dictyostelium discoideum , a key model organism in molecular, cellular, evolutionary, and developmental biology. Whole-genome sequencing of 37 mutation accumulation lines of D. discoideum after an average of 1,500 cell divisions yields a base-substitution mutation rate...
Article
Organisms adapted to life in extreme habitats (extremophiles) can further our understanding of the mechanisms of genetic stability, particularly replication and repair. Despite the harsh environmental conditions they endure, these extremophiles represent a great deal of the earth's biodiversity. Here, for the first time in a member of the archaeal...
Article
Full-text available
Motivation: Programmed DNA elimination plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific to the detection of DNA splicing events is scarce. In this paper, we describe ADFinder, an efficient detector of programmed...
Conference Paper
National Center for Genomic Analysis Support (NCGAS) is an NSF-funded center whose goal is to help biologists optimize their bioinformatic pipelines to best address their research questions. To help reach these goals, NCGAS provides a range of services based on the researchers needs. We currently host gateways, Trinity Galaxy and Genepattern, with...
Conference Paper
National Center for Genomic Analysis Support (NCGAS) is an NSF-funded center whose goal is to help biologists optimize their bioinformatic pipelines to best address their research questions. To help reach these goals, NCGAS provides a range of services based on the researchers needs. We currently host gateways, Trinity Galaxy and Genepattern, with...
Article
Full-text available
The evolution of mitochondrial genomes and their population-genetic environment among unicellular eukaryotes are understudied. Ciliate mitochondrial genomes exhibit a unique combination of characteristics, including a linear organization and the presence of multiple genes with no known function or detectable homologs in other eukaryotes. Here we st...
Preprint
Full-text available
Whole-Genome Duplications (WGDs) have shaped the gene repertoire of many eukaryotic lineages. The redundancy created by WGDs typically results in a phase of massive gene loss. However, some WGD-derived paralogs are maintained over long evolutionary periods and the relative contributions of different selective pressures to their maintenance is still...
Preprint
Full-text available
The evolution of mitochondrial genomes and their population-genetic environment among unicellular eukaryotes are understudied. Ciliate mitochondrial genomes exhibit a unique combination of characteristics, including a linear organization and the presence of multiple genes with no known function or detectable homologs in other eukaryotes. Here we st...
Article
Full-text available
Mutation is one of the most fundamental evolutionary forces. Studying variation in the mutation rate within and among closely-related species can help reveal mechanisms of genome divergence, but such variation is unstudied in the vast majority of organisms. Previous studies on ciliated protozoa have found extremely low mutation rates. In this study...
Article
Full-text available
How genetic variation is generated and maintained remains a central question in evolutionary biology. When presented with a complex environment, microbes can take advantage of genetic variation to exploit new niches. Here we present a massively parallel experiment where WT and repair-deficient (∆mutL) Escherichia coli populations have evolved over...
Article
Full-text available
Ciliated protists are a large group of single-celled eukaryotes with separate germline and somatic nuclei in each cell. The somatic genome is developed from the zygotic nucleus through a series of chromosomal rearrangements, including fragmentation, DNA elimination, de novo telomere addition, and DNA amplification. This unique feature makes them pe...
Conference Paper
One of the challenges to adoption of HPC is the disjunction between those who need it and those who know it. Biology (specifically, genomics) is a growing field for computational use, but the typical biologist does not have an established informatics background. The National Center for Genome Analysis Support (NCGAS) aids users in getting past the...
Article
Full-text available
Population structure can be described by genotypic correlation coefficients between groups of individuals, the most basic of which are the pair-wise relatedness coefficients between any two individuals. There are nine pair-wise relatedness coefficients in the most general model, and we show that these can be reduced to seven coefficients for bialle...
Preprint
Full-text available
Motivation Fusion genes created by genomic rearrangements can be potent drivers of tumorigenesis. However, accurate identification of functionally fusion genes from genomic sequencing requires whole genome sequencing, since exonic sequencing alone is often insufficient. Transcriptome sequencing provides a direct, highly effective alternative for ca...
Article
Full-text available
Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologica...
Article
Full-text available
Although the presence of endosymbiotic rickettsial bacteria, specifically Candidatus Megaira, has been reported in diverse habitats and a wide range of eukaryotic hosts, it remains unclear how broadly Ca. Megaira are distributed in a single host species. In this study we seek to address whether Ca. Megaira are present in most, if not all isolates,...
Article
Full-text available
In the past 10 years, the number of endosymbionts described within the bacterial order Rickettsiales has constantly grown. Since 2006, 18 novel Rickettsiales genera inhabiting protists, such as ciliates and amoebae, have been described. In this work, we characterize two novel bacterial endosymbionts from Paramecium collected near Bloomington, IN. B...
Article
Full-text available
A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome...
Article
Full-text available
Diversity-generating retroelements (DGRs) are genetic cassettes that can produce massive protein sequence variation in prokaryotes. Presumably DGRs confer selective advantages to their hosts (bacteria or viruses) by generating variants of target genes—typically resulting in target proteins with altered ligand-binding specificity—through a specializ...
Article
Full-text available
Mycobacterium smegmatis is a bacterium that is naturally devoid of known post-replicative DNA mismatch repair (MMR) homologs, mutS and mutL, providing an opportunity to investigate how the mutation rate and spectrum has evolved in the absence of a highly-conserved primary repair pathway. Mutation accumulation experiments of M. smegmatis yielded a b...
Preprint
Full-text available
Population structure can be described by genotypic correlation coefficients between groups of individuals, the most basic of which are the pair-wise relatedness coefficients between any two individuals. There are nine pair-wise relatedness coefficients in the most general model, and we show that these can be reduced to seven coefficients for bialle...
Article
Full-text available
Caedibacter varicaedens is a kappa killer endosymbiont bacterium of the ciliate Paramecium biaurelia . Here, we present the draft genome sequence of C. varicaedens .
Article
Full-text available
Comparative metagenomics remains challenging due to the size and complexity of metagenomic datasets. Here we introduce subtractive assembly, a de novo assembly approach for comparative metagenomics that directly assembles only the differential reads that distinguish between two groups of metagenomes. Using simulated datasets, we show it improves bo...
Article
Full-text available
Metagenomics and other meta-omics approaches (including metatranscriptomics) provide insights into the composition and function of microbial communities living in different environments or animal hosts. Metatranscriptomics research provides an unprecedented opportunity to examine gene regulation for many microbial species simultaneously, and more i...
Article
Full-text available
The rate at which new mutations arise in the genome is a key factor in the evolution and adaptation of species. Here we describe the rate and spectrum of spontaneous mutations for the fission yeast Schizosaccharomyces pombe, a key model organism with many similarities to higher eukaryotes. We undertook an ~1700 generation mutation accumulation (MA)...
Conference Paper
Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological...
Article
Full-text available
Deinococcus bacteria are extremely resistant to radiation, oxidation, and desiccation. Resilience to these factors has been suggested to be due to enhanced damage prevention and repair mechanisms, as well as highly efficient antioxidant protection systems. Here, using mutation-accumulation experiments we find that the GC-rich Deinococcus radioduran...
Article
Full-text available
High levels of genetic diversity exist among natural isolates of the bacterium Pseudomonas fluorescens, and are especially elevated around the replication terminus of the genome, where strain-specific genes are found. In an effort to understand the role of genetic variation in the evolution of Pseudomonas, we analyzed 31,106 base substitutions from...
Article
Programmed DNA rearrangements in the single-celled eukaryote Oxytricha trifallax completely rewire its germline into a somatic nucleus during development. This elaborate, RNA-mediated pathway eliminates noncoding DNA sequences that interrupt gene loci and reorganizes the remaining fragments by inversions and permutations to produce functional genes...
Article
Full-text available
The Paramecium aurelia complex is a group of 15 species that share at least three past whole-genome duplications (WGDs). The macronuclear genome sequences of P. biaurelia and P. sexaurelia are presented and compared to the published sequence of P. tetraurelia. Levels of duplicate-gene retention from the recent WGD differ by >10% across species, wit...
Article
Full-text available
Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequence of Paramecium biaurelia and Paramecium sexaurelia suggests that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequen...
Article
Full-text available
The CRISPR (clusters of regularly interspaced short palindromic repeats)–Cas adaptive immune system is an important defense system in bacteria, providing targeted defense against invasions of foreign nucleic acids. CRISPR–Cas systems consist of CRISPR loci and cas (CRISPR-associated) genes: sequence segments of invaders are incorporated into host g...
Article
Full-text available
We present the complete genomic sequence of the essential symbiont Polynucleobacter necessarius (Betaproteobacteria), which is a valuable case study for several reasons. First, it is hosted by a ciliated protist, Euplotes; bacterial symbionts of ciliates are still poorly known because of a lack of extensive molecular data. Second, the single specie...
Conference Paper
Full-text available
The National Center for Genome Analysis Support (NCGAS) is a response to the concern that NSF-funded life scientists were underutilizing the national cyberinfrastructure. NCGAS is a multi-institutional service center that provides computational resources, specialized systems support to both the end-user and systems administrators, and most importan...
Conference Paper
Full-text available
An increasing number of biologists' computational demands have outgrown the capacity of desktop workstations and they are turning to supercomputers to run their simulations and calculations. Many of today's computational problems, however, require larger resource commitments than even individual universities can provide. XSEDE is one of the first p...