[Show abstract][Hide abstract] ABSTRACT: Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs ("vFams") to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).
PLoS ONE 08/2014; 9(8):e105067. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.
[Show abstract][Hide abstract] ABSTRACT: A 14-year-old boy with severe combined immunodeficiency presented three times to a medical facility over a period of 4 months with fever and headache that progressed to hydrocephalus and status epilepticus necessitating a medically induced coma. Diagnostic workup including brain biopsy was unrevealing. Unbiased next-generation sequencing of the cerebrospinal fluid identified 475 of 3,063,784 sequence reads (0.016%) corresponding to leptospira infection. Clinical assays for leptospirosis were negative. Targeted antimicrobial agents were administered, and the patient was discharged home 32 days later with a status close to his premorbid condition. Polymerase-chain-reaction (PCR) and serologic testing at the Centers for Disease Control and Prevention (CDC) subsequently confirmed evidence of Leptospira santarosai infection.
New England Journal of Medicine 06/2014; · 54.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The duplication of transcription regulators can elicit major regulatory network rearrangements over evolutionary timescales. However, few examples of duplications resulting in gene network expansions are understood in molecular detail. Here we show that four Candida albicans transcription regulators that arose by successive duplications have differentiated from one another by acquiring different intrinsic DNA-binding specificities, different preferences for half-site spacing, and different associations with cofactors. The combination of these three mechanisms resulted in each of the four regulators controlling a distinct set of target genes, which likely contributed to the adaption of this fungus to its human host. Our results illustrate how successive duplications and diversification of an ancestral transcription regulator can underlie major changes in an organism's regulatory circuitry.
[Show abstract][Hide abstract] ABSTRACT: Ion channel gene expression can vary substantially among neurons of a given type, even though neuron-type-specific firing properties remain stable and reproducible. The mechanisms that modulate ion channel gene expression and stabilize neural firing properties are unknown. In Drosophila, we demonstrate that loss of the Shal potassium channel induces the compensatory rebalancing of ion channel expression including, but not limited to, the enhanced expression and function of Shaker and slowpoke. Using genomic and network modeling approaches combined with genetic and electrophysiological assays, we demonstrate that the transcription factor Krüppel is necessary for the homeostatic modulation of Shaker and slowpoke expression. Remarkably, Krüppel induction is specific to the loss of Shal, not being observed in five other potassium channel mutants that cause enhanced neuronal excitability. Thus, homeostatic signaling systems responsible for rebalancing ion channel expression can be selectively induced after the loss or impairment of a specific ion channel.
[Show abstract][Hide abstract] ABSTRACT: Morphogenesis and pattern formation are vital processes in any organism, whether unicellular or multicellular. But in contrast to the developmental biology of plants and animals, the principles of morphogenesis and pattern formation in single cells remain largely unknown. Although all cells develop patterns, they are most obvious in ciliates; hence, we have turned to a classical unicellular model system, the giant ciliate Stentor coeruleus. Here we show that the RNA interference (RNAi) machinery is conserved in Stentor. Using RNAi, we identify the kinase coactivator Mob1-with conserved functions in cell division and morphogenesis from plants to humans-as an asymmetrically localized patterning protein required for global patterning during development and regeneration in Stentor. Our studies reopen the door for Stentor as a model regeneration system.
[Show abstract][Hide abstract] ABSTRACT: This report describes three possibly related incidences of encephalitis, two of them lethal, in captive polar bears (Ursus maritimus). Standard diagnostic methods failed to identify pathogens in any of these cases. A comprehensive, three stage diagnostic ‘pipeline’, employing both standard serological methods and new DNA microarray and next generation sequencing-based diagnostics was developed,in part as a consequence of this initial failure. This pipeline approach illustrates the strengths, weaknesses and limitations of these tools in determining pathogen caused deaths in non-model organisms such as wildlife species and why the use of a limited number of diagnostic tools may fail to uncover important wildlife pathogens.
Journal of Comparative Pathology 05/2014; 150(4):474–488. · 1.38 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Since 2006, honey bee colonies in North America and Europe have experienced increased annual mortality. These losses correlate with increased pathogen incidence and abundance, though no single etiologic agent has been identified. Crithidia mellificae is a unicellular eukaryotic honey bee parasite that has been associated with colony losses in the USA and Belgium. C. mellificae is a member of the family Trypanosomatidae, which primarily includes other insect-infecting species (e.g., the bumble bee pathogen Crithidia bombi), as well as species that infect both invertebrate and vertebrate hosts including human pathogens (e.g.,Trypanosoma cruzi, T. brucei, and Leishmania spp.). To better characterize C. mellificae, we sequenced the genome and transcriptome of strain SF, which was isolated and cultured in 2010. The 32 megabase draft genome, presented herein, shares a high degree of conservation with the related species Leishmania major. We estimate that C. mellificae encodes over 8,300 genes, the majority of which are orthologs of genes encoded by L. major and other Leishmania or Trypanosoma species. Genes unique to C. mellificae, including those of possible bacterial origin, were annotated based on function and include genes putatively involved in carbohydrate metabolism. This draft genome will facilitate additional investigations of the impact of C. mellificae infection on honey bee health and provide insight into the evolution of this unique family.
PLoS ONE 01/2014; 9(4):e95057. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Bornaviruses are known to infect mammals and birds, and they have been associated with disease in both groups of animals. Here, we report the genome sequence of a bornavirus identified in a wild-caught Loveridge's garter snake (Elapsoidea loveridgei).
[Show abstract][Hide abstract] ABSTRACT: A severe, sometimes fatal respiratory disease has been observed in captive ball pythons (Python regius) since the late 1990s. In order to better understand this disease and its etiology, we collected case and control samples and performed pathological and diagnostic analyses. Electron micrographs revealed filamentous virus-like particles in lung epithelial cells of sick animals. Diagnostic testing for known pathogens did not identify an etiologic agent, so unbiased metagenomic sequencing was performed. Abundant nidovirus-like sequences were identified in cases and were used to assemble the genome of a previously unknown virus in the order Nidovirales. The nidoviruses, which were not previously known to infect nonavian reptiles, are a diverse order that includes important human and veterinary pathogens. The presence of the viral RNA was confirmed in all diseased animals (n = 8) but was not detected in healthy pythons or other snakes (n = 57). Viral RNA levels were generally highest in the lung and other respiratory tract tissues. The 33.5-kb viral genome is the largest RNA genome yet described and shares canonical characteristics with other nidovirus genomes, although several features distinguish this from related viruses. This virus, which we named ball python nidovirus (BPNV), will likely establish a new genus in Torovirinae subfamily. The identification of a novel nidovirus in reptiles contributes to our understanding of the biology and evolution of related viruses, and its association with lung disease in pythons is a promising step toward elucidating an etiology for this long-standing veterinary disease.
[Show abstract][Hide abstract] ABSTRACT: Fusion of the viral and host cell membranes is a necessary first step for infection by enveloped viruses, and is mediated by the envelope glycoprotein. The transmembrane subunits from the structurally defined "class I" glycoproteins adopt an α-helical "trimer-of-hairpins" conformation during the fusion pathway. Here we present our studies on the envelope glycoprotein transmembrane subunit, GP2, of the CAS virus (CASV). CASV was recently identified from annulated tree boas (Corallus annulatus) with inclusion body disease and is implicated in the disease etiology. We have generated and characterized two protein constructs consisting of the predicted CASV GP2 core domain. The crystal structure of the CASV GP2 post-fusion conformation indicates a trimeric α-helical bundle that is highly similar to those of Ebola Virus (EBOV) and Marburg Virus (MARV) GP2, despite CASV genome homology to arenaviruses. Denaturation studies demonstrate that the stability of CASV GP2 is pH-dependent with higher stability at lower pH; we propose that this behavior is due to a network of interactions among acidic residues that would destabilize the α-helical bundle under conditions where the side chains are deprotonated. The pH-dependent stability of the post-fusion structure has been observed in EBOV and MARV GP2, as well as other viruses that enter via the endosome. Infection experiments with CASV and the related Golden Gate Virus (GGV) support a mechanism of entry that requires endosomal acidification. Our results suggest that despite being primarily arenavirus-like, the transmembrane subunit of CASV is extremely similar to the filoviruses.
Journal of Molecular Biology 12/2013; · 3.91 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The human fungal pathogen Candida albicans can switch between two phenotypic cell types, termed "white" and "opaque." Both cell types are heritable for many generations, and the switch between the two types occurs epigenetically, that is, without a change in the primary DNA sequence of the genome. Previous work identified six key transcriptional regulators important for white-opaque switching: Wor1, Wor2, Wor3, Czf1, Efg1, and Ahr1. In this work, we describe the structure of the transcriptional network that specifies the white and opaque cell types and governs the ability to switch between them. In particular, we use a combination of genome-wide chromatin immunoprecipitation, gene expression profiling, and microfluidics-based DNA binding experiments to determine the direct and indirect regulatory interactions that form the switch network. The six regulators are arranged together in a complex, interlocking network with many seemingly redundant and overlapping connections. We propose that the structure (or topology) of this network is responsible for the epigenetic maintenance of the white and opaque states, the switching between them, and the specialized properties of each state.
[Show abstract][Hide abstract] ABSTRACT: Malaria drug resistance contributes to up to a million annual deaths. Judicious deployment of new antimalarials and vaccines could benefit from an understanding of early molecular events that promote the evolution of parasites. Continuous in vitro challenge of Plasmodium falciparum parasites with a novel dihydroorotate dehydrogenase (DHODH) inhibitor reproducibly selected for resistant parasites. Genome-wide analysis of independently-derived resistant clones revealed a two-step strategy to evolutionary success. Some haploid blood-stage parasites first survive antimalarial pressure through fortuitous DNA duplications that always included the DHODH gene. Independently-selected parasites had different sized amplification units but they were always flanked by distant A/T tracks. Higher level amplification and resistance was attained using a second, more efficient and more accurate, mechanism for head-to-tail expansion of the founder unit. This second homology-based process could faithfully tune DNA copy numbers in either direction, always retaining the unique DNA amplification sequence from the original A/T-mediated duplication for that parasite line. Pseudo-polyploidy at relevant genomic loci sets the stage for gaining additional mutations at the locus of interest. Overall, we reveal a population-based genomic strategy for mutagenesis that operates in human stages of P. falciparum to efficiently yield resistance-causing genetic changes at the correct locus in a successful parasite. Importantly, these founding events arise with precision; no other new amplifications are seen in the resistant haploid blood stage parasite. This minimizes the need for meiotic genetic cleansing that can only occur in sexual stage development of the parasite in mosquitoes.
[Show abstract][Hide abstract] ABSTRACT: The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Nucleic Acids Research 04/2013; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Sequence-specific DNA-binding proteins are among the most important classes of gene regulatory proteins, controlling changes in transcription that underlie many aspects of biology. In this work, we identify a transcriptional regulator from the human fungal pathogen Candida albicans that binds DNA specifically but has no detectable homology with any previously described DNA- or RNA-binding protein. This protein, named White-Opaque Regulator 3 (Wor3), regulates white-opaque switching, the ability of C. albicans to switch between two heritable cell types. We demonstrate that ectopic overexpression of WOR3 results in mass conversion of white cells to opaque cells and that deletion of WOR3 affects the stability of opaque cells at physiological temperatures. Genome-wide chromatin immunoprecipitation of Wor3 and gene expression profiling of a wor3 deletion mutant strain indicate that Wor3 is highly integrated into the previously described circuit regulating white-opaque switching and that it controls a subset of the opaque transcriptional program. We show by biochemical, genetic, and microfluidic experiments that Wor3 binds directly to DNA in a sequence-specific manner, and we identify the set of cis-regulatory sequences recognized by Wor3. Bioinformatic analyses indicate that the Wor3 family arose more recently in evolutionary time than most previously described DNA-binding domains; it is restricted to a small number of fungi that include the major fungal pathogens of humans. These observations show that new families of sequence-specific DNA-binding proteins may be restricted to small clades and suggest that current annotations-which rely on deep conservation-underestimate the fraction of genes coding for transcriptional regulators.
Proceedings of the National Academy of Sciences 04/2013; · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Low-cost DNA sequencing technologies have expanded the role for direct nucleic acid sequencing in the analysis of genomes, transcriptomes, and the metagenomes of whole ecosystems. Human and machine comprehension of such large datasets can be simplified via synthesis of sequence fragments into long, contiguous blocks of sequence (contigs), but most of the progress in the field of assembly has focused on genomes in isolation rather than metagenomes. Here, we present software for paired-read iterative contig extension (PRICE), a strategy for focused assembly of particular nucleic acid species using complex metagenomic data as input. We describe the assembly strategy implemented by PRICE and provide examples of its application to the sequence of particular genes, transcripts, and virus genomes from complex multi-component datasets, including an assembly of the BCBL-1 strain of Kaposi's sarcoma-associated herpesvirus. PRICE is open-source and available for free download (derisilab.ucsf.edu/software/price/ or sourceforge.net/projects/pricedenovo/).
[Show abstract][Hide abstract] ABSTRACT: The bacterial pathogen Bartonella quintana is passed between humans by body lice. B. quintana has adapted to both the human host and body louse vector niches, producing persistent infection with high titer bacterial loads in both the host (up to 10(5) colony-forming units [CFU]/ml) and vector (more than 10(8) CFU/ml). Using a novel custom microarray platform, we analyzed bacterial transcription at temperatures corresponding to the host (37°C) and vector (28°C), to probe for temperature-specific and growth phase-specific transcriptomes. We observed that transcription of 7% (93 genes) of the B. quintana genome is modified in response to change in growth phase, and that 5% (68 genes) of the genome is temperature-responsive. Among these transcriptional changes in response to temperature shift and growth phase was the induction of known B. quintana virulence genes and several previously unannotated genes. Hemin binding proteins, secretion systems, response regulators, and genes for invasion and cell attachment were prominent among the differentially-regulated B. quintana genes. This study represents the first analysis of global transcriptional responses by B. quintana. In addition, the in vivo experiments provide novel insight into the B. quintana transcriptional program within the body louse environment. These data and approaches will facilitate study of the adaptation mechanisms employed by Bartonella during the transition between human host and arthropod vector.
[Show abstract][Hide abstract] ABSTRACT: The control and prevention of communicable disease is directly impacted by the genetic mutability of the underlying etiological agents. In the case of RNA viruses, genetic recombination may impact public health by facilitating the generation of new viral strains with altered phenotypes and by compromising the genetic stability of live attenuated vaccines. The landscape of homologous recombination within a given RNA viral genome is thought to be influenced by several factors; however, a complete understanding of the genetic determinants of recombination is lacking. Here, we utilize gene synthesis and deep sequencing to create a detailed recombination map of the poliovirus 1 coding region. We identified over 50 thousand breakpoints throughout the genome, and we show the majority of breakpoints to be concentrated in a small number of specific "hotspots," including those associated with known or predicted RNA secondary structures. Nucleotide base composition was also found to be associated with recombination frequency, suggesting that recombination is modulated across the genome by predictable and alterable motifs. We tested the predictive utility of the nucleotide base composition association by generating an artificial hotspot in the poliovirus genome. Our results imply that modification of these motifs could be extended to whole genome re-designs for the development of recombination-deficient, genetically stable live vaccine strains.
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. RESULTS: Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. CONCLUSIONS: While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.
[Show abstract][Hide abstract] ABSTRACT: ABSTRACT Despite wide sequence divergence, multiple picornaviruses use the Golgi adaptor acyl coenzyme A (acyl-CoA) binding domain protein 3 (ACBD3/GCP60) to recruit phosphatidylinositol 4-kinase class III beta (PI4KIIIβ/PI4KB), a factor required for viral replication. The molecular basis of this convergent interaction and the cellular function of ACBD3 are not fully understood. Using affinity purification-mass spectrometry, we identified the putative Rab33 GTPase-activating proteins TBC1D22A and TBC1D22B as ACBD3-interacting factors. Fine-scale mapping of binding determinants within ACBD3 revealed that the interaction domains for TBC1D22A/B and PI4KB are identical. Affinity purification confirmed that PI4KB and TBC1D22A/B interactions with ACBD3 are mutually exclusive, suggesting a possible regulatory mechanism for recruitment of PI4KB. The C-terminal Golgi dynamics (GOLD) domain of ACBD3 has been previously shown to bind the 3A replication protein from Aichi virus. We find that the 3A proteins from several additional picornaviruses, including hepatitis A virus, human parechovirus 1, and human klassevirus, demonstrate an interaction with ACBD3 by mammalian two-hybrid assay; however, we also find that the enterovirus and kobuvirus 3A interactions with ACBD3 are functionally distinct with respect to TBC1D22A/B and PI4KB recruitment. These data reinforce the notion that ACBD3 organizes numerous cellular functionalities and that RNA virus replication proteins likely modulate these interactions by more than one mechanism. IMPORTANCE Multiple viruses use the same Golgi protein (ACBD3) to recruit the lipid kinase phosphatidylinositol 4-kinase class III beta (PI4KB) in order to replicate. We identify a new binding partner of ACBD3 in the evolutionarily conserved Rab GTPase-activating proteins (RabGAPs) TBC1D22A and -B. Interestingly, TBC1D22A directly competes with PI4KB for binding to the same location of ACBD3 by utilizing a similar binding domain. Different viruses are able to influence this interaction through distinct mechanisms to promote the association of PI4KB with ACBD3. This work informs our knowledge of both the physical interactions of the proteins that help maintain metazoan Golgi structure and how viruses subvert these evolutionarily conserved interactions for their own purposes.