A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing.

Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia.
Nucleic Acids Research (Impact Factor: 8.81). 09/2010; 38(17):e171. DOI: 10.1093/nar/gkq667
Source: PubMed

ABSTRACT Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ascaris suum (large roundworm of pigs) is a parasitic nematode that causes substantial losses to the meat industry. This nematode is suitable for biochemical studies because, unlike C. elegans, homogeneous tissue samples can be obtained by dissection. It has large sperm, produced in great numbers that permit biochemical studies of sperm motility. Widespread study of A. suum would be facilitated by more comprehensive genome resources and, to this end, we have produced a gonad transcriptome of A. suum. Two 454 pyrosequencing runs generated 572,982 and 588,651 reads for germline (TES) and somatic (VAS) tissues of the A. suum gonad, respectively. 86% of the high-quality (HQ) reads were assembled into 9,955 contigs and 69,791 HQ reads remained as singletons. 2.4 million bp of unique sequences were obtained with a coverage that reached 16.1-fold. 4,877 contigs and 14,339 singletons were annotated according to the C. elegans protein and the Kyoto Encyclopedia of Genes and Genomes (KEGG) protein databases. Comparison of TES and VAS transcriptomes demonstrated that genes participating in DNA replication, RNA transcription and ubiquitin-proteasome pathways are expressed at significantly higher levels in TES tissues than in VAS tissues. Comparison of the A. suum TES transcriptome with the C. elegans microarray dataset identified 165 A. suum germline-enriched genes (83% are spermatogenesis-enriched). Many of these genes encode serine/threonine kinases and phosphatases (KPs) as well as tyrosine KPs. Immunoblot analysis further suggested a critical role of phosphorylation in both testis development and spermatogenesis. A total of 2,681 A. suum genes were identified to have associated RNAi phenotypes in C. elegans, the majority of which display embryonic lethality, slow growth, larval arrest or sterility. Using deep sequencing technology, this study has produced a gonad transcriptome of A. suum. By comparison with C. elegans datasets, we identified sets of genes associated with spermatogenesis and gonad development in A. suum. The newly identified genes encoding KPs may help determine signaling pathways that operate during spermatogenesis. A large portion of A. suum gonadal genes have related RNAi phenotypes in C. elegans and, thus, might be RNAi targets for parasite control.
    BMC Genomics 10/2011; 12:481. · 4.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal host. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. METHODS: In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of Ascaris suum (roundworm), N. americanus and Schistosoma haematobium which infect the mammalian host. RESULTS: A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. CONCLUSIONS: Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate host, with a view towards the development of novel approaches for the control of neglected helminthiases.
    Parasites & Vectors 05/2013; 6(1):156. · 3.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The rapid advancements in recent years of high-throughput technologies in the life sciences are facilitating the generation and storage of huge amount of data in different databases. Despite significant developments in computing capacity and performance, an analysis of these large-scale data in a search for biomedical relevant patterns remains a challenging task. Scientific workflow applications are deemed to support data-mining in more complex scenarios that include many data sources and computational tools, as commonly found in bioinformatics. A scientific workflow application is a holistic unit that defines, executes, and manages scientific applications using different software tools. Existing workflow applications are process- or data- rather than resource-oriented. Thus, they lack efficient computational resource management capabilities, such as those provided by Cloud computing environments. Insufficient computational resources disrupt the execution of workflow applications, wasting time and money. To address this issue, advanced resource monitoring and management strategies are required to determine the resource consumption behaviours of workflow applications to enable a dynamical allocation and deallocation of resources. In this paper, we present a novel Cloud management infrastructure consisting of resource level-, application level monitoring techniques, and a knowledge management strategy to manage computational resources for supporting workflow application executions in order to guarantee their performance goals and their successful completion. We present the design description of these techniques, demonstrate how they can be applied to scientific workflow applications, and present detailed evaluation results as a proof of concept.
    Journal of Grid Computing 01/2013; · 1.60 Impact Factor

Full-text (3 Sources)

Available from
May 31, 2014