Article

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing.

Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia.
Nucleic Acids Research (Impact Factor: 8.81). 09/2010; 38(17):e171. DOI: 10.1093/nar/gkq667
Source: PubMed

ABSTRACT Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.

1 Bookmark
 · 
225 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: To explore gonad-specific gene transcription in the red abalone Haliotis rufescens, cDNA from mature reproductive tissues was 454-pyrosequenced. A total of 79 877 and 133 850 high-quality reads were generated for females and males, respectively, with an average length of 600 bp. Clustering and assembly of these reads produced a non-redundant set of unique sequences, comprising 2793 and 10 354 contigs, 8581 and 32 175 singletons, respectively, for males and females. In silico gene transcription analysis, comparing the sexes showed that 20% of the differentially expressed transcripts are involved in sex-specific patterns. Gene ontology analysis revealed a higher percentage of metabolic processes associated with females, whereas binding processes and biological regulation were mainly related to male transcriptomes. Single nucleotide polymorphism (SNP) associated with sex-related genes, such as lysin (SNP102), PF (SNP1254) and VTG (SNP876) were discovered and validated through high-resolution melting analysis. This study generated relevant genomic sequence data that might contribute to a better understanding of the various reproductive biological processes occurring in abalone. Once the underlying biological processes are understood, biotechnological methods to control maturation, identify sex and produce monosex lines for abalone aquaculture can be envisioned.
    Aquaculture Research 10/2012; · 1.42 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As small molecules that aid in posttranscriptional silencing, microRNA (miRNA) discovery and characterization have vastly benefited from the recent development and widespread application of next-generation sequencing (NGS) technologies. Several miRNAs were identified through sequencing of constructed small RNA libraries, whereas others were predicted by in silico methods using the recently accumulating sequence data. NGS was a major breakthrough in efforts to sequence and dissect the genomes of plants, including bread wheat and its progenitors, which have large, repetitive and complex genomes. Availability of survey sequences of wheat whole genome and its individual chromosomes enabled researchers to predict and assess wheat miRNAs both in the subgenomic and whole genome levels. Moreover, small RNA construction and sequencing-based studies identified several putative development- and stress-related wheat miRNAs, revealing their differential expression patterns in specific developmental stages and/or in response to stress conditions. With the vast amount of wheat miRNAs identified in recent years, we are approaching to an overall knowledge on the wheat miRNA repertoire. In the following years, more comprehensive research in relation to miRNA conservation or divergence across wheat and its close relatives or progenitors should be performed. Results may serve valuable in understanding both the significant roles of species-specific miRNAs and also provide us information in relation to the dynamics between miRNAs and evolution in wheat. Furthermore, putative development- or stress-related miRNAs identified should be subjected to further functional analysis, which may be valuable in efforts to develop wheat with better resistance and/or yield.
    Briefings in functional genomics 06/2014; · 3.43 Impact Factor
  • Adaptive Processes (8th) Decision and Control, 1969 IEEE Symposium on; 01/1969

Full-text (3 Sources)

Download
120 Downloads
Available from
May 31, 2014