A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia.
Nucleic Acids Research (Impact Factor: 8.81). 09/2010; 38(17):e171. DOI: 10.1093/nar/gkq667
Source: PubMed

ABSTRACT Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As small molecules that aid in posttranscriptional silencing, microRNA (miRNA) discovery and characterization have vastly benefited from the recent development and widespread application of next-generation sequencing (NGS) technologies. Several miRNAs were identified through sequencing of constructed small RNA libraries, whereas others were predicted by in silico methods using the recently accumulating sequence data. NGS was a major breakthrough in efforts to sequence and dissect the genomes of plants, including bread wheat and its progenitors, which have large, repetitive and complex genomes. Availability of survey sequences of wheat whole genome and its individual chromosomes enabled researchers to predict and assess wheat miRNAs both in the subgenomic and whole genome levels. Moreover, small RNA construction and sequencing-based studies identified several putative development- and stress-related wheat miRNAs, revealing their differential expression patterns in specific developmental stages and/or in response to stress conditions. With the vast amount of wheat miRNAs identified in recent years, we are approaching to an overall knowledge on the wheat miRNA repertoire. In the following years, more comprehensive research in relation to miRNA conservation or divergence across wheat and its close relatives or progenitors should be performed. Results may serve valuable in understanding both the significant roles of species-specific miRNAs and also provide us information in relation to the dynamics between miRNAs and evolution in wheat. Furthermore, putative development- or stress-related miRNAs identified should be subjected to further functional analysis, which may be valuable in efforts to develop wheat with better resistance and/or yield.
    Briefings in functional genomics 06/2014; DOI:10.1093/bfgp/elu021 · 3.43 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nitazoxanide (NTZ) is an anti-protozoal drug that has recently been repurposed to be used as an anti-viral drug against Hepatitis C virus (HCV) and some other viral infections. The mode of action of NTZ as an anti HCV is not well recognized. NTZ results in hyperphosphorylation of the viral non-structural 5 A (NS5A) protein which, in turn, interfere with the viral replication. However, the protein kinase that mediates this process is unknown. The aim of this work is to determine the protein kinase cellular target of NTZ that mediates its anti HCV effect. The chemical structure of NTZ and its active metabolite Tizoxanide (TIZ) were analyzed using similarity ensemble approach (SEA) and pharmacophore mapping approach. Accordingly, four protein kinases were spotted as possible NTZ targets. The further investigation by molecular docking approach and computational affinity assessment approach revealed glycogen synthase kinase 3b (GSK3b) as the most likely protein kinase that mediates the anti HCV effect of NTZ. This result does not exclude the possibility of the other three predicted kinases for being NTZ cellular targets.
    11/2014, Degree: Master, Supervisor: Alaa Hemeida, Amal Mahmoud, Medhat Hashem
  • [Show abstract] [Hide abstract]
    ABSTRACT: To explore gonad-specific gene transcription in the red abalone Haliotis rufescens, cDNA from mature reproductive tissues was 454-pyrosequenced. A total of 79 877 and 133 850 high-quality reads were generated for females and males, respectively, with an average length of 600 bp. Clustering and assembly of these reads produced a non-redundant set of unique sequences, comprising 2793 and 10 354 contigs, 8581 and 32 175 singletons, respectively, for males and females. In silico gene transcription analysis, comparing the sexes showed that 20% of the differentially expressed transcripts are involved in sex-specific patterns. Gene ontology analysis revealed a higher percentage of metabolic processes associated with females, whereas binding processes and biological regulation were mainly related to male transcriptomes. Single nucleotide polymorphism (SNP) associated with sex-related genes, such as lysin (SNP102), PF (SNP1254) and VTG (SNP876) were discovered and validated through high-resolution melting analysis. This study generated relevant genomic sequence data that might contribute to a better understanding of the various reproductive biological processes occurring in abalone. Once the underlying biological processes are understood, biotechnological methods to control maturation, identify sex and produce monosex lines for abalone aquaculture can be envisioned.
    Aquaculture Research 10/2012; 45(6). DOI:10.1111/are.12044 · 1.32 Impact Factor

Full-text (3 Sources)

Available from
May 31, 2014