Detecting distant homologies on protozoans metabolic pathways using scientific workflows

Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil.
International Journal of Data Mining and Bioinformatics (Impact Factor: 0.66). 01/2010; 4(3):256-80. DOI: 10.1504/IJDMB.2010.033520
Source: PubMed

ABSTRACT Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: The partial nucleotide sequence of putative Trypanosoma brucei rhodesiense oligosaccharyl transferase gene was previously reported. Here, we describe the determination of its full-length nucleotide sequence by Inverse PCR (IPCR), subsequent biological sequence analysis and transmembrane topology modelling. The full-length DNA sequence has an Open Reading Frame (ORF) of 2406 bp and encodes a polypeptide of 801 amino acid residues. Protein and DNA sequence analyses revealed that homologues within the genome of other kinetoplastid and various origins exist. Protein topology analysis predicted that Trypanosoma brucei rhodesiense putative oligosaccharyl transferase clone II (TbOST II) is a transmembrane protein with transmembrane helices in probably an N(cytosol)-C(cytosol) orientation. Data from the GenBank database assembly and sequence analyses in general clearly state that TbOST II is the STT3 subunit of OST in T.b. rhodesiense that necessitates further characterisation and functional studies with RNAi. TbOST II sequence had been deposited in the GenBank (accession number GU245937).
    International Journal of Data Mining and Bioinformatics 01/2011; 5(5):574-92. DOI:10.1504/IJDMB.2011.043035 · 0.66 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
    Omics A Journal of Integrative Biology 06/2014; DOI:10.1089/omi.2013.0172 · 2.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays e-Science experiments require computational and storage resources that most research centres cannot afford. Fortunately, researchers have at their disposal many distributed computing platforms where to run their experiments: clusters, supercomputers, grids and clouds. None of these platforms has showed to be the ideal choice and each of them has its own advantages and disadvantages, depending on various factors. However, the use of these powerful systems poses a challenge for scien-tists who don't have a computer science background. For that reason, this paper describes the design and implementation of an intuitive workflow system capable of executing any kind of application on a mix of computing platforms. Moreover, a bioinformatics use case called Ortosearch that will exploit the workflow system is detailed.
    8th Iberian Grid Infrastructure Conference: IBERGRID 2014, Aveiro; 09/2014