Detecting distant homologies on protozoans metabolic pathways using scientific workflows

Computer Science Department, COPPE, Federal University of Rio de Janeiro (UFRJ), P.O. Box 68511, 21941-972 Rio de Janeiro, RJ, Brazil.
International Journal of Data Mining and Bioinformatics (Impact Factor: 0.5). 06/2010; 4(3):256-80. DOI: 10.1504/IJDMB.2010.033520
Source: PubMed


Bioinformatics experiments are typically composed of programs in pipelines manipulating an enormous quantity of data. An interesting approach for managing those experiments is through workflow management systems (WfMS). In this work we discuss WfMS features to support genome homology workflows and present some relevant issues for typical genomic experiments. Our evaluation used Kepler WfMS to manage a real genomic pipeline, named OrthoSearch, originally defined as a Perl script. We show a case study detecting distant homologies on trypanomatids metabolic pathways. Our results reinforce the benefits of WfMS over script languages and point out challenges to WfMS in distributed environments.

1 Follower
7 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: The partial nucleotide sequence of putative Trypanosoma brucei rhodesiense oligosaccharyl transferase gene was previously reported. Here, we describe the determination of its full-length nucleotide sequence by Inverse PCR (IPCR), subsequent biological sequence analysis and transmembrane topology modelling. The full-length DNA sequence has an Open Reading Frame (ORF) of 2406 bp and encodes a polypeptide of 801 amino acid residues. Protein and DNA sequence analyses revealed that homologues within the genome of other kinetoplastid and various origins exist. Protein topology analysis predicted that Trypanosoma brucei rhodesiense putative oligosaccharyl transferase clone II (TbOST II) is a transmembrane protein with transmembrane helices in probably an N(cytosol)-C(cytosol) orientation. Data from the GenBank database assembly and sequence analyses in general clearly state that TbOST II is the STT3 subunit of OST in T.b. rhodesiense that necessitates further characterisation and functional studies with RNAi. TbOST II sequence had been deposited in the GenBank (accession number GU245937).
    International Journal of Data Mining and Bioinformatics 10/2011; 5(5):574-92. DOI:10.1504/IJDMB.2011.043035 · 0.50 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Scientific Workflows are abstractions used to model in silico scientific experiments. Cloud environments are still incipient in collecting and recording prospective and retrospective provenance. This paper presents an approach to support collecting metadata provenance of in silico scientific experiments executed in public clouds. The strategy was implemented as a distributed and modular architecture named Matriohska. This paper also presents a provenance data model compatible with PROV specification. We also show preliminary results that describe how provenance metadata was captured from the components running in the cloud.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
    Omics A Journal of Integrative Biology 06/2014; 18(8). DOI:10.1089/omi.2013.0172 · 2.36 Impact Factor
Show more