Oswaldo Trelles

Universidad de Málaga · computer architecture

Publications

  • The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications.

    Toshiaki Katayama, Mark D Wilkinson, Rutger Vos, Takeshi Kawashima, Shuichi Kawashima, Mitsuteru Nakao, Yasunori Yamamoto, Hong-Woo Chun, Atsuko Yamaguchi, Shin Kawano, [......], Martin Senger, Jessica Severin, Yasumasa Shigemoto, Hideaki Sugawara, James Taylor, Oswaldo Trelles, Chisato Yamasaki, Riu Yamashita, Noriyuki Satoh, Toshihisa Takagi

    Journal of biomedical semantics. 08/2011; 2:4.

    ABSTRACT: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices ... [more] ABSTRACT: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
  • 27.82
    Impact points
    Big data, but are we ready?

    Oswaldo Trelles, Pjotr Prins, Marc Snir, Ritsert C Jansen

    Nature reviews. Genetics. 03/2011; 12(3):224.

  • Using Graphics Processors for a High Performance Normalization of Gene Expressions.

    Andrés Rodríguez, Oswaldo Trelles, Manuel Ujaldon

    13th IEEE International Conference on High Performance Computing & Communication, HPCC 2011, Banff, Alberta, Canada, September 2-4, 2011; 01/2011

  • 3.43
    Impact points
    MAPI: towards the integrated exploitation of bioinformatics Web Services.

    Sergio Ramirez, Johan Karlsson, Oswaldo Trelles

    BMC bioinformatics. 01/2011; 12:419.

    Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. To facilitate the con... [more] Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
  • 7.48
    Impact points
    MOWServ: a web client for integration of bioinformatic resources.

    Sergio Ramírez, Antonio Muñoz-Mérida, Johan Karlsson, Maximiliano García, Antonio J Pérez-Pulido, M Gonzalo Claros, Oswaldo Trelles

    Nucleic acids research. 07/2010; 38(Web Server issue):W671-6.

    The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was ... [more] The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.
  • 4.93
    Impact points
    jORCA: easily integrating bioinformatics Web Services.

    Victoria Martín-Requena, Javier Ríos, Maximiliano García, Sergio Ramírez, Oswaldo Trelles

    Bioinformatics (Oxford, England). 02/2010; 26(4):553-9.

    MOTIVATION: Web services technology is becoming the option of choice to deploy bioinformatics tools that are universally available. One of the major strengths of this approach is that it supports machine-to-machine interoperability over a network. However, a weakness of this approach is that various... [more] MOTIVATION: Web services technology is becoming the option of choice to deploy bioinformatics tools that are universally available. One of the major strengths of this approach is that it supports machine-to-machine interoperability over a network. However, a weakness of this approach is that various Web Services differ in their definition and invocation protocols, as well as their communication and data formats-and this presents a barrier to service interoperability. RESULTS: jORCA is a desktop client aimed at facilitating seamless integration of Web Services. It does so by making a uniform representation of the different web resources, supporting scalable service discovery, and automatic composition of workflows. Usability is at the top of the jORCA agenda; thus it is a highly customizable and extensible application that accommodates a broad range of user skills featuring double-click invocation of services in conjunction with advanced execution-control, on the fly data standardization, extensibility of viewer plug-ins, drag-and-drop editing capabilities, plus a file-based browsing style and organization of favourite tools. The integration of bioinformatics Web Services is made easier to support a wider range of users. .
  • Workflow Composition and Enactment Using jORCA.

    Johan Karlsson, Victoria Martin-Requena, Javier Ríos, Oswaldo Trelles

    Leveraging Applications of Formal Methods, Verification, and Validation - 4th International Symposium on Leveraging Applications, ISoLA 2010, Heraklion, Crete, Greece, October 18-21, 2010, Proceedings, Part I; 01/2010

  • The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*.

    Toshiaki Katayama, Kazuharu Arakawa, Mitsuteru Nakao, Keiichiro Ono, Kiyoko F Aoki-Kinoshita, Yasunori Yamamoto, Atsuko Yamaguchi, Shuichi Kawashima, Hong-Woo Chun, Jan Aerts, [......], Daron M Standley, Hideaki Sugawara, Toshiyuki Tashiro, Oswaldo Trelles, Rutger A Vos, Mark D Wilkinson, William York, Christian M Zmasek, Kiyoshi Asai, Toshihisa Takagi

    Journal of biomedical semantics. 01/2010; 1(1):8.

    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, vario... [more] Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
  • 3.43
    Impact points
    PreP+07: improvements of a user friendly tool to pre-process and analyse microarray data.

    Victoria Martin-Requena, Antonio Munoz-Merida, M Gonzalo Claros, Oswaldo Trelles

    BMC bioinformatics. 02/2009; 10(1):16.

    ABSTRACT: BACKGROUND: Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that mo... [more] ABSTRACT: BACKGROUND: Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way. RESULTS: PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures--double scan and Supervised Lowess--, a complete set of graphical representations--MA plot, RG plot, QQ plot, PP plot, PN plot-- and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled. CONCLUSIONS: PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003.
  • PreP+07: improvements of a user friendly tool to preprocess and analyse microarray data

    Victoria Martin-Requena, Antonio Muñoz-Merida, Claros M Gonzalo, Oswaldo Trelles

    BMC Bioinformatics. 01/2009;

    Abstract Background Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that mo... [more] Abstract Background Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way. Results PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled. Conclusion PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003.
  • 3.43
    Impact points
    Magallanes: a web services discovery and automatic workflow composition tool.

    Javier Ríos, Johan Karlsson, Oswaldo Trelles

    BMC bioinformatics. 01/2009; 10:334.

    BACKGROUND: To aid in bioinformatics data processing and analysis, an increasing number of web-based applications are being deployed. Although this is a positive circumstance in general, the proliferation of tools makes it difficult to find the right tool, or more importantly, the right set of tools... [more] BACKGROUND: To aid in bioinformatics data processing and analysis, an increasing number of web-based applications are being deployed. Although this is a positive circumstance in general, the proliferation of tools makes it difficult to find the right tool, or more importantly, the right set of tools that can work together to solve real complex problems. RESULTS: Magallanes (Magellan) is a versatile, platform-independent Java library of algorithms aimed at discovering bioinformatics web services and associated data types. A second important feature of Magallanes is its ability to connect available and compatible web services into workflows that can process data sequentially to reach a desired output given a particular input. Magallanes' capabilities can be exploited both as an API or directly accessed through a graphic user interface.The Magallanes' API is freely available for academic use, and together with Magallanes application has been tested in MS-Windows XP and Unix-like operating systems. Detailed implementation information, including user manuals and tutorials, is available at http://www.bitlab-es.com/magallanes. CONCLUSION: Different implementations of the same client (web page, desktop applications, web services, etc.) have been deployed and are currently in use in real installations such as the National Institute of Bioinformatics (Spain) and the ACGT-EU project. This shows the potential utility and versatility of the software library, including the integration of novel tools in the domain and with strong evidences in the line of facilitate the automatic discovering and composition of workflows.
  • 7.33
    Impact points
    Interoperability with Moby 1.0--it's better than sharing your toothbrush!

    Mark D Wilkinson, Martin Senger, Edward Kawas, Richard Bruskiewich, Jerome Gouzy, Celine Noirot, Philippe Bardou, Ambrose Ng, Dirk Haase, Enrique de Andres Saiz, [......], Antonio J Pérez, Jose Aldana, M Mar Rojano, Raul Fernandez-Santa Cruz, Ismael Navas, Gary Schiltz, Andrew Farmer, Damian Gessler, Heiko Schoof, Andreas Groscurth

    Briefings in bioinformatics. 06/2008; 9(3):220-31.

    The BioMoby project was initiated in 2001 from within the model organism database community. It aimed to standardize methodologies to facilitate information exchange and access to analytical resources, using a consensus driven approach. Six years later, the BioMoby development community is pleased t... [more] The BioMoby project was initiated in 2001 from within the model organism database community. It aimed to standardize methodologies to facilitate information exchange and access to analytical resources, using a consensus driven approach. Six years later, the BioMoby development community is pleased to announce the release of the 1.0 version of the interoperability framework, registry Application Programming Interface and supporting Perl and Java code-bases. Together, these provide interoperable access to over 1400 bioinformatics resources worldwide through the BioMoby platform, and this number continues to grow. Here we highlight and discuss the features of BioMoby that make it distinct from other Semantic Web Service and interoperability initiatives, and that have been instrumental to its deployment and use by a wide community of bioinformatics service providers. The standard, client software, and supporting code libraries are all freely available at http://www.biomoby.org/.
  • 3.43
    Impact points
    Supervised Lowess normalization of comparative genome hybridization data--application to lactococcal strain comparisons.

    Sacha A F T van Hijum, Richard J S Baerends, Aldert L Zomer, Harma A Karsens, Victoria Martin-Requena, Oswaldo Trelles, Jan Kok, Oscar P Kuipers

    BMC bioinformatics. 02/2008; 9:93.

    BACKGROUND: Array-based comparative genome hybridization (aCGH) is commonly used to determine the genomic content of bacterial strains. Since prokaryotes in general have less conserved genome sequences than eukaryotes, sequence divergences between the genes in the genomes used for an aCGH experiment... [more] BACKGROUND: Array-based comparative genome hybridization (aCGH) is commonly used to determine the genomic content of bacterial strains. Since prokaryotes in general have less conserved genome sequences than eukaryotes, sequence divergences between the genes in the genomes used for an aCGH experiment obstruct determination of genome variations (e.g. deletions). Current normalization methods do not take into consideration sequence divergence between target and microarray features and therefore cannot distinguish a difference in signal due to systematic errors in the data or due to sequence divergence. RESULTS: We present supervised Lowess, or S-Lowess, an application of the subset Lowess normalization method. By using a predicted subset of array features with minimal sequence divergence between the analyzed strains for the normalization procedure we remove systematic errors from dual-dye aCGH data in two steps: (1) determination of a subset of conserved genes (i.e. likely conserved genes, LCG); and (2) using the LCG for subset Lowess normalization. Subset Lowess determines the correction factors for systematic errors in the subset of array features and normalizes all array features using these correction factors. The performance of S-Lowess was assessed on aCGH experiments in which differentially labeled genomic DNA fragments of Lactococcus lactis IL1403 and L. lactis MG1363 strains were hybridized to IL1403 DNA microarrays. Since both genomes are sequenced and gene deletions identified, the success rate of different aCGH normalization methods in detecting these deletions in the MG1363 genome were determined. S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions. CONCLUSION: S-Lowess is implemented in a user-friendly web-tool accessible from http://bioinformatics.biol.rug.nl/websoftware/s-lowess. We demonstrate that it outperforms existing normalization methods and maximizes detection of genomic variation (e.g. deletions) from microbial aCGH data.
  • Saturation and Quantization Reduction in Microarray Experiments using Two Scans at Different Sensitivities

    Jorge García de la Nava, Sacha van Hijum, Oswaldo Trelles

    Statistical Applications in Genetics and Molecular Biology. 02/2007; 3(1).

    We present a mathematical model to extend the dynamic range of gene expression data measured by laser scanners. The strategy is based on the rather simple but novel idea of producing two images with different scanner sensitivities, obtaining two different sets of expression values: the first is a lo... [more] We present a mathematical model to extend the dynamic range of gene expression data measured by laser scanners. The strategy is based on the rather simple but novel idea of producing two images with different scanner sensitivities, obtaining two different sets of expression values: the first is a low-sensitivity measure to obtain high expression values which would be saturated in a high-sensitivity measure; the second, by the converse strategy, obtains additional information about the low-expression levels. Two mathematical models based on linear and gamma curves are presented for relating the two measurements to each other and producing a coherent and extended range of values. The procedure minimizes the quantization relative error and avoids the collateral effects of saturation. Since most of the current scanner devices are able to adjust the saturation level, the strategy can be considered as a universal solution, and not dependent on the image processing software used for reading the DNA chip. Various tests have been performed, on both proprietary and public domain data sets, showing a reduction of the saturation and quantization effects, not achievable by other methods, with a more complete description of gene-expression data and with a reasonable computational complexity.
  • 2.57
    Impact points
    A new user-friendly software platform for systematic classification of skin lesions to aid in their diagnosis and prognosis.

    Manuel J Martín-Vázquez, Mario A Trelles, Alejandro Sola, R Glen Calderhead, Oswaldo Trelles

    Lasers in medical science. 05/2006; 21(1):54-60.

    BACKGROUND AND AIMS: The field of much less invasive nonablative aesthetic surgery continues to grow, but consistent and truly objective evaluation of the sometimes comparatively small improvements in the treated skin remains a problem for both clinicians and patients. In this work, we present the d... [more] BACKGROUND AND AIMS: The field of much less invasive nonablative aesthetic surgery continues to grow, but consistent and truly objective evaluation of the sometimes comparatively small improvements in the treated skin remains a problem for both clinicians and patients. In this work, we present the development of a generic, modular and expandable platform to allow user-friendly image manipulation, sampling extraction and computer-assisted evaluation of tissue features in the dermatological/aesthetic field of clinical medicine. MATERIALS AND METHODS: The unique characteristic of the platform is the modular extension of the algorithm gallery by the use of extended value added services, which enables the easy incorporation of new image processing procedures to customise the gallery for specific concerns. A novel algorithm to evaluate skin wrinkles is also presented as a demonstration of this integration process. The software platform is designed to evaluate image-tissue indices and to identify individual or combined descriptors which will more accurately represent differences in skin quality. It is based on a set of indices correlating clinical expert and computer classifications, which build up a constantly expanding tissue catalogue. By means of this catalogue, the different tissue qualities of photographic samples can be assessed according to the different positions of the samples in the catalogue. CONCLUSIONS: This new platform can be used to generate sensitive and objective comparative measurement not only for diagnostic reports on the pre-treatment condition of samples but also for demonstrating the improvement and efficacy of the prescribed treatment to both the clinician and colleagues and the patient, thereby helping to increase the patient satisfaction index.
  • 4.93
    Impact points
    Intelligent client for integrating bioinformatics services.

    Ismael Navas-Delgado, Maria del Mar Rojano-Muñoz, Sergio Ramírez, Antonio J Pérez, Eduardo Andrés León, Jose F Aldana Montes, Oswaldo Trelles

    Bioinformatics (Oxford, England). 02/2006; 22(1):106-11.

    MOTIVATION: In addition to existing bioinformatics software, a lot of new tools are being developed world wide to supply services for an ever growing, widely dispersed and heterogeneous collection of biological data. The integration of these resources under a common platform is a challenging task. T... [more] MOTIVATION: In addition to existing bioinformatics software, a lot of new tools are being developed world wide to supply services for an ever growing, widely dispersed and heterogeneous collection of biological data. The integration of these resources under a common platform is a challenging task. To this end, several groups are developing integration technologies, in which services are usually registered in some sort of catalogue to allow novel discovering and accessing mechanisms to be implemented. However, each service demands specific interfaces to accommodate their parameters and it is a complicated task linking the different service inputs and outputs to solve a biological problem. RESULTS: In this work we address the design and implementation of a versatile web client to access BioMOBY compatible services (a system by which a client can interact with multiple sources of biological data regardless of the underlying format or schema) using the service description stored in the BioMOBY catalogue. The automatic interface generator significantly reduces developing time and produces uniform service access mechanisms. The design and proof of concept (for such a client) including the generic interface generator have been developed and implemented in the National Institute for Bioinformatics in Spain. AVAILABILITY: The INB (National Institute for Bioinformatics, Spain) platform is available at www.inab.org/MOWServ
  • 3.43
    Impact points
    Integrated analysis of gene expression by Association Rules Discovery.

    Pedro Carmona-Saez, Monica Chagoyen, Andres Rodriguez, Oswaldo Trelles, Jose M Carazo, Alberto Pascual-Montano

    BMC bioinformatics. 02/2006; 7:54.

    BACKGROUND: Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biologic... [more] BACKGROUND: Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process. RESULTS: In this study we present a method for the integrative analysis of microarray data based on the Association Rules Discovery data mining technique. The approach integrates gene annotations and expression data to discover intrinsic associations among both data sources based on co-occurrence patterns. We applied the proposed methodology to the analysis of gene expression datasets in which genes were annotated with metabolic pathways, transcriptional regulators and Gene Ontology categories. Automatically extracted associations revealed significant relationships among these gene attributes and expression patterns, where many of them are clearly supported by recently reported work. CONCLUSION: The integration of external biological information and gene expression data can provide insights about the biological processes associated to gene expression programs. In this paper we show that the proposed methodology is able to integrate multiple gene annotations and expression data in the same analytic framework and extract meaningful associations among heterogeneous sources of data. An implementation of the method is included in the Engene software package.
  • Distributed Execution of Workflows.

    Ismael Navas Delgado, José Francisco Aldana Montes, Oswaldo Trelles

    Computational Science - ICCS 2006, 6th International Conference, Reading, UK, May 28-31, 2006, Proceedings, Part III; 01/2006

  • Distributed Execution of Workflows in the INB.

    Ismael Navas Delgado, Antonio Jesús Pérez, José Francisco Aldana Montes, Oswaldo Trelles

    Data Integration in the Life Sciences, Third International Workshop, DILS 2006, Hinxton, UK, July 20-22, 2006, Proceedings; 01/2006

Following (17)

Topics (1)

38
Publications
18
Followers