Javier Herrero

Centro de Investigación Príncipe Felipe, Valenza, Valencia, Spain

Are you Javier Herrero?

Claim your profile

Publications (24)79.43 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we compare various applications of supervised and unsupervised neural networks to the analysis of the gene expression profiles produced using DNA microarrays. In particular we are interested in the classification of samples or conditions. We have found that if gene expression profiles are clustered at the optimal level, the classification of conditions obtained using the average gene expression profile of each cluster is better than that obtained directly using all the gene expression profiles. If a supervised method (a back propagation neural network) is used instead of an unsupervised method, the efficiency of the classification of conditions increases. We studied the relative efficiencies of different clustering methods for reducing the dimensionality of the gene expression profile data set and found that the Self-Organising Tree Algorithm (SOTA) is a good choice for this task.
    05/2007: pages 91-103;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.
    Nucleic Acids Research 08/2006; 34(Web Server issue):W486-91. · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Expression Profile Analysis Suite, GEPAS, has been running for more than three years. With >76,000 experiments analysed during the last year and a daily average of almost 300 analyses, GEPAS can be considered a well-established and widely used platform for gene expression microarray data analysis. GEPAS is oriented to the analysis of whole series of experiments. Its design and development have been driven by the demands of the biomedical community, probably the most active collective in the field of microarray users. Although clustering methods have obviously been implemented in GEPAS, our interest has focused more on methods for finding genes differentially expressed among distinct classes of experiments or correlated to diverse clinical outcomes, as well as on building predictors. There is also a great interest in CGH-arrays which fostered the development of the corresponding tool in GEPAS: InSilicoCGH. Much effort has been invested in GEPAS for developing and implementing efficient methods for functional annotation of experiments in the proper statistical framework. Thus, the popular FatiGO has expanded to a suite of programs for functional annotation of experiments, including information on transcription factor binding sites, chromosomal location and tissues. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.
    Nucleic Acids Research 07/2005; 33(Web Server issue):W616-20. · 8.81 Impact Factor
  • 12/2004: pages 255-266;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since the first papers published in the late nineties, including, for the first time, a comprehensive analysis of microarray data, the number of questions that have been addressed through this technique have both increased and diversified. Initially, interest focussed on genes coexpressing across sets of experimental conditions, implying, essentially, the use of clustering techniques. Recently, however, interest has focussed more on finding genes differentially expressed among distinct classes of experiments, or correlated to diverse clinical outcomes, as well as in building predictors. In addition to this, the availability of accurate genomic data and the recent implementation of CGH arrays has made mapping expression and genomic data on the chromosomes possible. There is also a clear demand for methods that allow the automatic transfer of biological information to the results of microarray experiments. Different initiatives, such as the Gene Ontology (GO) consortium, pathways databases, protein functional motifs, etc., provide curated annotations for genes. Whereas many resources on the web focus mainly on clustering methods, GEPAS has evolved to cope with the aforementioned new challenges that have recently arisen in the field of microarray data analysis. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://gepas.bioinfo.cnio.es.
    Nucleic Acids Research 08/2004; 32(Web Server issue):W485-91. · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of genome-scale data from different high throughput techniques usually involves the grouping of genes based on experimental criteria. These groups are a consequence of the biological roles the genes are playing within the cell. Establishing which of these groups are functionally important is essential. Gene ontology terms provide a specialised vocabulary to describe the relevant biological properties of genes. We used a simple procedure to extract terms that are significantly over or under-represented in sets of genes within the context of a genome-scale experiment. Said procedure, which takes the multiple-testing nature of the statistical contrast into account, has been implemented as a Web application, FatiGO, allowing for easy and interactive querying. Several examples demonstrate its application and the type of information that can be extracted. Although a number of genes still lack gene ontology annotations, the results were informative enough to characterise the biological processes in the systems analysed.
    Neural Networks for Signal Processing, 2003. NNSP'03. 2003 IEEE 13th Workshop on; 10/2003
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es).
    Nucleic Acids Research 08/2003; 31(13):3461-7. · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We constructed two-dimensional representations of profiles of gene conservation across different genomes using the genome of Escherichia coli as a model. These profiles permit both the visualization at the genome level of different traits in the organism studied and, at the same time, reveal features related to the genomes analyzed (such as defective genomes or genomes that lack a particular system). Conserved genes are not uniformly distributed along the E. coli genome but tend to cluster together. The study of gene distribution patterns across genomes is important for the understanding of how sets of genes seem to be dependent on each other, probably having some functional link. This provides additional evidence that can be used for the elucidation of the function of unannotated genes. Clustering these patterns produces families of genes which can be arranged in a hierarchy of closeness. In this way, functions can be defined at different levels of generality depending on the level of the hierarchy that is studied. The combined study of conservation and phenotypic traits opens up the possibility of defining phenotype/genotype associations, and ultimately inferring the gene or genes responsible for a particular trait.
    Genome Research 06/2003; 13(5):991-8. · 14.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an interactive web tool for preprocessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, management of missing values, replicate handling, flat pattern filtering and pattern standardization and they are required before performing any pattern analysis. The processed data set can be sent to other pattern analysis tools.
    Bioinformatics 04/2003; 19(5):655-6. · 5.32 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Class prediction and feature selection is keyin the context of diagnostic applications ofDNA microarrays. Microarray data is noisy andtypically composed of a low number of samplesand a large number of genes. Perceptrons canconstitute an efficient tool for accurateclassification of microarray data.Nevertheless, the large input layers necessaryfor the direct application of perceptrons andthe low samples available for the trainingprocess hamper its use. Two strategies can betaken for an optimal use of a perceptron with afavourable balance between samples for trainingand the size of the input layer: (a) reducingthe dimensionality of the data set fromthousands to no more than one hundred, highlyinformative average values, and using theweights of the perceptron for feature selectionor (b) using a selection of only few genesthat produce an optimal classification with theperceptron. In this case, feature selection iscarried out first. Obviously, a combinedapproach is also possible. In this manuscriptwe explore and compare both alternatives. Westudy the informative contents of the data atdifferent levels of compression with a veryefficient clustering algorithm (Self OrganizingTree Algorithm). We show how a simple geneticalgorithm selects a subset of gene expressionvalues with 100% accuracy in theclassification of samples with maximumefficiency. Finally, the importance ofdimensionality reduction is discussed in lightof its capacity for reducing noise andredundancies in microarray data.
    Artificial Intelligence Review 01/2003; 20:39-51. · 1.57 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an interactive web tool for prepro- cessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, man- agement of missing values, replicate handling, flat pattern filtering and pattern standardization and they are required before performing any pattern analysis. The processed data set can be sent to other pattern analysis tools. Availability: The web interface is accessible through http:
    Bioinformatics. 01/2003; 19:655-656.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of DNA microarrays opens up the possibility of measuring the expression levels of thousands of genes simultaneously under different conditions. Time-course experiments allow researchers to study the dynamics of gene interactions. The inference of genetic networks from such measures can give important insights for the understanding of a variety of biological problems. Most of the existing methods for genetic network reconstruction require many experimental data points, or can only be applied to the reconstruction of small subnetworks. Here we present a method that reduces the dimensionality of the dataset and then extracts the significant dynamic correlations among genes. The method requires a number of points achievable in common time-course experiments.
    Comparative and Functional Genomics 01/2003; 4(1):148-54. · 0.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This manuscript describes a combined approach of unsupervised clustering followed by supervised learning that provides an efficient classification of conditions in DNA array gene expression experiments (different cell lines including some cancer types, in the cases shown). Firstly the dimensionality of the dataset of gene expression profiles is reduced to a number of non-redundant clusters of co-expressing genes using an unsupervised clustering algorithm, the Self Organizing Tree Algorithm (SOTA), a hierarchical version of Self Organizing Maps (SOM). Then, the average values of these clusters are used for training a perceptron that produces a very efficient classification of the conditions. This way of reducing the dimensionality of the data set seems to perform better than other ones previously proposed such as principal component analysis (PCA). In addition, the weights that connect the gene clusters to the different experimental conditions can be used to assess the relative importance of the genes in the definition of these classes. Finally, Gene Ontology (GO) terms are used to infer a possible biological role for these groups of genes and to asses the validity of the classification from a biological point of view.
    Journal of VLSI Signal Processing 01/2003; 35:245-253. · 0.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Interferon-alpha therapy has been shown to be active in the treatment of mycosis fungoides although the individual response to this therapy is unpredictable and dependent on essentially unknown factors. In an effort to better understand the molecular mechanisms of interferon-alpha resistance we have developed an interferon-alpha resistant variant from a sensitive cutaneous T-cell lymphoma cell line. We have performed expression analysis to detect genes differentially expressed between both variants using a cDNA microarray including 6386 cancer-implicated genes. The experiments showed that resistance to interferon-alpha is consistently associated with changes in the expression of a set of 39 genes, involved in signal transduction, apoptosis, transcription regulation, and cell growth. Additional studies performed confirm that STAT1 and STAT3 expression and interferon-alpha induction and activation are not altered between both variants. The gene MAL, highly overexpressed by resistant cells, was also found to be expressed by tumoral cells in a series of cutaneous T-cell lymphoma patients treated with interferon-alpha and/or photochemotherapy. MAL expression was associated with longer time to complete remission. Time-course experiments of the sensitive and resistant cells showed a differential expression of a subset of genes involved in interferon-response (1 to 4 hours), cell growth and apoptosis (24 to 48 hours.), and signal transduction.
    American Journal Of Pathology 12/2002; 161(5):1825-37. · 4.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Expression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups.
    Journal of Biotechnology 10/2002; 98(2-3):269-83. · 3.18 Impact Factor
  • Source
    L. Conde, A. Mateos, J. Herrero, J. Dopazo
    [Show abstract] [Hide abstract]
    ABSTRACT: This manuscript describes a combined approach of unsupervised clustering followed by supervised learning that provides an efficient classification of conditions in DNA array gene expression experiments (different cell lines including some cancer types, in the cases shown). Firstly the dimensionality of the dataset of gene expression profiles is reduced to a number of non-redundant clusters of co-expressing genes using an unsupervised clustering algorithm, the Self Organizing Tree Algorithm (SOTA), a hierarchical version of Self Organizing Maps (SOM). Then, the average values of these clusters are used for the training of a perception that produces a very efficient classification of the conditions. This way of reducing the dimensionality of the data set seems to perform better than other ones previously proposed such as PCA. In addition, the weights that connect the gene clusters to the different experimental conditions can be used to assess the relative importance of the genes in the definition of these classes. Finally, Gene Ontology (GO) terms are used to infer a possible biological role for these groups of genes and to asses the validity of the classification from a biological point of view.
    Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on; 02/2002
  • Source
    Álvaro Mateos, Javier Herrero, Joaquín Dopazo
    [Show abstract] [Hide abstract]
    ABSTRACT: The success of the application of neural networks to DNA microarray data comes from their efficiency in dealing with noisy data. Here we describe a combined approach that provides, at the same time, an accurate classification of samples in DNA microarray gene expression experiments (different cancer cell lines, in this case) and allows the extraction of the gene, or clusters of co-expressing genes, that account for these differences. Firstly we reduce the dataset of gene expression profiles to a number of non-redundant clusters of co-expressing genes. Then, the cluster’s average values are used for training a perceptron, that produces an accurate classification of different classes of cell lines. The weights that connect the gene clusters to the cell lines are used to asses the relative importance of the genes in the definition of these classes. Finally, the biological role for these groups of genes is discussed.
    Artificial Neural Networks - ICANN 2002, International Conference, Madrid, Spain, August 28-30, 2002, Proceedings; 01/2002
  • Source
    Javier Herrero, Joaquín Dopazo
    [Show abstract] [Hide abstract]
    ABSTRACT: Self-organizing maps (SOM) constitute an alternative to classical clustering methods because of its linear run times and superior performance to deal with noisy data. Nevertheless, the clustering obtained with SOM is dependent on the relative sizes of the clusters. Here, we show how the combination of SOM with hierarchical clustering methods constitutes an excellent tool for exploratory analysis of massive data like DNA microarray expression patterns.
    Journal of Proteome Research 01/2002; 1(5):467-70. · 5.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interferon-α therapy has been shown to be active in the treatment of mycosis fungoides although the individual response to this therapy is unpredictable and dependent on essentially unknown factors. In an effort to better understand the molecular mechanisms of interferon-α resistance we have developed an interferon-α resistant variant from a sensitive cutaneous T-cell lymphoma cell line. We have performed expression analysis to detect genes differentially expressed between both variants using a cDNA microarray including 6386 cancer-implicated genes. The experiments showed that resistance to interferon-α is consistently associated with changes in the expression of a set of 39 genes, involved in signal transduction, apoptosis, transcription regulation, and cell growth. Additional studies performed confirm that STAT1 and STAT3 expression and interferon-α induction and activation are not altered between both variants. The gene MAL, highly overexpressed by resistant cells, was also found to be expressed by tumoral cells in a series of cutaneous T-cell lymphoma patients treated with interferon-α and/or photochemotherapy. MAL expression was associated with longer time to complete remission. Time-course experiments of the sensitive and resistant cells showed a differential expression of a subset of genes involved in interferon-response (1 to 4 hours), cell growth and apoptosis (24 to 48 hours.), and signal transduction.
    American Journal of Pathology - AMER J PATHOL. 01/2002; 161(5):1825-1837.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol. , 44 , 226–233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. Results: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. Availability: A server running the program can be found at: http://bioinfo.cnio.es/sotarray Contact: jdopazo@cnio.es * To whom correspondence should be addressed.
    Bioinformatics 03/2001; · 5.32 Impact Factor

Publication Stats

1k Citations
79.43 Total Impact Points

Institutions

  • 2006
    • Centro de Investigación Príncipe Felipe
      • Department of Bioinformatics and Genomics
      Valenza, Valencia, Spain
  • 2000–2005
    • Centro Nacional de Investigaciones Oncológicas
      • Molecular Pathology Programme
      Madrid, Madrid, Spain
  • 2002
    • Hospital Universitario de La Princesa
      Madrid, Madrid, Spain