Publications (76)354.07 Total impact
-
Article: Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles.
[show abstract] [hide abstract]
ABSTRACT: Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.PLoS ONE 01/2013; 8(1):e52854. · 4.09 Impact Factor -
Article: Experimental evidence validating the computational inference of functional associations from gene fusion events: a critical survey.
[show abstract] [hide abstract]
ABSTRACT: More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.Briefings in Bioinformatics 12/2012; · 5.20 Impact Factor -
Article: Transcriptome classification reveals molecular subtypes in psoriasis.
[show abstract] [hide abstract]
ABSTRACT: BACKGROUND: Psoriasis is an immune-mediated disease characterised by chronically elevated pro-inflammatory cytokine levels, leading to aberrant keratinocyte proliferation and differentiation. Although certain clinical phenotypes, such as plaque psoriasis, are well defined, it is currently unclear whether there are molecular subtypes that might impact on prognosis or treatment outcomes. RESULTS: We present a pipeline for patient stratification through a comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls, to establish differences in RNA expression patterns across all tissue types. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes. This multi-stage procedure was applied to several published psoriasis studies and a comparison of gene expression patterns across datasets was performed. CONCLUSION: Overall, classification of psoriasis gene expression patterns revealed distinct molecular sub-groups within the clinical phenotype of plaque psoriasis. Enrichment for TGFb and ErbB signaling pathways, noted in one of the two psoriasis subgroups, suggested that this group may be more amenable to therapies targeting these pathways. Our study highlights the potential biological relevance of using ensemble decision tree predictors to determine molecular disease subtypes, in what may initially appear to be a homogenous clinical group. The R code used in this paper is available upon request.BMC Genomics 09/2012; 13(1):472. · 4.07 Impact Factor -
Article: The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence
Genes. 05/2012; 3:291-319. -
Article: BioTextQuest: a web-based biomedical text mining suite for concept discovery.
[show abstract] [hide abstract]
ABSTRACT: BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.Bioinformatics 12/2011; 27(23):3327-8. · 5.47 Impact Factor -
Article: BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Bioinformatics. 01/2011; 27:3327-3328. -
Article: Protein coalitions in a core mammalian biochemical network linked by rapidly evolving proteins.
[show abstract] [hide abstract]
ABSTRACT: Cellular ATP levels are generated by glucose-stimulated mitochondrial metabolism and determine metabolic responses, such as glucose-stimulated insulin secretion (GSIS) from the β-cells of pancreatic islets. We describe an analysis of the evolutionary processes affecting the core enzymes involved in glucose-stimulated insulin secretion in mammals. The proteins involved in this system belong to ancient enzymatic pathways: glycolysis, the TCA cycle and oxidative phosphorylation. We identify two sets of proteins, or protein coalitions, in this group of 77 enzymes with distinct evolutionary patterns. Members of the glycolysis, TCA cycle, metabolite transport, pyruvate and NADH shuttles have low rates of protein sequence evolution, as inferred from a human-mouse comparison, and relatively high rates of evolutionary gene duplication. Respiratory chain and glutathione pathway proteins evolve faster, exhibiting lower rates of gene duplication. A small number of proteins in the system evolve significantly faster than co-pathway members and may serve as rapidly evolving adapters, linking groups of co-evolving genes. Our results provide insights into the evolution of the involved proteins. We find evidence for two coalitions of proteins and the role of co-adaptation in protein evolution is identified and could be used in future research within a functional context.BMC Evolutionary Biology 01/2011; 11:142. · 3.52 Impact Factor -
Article: Functional Genomics Assistant (FUGA): a toolbox for the analysis of complex biological networks.
[show abstract] [hide abstract]
ABSTRACT: Cellular constituents such as proteins, DNA, and RNA form a complex web of interactions that regulate biochemical homeostasis and determine the dynamic cellular response to external stimuli. It follows that detailed understanding of these patterns is critical for the assessment of fundamental processes in cell biology and pathology. Representation and analysis of cellular constituents through network principles is a promising and popular analytical avenue towards a deeper understanding of molecular mechanisms in a system-wide context. We present Functional Genomics Assistant (FUGA) - an extensible and portable MATLAB toolbox for the inference of biological relationships, graph topology analysis, random network simulation, network clustering, and functional enrichment statistics. In contrast to conventional differential expression analysis of individual genes, FUGA offers a framework for the study of system-wide properties of biological networks and highlights putative molecular targets using concepts of systems biology. FUGA offers a simple and customizable framework for network analysis in a variety of systems biology applications. It is freely available for individual or academic use at http://code.google.com/p/fuga.BMC Research Notes 01/2011; 4:462. -
Article: Genome-wide expression patterns in physiological cardiac hypertrophy.
[show abstract] [hide abstract]
ABSTRACT: Genome-wide expression patterns in physiological cardiac hypertrophy. Co-expression patterns in physiological cardiac hypertrophy In this study, the first large-scale analysis of publicly available genome-wide expression data of several in vivo murine models of physiological LVH was carried out using network analysis. On evaluating 3 million gene co-expression patterns across 141 relevant microarray experiments, it was found that physiological adaptation is an evolutionarily conserved processes involving preservation of the function of cytochrome c oxidase, induction of autophagy compatible with cell survival, and coordinated regulation of angiogenesis. This analysis not only identifies known biological pathways involved in physiological LVH, but also offers novel insights into the molecular basis of this phenotype by identifying key networks of co-expressed genes, as well as their topological and functional properties, using relevant high-quality microarray experiments and network inference.BMC Genomics 10/2010; 11:557. · 4.07 Impact Factor -
Conference Proceeding: Clustering of discrete and fuzzy phylogenetic profiles
5th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '10; 10/2010 -
Article: Promoter complexity and tissue-specific expression of stress response components in Mytilus galloprovincialis, a sessile marine invertebrate species.
[show abstract] [hide abstract]
ABSTRACT: The mechanisms of stress tolerance in sessile animals, such as molluscs, can offer fundamental insights into the adaptation of organisms for a wide range of environmental challenges. One of the best studied processes at the molecular level relevant to stress tolerance is the heat shock response in the genus Mytilus. We focus on the upstream region of Mytilus galloprovincialis Hsp90 genes and their structural and functional associations, using comparative genomics and network inference. Sequence comparison of this region provides novel evidence that the transcription of Hsp90 is regulated via a dense region of transcription factor binding sites, also containing a region with similarity to the Gamera family of LINE-like repetitive sequences and a genus-specific element of unknown function. Furthermore, we infer a set of gene networks from tissue-specific expression data, and specifically extract an Hsp class-associated network, with 174 genes and 2,226 associations, exhibiting a complex pattern of expression across multiple tissue types. Our results (i) suggest that the heat shock response in the genus Mytilus is regulated by an unexpectedly complex upstream region, and (ii) provide new directions for the use of the heat shock process as a biosensor system for environmental monitoring.PLoS Computational Biology 01/2010; 6(7):e1000847. · 5.22 Impact Factor -
Article: Stratification of co-evolving genomic groups using ranked phylogenetic profiles.
[show abstract] [hide abstract]
ABSTRACT: Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.BMC Bioinformatics 01/2009; 10:355. · 2.75 Impact Factor -
Article: PuReD-MCL: a graph-based PubMed document clustering methodology.
Bioinformatics. 01/2008; 24:1935-1941. -
Article: Science communication media for scientists and the public
EMBO Reports 09/2007; 8(10):886-887. · 7.36 Impact Factor -
Article: Lineage-specific partitions in archaeal transcription.
[show abstract] [hide abstract]
ABSTRACT: The phylogenetic distribution of the components comprising the transcriptional machinery in the crenarchaeal and euryarchaeal lineages of the Archaea was analyzed in a systematic manner by genome-wide profiling of transcription complements in fifteen complete archaeal genome sequences. Initially, a reference set of transcription-associated proteins (TAPs) consisting of sequences functioning in all aspects of the transcriptional process, and originating from the three domains of life, was used to query the genomes. TAP-families were detected by sequence clustering of the TAPs and their archaeal homologues, and through extensive database searching, these families were assigned a function. The phylogenetic origins of archaeal genes matching hidden Markov model profiles of protein domains associated with transcription, and those encoding the TAP-homologues, showed there is extensive lineage-specificity of proteins that function as regulators of transcription: most of these sequences are present solely in the Euryarchaeota, with nearly all of them homologous to bacterial DNA-binding proteins. Strikingly, the hidden Markov model profile searches revealed that archaeal chromatin and histone-modifying enzymes also display extensive taxon-restrictedness, both across and within the two phyla.Archaea (Vancouver, B.C.) 06/2007; 2(2):117-25. -
Article: Denoising inferred functional association networks obtained by gene fusion analysis.
[show abstract] [hide abstract]
ABSTRACT: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions. We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function.BMC Genomics 02/2007; 8:460. · 4.07 Impact Factor -
Article: CORRIE: enzyme sequence annotation with confidence estimates.
[show abstract] [hide abstract]
ABSTRACT: Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/.BMC Bioinformatics 02/2007; 8 Suppl 4:S3. · 2.75 Impact Factor -
Article: Highly consistent patterns for inherited human diseases at the molecular level.
[show abstract] [hide abstract]
ABSTRACT: Over 1600 mammalian genes are known to cause an inherited disorder, when subjected to one or more mutations. These disease genes represent a unique resource for the identification and quantification of relationships between phenotypic attributes of a disease and the molecular features of the associated disease genes, including their ascribed annotated functional classes and expression patterns. Such analyses can provide a more global perspective and a deeper understanding of the probable causes underlying human hereditary diseases. In this perspective and critical view of disease genomics, we present a comparative analysis of genes reported to cause inherited diseases in humans in terms of their causative effects on physiology, their genetics and inheritance modes, the functional processes they are involved in and their expression profiles across a wide spectrum of tissues. Our analysis reveals that there are more extensive correlations between these attributes of genetic disease genes than previously appreciated. For instance, the functional pattern of genes causing dominant and recessive diseases is markedly different. Also, the function of the genes and their expression correlate with the type of disease they cause when mutated. The results further indicate that a comparative genomics approach for the analysis of genes linked to human genetic diseases will facilitate the elucidation of the underlying molecular and cellular mechanisms.Bioinformatics 03/2006; 22(3):269-77. · 5.47 Impact Factor -
Article: Structural and functional properties of genes involved in human cancer.
[show abstract] [hide abstract]
ABSTRACT: One of the main goals of cancer genetics is to identify the causative elements at the molecular level leading to cancer. We have conducted an analysis of a set of genes known to be involved in cancer in order to unveil their unique features that can assist towards the identification of new candidate cancer genes. We have detected key patterns in this group of genes in terms of the molecular function or the biological process in which they are involved as well as sequence properties. Based on these features we have developed an accurate Bayesian classification model with which human genes have been scored for their likelihood of involvement in cancer.BMC Genomics 02/2006; 7:3. · 4.07 Impact Factor -
Article: Ancestral state reconstructions for genomes.
[show abstract] [hide abstract]
ABSTRACT: The recent expansion of phylogenetic analysis from the traditional field of molecular evolution, analyzing histories of genes, to the nascent field of "genomic evolution", analyzing histories of entire genomes, enables the construction of trees based on genome information, the quantification of the key processes that shape genome content and, ultimately, plausible parsimony reconstructions of ancestral genomes. Thus, when genomes are considered as phylogenetic characters, it is possible to reconstruct not only the history of species but also the ancestral states in terms of genome structure or function. In the future, we might be able to accurately reconstruct--or retrodict--a chain of events that led to the emergence of a specific genome sequence and, ultimately, to synthesize ancestral genomes at will, creating a "Jurassic database" of genomes.Current Opinion in Genetics & Development 01/2006; 15(6):595-600. · 8.09 Impact Factor
Top Journals
- Bioinformatics (11)
- Nucleic Acids Research (5)
- Genome Research (4)
- BMC Bioinformatics (4)
- BMC Genomics (4)
Institutions
-
2013
-
Aristotle University of Thessaloniki
- Division of Electronics and Computer Engineering
Thessaloníki, Kentriki Makedonia, Greece
-
-
2009
-
Tel Aviv University
Tel Aviv, Tel Aviv, Israel -
King's College London
London, ENG, United Kingdom
-
-
2007
-
Ecole normale supérieure de Lyon
Lyon, Rhone-Alpes, France
-
-
2002–2007
-
EMBL-EBI
Cambridge, ENG, United Kingdom -
SRI International
Menlo Park, CA, USA -
Universitat Rovira i Virgili
- Department of Biochemistry and Biotechnology
Tarragona, Catalonia, Spain
-
-
2006
-
University Pompeu Fabra
- Center for Genomic Regulation (CRG)
Barcelona, Catalonia, Spain
-
-
2005
-
Belgian Nuclear Research Centre
Mol, VLG, Belgium
-
-
2003–2005
-
University of Cambridge
- Cambridge Institute of Public Health
Cambridge, ENG, United Kingdom -
Medical Research Council (UK)
London, ENG, United Kingdom
-
-
2000
-
Harokopion University of Athens
Athens, Attiki, Greece
-