Research experience
-
Jul 2003–
Jun 2005Research: Royal Holloway, University of London
Royal Holloway, University of LondonUnited Kingdom · London -
Dec 2000–
Dec 2012Research: Helsingin yliopisto
Helsingin yliopisto · Department of Computer ScienceFinland · Helsinki
Publications (65) View all
-
Article: Biomarker discovery by sparse canonical correlation analysis of complex clinical phenotypes of tuberculosis and malaria.
[show abstract] [hide abstract]
ABSTRACT: Biomarker discovery aims to find small subsets of relevant variables in 'omics data that correlate with the clinical syndromes of interest. Despite the fact that clinical phenotypes are usually characterized by a complex set of clinical parameters, current computational approaches assume univariate targets, e.g. diagnostic classes, against which associations are sought for. We propose an approach based on asymmetrical sparse canonical correlation analysis (SCCA) that finds multivariate correlations between the 'omics measurements and the complex clinical phenotypes. We correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. We discovered relevant 'omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5-3% of all 'omic variables. We show that using clinical view projections we obtain remarkable improvements in diagnostic class prediction, up to 11% in tuberculosis and up to 5% in malaria. Our approach finds proteomic-biomarkers that correlate with complex combinations of clinical-biomarkers. Using the clinical-biomarkers improves the accuracy of diagnostic class prediction while not requiring the measurement plasma proteomic profiles of each subject. Our approach makes it feasible to use omics' data to build accurate diagnostic algorithms that can be deployed to community health centres lacking the expensive 'omics measurement capabilities.PLoS Computational Biology 04/2013; 9(4):e1003018. · 5.22 Impact Factor -
Article: Metabolite identification and molecular fingerprint prediction through machine learning.
[show abstract] [hide abstract]
ABSTRACT: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. markus.heinonen@cs.helsinki.fi.Bioinformatics 07/2012; 28(18):2333-41. · 5.47 Impact Factor -
Conference Proceeding: Minimum Mutation Algorithm for Gapless Metabolic Network Evolution.
Esa Pitkänen, Juho Rousu, Mikko ArvasBIOINFORMATICS 2011 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, Rome, Italy, 26-29 January, 2011; 01/2011 -
SourceAvailable from: Juho Rousu
Article: Computing atom mappings for biochemical reactions without subgraph isomorphism.
[show abstract] [hide abstract]
ABSTRACT: The ability to trace the fate of individual atoms through the metabolic pathways is needed in many applications of systems biology and drug discovery. However, this information is not immediately available from the most common metabolome studies and needs to be separately acquired. Automatic discovery of correspondence of atoms in biochemical reactions is called the "atom mapping problem." We suggest an efficient approach for solving the atom mapping problem exactly--finding mappings of minimum edge edit distance. The algorithm is based on A* search equipped with sophisticated heuristics for pruning the search space. This approach has clear advantages over the commonly used heuristic approach of iterative maximum common subgraph (MCS) algorithm: we explicitly minimize an objective function, and we produce solutions that typically require less manual curation. The two methods are similar in computational resource demands. We compare the performance of the proposed algorithm against several alternatives on data obtained from the KEGG LIGAND and RPAIR databases: greedy search, bi-partite graph matching, and the MCS approach. Our experiments show that alternative approaches often fail in finding mappings with minimum edit distance.Journal of computational biology: a journal of computational molecular cell biology 01/2011; 18(1):43-58. · 1.69 Impact Factor -
Article: Computational methods for metabolic reconstruction.
Esa Pitkänen, Juho Rousu, Esko Ukkonen[show abstract] [hide abstract]
ABSTRACT: In the wake of numerous sequenced genomes becoming available, computational methods for the reconstruction of metabolic networks have received considerable attention. Here, we review recent methods and software tools useful along the reconstruction workflow, from sequence annotation and network assembly to model verification and testing against experimental data. Reconstruction methods can be divided into three categories, depending on the magnitude of network context which is taken into account in the process of assembling the metabolic model: First, each enzyme may be predicted independently by annotation transfer or machine learning methods. Second, the presence of a metabolic pathway may be detected from genome and experimental evidence, often utilizing a reference pathway database. Third, the method may attempt to directly reconstruct a consistent metabolic network without relying on predefined reference pathways. Regardless of the chosen context, all methods strive to reconstruct genome-scale metabolic reconstructions. Currently a gap exists between software platforms dedicated to genome annotation and computational tools for automatically repairing network inconsistencies and validating against measurement data. We argue that to accelerate the reconstruction efforts, computational tools need to be developed that bridge the phases of the reconstruction workflow. In particular, the goal of finding consistent metabolic models suitable for computational analysis should be taken into account already in the beginning phases of reconstruction.Current opinion in biotechnology 02/2010; 21(1):70-7. · 7.82 Impact Factor