[Show abstract][Hide abstract] ABSTRACT: Background
Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies.ResultsA supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile.Conclusions
The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
[Show abstract][Hide abstract] ABSTRACT: In microarray data analysis, traditional methods that focus on single genes are increasingly replaced by methods that analyse functional units corresponding to biochemical pathways, as these are considered to offer more insight into gene expression and disease associations. However, the development of robust pipelines to relate genotypic functional modules to disease phenotypes through known molecular interactions is still at its early stages.
In this article we first discuss methodologies that employ groups of genes in disease classification tasks that aim to link gene expression patterns with disease outcome. Then we present a pathway-based approach for disease classification through a mathematical programming model based on hyper-box principles. Association rules derived from the model are extracted and discussed with respect to pathway-specific molecular patterns related to the disease. Overall, we argue that the use of gene sets corresponding to disease-relevant pathways is a promising route to uncover expression-to-phenotype relations in disease classification and we illustrate the potential of hyper-box classification in assessing the predictive power of functional pathways and uncover the effect of specific genes in the prediction of disease phenotypes.
[Show abstract][Hide abstract] ABSTRACT: Cytokines are critical checkpoints of inflammation. The treatment of human autoimmune disease has been revolutionized by targeting inflammatory cytokines as key drivers of disease pathogenesis. Despite this, there exist numerous pitfalls when translating preclinical data into the clinic. We developed an integrative biology approach combining human disease transcriptome data sets with clinically relevant in vivo models in an attempt to bridge this translational gap. We chose interleukin-22 (IL-22) as a model cytokine because of its potentially important proinflammatory role in epithelial tissues. Injection of IL-22 into normal human skin grafts produced marked inflammatory skin changes resembling human psoriasis. Injection of anti-IL-22 monoclonal antibody in a human xenotransplant model of psoriasis, developed specifically to test potential therapeutic candidates, efficiently blocked skin inflammation. Bioinformatic analysis integrating both the IL-22 and anti-IL-22 cytokine transcriptomes and mapping them onto a psoriasis disease gene coexpression network identified key cytokine-dependent hub genes. Using knockout mice and small-molecule blockade, we show that one of these hub genes, the so far unexplored serine/threonine kinase PIM1, is a critical checkpoint for human skin inflammation and potential future therapeutic target in psoriasis. Using in silico integration of human data sets and biological models, we were able to identify a new target in the treatment of psoriasis.
Science translational medicine 02/2014; 6(223):223ra22. DOI:10.1126/scitranslmed.3007217 · 15.84 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Malignant melanoma, the most lethal skin cancer, is considered as a representative model for cross talk between immune responses and malignancy. Efforts to elucidate the nature of these interactions have translated into immunotherapeutic strategies. Adjuvant therapeutics such as IL-2 and IFNα2b have reached clinical application, and emerging therapies targeting key immunomodulatory molecules such as CTLA-4 have renewed excitement in the field, highlighting the potential of manipulating immune responses in the clinical setting, but also the merits for further elucidating complex underlying immunological pathways. Screening technologies have yielded new insights leading to identification of biomarkers for disease prognosis and applied clinical immunotherapies. The promise of systems biology is to integrate diverse biomedical characterizations into detailed models of underlying mechanisms and therapies through suitable computational and mathematical formalisms. In this review, we discuss recent developments in dissecting the complex and diverse immune responses associated with melanoma through both computational and experimental means. We show the significance of devising new, improved approaches that can better serve as models of immune interactions and therapies. We propose that efforts in this direction may realize the potential of personalized medicine and facilitate development of the next generation of efficacious tools to treat patients.
Critical Reviews in Biomedical Engineering 11/2012; 40(4):279-294. DOI:10.1615/CritRevBiomedEng.v40.i4.40
[Show abstract][Hide abstract] ABSTRACT: Background
Psoriasis is an immune-mediated disease characterised by chronically elevated pro-inflammatory cytokine levels, leading to aberrant keratinocyte proliferation and differentiation. Although certain clinical phenotypes, such as plaque psoriasis, are well defined, it is currently unclear whether there are molecular subtypes that might impact on prognosis or treatment outcomes.
We present a pipeline for patient stratification through a comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls, to establish differences in RNA expression patterns across all tissue types. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes. This multi-stage procedure was applied to several published psoriasis studies and a comparison of gene expression patterns across datasets was performed.
Overall, classification of psoriasis gene expression patterns revealed distinct molecular sub-groups within the clinical phenotype of plaque psoriasis. Enrichment for TGFb and ErbB signaling pathways, noted in one of the two psoriasis subgroups, suggested that this group may be more amenable to therapies targeting these pathways. Our study highlights the potential biological relevance of using ensemble decision tree predictors to determine molecular disease subtypes, in what may initially appear to be a homogenous clinical group. The R code used in this paper is available upon request.
[Show abstract][Hide abstract] ABSTRACT: Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.
[Show abstract][Hide abstract] ABSTRACT: In microarray data analysis, traditional methods focusing either on all the genes or a single gene at a time are being replaced by methods based on sets of genes that correspond to biochemical pathways, to offer more informative strategies into disease associations. However, the development of robust pipelines to relate the genotype to disease phenotypes through known molecular interactions is still in its early stages. We report the use of a mathematical optimisation approach based on hyper-box principles to classify cancer samples within pathways into appropriate disease phenotypes. Most informative genes were identified based on non-overlapping constraints of the classification procedure and the algorithm showed good performance comparing to established classification protocols.
Computer Aided Chemical Engineering 12/2011; 29. DOI:10.1016/B978-0-444-54298-4.50088-X
[Show abstract][Hide abstract] ABSTRACT: Cellular ATP levels are generated by glucose-stimulated mitochondrial metabolism and determine metabolic responses, such as glucose-stimulated insulin secretion (GSIS) from the β-cells of pancreatic islets. We describe an analysis of the evolutionary processes affecting the core enzymes involved in glucose-stimulated insulin secretion in mammals. The proteins involved in this system belong to ancient enzymatic pathways: glycolysis, the TCA cycle and oxidative phosphorylation.
We identify two sets of proteins, or protein coalitions, in this group of 77 enzymes with distinct evolutionary patterns. Members of the glycolysis, TCA cycle, metabolite transport, pyruvate and NADH shuttles have low rates of protein sequence evolution, as inferred from a human-mouse comparison, and relatively high rates of evolutionary gene duplication. Respiratory chain and glutathione pathway proteins evolve faster, exhibiting lower rates of gene duplication. A small number of proteins in the system evolve significantly faster than co-pathway members and may serve as rapidly evolving adapters, linking groups of co-evolving genes.
Our results provide insights into the evolution of the involved proteins. We find evidence for two coalitions of proteins and the role of co-adaptation in protein evolution is identified and could be used in future research within a functional context.
[Show abstract][Hide abstract] ABSTRACT: A transgenic mouse model for conditional induction of long-term hibernation via myocardium-specific expression of a VEGF-sequestering soluble receptor allowed the dissection of the hibernation process into an initiation and a maintenance phase. The hypoxic initiation phase was characterized by peak levels of K(ATP) channel and glucose transporter 1 (GLUT1) expression. Glibenclamide, an inhibitor of K(ATP) channels, blocked GLUT1 induction. In the maintenance phase, tissue hypoxia and GLUT1 expression were reduced. Thus, we employed a combined "-omics" approach to resolve this cardioprotective adaptation process. Unguided bioinformatics analysis on the transcriptomic, proteomic and metabolomic datasets confirmed that anaerobic glycolysis was affected and that the observed enzymatic changes in cardiac metabolism were directly linked to hypoxia-inducible factor (HIF)-1 activation. Although metabolite concentrations were kept relatively constant, the combination of the proteomic and transcriptomic dataset improved the statistical confidence of the pathway analysis by 2 orders of magnitude. Importantly, proteomics revealed a reduced phosphorylation state of myosin light chain 2 and cardiac troponin I within the contractile apparatus of hibernating hearts in the absence of changes in protein abundance. Our study demonstrates how combining different "-omics" datasets aids in the identification of key biological pathways: chronic hypoxia resulted in a pronounced adaptive response at the transcript and the protein level to keep metabolite levels steady. This preservation of metabolic homeostasis is likely to contribute to the long-term survival of the hibernating myocardium.
Journal of Molecular and Cellular Cardiology 02/2011; 50(6):982-90. DOI:10.1016/j.yjmcc.2011.02.010 · 4.66 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The recent explosion of biological data and the concomitant proliferation of distributed databases make it challenging for biologists and bioinformaticians to discover the best data resources for their needs, and the most efficient way to access and use them. Despite a rapid acceleration in uptake of syntactic and semantic standards for interoperability, it is still difficult for users to find which databases support the standards and interfaces that they need. To solve these problems, several groups are developing registries of databases that capture key metadata describing the biological scope, utility, accessibility, ease-of-use and existence of web services allowing interoperability between resources. Here, we describe some of these initiatives including a novel formalism, the Database Description Framework, for describing database operations and functionality and encouraging good database practise. We expect such approaches will result in improved discovery, uptake and utilization of data resources.
Database URL: http://www.casimir.org.uk/casimir_ddf
Database The Journal of Biological Databases and Curation 01/2010; 2010:baq014. DOI:10.1093/database/baq014 · 3.37 Impact Factor