[Show abstract][Hide abstract] ABSTRACT: Background
Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies.ResultsA supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile.Conclusions
The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
[Show abstract][Hide abstract] ABSTRACT: In microarray data analysis, traditional methods that focus on single genes are increasingly replaced by methods that analyse functional units corresponding to biochemical pathways, as these are considered to offer more insight into gene expression and disease associations. However, the development of robust pipelines to relate genotypic functional modules to disease phenotypes through known molecular interactions is still at its early stages.
In this article we first discuss methodologies that employ groups of genes in disease classification tasks that aim to link gene expression patterns with disease outcome. Then we present a pathway-based approach for disease classification through a mathematical programming model based on hyper-box principles. Association rules derived from the model are extracted and discussed with respect to pathway-specific molecular patterns related to the disease. Overall, we argue that the use of gene sets corresponding to disease-relevant pathways is a promising route to uncover expression-to-phenotype relations in disease classification and we illustrate the potential of hyper-box classification in assessing the predictive power of functional pathways and uncover the effect of specific genes in the prediction of disease phenotypes.
No preview · Article · Sep 2014 · Mathematical Biosciences
[Show abstract][Hide abstract] ABSTRACT: Cytokines are critical checkpoints of inflammation. The treatment of human autoimmune disease has been revolutionized by targeting inflammatory cytokines as key drivers of disease pathogenesis. Despite this, there exist numerous pitfalls when translating preclinical data into the clinic. We developed an integrative biology approach combining human disease transcriptome data sets with clinically relevant in vivo models in an attempt to bridge this translational gap. We chose interleukin-22 (IL-22) as a model cytokine because of its potentially important proinflammatory role in epithelial tissues. Injection of IL-22 into normal human skin grafts produced marked inflammatory skin changes resembling human psoriasis. Injection of anti-IL-22 monoclonal antibody in a human xenotransplant model of psoriasis, developed specifically to test potential therapeutic candidates, efficiently blocked skin inflammation. Bioinformatic analysis integrating both the IL-22 and anti-IL-22 cytokine transcriptomes and mapping them onto a psoriasis disease gene coexpression network identified key cytokine-dependent hub genes. Using knockout mice and small-molecule blockade, we show that one of these hub genes, the so far unexplored serine/threonine kinase PIM1, is a critical checkpoint for human skin inflammation and potential future therapeutic target in psoriasis. Using in silico integration of human data sets and biological models, we were able to identify a new target in the treatment of psoriasis.
Full-text · Article · Feb 2014 · Science translational medicine
[Show abstract][Hide abstract] ABSTRACT: Malignant melanoma, the most lethal skin cancer, is considered as a representative model for cross talk between immune responses and malignancy. Efforts to elucidate the nature of these interactions have translated into immunotherapeutic strategies. Adjuvant therapeutics such as IL-2 and IFNα2b have reached clinical application, and emerging therapies targeting key immunomodulatory molecules such as CTLA-4 have renewed excitement in the field, highlighting the potential of manipulating immune responses in the clinical setting, but also the merits for further elucidating complex underlying immunological pathways. Screening technologies have yielded new insights leading to identification of biomarkers for disease prognosis and applied clinical immunotherapies. The promise of systems biology is to integrate diverse biomedical characterizations into detailed models of underlying mechanisms and therapies through suitable computational and mathematical formalisms. In this review, we discuss recent developments in dissecting the complex and diverse immune responses associated with melanoma through both computational and experimental means. We show the significance of devising new, improved approaches that can better serve as models of immune interactions and therapies. We propose that efforts in this direction may realize the potential of personalized medicine and facilitate development of the next generation of efficacious tools to treat patients.
No preview · Article · Nov 2012 · Critical Reviews in Biomedical Engineering
[Show abstract][Hide abstract] ABSTRACT: Background
Psoriasis is an immune-mediated disease characterised by chronically elevated pro-inflammatory cytokine levels, leading to aberrant keratinocyte proliferation and differentiation. Although certain clinical phenotypes, such as plaque psoriasis, are well defined, it is currently unclear whether there are molecular subtypes that might impact on prognosis or treatment outcomes.
We present a pipeline for patient stratification through a comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls, to establish differences in RNA expression patterns across all tissue types. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes. This multi-stage procedure was applied to several published psoriasis studies and a comparison of gene expression patterns across datasets was performed.
Overall, classification of psoriasis gene expression patterns revealed distinct molecular sub-groups within the clinical phenotype of plaque psoriasis. Enrichment for TGFb and ErbB signaling pathways, noted in one of the two psoriasis subgroups, suggested that this group may be more amenable to therapies targeting these pathways. Our study highlights the potential biological relevance of using ensemble decision tree predictors to determine molecular disease subtypes, in what may initially appear to be a homogenous clinical group. The R code used in this paper is available upon request.
[Show abstract][Hide abstract] ABSTRACT: Graphical representation to illustrate the relationship between 27 highly discriminative genes and disease sub-groups according to Gini Index calculated from RF for the Yao dataset. Light blue to green rectangular bands represent the four skin-types (PP01: light blue, PP02: blue, NN: light- green, PN: green) and are followed by purple to orange rectangular bands representing relevant genes (arranged clockwise). Genes and skin groups are ordered according shared pairing links. An overview of patterns of informative genes for prediction of each disease class can be visualised.
[Show abstract][Hide abstract] ABSTRACT: The ‘core’ set of genes defined through differential expression analysis: positively (130) and negatively (76) differentially expressed genes in psoriatic samples of the GAIN dataset.
[Show abstract][Hide abstract] ABSTRACT: Genes identified as most informative after classification of skin disease phenotypes. Gini Index (GI) was used as variable importance measure and was estimated for each gene per group from random forest classification, so as to prioritise genes in terms of their ability to discriminate distinct molecular patterns. After training of the random forest classifier, GI is derived for each gene across all trees and the ranking of genes with GI > = 0.02 is shown here for each skin group.
[Show abstract][Hide abstract] ABSTRACT: Example of a decision tree for classification of tissue samples in appropriate disease classes. Heatmap illustrates expression values for 25 genes across 108 tissue samples and represents part of the heatmap shown in figure 2. A decision tree is a tree-like structure to relate gene expression measurements to sample phenotype class, with a view to deriving a predictive model. Nodes (rectangles) in the tree represent a test on gene expressions to derive a decision on a sample’s class, edges (arrows) indicate the expression level of the variable that can best distinguish the samples and leaves (or terminal nodes - circles) represent class predictions. The path from root to each terminal node equates to a list of conditions in the form of gene expression rules that can relate tissue samples to disease phenotype class.
[Show abstract][Hide abstract] ABSTRACT: Graphical representation to illustrate the relationship between 19 highly discriminative genes and disease sub-groups according to Gini Index calculated from decision trees forest in the Gudjonnson dataset. The green band represents the first psoriatic group (PP01), light blue corresponds to the second psoriatic sub-group (PP02), yellow corresponds to healthy individuals (NN) and light green presents the non-lesional cases (PN) and are arranged clockwise followed by purple to orange rectangular bands that represent relevant genes. Genes and skin groups are ordered according to shared pairing links, as described previously.
[Show abstract][Hide abstract] ABSTRACT: A multidimensional scaling plot of psoriasis datasets from Gudjonnson et al. 2010  (A) and Yao et al. 2008  (B) to illustrate grouping of samples according to random forest clustering. Two distinct psoriatic groups are identified in involved tissue (PP01 green and PP02 purple), while NN and PN samples largely co-localise. Overall, clustering is comparable to GAIN data that is shown in figure 4.
[Show abstract][Hide abstract] ABSTRACT: A multidimensional scaling (MDS) plot showing the distinction of psoriatic cases into two groups, PP01 (red) and PP02 (black), as obtained after RF clustering and classification.
[Show abstract][Hide abstract] ABSTRACT: Markov Cluster Algorithm (MCL) applied on the psoriatic sub-group tissue sample networks to extract clusters of gene expression. Both networks consisted of 36 clusters and the largest clusters (number of nodes > 8) for both networks are shown and denoted by colour. Pathway enrichment for these clusters is shown in tables 3 and 4 for PP01 and PP02 networks respectively.
[Show abstract][Hide abstract] ABSTRACT: Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many expression quantitative trait locus (eQTL) studies, typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis effect on expression cannot be accounted for by common cis variants, a finding that reveals the contribution of low-frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene, and we identify several replicating trans variants that act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.