ChapterPDF Available

GenExSt: A Tool to Identify Correlation of Gene Expression After Normalization with Housekeeping Genes

Authors:

Abstract and Figures

Interaction between genes is one driving force that can influence a biological outcome. In a genetic disease such as cancer, understanding genetic interactions may help us elucidate mechanisms sustaining cancer growth. A computational approach is one way to detect genetic interactions in the context of cancer. In this article, we introduce a tool, GenExSt, and its underlying method to study gene interactions. We applied our method to discover gene-pairs whose expressions demonstrate patterns of correlation. For this demonstration, we selected ten breast cancer gene expression data sets from the Genomic Data Commons Data Portal through National Cancer Institute. We focused on genes that suppress genome instability, or instability suppressing genes (GIS), many of which play an important role in cancer. We applied our method to an inter-comparison across data sets. Here we tested statistical normalization approaches derived from the combined expressions of randomly selected, single, housekeeping (HK) genes, and from the calculated mean of three expressions. In addition, our method derives R2 values from linear models in which the expressions of all possible pairs of GIS genes are placed in a linear model to produce heatmaps to indicate probable correlations. We show that results from our method are suited to normalized data, extracted from multiple genes simultaneously, rather than using single gene expression values. GenExSt may be used to study gene expression data in other settings provided that the concept of gene interactions is appropriate in the context.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
Yeasts can be engineered into “living foundries” for non-natural chemical production by reprogramming their genome using a synthetic biology “design-build-test” cycle. While methods for “design” and “build” are scalable and efficient, “test” remains a labor-intensive bottleneck, limiting the effectiveness of the genetic reprogramming results. Here we describe Isogenic Colony Sequencing (ICO-seq), a massively-parallel strategy to assess the gene expression, and thus engineered pathway efficacy, of large numbers of genetically distinct yeast colonies. We use the approach to characterize opaque-white switching in 658 C. albicans colonies. By profiling transcriptomes of 1642 engineered S. cerevisiae strains, we use it to assess gene expression heterogeneity in a protein mutagenesis library. Our approach will accelerate synthetic biology by allowing facile and cost-effective transcriptional profiling of large numbers of genetically distinct yeast strains.
Article
Full-text available
Endosome biogenesis in eukaryotic cells is critical for nutrient uptake and plasma membrane integrity. Early endosomes initially contain Rab5, which is replaced by Rab7 on late endosomes prior to their fusion with lysosomes. Recruitment of Rab7 to endosomes requires the Mon1-Ccz1 guanosine exchange factor (GEF). Here, we show that full function of the Drosophila Mon1-Ccz1 complex requires a third stoichiometric subunit, termed Bulli. Bulli localises to Rab7 positive endosomes, in agreement with its function in the GEF complex. Using Drosophila nephrocytes as a model system, we observe that absence of Bulli results in (i) reduced endocytosis, (ii) Rab5 accumulation within non-acidified enlarged endosomes, and (iii) defective Rab7 localisation and (iv) impaired endosomal maturation. Moreover, longevity of animals lacking bulli is affected. Both Mon1-Ccz1 dimer and a Bulli-containing trimer display Rab7 GEF activity. In summary, this suggests a key role of Bulli in Rab5 to Rab7 transition during endosomal maturation rather than a direct influence on the GEF activity of Mon1-Ccz1.
Article
Full-text available
Diseases involve complex modifications to the cellular machinery. The gene expression profile of the affected cells contains characteristic patterns linked to a disease. Hence, new biological knowledge about a disease can be extracted from these profiles, improving our ability to diagnose and assess disease risks. This knowledge can be used for drug re-purposing, or by physicians to evaluate a patient’s condition and co-morbidity risk. Here, we consider differential gene expressions obtained by microarray technology for patients diagnosed with various diseases. Based on these data and cellular multi-scale organization, we aim at uncovering disease–disease, disease–gene and disease–pathway associations. We propose a neural network with structure based on the multi-scale organization of proteins in a cell into biological pathways. We show that this model is able to correctly predict the diagnosis for the majority of patients. Through the analysis of the trained model, we predict disease–disease, disease–pathway, and disease–gene associations and validate the predictions by comparisons to known interactions and literature search, proposing putative explanations for the predictions.
Article
Full-text available
Background and aim of study qPCR is a robust technique which quantifies the expressions of target genes in relation to reference genes. Stresses such as virus infection or heat shock change expressions of many cellular genes including the reference genes, so the aim was to introduce a constant calibrator to normalize the data to. Methodology Constructed glyceraldehyde 3-phosphate dehydrogenase (GAPDH) plasmid was transcribed to GAPDH RNA and used as spike RNA. Spiked RNA samples were subjected to qPCR at different conditions such as virus infection, IFN treatment, or mild heat shock. The results Adenovirus hexon in interferon-deficient cells showed different expression levels when data were normalized to GAPDH or 18S. Consistently, hexon expression levels were different in untreated cells under the control or heat-shocked conditions when data were normalized to GAPDH or 18S. Promyelocytic leukemia protein II (PML-II) expression level was lower in HeLa-PML-II-deficient cells (PML-II-Kd) compared to the control when the data were normalized to GAPDH as a reference gene and also in GAPDH RNA spiked, which showed reasonable consistency. More consistent data were obtained when the GAPDH normalizer was added before the step of treating the extracted RNA with DNase compared to add it after the treatment or directly to the qPCR reaction. Conclusion The internal controls that were chosen for this study completely changed the experimental results since they were affected with the experimental conditions. However, GAPDH spike RNA level was stable in its amplification at different kinds of stresses. So it can be an alternative for housekeeping gene due to its stability at these different conditions.
Article
Full-text available
Despite advances in high-throughput sequencing that have revolutionized the discovery of gene defects in rare Mendelian diseases, there are still gaps in translating individual genome variation to observed phenotypic outcomes. While we continue to improve genomics approaches to identify primary disease-causing variants, it is evident that no genetic variant acts alone. In other words, some other variants in the genome (genetic modifiers) may alleviate (suppress) or exacerbate (enhance) the severity of the disease, resulting in the variability of phenotypic outcomes. Thus, to truly understand the disease, we need to consider how the disease-causing variants interact with the rest of the genome in an individual. Here, we review the current state-of-the-field in the identification of genetic modifiers in rare Mendelian diseases and discuss the potential for future approaches that could bridge the existing gap.
Article
Full-text available
Physcomitrella patens is a bryophyte model plant that is often used to study plant evolution and development. Its resources are of great importance for comparative genomics and evo‐devo approaches. However, expression data from Physcomitrella patens were so far generated using different gene annotation versions and three different platforms: CombiMatrix and NimbleGen expression microarrays and RNA sequencing. The currently available P. patens expression data are distributed across three tools with different visualization methods to access the data. Here, we introduce an interactive expression atlas, Physcomitrella Expression Atlas Tool (PEATmoss), that unifies publicly available expression data for P. patens and provides multiple visualization methods to query the data in a single web‐based tool. Moreover, PEATmoss includes 35 expression experiments not previously available in any other expression atlas. To facilitate gene expression queries across different gene annotation versions, and to access P. patens annotations and related resources, a lookup database and web tool linked to PEATmoss was implemented. PEATmoss can be accessed at https://peatmoss.online.uni-marburg.de
Article
Full-text available
Motivation: Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalisation, imputation and batch effect correction. Results: Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method's likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data. Availability: The R package "bayNorm" is available at https://github.com/WT215/bayNorm. The code for analysing data in this paper is available at https://github.com/WT215/bayNorm_papercode. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Centrosome amplification (CA) is a common feature of human tumours and a promising target for cancer therapy. However, CA’s pan-cancer prevalence, molecular role in tumourigenesis and therapeutic value in the clinical setting are still largely unexplored. Here, we used a transcriptomic signature (CA20) to characterise the landscape of CA-associated gene expression in 9,721 tumours from The Cancer Genome Atlas (TCGA). CA20 is upregulated in cancer and associated with distinct clinical and molecular features of breast cancer, consistently with our experimental CA quantification in patient samples. Moreover, we show that CA20 upregulation is positively associated with genomic instability, alteration of specific chromosomal arms and C>T mutations, and we propose novel molecular players associated with CA in cancer. Finally, high CA20 is associated with poor prognosis and, by integrating drug sensitivity with drug perturbation profiles in cell lines, we identify candidate compounds for selectively targeting cancer cells exhibiting transcriptomic evidence for CA.
Article
Inhibitors of nuclear poly(ADP-ribose) polymerase (PARP) enzymes (e.g., PARP-1) have improved clinical outcomes in ovarian cancer, especially in patients with BRCA1/2 gene mutations or additional homologous recombination (HR) DNA repair pathway deficiencies. These defects serve as biomarkers for response to PARP inhibitors (PARPi). We sought to identify an additional biomarker that could predict responses to both conventional chemotherapy and PARPi in ovarian cancers. We focused on cellular ADP-ribosylation (ADPRylation), which is catalyzed by PARP enzymes and detected by detection reagents we developed previously. We determined molecular phenotypes of 34 high-grade serous ovarian cancers and associated them with clinical outcomes. We used the levels and patterns of ADPRylation and PARP-1 to distribute ovarian cancers into distinct molecular phenotypes, which exhibit dramatically different gene expression profiles. In addition, the levels and patterns of ADPRylation, PARP-1 protein, and gene expression correlated with clinical outcomes in response to platinum-based chemotherapy, with cancers exhibiting the highest levels of ADPRylation having the best outcomes independent of BRCA1/2 status. Finally, in cell culture-based assays using patient-derived ovarian cancer cell lines, ADPRylation levels correlated with sensitivity to the PARPi, Olaparib, with cell lines exhibiting high levels of ADPRylation having greater sensitivity to Olaparib. Collectively, our study demonstrates that ovarian cancers exhibit a wide range of ADP-ribosylation levels, which correlate with therapeutic responses and clinical outcomes. These results suggest ADP-ribosylation may be a useful biomarker for PARPi sensitivity in ovarian cancers, independent of BRCA1/2 or homologous recombination deficiency (HRD) status.
Article
Yeasts can be engineered into “living foundries” for non-natural chemical production by reprogramming their genome using a synthetic biology “design-build-test” cycle. While methods for “design” and “build” are scalable and efficient, “test” remains a labor-intensive bottleneck, limiting the effectiveness of the genetic reprogramming results. Here we describe Isogenic Colony Sequencing (ICO-seq), a massively-parallel strategy to assess the gene expression, and thus engineered pathway efficacy, of large numbers of genetically distinct yeast colonies. We use the approach to characterize opaque-white switching in 658 C. albicans colonies. By profiling transcriptomes of 1642 engineered S. cerevisiae strains, we use it to assess gene expression heterogeneity in a protein mutagenesis library. Our approach will accelerate synthetic biology by allowing facile and cost-effective transcriptional profiling of large numbers of genetically distinct yeast strains.