Identifying hypermethylated CpG islands using a quantile regression model

Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio, USA.
BMC Bioinformatics (Impact Factor: 2.58). 02/2011; 12(1):54. DOI: 10.1186/1471-2105-12-54
Source: PubMed


DNA methylation has been shown to play an important role in the silencing of tumor suppressor genes in various tumor types. In order to have a system-wide understanding of the methylation changes that occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify hypermethylated CGIs and their associated genes.
We introduce three statistical measurements to compare the performance of the proposed quantile regression model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.
In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify hypermethylated CGIs in both breast and ovarian cancer data.

Download full-text


Available from: Zhengyi Chen
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Methylation change plays an important role in many cellular systems, including cancer development. During recent years, genome-wide or large-scale methylation data has become available thanks to rapid advances in high-throughput biotechnologies. So far, researchers have always used gene expression profiling to study disease subtypes and related therapies. In this study, we investigated methylation profiles in 30 breast cancer cell lines using methylation data generated by microarray technologies. Strong variation of the number of methylation peaks was found among these 30 cell lines; however, more peaks were found in the upstream regions than in downstream regions of genes. We further grouped the methylation profiles of these cell lines into three consensus clusters. Finally, we performed an integrative analysis of breast cancer cell lines using both methylation and gene-expression profiling data. There was no significant correlation between methylation-profiling subtypes and gene-expression profiling subtypes, suggesting the complex nature of methylation in the regulation of gene expression. However, we found basal B cell lines appeared exclusively in two methylation clusters. Although these results are preliminary, this study suggests that methylation profiling might be promising in disease subtype classification and the development of therapeutic strategies.
    Preview · Article · May 2012 · Chemistry & Biodiversity
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA methylation is an epigenetic event that adds a methyl-group to the 5' cytosine. This epigenetic modification can significantly affect gene expression in both normal and diseased cells. Hence, it is important to study methylation signals at the single cytosine site level, which is now possible utilizing bisulfite conversion technique (i.e., converting unmethylated Cs to Us and then to Ts after PCR amplification) and next generation sequencing (NGS) technologies. Despite the advances of NGS technologies, certain quality issues remain. Some of the more prevalent quality issues involve low per-base sequencing quality at the 3' end, PCR amplification bias, and bisulfite conversion rates. Therefore, it is important to conduct quality assessment before downstream analysis. To the best of our knowledge, no existing software packages can generally assess the quality of methylation sequencing data generated based on different bisulfite-treated protocols. To conduct the quality assessment of bisulfite methylation sequencing data, we have developed a pipeline named MethyQA. MethyQA combines currently available open-source software packages with our own custom programs written in Perl and R. The pipeline can provide quality assessment results for tens of millions of reads in under an hour. The novelty of our pipeline lies in its examination of bisulfite conversion rates and of the DNA sequence structure of regions that have different conversion rates or coverage. MethyQA is a new software package that provides users with a unique insight into the methylation sequencing data they are researching. It allows the users to determine the quality of their data and better prepares them to address the research questions that lie ahead. Due to the speed and efficiency at which MethyQA operates, it will become an important tool for studies dealing with bisulfite methylation sequencing data.
    Full-text · Article · Aug 2013 · BMC Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper provides a review of recent applications of quantile regression to the fields of genetic and the emerging -omic studies. It begins with a general background about this statistical approach following the seminal paper of Koenker and Bassett (Econometrica 46:33-50, 1978). Applications are described, as diverse as genetic association studies, penetrance estimation, gene expression, CGH array experiments, RNAseq experiments, methylation data and proteomics. This paper also introduces recent extensions of quantile regression with a particular focus on the Copula-quantile regression, an approach we recently proposed for sib-pair analysis. A real data example from eQTL analysis is then presented and the [Formula: see text] codes, which run the analyses are provided. Finally, we conclude with some statistical software presentation and some general statements about the potential and interests of quantile regression in modern biological experiments.
    Full-text · Article · Apr 2014 · Human Genetics
Show more