Page 1

RESEARCH ARTICLEOpen Access

Identifying hypermethylated CpG islands using a

quantile regression model

Shuying Sun1,2*, Zhengyi Chen3, Pearlly S Yan4, Yi-Wen Huang4, Tim HM Huang4, Shili Lin5

Abstract

Background: DNA methylation has been shown to play an important role in the silencing of tumor suppressor

genes in various tumor types. In order to have a system-wide understanding of the methylation changes that

occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can

simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large

percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable

confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first

implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in

an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from

breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying

hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify

hypermethylated CGIs and their associated genes.

Results: We introduce three statistical measurements to compare the performance of the proposed quantile regression

model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and

unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show

that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.

Conclusions: In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by

incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify

hypermethylated CGIs in both breast and ovarian cancer data.

Background

Epigenetic changes are one of the most common molecu-

lar modifications in cells [1-4]. Among different epige-

netic changes, DNA methylation, the addition of a

methyl group (CH3) to the 5’s cytosine (C) at a CpG site,

plays an important role in gene expression regulation,

transposons silencing, and transcription factor binding

inhibition [5-13]. Therefore, DNA methylation has signif-

icant implications in both normal biology and complex

diseases, such as cancer. In fact, DNA methylation pat-

terns change during tumor growth. These changes may

include regional or genome-wide gain or loss of methyla-

tion [14]. The gain of methylation in cancer is called

hypermethylation, that is, there are more methylation

signals in cancerous cells than in normal cells. On the

other hand, the loss of methylation in cancer is called

hypomethylation, that is, there are fewer methylation sig-

nals in cancerous cells than in normal cells. Numerous

studies have reported that DNA hypermethylation may

cause tumor suppressor gene silencing [15,16]. These

abnormal DNA methylations usually occur at CGIs,

genomic regions rich in CpG sites.

In order to gain an understanding of how genome-wide

(especially CGI) methylation changes affect tumor growth,

numerous microarray protocols have been developed to

simultaneously assay the methylation status of all or partial

regions in the whole genome. Most of these microarray

protocols are developed based upon one of the following

three methods of methylation-dependent treatment of

DNA, each with its advantages and disadvantages [16]:

(1) using methylation sensitive enzymes (such as HpaII

and HinpI) to digest DNA, (2) using specific antibodies or

* Correspondence: shuying.sun@case.edu

1Case Comprehensive Cancer Center, Case Western Reserve University,

Cleveland, Ohio, USA

Full list of author information is available at the end of the article

Sun et al. BMC Bioinformatics 2011, 12:54

http://www.biomedcentral.com/1471-2105/12/54

© 2011 Sun et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons

Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properly cited.

Page 2

methyl-binding proteins to obtain DNA fragments

enriched with methylation signals, and (3) using sodium

bisulfite to treat denatured DNA to convert unmethylated

cytosine (C) to thymine (T).

In our group, the DMH protocol has been developed

to simultaneously assay the methylation status of all

known CGIs [17,18] using methylation sensitive

enzymes HpaII and HinpI to digest DNA. As opposed

to the earlier DMH protocol in which interrogated sam-

ples were hybridized onto CGI clone arrays with printed

probes averaging 870 bp in length, the current DMH

method assays the sample using CGI tiling arrays with

much shorter probes (45 - 60 bp). Probe affinity, PCR

effects, and many other measurable and unmeasurable

confounding factors due to shorter probe length affect

the observed methylation signals [19].

Previous DMH methylation microarray data analysis

methods either propose an arbitrary log ratio cut off of

1.5 to detect differential methylation [20] or focus on

modeling differential methylation at the probe level [21].

Due to the large impact of probe affinity and many con-

founding factors, a single high log ratio probe may not

represent true biological signals. Furthermore, it can be

misleading to select differentially methylated promoter

regions based on independent probe signals. In addition,

we are more interested in identifying hypermethylated

regions as opposed to local changes detected by a differ-

ence in a single probe. To meet these biological interests

and needs, we propose the use of a quantile regression

model [22] in order to aggregate CGI probe signals for

the identification of hypermethylated regions. Probe

effects are directly incorporated into this proposed

model. Genes with hypermethylated promoters can be

easily selected according to their associated CGIs. The

idea of using a quantile regression model to identify

methylated CGIs was originally presented by our group

as a poster at the 4th International Symposium on

Bioinformatics Research and Applications. In that pos-

ter, we used a quantile regression model at 75% quan-

tile. Although known methylated and unmethylated

genes can be identified, we were unsure whether 75%

would turn out to be the best quantile level.

In the following sections, we first give a brief intro-

duction to our breast cancer cell line and ovarian cancer

microarray data. We then explain how to use a quantile

regression model to identify hypermethylated CGIs.

Finally, we implement quantile regression models at dif-

ferent quantile levels and compare the performance of

these models using three statistical measurements.

Methods

We use two DMH microarray datasets generated from

40 breast cancer cell lines and 26 ovarian cancer

patients. In particular, we use the 2-color 244K Agilent

arrays hybridized with the test samples (e.g. the breast

cancer cell lines) dye coupled with Cy5 (red) and a com-

mon normal reference dye coupled with Cy3 (green).

The base two log ratio of red over green intensity, log2

(Cy5/Cy3), is used as the observed methylation signal at

each probe. For each array, dye effects are corrected

using the standard within array LOESS normalization in

the Bioconductor package “limma” [23]. We have

explored several normalization methods and found that

the standard LOESS normalization produces more con-

sistent and reliable results than the others (data not

shown).

In a common DMH experiment, it is desirable to

identify CGIs that are hypermethylated in a large per-

centage of the total N samples (e.g., N cancer patients

or N cancer cell lines). Therefore, one important goal of

our DMH microarray study is to identify the CGIs that

are commonly methylated in N samples (N = 40 for

breast cancer data and N = 26 for ovarian cancer data).

In order to control the noise due to measured and

unmeasured factors such as GC content, scanner effects,

and PCR effects that may affect the signals, we apply the

following quantile regression model to each CGI:

Qsample probe

s

sampleprobe

psp

Ysp( |,)( )

( )

=+

where QYsp(τ|samples, probep) is the τ-th conditional

quantile of the observed probe log ratio of sample s at

probe p, samplesrepresents the expected signal from the

sample, and probepdenotes the probe effect. In the

above quantile regression model, error terms are

assumed to be independent and distribution-free. The

regression coefficients, especially samplesand probep, are

estimated by formulating the quantile regression pro-

blem as a linear program [22]. In fact, both parameter

estimation and inference are conducted using the R

package “quantreg” [22]. An example of using this pack-

age to fit a quantile regression model for one CpG

island has been provided (see Additional file 1).

In the above regression model, we let τ = 95%, 90%,

85%, 80%, 75%, 70%, 65% and 60%. We choose quantile

levels over 50% because we are interested in identifying

hypermethylated regions. In particular, for each sample

(or cell line) effect from the quantile regression output,

there is a p-value indicating whether a sample (or cell

line) shows significant methylation signals at one parti-

cular CGI under the null hypothesis that samples(τ) = 0.

The methylation level at each CGI is taken as the num-

ber of samples for which their associated p-values are

less than a certain cutoff value p0where we let p0=

0.05, 0.04, 0.03, 0.02, and 0.01. For example, if a CGI

has p-values less than 0.01 in 38 out of 40 breast cancer

arrays, this indicates that this CGI may have very strong

methylation signals across many samples.

Sun et al. BMC Bioinformatics 2011, 12:54

http://www.biomedcentral.com/1471-2105/12/54

Page 2 of 6

Page 3

In order to verify that our quantile regression model

can identify the real methylation signals and to compare

the results of our regression models at different quantile

levels, we use known methylated and housekeeping

genes as “positive” and “negative” controls respectively.

In fact, 30 known hypermethylated genes [24-27] have

been reported for breast cancer, and 32 known hyper-

methylated genes have been reported for ovarian cancer

[28]. For both breast and ovarian cancer, 47 housekeep-

ing genes [29] are selected as “negative” control (i.e.,

known unmethylated genes) due to their low methyla-

tion signals. Recall that the methylation score given to

each CGI is the count of samples with p-value less than

a cutoff point. At each p-value cutoff point p0, we have

a methylation score for each CGI. Then, there are Nm

and NHKmethylation scores with Nm= 30 for breast

cancer data, Nm= 32 for ovarian cancer data, and NHK

= 47 for unmethylated housekeeping genes. We choose

these Nmand NHKgenes because each of them is asso-

ciated with at least one CGI. Therefore, this paper

will refer to these genes as Nmmethylated and NHK

unmethylated CGIs.

In order to determine if known methylated and

unmethylated CGIs are identified correctly, we use three

different statistical measurements for known methylated

and unmethylated CGIs. The first measurement is the

area under a Receiver Operating Characteristic (ROC)

curve, which we call “AUC” (Area Under Curve). A ROC

is a graphical plot of the sensitivity vs. (1 - specificity) for

a binary classifier system as its discrimination threshold

varies. The ROC can also be represented equivalently by

plotting true positive rates (TPR) vs. false positive rates

(FPR). In this paper, the TPR is the fraction of known

methylated CGIs that are correctly classified as methy-

lated CGIs at a specific methylation score level C0(0 ≤

C0≤ N). The FPR is the fraction of known unmethylated

housekeeping CGIs that are incorrectly classified as

methylated CGIs at a specific methylation score level C0.

The second measurement is the mean difference of

methylation scores of two groups. We call this measure-

ment mean.diff, that is x

m

are mean methylation scores for known methylated and

unmethylated housekeeping CGIs. The third measure-

ment is the mean difference of methylation scores of two

groups of CGIs divided by their standard deviation. That

xx

sNsN

mm HKHK

+

//

sHk

are the mean and variance of methylation scores for

known methylated and housekeeping CGIs respectively,

we call this measurement “T.stat”. At each quantile level

τ, the larger a statistical measurement is, the more evi-

dent that this quantile level is better at identifying methy-

lated and unmethylated CGIs.

x

HK

−

, where xmand xHK

is,

mHK

2

−

2

, where xm, xHK, sm

2

and

2

Results

Using both breast and ovarian cancer data sets, we com-

pare the performance of the proposed quantile regres-

sions using three different measurements: AUC, mean.

diff and T.stat. All comparison results are listed in

Tables 1, 2, 3, 4, 5 and 6 with Tables 1, 2 and 3 for

breast cancer data and Tables 4, 5 and 6 for ovarian

cancer data. To have a clear view of these tables, we

have plotted the summary result for each table in Figure

1, where the top three plots are for the three measure-

ment results based on breast cancer data, and the bot-

tom three plots are for the three measurements based

on ovarian cancer data. For all three measurements, the

larger a statistical measurement is, the better that a

quantile regression model is at identifying the two dif-

ferent groups of CGIs (methylated and unmethylated).

In Figure 1, we can see consistent patterns in all three

measurements for both breast and ovarian cancer data.

That is, 90% (cyan), the 85% (dark green), and 80% (red)

are the top 3 lines and these three lines have relatively

small variations across different p-values. Therefore, we

can conclude that any τ between 80% and 90% could

serve well to identify two different groups of CGIs

(methylated and unmethylated). We recommend 85%

for convenience.

In order to determine if our quantile regression model

is better than other available methods, we compare our

method with the previous one that uses a 1.5 cutoff

value at a probe level [20] using our breast cancer data.

A single probe with a large log ratio is not reliable, so

we consider the following cases in each CGI: (1) at least

30% of probes with log ratios greater than 1.5, (2) at

least 50% of probes with log ratios greater than 1.5, and

(3) 100% probes (that is, all probes) with log ratios

greater than 1.5. For the above three cases, the AUC is

0.51, 0.52, and 0.50. These small AUCs are mainly due

to the fact that some methylated CGIs or genes do not

necessarily have many probes with log ratios greater

Table 1 Breast cancer AUC measurement table

τ P < 0.01p < 0.02 p < 0.03 p < 0.04p < 0.05

95%

90%

85%

80%

75%

70%

65%

60%

0.865

0.846

0.849

0.868

0.837

0.797

0.753

0.645

0.843

0.843

0.847

0.851

0.825

0.779

0.747

0.669

0.839

0.840

0.846

0.849

0.832

0.788

0.746

0.670

0.839

0.849

0.848

0.849

0.833

0.821

0.754

0.676

0.845

0.850

0.855

0.849

0.843

0.835

0.762

0.681

The first column contains the quantile levels. The second column contains a

sub-table with each sub-column corresponding to the AUC measurement

based on a specific p value and each sub-row corresponding to one quantile

level. Bold numbers are the five largest AUC values in this table.

Sun et al. BMC Bioinformatics 2011, 12:54

http://www.biomedcentral.com/1471-2105/12/54

Page 3 of 6

Page 4

than 1.5. In fact, they are more likely to have several

probes with relatively large but less than 1.5 log ratios.

We see this pattern very often in our data. As for the

first case, only 3 out of 30 known methylated genes and

4 out of 47 HK genes have at least one cell line with

more than 30% probes that have log ratios greater than

1.5. As for the second case, only 2 out of 30 known

methylated genes and 1 out of 47 HK genes have at

least one cell line with more than 50% probes that have

log ratios greater than 1.5. As for the third case, 0 out

of 30 known methylated genes and 0 out of 47 HK

genes have at least one cell line with 100% (i.e., all)

probes that have log ratios larger than 1.5. Therefore,

our quantile regression method is certainly much better

than the one that uses 1.5 as a cutoff. In addition, the

1.5 cutoff method may work well for our previous ver-

sion DMH protocol that has longer printed probes

(about 870 bp). However, this arbitrary cutoff method

does not work for the current protocol that uses much

shorter probes (45 ~60 bp).

Discussion

The three measurement plots of breast and ovarian can-

cer data have slightly different patterns. This may be

due to the sample differences between the two datasets.

Breast cancer data are generated from cell lines while

ovarian cancer data are generated from patients. The

breast cancer cell line samples are more homogeneous

than ovarian patient samples. In addition, cancer cell

lines appear to have more methylation than cancer

patients. Furthermore, breast cancer data have 40 arrays

and ovarian cancer data have 26 arrays. This sample size

difference may also explain some inconsistencies

between breast and ovarian cancer data at different

quantile levels due to random variability.

We also observe that the results of the three proposed

measurements show slightly inconsistent patterns. This

may be due to the definition of the three measurements.

AUC and T.stat both consider the variations of methyla-

tion scores. However, mean.diff only considers the dif-

ference of mean methylation scores between methylated

CGIs and unmethylated housekeeping CGIs. Therefore,

the result of AUC and T.stat may be more reliable.

Conclusions

In this paper, we have proposed to use a quantile

regression model to identify hypermethylated CGIs. In

particular, we have incorporated probe effects to take

Table 2 Breast cancer mean.diff measurement table

τ P < 0.01 p < 0.02 p < 0.03p < 0.04p < 0.05

95%

90%

85%

80%

75%

70%

65%

60%

13.550

17.497

16.030

13.114

9.386

5.732

3.548

1.682

11.197

16.955

16.936

14.165

10.770

6.981

4.589

2.156

10.187

15.899

17.191

14.837

11.798

7.860

5.187

2.618

9.277

15.312

17.357

15.312

12.317

9.204

5.528

2.972

8.788

14.515

17.309

15.694

13.143

9.818

5.765

3.395

The first column contains the quantile levels. The second column contains a

sub-table with each sub-column corresponding to the mean.diff measurement

based on a specific p value and each sub-row corresponding to one quantile

level. Bold numbers are the five largest mean.diff values in this table.

Table 3 Breast cancer T.stat measurement table

τ P < 0.01 p < 0.02p < 0.03p < 0.04 p < 0.05

95%

90%

85%

80%

75%

70%

65%

60%

6.939

6.680

6.110

5.463

4.810

4.254

3.780

2.686

5.946

6.816

6.443

5.749

5.037

4.313

3.637

2.774

5.549

6.786

6.586

5.982

5.259

4.472

3.796

2.787

5.311

6.869

6.826

6.172

5.482

4.938

3.938

2.872

5.213

6.790

7.097

6.241

5.740

5.147

4.005

2.941

The first column contains the quantile levels. The second column contains a

sub-table with each sub-column corresponding to the T.stat measurement

based on a specific p value and each sub-row corresponding to one quantile

level. Bold numbers are the five largest T.stat values in this table.

Table 4 Ovarian cancer AUC measurement table

τ P < 0.01 p < 0.02p < 0.03 p < 0.04 p < 0.05

95%

90%

85%

80%

75%

70%

65%

60%

0.808

0.822

0.823

0.826

0.815

0.774

0.754

0.712

0.800

0.821

0.821

0.846

0.818

0.769

0.759

0.714

0.807

0.815

0.824

0.836

0.802

0.784

0.749

0.672

0.808

0.815

0.823

0.819

0.791

0.780

0.754

0.687

0.796

0.809

0.820

0.812

0.796

0.774

0.766

0.671

The first column contains the quantile levels. The second column contains a

sub-table with each sub-column corresponding to the AUC measurement

based on a specific p value and each sub-row corresponding to one quantile

level. Bold numbers are the five largest AUC values in this table.

Table 5 Ovarian cancer mean.diff measurement table

τP < 0.01p < 0.02p < 0.03p < 0.04p < 0.05

95%

90%

85%

80%

75%

70%

65%

60%

8.122

11.339

11.022

9.226

6.922

5.320

2.991

1.745

7.071

10.331

11.119

9.966

8.138

6.301

3.579

2.407

6.564

9.459

11.099

10.166

8.270

6.908

4.092

2.369

5.870

8.942

10.734

9.909

8.589

7.119

4.326

2.521

5.414

8.229

10.453

9.969

8.878

7.310

4.745

2.682

The first column contains the quantile levels. The second column contains a

sub-table with each sub-column corresponding to the mean.diff measurement

based on a specific p value and each sub-row corresponding to one quantile

level. Bold numbers are the five largest mean.diff values in this table.

Sun et al. BMC Bioinformatics 2011, 12:54

http://www.biomedcentral.com/1471-2105/12/54

Page 4 of 6

Page 5

into consideration the noises from unmeasurable factors.

In order to find out at which quantile levels (95%, 90%,

85%, 80%, 75%, 70%, 65%, and 60%) the proposed quan-

tile regression model is better at identifying known

methylated and unmethylated CGIs, we have introduced

three statistical measurements: AUC, mean.diff, and T.

stat. These measurements show that the quantile level

between 80% and 90% might serve well for identifying

methylated and unmethylated CGIs. Although this paper

has only demonstrated how to identify hypermethylated

CGIs by setting quantiles larger than 50%, our quantile

regression model can also be used to identify hypo-

methylated CGIs with quantiles smaller than 50%, if

desired.

Additional material

Additional file 1: R code for fitting a quantile regression model. This

file gives an example of using the R package “quantreg” to fit a quantile

regression model to identify methylation signals in one CpG island.

Acknowledgements and Funding

This work was supported by the National Science Foundation 0112050 while

SS was a postdoctoral researcher at the Mathematical Biosciences Institute,

The Ohio State University. The authors thank Drs. Terry Speed and Dustin

Potter for valuable suggestions and discussions.

Table 6 Ovarian cancer T.stat measurement table

τ P < 0.01p < 0.02 p < 0.03p < 0.04 p < 0.05

95%

90%

85%

80%

75%

70%

65%

60%

5.673

6.378

6.253

5.772

4.906

4.400

3.712

3.302

5.295

6.134

6.258

6.194

5.451

4.671

3.949

3.395

5.177

5.953

6.290

6.272

5.523

4.962

4.134

2.967

4.889

5.958

6.293

6.044

5.616

5.002

4.075

3.065

4.758

5.656

6.308

6.010

5.699

5.132

4.307

3.024

The first column contains the quantile levels. The second column contains a

sub-table with each sub-column corresponding to the T.stat measurement

based on a specific p value and each sub-row corresponding to one quantile

level. Bold numbers are the five largest T.stat values in this table.

0.01 0.02 0.030.04 0.05

0.65

0.70

0.75

0.80

0.85

AUC of Breast

p value

Area under Curve

0.010.020.03 0.04 0.05

5

10

15

mean.diff of Breast

p value

mean methyl difference

0.01 0.020.030.04 0.05

3

4

5

6

7

T.stat of Breast

p value

T−like statistics

0.010.02 0.03 0.040.05

0.65

0.70

0.75

0.80

0.85

AUC of Ovarian

p value

Area under Curve

0.01 0.020.030.040.05

5

10

15

mean.diff of Ovarian

p value

mean methyl difference

0.01 0.02 0.030.04 0.05

3

4

5

6

7

T.stat of Ovarian

p value

T−like statistics

Figure 1 Comparisons of quantile regression models at different quantile levels. We compare results of different quantiles by studying their

performances on identifying two different groups of CGIs (methylated and unmethylated). The legend is: “brown” for τ = 95%, “cyan” for τ = 90%,

“dark green” for τ = 85%, “red” for τ = 80%, “green” for τ = 75%, “blue” for τ = 70%, “black” for τ = 65%, and “purple” for τ = 60%. The top panel

contains three plots for breast cancer data while the bottom panel contains three plots for ovarian cancer data.

Sun et al. BMC Bioinformatics 2011, 12:54

http://www.biomedcentral.com/1471-2105/12/54

Page 5 of 6

Page 6

Author details

1Case Comprehensive Cancer Center, Case Western Reserve University,

Cleveland, Ohio, USA.2Department of Epidemiology and Biostatistics, Case

Western Reserve University, Cleveland, Ohio, USA.3Department of Statistics,

Case Western Reserve University, Cleveland, Ohio, USA.4Human Cancer

Genetics Program, The Ohio State University, Columbus, Ohio, USA.

5Department of Statistics, The Ohio State University, Columbus, Ohio, USA.

Authors’ contributions

SS and ZC developed and implemented the models, performed all statistical

analyses, and drafted and revised the manuscript. PSY and YWH were

involved in the data collection and helped in preparation of the manuscript.

THMH oversaw the project and revised the manuscript. SL provided

suggestions on the project and revised the manuscript. All authors have

read and approved the final document.

Received: 15 October 2010 Accepted: 15 February 2011

Published: 15 February 2011

References

1. Kim MS, Lee J, Sidransky D: DNA methylation markers in colorectal

cancer. Cancer Metastasis Rev 2010, 29(1):181-206.

2. Lin J, Lai M, Huang Q, Ma Y, Cui J, Ruan W: Methylation patterns of

IGFBP7 in colon cancer cell lines are associated with levels of gene

expression. J Pathol 2007, 212(1):83-90.

3.Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JP: CpG

island methylator phenotype in colorectal cancer. Proc Natl Acad Sci USA

1999, 96(15):8681-8686.

4. Zitt M, Muller HM: DNA methylation in colorectal cancer–impact on

screening and therapy monitoring modalities? Dis Markers 2007, 23(1-

2):51-71.

5.Bell AC, Felsenfeld G: Methylation of a CTCF-dependent boundary

controls imprinted expression of the Igf2 gene. Nature 2000,

405(6785):482-485.

6. Clark SJ, Harrison J, Molloy PL: Sp1 binding is inhibited by (m)Cp(m)CpG

methylation. Gene 1997, 195(1):67-71.

7. Douet V, Heller MB, Le Saux O: DNA methylation and Sp1 binding

determine the tissue-specific transcriptional activity of the mouse Abcc6

promoter. Biochem Biophys Res Commun 2007, 354(1):66-71.

8.Goll MG, Bestor TH: Eukaryotic cytosine methyltransferases. Annu Rev

Biochem 2005, 74:481-514.

9.Hark AT, Schoenherr CJ, Katz DJ, Ingram RS, Levorse JM, Tilghman SM: CTCF

mediates methylation-sensitive enhancer-blocking activity at the H19/

Igf2 locus. Nature 2000, 405(6785):486-489.

10. Henderson IR, Jacobsen SE: Epigenetic inheritance in plants. Nature 2007,

447(7143):418-424.

11.Inoue S, Oishi M: Effects of methylation of non-CpG sequence in the

promoter region on the expression of human synaptotagmin XI (syt11).

Gene 2005, 348:123-134.

12.Kitazawa S, Kitazawa R, Maeda S: Transcriptional regulation of rat cyclin

D1 gene by CpG methylation status in promoter region. J Biol Chem

1999, 274(40):28787-28793.

13. Mancini DN, Singh SM, Archer TK, Rodenhiser DI: Site-specific DNA

methylation in the neurofibromatosis (NF1) promoter interferes with

binding of CREB and SP1 transcription factors. Oncogene 1999,

18(28):4108-4119.

14. Jones PA, Baylin SB: The epigenomics of cancer. Cell 2007, 128(4):683-692.

15.Cortez CC, Jones PA: Chromatin, cancer and drug therapies. Mutat Res

2008, 647(1-2):44-51.

16. Laird PW: Principles and challenges of genome-wide DNA methylation

analysis. Nat Rev Genet 2010, 11(3):191-203.

17. Huang TH, Perry MR, Laux DE: Methylation profiling of CpG islands in

human breast cancer cells. Hum Mol Genet 1999, 8(3):459-470.

18. Yan PS, Potter D, Deatherage DE, Huang TH, Lin S: Differential methylation

hybridization: profiling DNA methylation with a high-density CpG island

microarray. Methods Mol Biol 2009, 507:89-106.

19.Sun S, Yan PS, Huang TH, Lin S: Identifying differentially methylated

genes using mixed effect and generalized least square models. BMC

Bioinformatics 2009, 10:404.

20.Zighelboim I, Goodfellow PJ, Schmidt AP, Walls KC, Mallon MA, Mutch DG,

Yan PS, Huang TH, Powell MA: Differential methylation hybridization array

of endometrial cancers reveals two novel cancer-specific methylation

markers. Clin Cancer Res 2007, 13(10):2882-2889.

Khalili A, Potter D, Yan P, Li L, Gray J, Huang T, Lin S: Gamma-normal-

gamma mixture model for detecting differentially methylated Loci in

three breast cancer cell lines. Cancer Inform 2007, 3:43-54.

Koenker R: Quantile Rregression. New York: Cambridge University Press;

2005.

Smyth GK, Speed T: Normalization of cDNA microarray data. Methods

2003, 31(4):265-273.

Fiegl H, Millinger S, Goebel G, Muller-Holzner E, Marth C, Laird PW,

Widschwendter M: Breast cancer DNA methylation profiles in cancer cells

and tumor stroma: association with HER-2/neu status in primary breast

cancer. Cancer Res 2006, 66(1):29-33.

Tan LW, Bianco T, Dobrovic A: Variable promoter region CpG island

methylation of the putative tumor suppressor gene Connexin 26 in

breast cancer. Carcinogenesis 2002, 23(2):231-236.

Widschwendter M, Jones PA: DNA methylation and breast carcinogenesis.

Oncogene 2002, 21(35):5462-5482.

Yang X, Yan L, Davidson NE: DNA methylation in breast cancer. Endocr

Relat Cancer 2001, 8(2):115-127.

Barton CA, Hacker NF, Clark SJ, O’Brien PM: DNA methylation changes in

ovarian cancer: implications for early diagnosis, prognosis and

treatment. Gynecol Oncol 2008, 109(1):129-139.

Eisenberg E, Levanon EY: Human housekeeping genes are compact.

Trends Genet 2003, 19(7):362-365.

21.

22.

23.

24.

25.

26.

27.

28.

29.

doi:10.1186/1471-2105-12-54

Cite this article as: Sun et al.: Identifying hypermethylated CpG islands

using a quantile regression model. BMC Bioinformatics 2011 12:54.

Submit your next manuscript to BioMed Central

and take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at

www.biomedcentral.com/submit

Sun et al. BMC Bioinformatics 2011, 12:54

http://www.biomedcentral.com/1471-2105/12/54

Page 6 of 6