Page 1

PROCEEDINGSOpen Access

Enhancing the discovery of rare disease variants

through hierarchical modeling

Gary K Chen

From Genetic Analysis Workshop 17

Boston, MA, USA. 13-16 October 2010

Abstract

Advances in next-generation sequencing technology are enabling researchers to capture a comprehensive picture

of genomic variation across large numbers of individuals with unprecedented levels of efficiency. The main analytic

challenge in disease mapping is how to mine the data for rare causal variants among a sea of neutral variation. To

achieve this goal, investigators have proposed a number of methods that exploit biological knowledge. In this

paper, I propose applying a Bayesian stochastic search variable selection algorithm in this context. My multivariate

method is inspired by the combined multivariate and collapsing method. In this proposed method, however, I

allow an arbitrary number of different sources of biological knowledge to inform the model as prior distributions in

a two-level hierarchical model. This allows rare variants with similar prior distributions to share evidence of

association. Using the 1000 Genomes Project single-nucleotide polymorphism data provided by Genetic Analysis

Workshop 17, I show that through biologically informative prior distributions, some power can be gained over

noninformative prior distributions.

Background

Genome-wide association studies (GWAS) have been a

powerful method for revealing common variants that

confer a modest increase in disease risk in carriers. In

general, the single-nucleotide polymorphisms (SNPs)

that show the strongest evidence for association in

GWAS do not perfectly tag the putative causal variant

(s) nearby because of ancestral recombination events;

therefore resequencing in these regions is necessary to

resolve the precise location of the causal variant(s).

Dickson et al. [1] postulated one possible explanation

for why many fine-mapping efforts have failed to map a

single causal SNP in the region tagged by the original

genome-wide association signal: multiple rare variants

(MRVs) residing on multiple haplotypes at the region of

the genome-wide association signal are generating a

“synthetic” association when these haplotypes share a

common allele that is observed more in case subjects

than in control subjects. In support of the MRV

hypothesis, several investigators have recently developed

a number of popular burden-type methods [2-4]. These

methods are predicated on the notion that presence of

or an increase in the number of mutations for a person

at a particular pathway, region, gene, or any other biolo-

gical unit can serve as a reasonable proxy for his/her

risk of developing disease. The common theme among

these methods is that the genotypes for MRVs that map

to these biological units, called bins, are collapsed into a

single vector of scores, a technique that can potentially

improve statistical power to detect disease association.

For example, in the combined multivariate and collap-

sing (CMC) method of Li and Leal [2], a score for an

individual is assigned 1 if at least one mutation is

observed across all SNPs within a bin, or 0 otherwise.

The significance of a gene, for example, can then be

tested by jointly modeling all bins that map within the

gene using a multivariate method such as Hotelling’s

multivariate T-test, logistic regression, or linear

regression.

In this paper, I describe how I adapted the concept of

the CMC method into a Bayesian variable selection

algorithm with the notion that common SNPs may also

Correspondence: gary.k.chen@usc.edu

Division of Biostatistics, Department of Preventive Medicine, University of

Southern California, 2001 North Soto Street, SSB 202Q, MC 9234, Los

Angeles, CA 90089-9234, USA

Chen BMC Proceedings 2011, 5(Suppl 9):S16

http://www.biomedcentral.com/1753-6561/5/S9/S16

© 2011 Chen; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons

Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properly cited.

Page 2

contribute valuable information to nearby causal rare

variants, assuming that the shared haplotype model [1]

is true. The exon resequencing data set provided by the

organizers of Genetic Analysis Workshop 17 (GAW17)

provides an ideal opportunity for evaluating the perfor-

mance of this new approach.

Methods

Details of the simulated GAW17 data set can be found in

this same issue [5]. I defined variants that had a minor

allele frequency (MAF) less than 0.01 to be rare but

potentially the most biologically interesting, because

extremely rare mutations are expected to have the great-

est deleterious effects on phenotype. Of all the SNPs pro-

vided in the data set, 73% (18,131) fall within this MAF

range. For each gene, I applied the collapsing procedure,

as described in the CMC method [2], by grouping rare

SNPs into one of two bins defined by their predicted

impact on protein (i.e., synonymous or nonsynonymous

variant). Any bin with a MAF less than 0.01 after the col-

lapsing procedure was not included for further analysis.

Common SNPs, defined as those having a MAF ≥ 0.01,

were not collapsed with any other SNPs in the gene. For

conciseness, I use the term variable to define either a sin-

gle common SNP or a SNP bin. The final marker panel

included 7,385 variables: 1,029 bins containing collapsed

rare variants and 6,356 bins containing common SNPs. I

experimented with higher threshold values for bin defini-

tion (e.g., MAF = 0.05), but this strategy did not recover

an appreciable number of bins from the filtering step

because most genes in the data set were small and har-

bored private mutations. True log relative risks (denoted

b) for each SNP are provided in the simulation answer

key, which quantifies each SNP’s effect on the quantita-

tive traits Q1 and Q2. Thus, to assess how accurately my

method can recover the true values of b at each SNP, I

constrained the analyses only to models where the out-

come phenotype was either trait Q1 or Q2.

The statistical model I used was a two-level hierarchi-

cal model, described in detail by Chen and Thomas [6].

One property of a hierarchical model that is appealing

when analyzing variants of low frequency, where maxi-

mum-likelihood estimates (MLEs) of association ˆb b can

be highly unstable, is the ability to smooth these point

estimates (and their variances) toward prior distributions

defined in a second level. At the first level, I apply

ordinary least-squares regression, which produces MLEs

of association between a continuous trait (i.e., either Q1

or Q2) and a random set of m model variables. A design

matrix X stores the variable values, and the vector Y

stores values of Q1 or Q1 across all individuals:

YX

=+

b bb b

01, ,.

m

(1)

I define a prior distribution on b in Eq. (1) using the

annotation information provided by GAW17. For vari-

able k, bkis distributed as a mixed-effects model, origin-

ally defined by Besag et al. [7] as:

†k

T

kkk

pqj

Z ++

,

(2)

where the latent fixed effect is π and the random

effects components are:

qs

jj

t

n

k

kk

k

N

N

~( ,0),()

~,.()

3

3

2

2

a

b

−−

⎛

⎜⎜

⎝

⎞

⎟⎟

⎠

The Z matrix stores external knowledge about each of

the m variables currently in the model. To encode my

belief that deleterious mutations would have higher or

lower values of b relative to other types of mutations, I

assigned a value of 1 to the nonsynonymous mutation

in the second column (after reserving the first column

as the intercept) of the m × 2 design matrix Z and a

value of 0 for any other SNP category. The term π, esti-

mated using ordinary linear regression, relates the mag-

nitude of b in Eq. (1) to values in Z. Furthermore, to

encode my belief that mutations within the same gene

should have similar effects on disease, I specified an

indicator encoding whether predictor k and any other

model variables are in the same gene by means of a k ×

k adjacency matrix A. Specifically, the parameter j−k

stores the mean of the MLE ˆb b from the first level,

taken across neighbors of variable k (i.e., all other vari-

ables that are in the same gene) defined by means of A.

The variance term τ2is inversely scaled by vk, the num-

ber of neighbors of k to weight the uncertainty about τ2.

Finally, θksoaks up any remaining variation in the sec-

ond level of the model through the variance term s2.

A posterior density is defined on the basis of the likeli-

hood and normal density function corresponding to the

first (Eq. (1)) and second (Eq. (2)) levels of the hierarchi-

cal model. I use the product of this density function and

a model transition function as the objective function of a

reversible jump Monte Carlo Markov chain (MCMC)

algorithm to stochastically explore the search space,

fitting all possible sets of model variables to the data. The

model transition kernel itself is informed through empiri-

cal Bayes estimates of the hyperparameters (e.g., π), so

that regions of the search space that have strong empiri-

cal support and prior evidence are prioritized. Further

details on how the variable selection algorithm works can

be found in Chen and Thomas [6].

In the next section I present results between a more

conventional method and my proposed method. The

Chen BMC Proceedings 2011, 5(Suppl 9):S16

http://www.biomedcentral.com/1753-6561/5/S9/S16

Page 2 of 6

Page 3

first method is an ordinary least-squares regression

between the quantitative trait (i.e., Q1 or Q2) and each

vector of variable scores, which I denote as the MLE

method. This approach is equivalent to a conventional

genome-wide association scan, testing for marginal

effects. I compared this to four variations of the multi-

variate MCMC method. Specifically, I varied the degree

of informativeness in the prior distribution by modifying

the definition of the matrices A and Z. The most infor-

mative prior distribution (denoted FULL) stores both

gene membership and SNP mutation type information

in the A and Z matrices, respectively. In the second var-

iant of the prior distribution (denoted Z only), I

removed gene membership information so that matrix

A was simply the identity matrix. Conversely, in the

third variant of the prior distribution (denoted A only),

I removed mutation class information so that the Z

matrix included only the intercept. The last variant of

the prior distribution (denoted UNINF) includes both

the uninformative Z and A matrices and is equivalent to

a the ridge style prior distribution (i.e., b ~ N(0, s2)).

For each of the MCMC analyses, I sampled 2 million

realizations from the posterior distribution, retaining

statistics on only the last million realizations to mini-

mize any correlation to the initial parameter values. Run

time on a 2-GHz Xeon processor was approximately 8

h. I verified that the retained statistics reached conver-

gence by comparing their distributions across multiple

chains using a nonsignificant p-value extracted from the

Kolmogorov-Smirnov test.

To quantify evidence for any specific variable (either

common SNP or SNP bin), I empirically estimated

Bayes factors (BFs) for each variable by dividing the pos-

terior odds by the prior odds, as described by Chen and

Thomas [6]. BFs quantify the increase in evidence for a

hypothesis (in this case, inclusion of a variable into the

model) in light of observed data relative to a prior

hypothesis [8].

Results

Table 1 lists the posterior estimates of the various

hyperparameters of the hierarchical model under the

FULL prior distribution specification. For either of the

two quantitative traits, the residual variance (τ2) in the

random effects component was smaller than the residual

variance from the fixed effects component (s2), indicat-

ing a good fit between the gene-membership prior

distribution and the observed data. The posterior esti-

mates for the prior mean (π) indicate a slightly positive

correlation (0.03) between disease risk and presence of a

nonsynonymous mutation, although the evidence is

weak considering the large standard errors (0.06).

As alluded to earlier, hierarchical modeling shrinks

unstable MLEs toward means informed through either

informative or noninformative prior distributions. I con-

sidered two metrics that measure the accuracy of a

method’s estimation of the true effect size: the mean

coverage rate (MCR) and the root mean-square error

(RMSE). I defined the MCR as the proportion across all

causal SNPs and simulation replicates where the true

value of b falls within the 95% confidence interval of the

estimator. Thus a perfect estimator would have a value

of 1. Hierarchical modeling achieved an MCR of 0.91

under the Q1 disease model, in contrast to an MCR of

0.56 when applying maximum likelihood. The second

metric I considered, RMSE, is calculated by taking the

square root of the average squared difference (also taken

across all markers and replicates) between the estimated

and true values of b. A smaller value of the RMSE indi-

cates a more precise estimation of the true effect size.

Under the Q1 disease model, the RMSE for the hier-

archical model was 0.17, whereas for the maximum-like-

lihood model it was 0.38. When Q2 was the disease

model, the RMSE and MSE were similar (within ±0.01),

approximately 0.17 and 0.94, respectively, regardless of

which method was used. Table 2 presents a list of causal

variables under the Q1 disease model, indicating that

several SNPs at the FLT1 gene were poorly estimated

using maximum likelihood.

I next evaluated the ability of the MCMC sampler to

perform variable selection by comparing sensitivity and

specificity across the four variants of the prior distribu-

tion. The receiver operating characteristic (ROC) curves

in Figures 1 and 2 illustrate power across various false

discovery rates for traits Q1 and Q2, respectively. As

Table 1 Posterior estimates of hyperparameters

Parameter

τ2

s2

π (SE)

Trait Q1Trait Q1

0.006

0.01

0.03 (0.06)

0.006

0.01

0.02 (0.06)

Table 2 Accuracy of estimates of b for trait Q1 between

maximum-likelihood estimate (MLE) and hierarchical

modeling (HM) estimates

VariableMean square error Mean coverage ratea

MLE HM MLEHM

C1S6521

C13S398

C13S515

C13S522

HFE, nonsynonymous

KCTD14, nonsynonymous

C4S1878

0.196

0.069

0.300

0.126

0.218

0.007

0.102

0.046

0.036

0.029

0.037

0.033

0.009

0.024

0.68

0.90

0.07

0.06

0.63

0.91

0.7

0.94

0.93

0.92

0.71

0.98

0.84

0.95

aProportion of replicates where true b falls within the 95% confidence

interval.

Chen BMC Proceedings 2011, 5(Suppl 9):S16

http://www.biomedcentral.com/1753-6561/5/S9/S16

Page 3 of 6

Page 4

one might expect, introducing informative prior distri-

butions into the model improves power to detect causal

variants. Gene membership information as encoded in

matrix A proved to be the most critical component for

power overall. When Q2 was used as the outcome phe-

notype, the method showed greater sensitivity than the

MLE method across all false discovery range (FDR)

values, regardless of the prior distribution specification.

Q1 performed slightly worse than the MLE at low FDRs

when gene membership information was omitted from

the prior specification. Table 3 summarizes the relative

differences in power at FDR = 0.05 between the MLE

and my approach.

I noted a wide range of evidence across the variables

considered. Tables 4 and 5 present a comparison of BFs

across the various prior specifications in the variable

selection algorithm for each causal variable that was

included in the analysis. Although guidelines for BF

interpretation [8] deem several variables to be “barely

worth mentioning” (BF range, 1 to 3), others could be

considered “decisive” (BF > 100). Under Q1, evidence of

association was strongest for the C13S431, C13S522,

and C13S523 SNPs in the FLT1 gene, which had more

“common” MAFs of 0.02, 0.03, and 0.07, respectively.

These same SNPs also had fairly large simulated odds

ratios (2.1, 1.9, and 1.9, respectively), which most likely

explain the improved overall performance of all the

methods under the Q1 model, as shown in Figures 1

and 2, in contrast to the Q2 model, whose disease

model was more challenging. The only SNP under the

Q1 model that was more common than these three

SNPs was C4S1878 (MAF = 0.16). A relatively moderate

BF of 107 at this SNP reflects its modest simulated odds

ratio of 1.1. The A matrix information, which helps dis-

tribute evidence of association across a gene, was advan-

tageous for SNPs within FLT1. In contrast, for SNPs in

other genes, the Z matrix, which enables variables of the

same mutation type to share a common mean, improved

the method’s power to detect causal variants, as seen in

the higher BFs in column 2 versus column 3 in Tables 4

and 5. This observation was not too surprising, consid-

ering the fact that the simulation model considered only

nonsynonymous mutations to be causal.

Discussion

In response to the missing heritability mystery plaguing

the field of complex trait genetics, there is understand-

ably massive interest in developing methods that can

effectively investigate the relationship between rare var-

iants and disease. In the methods described by Madsen

Figure 1 Receiver operating characteristic curve under

polygenic disease model for trait Q1. The proportion of causal

variants is plotted as a function of the proportion of noncausal

variants, taken across 200 replicates.

Figure 2 Receiver operating characteristic curve under

polygenic disease model for traitQ2. The proportion of causal

variants is plotted as a function of the proportion of noncausal

variants, taken across 200 replicates.

Table 3 Relative power (in relation to the maximum-

likelihood estimate) of hierarchical modeling method at

FDR = 0.05

VariationTrait Q1Trait Q2

UNINF

Z only

A only

FULL

0.94

0.98

1.04

1.05

1.14

1.17

1.17

1.19

FDR, false discovery range.

Chen BMC Proceedings 2011, 5(Suppl 9):S16

http://www.biomedcentral.com/1753-6561/5/S9/S16

Page 4 of 6

Page 5

and Browning [3] and a more recent refinement

described by Price et al. [4], common SNPs are down-

weighted on the assumption that their effect sizes are

expected to be smaller than their rarer neighbors.

Details on these approaches are found in Dering et al.

[9]. A one degree of freedom test is carried out at the

gene level or other biological unit rather than at the

SNP level. These methods are appealing because power

can be increased as a result of fewer multiple hypotheses

to adjust for. I took a somewhat different approach that

was closer in spirit to the CMC method [2]. Like the

CMC approach, my method operates within a multivari-

ate framework so that multiple bins within a gene can

be considered; this allows one to test multiple hypoth-

eses and to refine the signal, albeit at a statistical cost

resulting from multiple comparisons. In contrast to the

Madsen and Browning [3] and Price et al. [4] methods,

I do not down-weight SNPs of higher frequency. In fact,

I believe that if there is a shared haplotype effect among

case subjects, then these common SNPs can aid in dis-

covery of rarer neighbors through an appropriate prior

specification (e.g., the A matrix in the hierarchical

model). With any type of collapsing strategy, including

mine, the choice of how bins are defined is arbitrary

and some type of permutation procedure is necessary to

alleviate an increase in type I error from overfitting the

data. My Bayesian method, while also computationally

expensive, does not involve permutation. Through Bayes

model averaging and reporting of BFs, the problem of

model overfitting is handled naturally. I previously

demonstrated through simulations that the model is

robust in light of multiple comparisons within the con-

text of discovering interactions [6].

The results from the analyses show that in certain

cases, such as when Q1 is modeled as the outcome, rare

variants can make accurate estimates of effect size diffi-

cult when operating under a conventional MLE frame-

work. Hierarchical modeling can be particularly helpful

here, even if the prior distributions are not particularly

informative. However, I must provide an important

caveat that the method, which still operates under a

standard multivariate regression framework at the first

level of the model, does not appear to work particularly

well when rare variants (i.e., omitting a collapsing strat-

egy), such as singleton mutations, are directly tested;

convergence problems usually emerge when the design

matrix becomes numerically singular. Thus I was unable

to directly evaluate the method’s performance on any

one specific SNP among the extremely rare causal var-

iants. The LASSO (least absolute shrinkage and selec-

tion operator) method [10], another flavor of penalized

regression that provides variable selection, has recently

been extended to allow one to directly test any rare var-

iant by defining bins (e.g., genes) that relax the global

penalization parameter [11]. Although my approach is

more limited in this sense, my model allows the investi-

gator to include an arbitrary number of prior knowledge

sources through columns in a Z matrix, as demon-

strated in the sensitivity analyses presented in Figures 1

and 2. I found that defining a richer prior distribution

on b based on biology could indeed improve power to

detect variants. On closer inspection, I learned that

mutation type (synonymous vs. nonsynonymous) infor-

mation was more beneficial than gene-membership

information for most of the SNP bins, but the opposite

was true for the FLT1 gene. Thus I recommend provid-

ing as much external knowledge as possible in the

model (e.g., adding additional columns in Z). Because

my method is based on empirical Bayes estimates, it is

robust to poor specification of the prior distribution,

because this only leads to increased uncertainty

Table 4 Bayes factors for each causal variable under the

Q1 trait model

Causal variablea

UNINFZ onlyA onlyFULL

ARNT, nonsynonymous

C4S1884

HIF1A, nonsynonymous

C13S522

C1S6533

C4S1878

C14S1734

C13S431

FLT1, nonsynonymous

C13S523

1.14

36.33

0.58

527.13

109.38

57.23

1.47

299.20

17.25

998.33

1.95

43.31

1.14

600.4

149.45

92.15

2.42

327.49

27.80

999.07

1.85

36.12

0.57

764.92

119.39

69.16

1.47

572.87

81

999.87

3.99

47.65

1.22

773.35

162.42

107.15

2.58

551.06

103.17

999.7

aDefined as either a bin of SNPs (shown with convention gene name and

mutation class) or a single SNP. Only variables with MAF ≥ 0.01 were included

for analyses.

Table 5 Bayes factors for each causal variable under the

Q2 trait model

Causal variablea

UNINF Z only A only FULL

C6S5441

SIRT1, nonsynonymous

C2S354

C8S442

C6S5449

SREBF1, nonsynonymous

PDGFD, nonsynonymous

C6S5426

C6S5380

PLAT, nonsynonymous

VLDLR, nonsynonymous

BCHE, nonsynonymous

LPL, nonsynonymous

53.96

22.36

11.17

62.14

64.43

60.71

49.64

0.87

212.1

9.28

17.51

38.11

1.34

77.22

34.60

15.29

88.72

85.17

83.47

71.03

1.48

264.3

14.54

26.85

55.60

2.43

65.32

23.92

11.04

64.60

73.89

60.57

49.12

1.25

210.2

8.89

17.15

37.50

1.62

88.21

34.82

16.46

87.26

94.77

85.76

70.54

2.10

263.0

14.44

26.93

55.62

2.63

aDefined as either a bin of SNPs (shown with convention gene name and

mutation class) or a single SNP. Only variables with MAF ≥ 0.01 were included

for analyses.

Chen BMC Proceedings 2011, 5(Suppl 9):S16

http://www.biomedcentral.com/1753-6561/5/S9/S16

Page 5 of 6

Page 6

(modeled in the prior variances τ2and s2), asymptoti-

cally reducing the prior distribution on b to a ridge

prior distribution.

Clearly, there is a need to develop methods to effec-

tively mine the data for rare variants that confer disease

risk. I am optimistic that my approach is more effective

than other methods in many cases, but it does have the

same limitations as those shared by collapsing-style

methods, particularly the strong assumption that effect

sizes will point in the same direction among SNPs inside

a bin. I am considering other variations of the hierarchi-

cal model that might more flexibly accommodate this

type of heterogeneity. One appealing idea is to include a

new stochastic layer into the algorithm that randomly

groups SNPs into bins (and consequently compresses

the A and Z matrices accordingly). My method cur-

rently permits one to perform a global test of associa-

tion (i.e., are any rare variants associated?) by testing

fixed bins. An important property of enabling flexibility

in bin assignment is that one can additionally perform

local tests of association (i.e., how often does this SNP

appear in any bin?).

Conclusions

I have presented a computationally efficient Bayesian

method that simultaneously provides additional power

to discover rare disease variants and enhances estima-

tion of true effect sizes. Users interested in the algo-

rithm can download C++ source code and binaries from

my website (http://www-hsc.usc.edu/~garykche/).

Acknowledgments

I would like to thank the organizers of Genetic Analysis Workshop 17, the

anonymous manuscript reviewers, and the copy editor Mimi Braverman for

improving the paper. The Workshop is supported by National Institutes of

Health grant R01 GM031575.

This article has been published as part of BMC Proceedings Volume 5

Supplement 9, 2011: Genetic Analysis Workshop 17. The full contents of the

supplement are available online at http://www.biomedcentral.com/1753-

6561/5?issue=S9.

Authors’ contributions

GKC conceived of the study, carried out the statistical analyses, and drafted

the manuscript.

Competing interests

I have no competing interests to declare.

Published: 29 November 2011

References

1.Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB: Rare variants

create synthetic genome-wide associations. PLoS Biol 1000, 8:e294.

2.Li B, Leal SM: Methods for detecting associations with rare variants for

common diseases: application to analysis of sequence data. Am J Hum

Genet 2008, 83:311-321.

3. Madsen BE, Browning SR: A groupwise association test for rare mutations

using a weighted sum statistic. PLoS Genet 2009, 5:e1000384.

4.Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR:

Pooled association tests for rare variants in exon-resequencing studies.

Am J Hum Genet 2010, 86:832-838.

Almasy LA, Dyer TD, Peralta JM, Kent JW Jr, Charlesworth JC, Curran JE,

Blangero J: Genetic Analysis Workshop 17 mini-exome simulation. BMC

Proc 2011, 5(suppl 9):S2.

Chen GK, Thomas DC: Using biological knowledge to discover higher

order interactions in genetic association studies. Genet Epidemiol 2010,

34:863-878.

Besag J, York J, Mollie A: Bayesian image restoration, with two

applications in spatial statistics. Ann Inst Stat Math 1991, 43:1-20.

Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc 1995, 90:773-795.

Dering C, Pugh E, Ziegler A: Statistical analysis of rare sequence variants:

an overview of collapsing methods. Genet Epidemiol 2011, X(suppl X):X-X.

Tibshirani R: Regression shrinkage and selection via the Lasso. J R Stat

Soc Ser B Stat Methodol 1996, 58:267-288.

Zhou H, Sehl ME, Sinsheimer JS, Sobel EM, Lange K: Association screening

of common and rare genetic variants by penalized regression.

Bioinformatics 2010, 26(19):2375-82.

5.

6.

7.

8.

9.

10.

11.

doi:10.1186/1753-6561-5-S9-S16

Cite this article as: Chen: Enhancing the discovery of rare disease

variants through hierarchical modeling. BMC Proceedings 2011 5(Suppl 9):

S16.

Submit your next manuscript to BioMed Central

and take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at

www.biomedcentral.com/submit

Chen BMC Proceedings 2011, 5(Suppl 9):S16

http://www.biomedcentral.com/1753-6561/5/S9/S16

Page 6 of 6