# Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction.

**ABSTRACT** Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on "top" findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using "on-the-fly" lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis.

**0**Bookmarks

**·**

**130**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Genetic variations of the 5-lipoxygenase activating protein and leukotriene A4 hydrolase genes that confer an increased risk of ischemic stroke have implicated the family of leukotrienes as potential mediators of ischemic stroke. This study aimed to explore the association of ALOX5, LTA4H and LTC4S gene polymorphisms with ischemic stroke risk in a cohort of Chinese in east China.World journal of emergency medicine. 01/2013; 4(1):32-7. - [Show abstract] [Hide abstract]

**ABSTRACT:**Large-scale epistasis studies can give new clues to system-level genetic mechanisms and a better understanding of the underlying biology of human complex disease traits. Though many novel methods have been proposed to carry out such studies, so far only a few of them have demonstrated replicable results. Here, we propose a minimal protocol for genome-wide association interaction (GWAI) analysis to identify gene-gene interactions from large-scale genomic data. The different steps of the developed protocol are discussed and motivated, and encompass interaction screening in a hypothesis-free and hypothesis-driven manner. In particular, we examine a wide range of aspects related to epistasis discovery in the context of complex traits in humans, hereby giving practical recommendations for data quality control, variant selection or prioritization strategies and analytic tools, replication and meta-analysis, biological validation of statistical findings and other related aspects. The minimal protocol provides guidelines and attention points for anyone involved in GWAI analysis and aims to enhance the biological relevance of GWAI findings. At the same time, the protocol improves a better assessment of strengths and weaknesses of published GWAI methodologies.Human Genetics 08/2014; · 4.52 Impact Factor - SourceAvailable from: Kyrylo BessonovElena S. Gusareva, Minerva M. Carrasquillo, Céline Bellenguez, Elise Cuyvers, Samuel Colon, Neill R. Graff-Radford, Ronald C. Petersen, Dennis W. Dickson, Jestinah M. Mahachie John, Kyrylo Bessonov, Christine Van Broeckhoven, Denise Harold, Julie Williams, Philippe Amouyel, Kristel Sleegers, Nilüfer Ertekin-Taner, Jean-Charles Lambert, Kristel Van Steen[Show abstract] [Hide abstract]

**ABSTRACT:**We propose a minimal protocol for exhaustive genome-wide association interaction analysis that involves screening for epistasis over large-scale genomic data combining strengths of different methods and statistical tools. The different steps of this protocol are illustrated on a real-life data application for Alzheimer's disease (AD) (2259 patients and 6017 controls from France). Particularly, in the exhaustive genome-wide epistasis screening we identified AD-associated interacting SNPs-pair from chromosome 6q11.1 (rs6455128, the KHDRBS2 gene) and 13q12.11 (rs7989332, the CRYL1 gene) (p = 0.006, corrected for multiple testing). A replication analysis in the independent AD cohort from Germany (555 patients and 824 controls) confirmed the discovered epistasis signal (p = 0.036). This signal was also supported by a meta-analysis approach in 5 independent AD cohorts that was applied in the context of epistasis for the first time. Transcriptome analysis revealed negative correlation between expression levels of KHDRBS2 and CRYL1 in both the temporal cortex (β = −0.19, p = 0.0006) and cerebellum (β = −0.23, p < 0.0001) brain regions. This is the first time a replicable epistasis associated with AD was identified using a hypothesis free screening approach.Neurobiology of Aging 11/2014; · 4.85 Impact Factor

Page 1

Lower-Order Effects Adjustment in Quantitative Traits

Model-Based Multifactor Dimensionality Reduction

Jestinah M. Mahachie John1,2*, Tom Cattaert1,2, Franc ¸ois Van Lishout1,2, Elena S. Gusareva1,2, Kristel Van

Steen1,2

1Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium, 2Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium

Abstract

Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big

challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic

effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic

variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we

restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-

way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order

genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant

coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that

empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use

automatic SNP selection procedures, SNP selection based on ‘‘top’’ findings, or SNP selection based on p-value criterion for

interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects

in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately

controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In

conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be

avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality

Reduction and involve using residuals as the new trait. We advocate using ‘‘on-the-fly’’ lower-order effects adjusting when

screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis.

Citation: Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K (2012) Lower-Order Effects Adjustment in Quantitative Traits Model-Based

Multifactor Dimensionality Reduction. PLoS ONE 7(1): e29594. doi:10.1371/journal.pone.0029594

Editor: Yun Li, University of North Carolina, United States of America

Received July 15, 2011; Accepted December 1, 2011; Published January 5, 2012

Copyright: ? 2012 Mahachie John et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits

unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: All authors acknowledge research opportunities offered by the Belgian Network BioMAGNet (Bioinformatics and Modelling: From Genomes to

Networks), funded by the Interuniversity Attraction Poles Programme (Phase VI/4), initiated by the Belgian State, Science Policy Office. Their work was also

supported in part by the IST Programme of the European Community, under the PASCAL2 Network of Excellence (Pattern Analysis, Statistical Modelling and

Computational Learning), IST-2007-216886. In addition, F. Van Lishout acknowledges support by Alma in Silico, funded by the European Commission and Walloon

Region through the Interreg IV Program. Tom Cattaert is a Postdoctoral Researcher of the Funds for Scientific Research – FNRS. The funders had no role in study

design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: jmahachie@ulg.ac.be

Introduction

Complex diseases commonly occur in a population and are a

major source of discomfort, disability and death worldwide. They

are believed to arise from multiple predisposing factors, both

genetic and non-genetic, each factor potentially having a

modifying effect on the other. Detecting gene-gene interactions

or epistasis in studies of human complex diseases is a big challenge

in genetic epidemiology. An additional challenge is to account for

important lower-order genetic effects in order to reduce false

positive epistasis results. To date, several strategies are available,

within the context of genetic association studies that specifically

aim to identify and characterize gene-gene interactions. Among

these strategies is the Model-Based Multifactor Dimensionality

Reduction (MB-MDR) which was first introduced by Calle et al.

[1]. The strategy of MB-MDR to tackle the dimensionality

problem in interaction detection involves reducing a potentially

high dimensional problem to a one-dimensional problem by

pooling multi-locus genotypes into three groups based on

association testing or modeling. Those multi-locus genotypes

exhibiting some significant evidence of increasing or decreasing

phenotypic mean, are labeled High group and Low group,

respectively. In addition, those multi-locus genotypes that either

show no evidence of association or have no sufficient sample size

contribute to an additional third Model-Based Multifactor

Dimensionality Reduction category, that of ‘No Evidence for

association’. It has been suggested that Model-Based Multifactor

Dimensionality Reduction is a useful method for identifying gene-

gene interactions in case-control or family-based design for both

dichotomous and quantitative traits [1,2,3,4,5,6]. For more details

on MB-MDR, we refer to the aforementioned articles. Although a

power study of MB-MDR detection with and without main effects

adjustment has been performed before [4,6], these studies only

involve adjusting for the known functional SNPs contributing to an

epistasis effect. The preliminary results these studies gave rise to,

emphasized the importance of lower-order effects adjustment

when searching for gene-gene interactions and warranted a more

detailed investigation.

PLoS ONE | www.plosone.org1 January 2012 | Volume 7 | Issue 1 | e29594

Page 2

In this study, we perform a thorough simulation-based

investigation of the power of quantitative trait MB-MDR to

identify gene-gene interactions, using different strategies to adjust

for lower-order genetic effects, that may or not be part of the

(functional) SNP-SNP interaction under investigation. Perfor-

mance criteria used are power and familywise error rate. We

perform MB-MDR epistasis analyses first without any adjustment

for main effects and then with adjustments using several strategies.

The proposed main effects corrections can be grouped into two

categories: 1) main effects screening followed by MB-MDR

applied to an adjusted trait and 2) main effect adjustment

integrated in step 1 and step 2 of MB-MDR. These are depicted

in Figure 1 and described in more detail in the methods section.

Methods

MB-MDR

We apply a quantitative trait MB-MDR as described in

Mahachie John et al. [6] and its generalization to main effects

corrections. For a sufficiently frequent bi-allelic marker, there are

3 theoretically possible genotypes. Hence, 2 bi-allelic markers give

rise to 9 multi-locus cells. Each of the 9 multi-locus genotype cells

alternatively constitute group 1. The remaining 8 multi-locus

genotypes constitute group 2. The key MB-MDR steps are

summarized in Figure 2. In MB-MDR step 1, we make use of a

Student t-test at significance level 0.1 to compare the mean trait

values in the 2 aforementioned groups of multi-locus genotypes. In

step 2, we use the cell-based results of step 1 to label significant

cells as H(igh) or L(ow) and non significant ones as ‘no evidence’,

O. The sign of the Student’s t-test statistic is used to distinguish

between H and L: a positive (negative) sign refers to H (L). The

result is a new categorical variable with labels H, L and O. A new

association test is then performed for the newly created construct

on the trait, Y. In particular, we consider the maximum of Student

t-tests comparing the H-cells versus {L,O}-cells and L-cells versus

{H,O}-cells. In step 3, we assess the overall significance by

adopting a permutation-based maxT correction [7] with 999

replicates. Although in this study we focus on 2-locus interactions,

the principle of MB-MDR can be extended to single SNP-analysis

(hereafter referred to as MB-MDR1D) and higher-order (.2)

interactions (under construction).

Several methods exist to correct for lower-order effects in the

context of quantitative MB-MDR epistasis screening. An

overview of the considered methods in this study is given in

Figure 2. A first strategy is to extensively look for potentially

confounding main effects to transform the original trait to an

adjusted trait and to submit this newly defined trait to MB-MDR

for epistasis screening.

When correcting for main effects, a note about how to best code

lower-order effects is warranted. In a GWA study, SNPs are often

coded in an additive way [8]. This coding works well in practice,

although power can be gained by acknowledging the true

underlying genetic models [9]. For instance, if the two homozygote

genotypes at a locus exhibit the same risk, different from the

Figure 1. Different approaches to adjust for lower-order effects in MB-MDR epistasis screening.

doi:10.1371/journal.pone.0029594.g001

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org2January 2012 | Volume 7 | Issue 1 | e29594

Page 3

heterozygote risk (over-dominance), then the additive coding will

have reduced power irrespective of the sample size [10].

Alternatively, several coding schemes may be investigated and a

maximum statistic over screened main effects models may be

selected [11]. The differing unknown operating modes of

inheritance throughout the genome make it hard to flexibly and

automatically acknowledge this complex inheritance spectrum.

Therefore, the route chosen in this paper, now in an epistasis

context is to correct for main effects by either assuming an additive

or a co-dominant coding scheme, in scenarios that involve

different contributions of additive and dominance variance to

main effects variance. Although some of these scenarios may be

better captured by non-additive and non-co-dominant codings, the

interest is in finding an all-purpose acceptable (in terms of power

Figure 2. Summary of the steps involved in MB-MDR analysis.

doi:10.1371/journal.pone.0029594.g002

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org3 January 2012 | Volume 7 | Issue 1 | e29594

Page 4

and type I error) way to remove the main effects signals

influencing epistasis signals. Choosing between additive and co-

dominant coding schemes implies choosing between the least and

most severe such removal of effects.

Main effects screening prior to MB-MDR

This screening procedure involves first adjusting for a chosen

subset of main effects via parametric (linear) regression models and

then considering residuals from the fitted models as a new trait for

MB-MDR. For the adjustment methods involving significance

assessments, we remark that whenever none of the SNPs are

significant, the original trait is submitted to MB-MDR.

Single (univariate) regression-based searches.

main effects can be identified via single-SNP regression models,

as is done in a classical GWA setting. Hence, SNPs that meet a

stringent criterion (such as governed by a Bonferroni criterion)

will be labelled as ‘‘important’’ and are therefore good candidates

to correct for in an epistasis screening. In this study, we prefer to

take a less conservative route, such as a selection based on step-

down maxT adjusted p-values with 999 replicates (Figure 1;

SRperm). However, targeting effects standing out in a GWA main

effects screening while maintaining overall type I error is quite

different from targeting main effects to adjust for in an epistasis

screening. Therefore, we also consider selecting ‘‘optimal’’ SNPs

for main effects correction in the quantitative MB-MDR

screening on the basis of their significance without correction

for multiple testing (Figure 1; SR0.05) or on the basis of a ranking

of the corresponding raw p-values (Figure 1; SRtop5, SRtop10,

SRtop15).

Multiple regression-based searches.

number of SNPs that are involved in a main effects genome-

wide analysis, multiple regression-based searches are often

automated.One suchautomated

selection based on AIC (stepAIC in R package MASS, R

2.10.0). This procedure iteratively adds and/or drops variables

to seek the lowest AIC score. The final model generates the list of

main effects to correct for in the quantitative trait MB-MDR

analysis (Figure 1; MRAIC).

Main effects adjustment as an integral part of MB-

MDR.

In this scenario, main effects are adjusted for ‘‘on-the-

fly’’, i.e. SNPs are adjusted for during the first 2 MB-MDR

epistasis screening steps. Three types of adjustment are

considered. A first type is to always adjust for the SNPs in the

pair under investigation (Figure 1; MB-MDRadjust). Hence, the

adjustment is done irrespective of whether a main effect is truly

present. A second type is to only adjust for SNPs that are

identified by MB-MDR1Das significant. Here, MB-MDR1Dis

run first and a list of genome-wide significant SNPs is identified

(based on step-down maxT with 999 permutation replicates).

MB-MDR epistasis screening is then performed while only

adjusting for the identified SNPs for the pair under investigation

(Figure 1; MB-MDR1D). A third type is to only adjust for

significant SNPs obtained via single regression models and

maxT significance assessment (Figure 1; MB-MDRlist). Thus, for

MB-MDR1Dand MB-MDRlist, any of the following 3 situations

can arise: a) None of the 2 SNPs is significant and no correction

is performed b) One of the 2 SNPs is significant and this is

adjusted for c) Both SNPS are significant and both SNPs are

adjusted for.

In order to account for potentially important SNPs as an

integral part of MB-MDR, we remark that the Student’s t-test in

MB-MDR steps 1–2 (Figure 2) is replaced by the Wald test for the

interaction effect in a regression framework.

Important

Due to a large

approachusesstepwise

Data Simulation

Simulated data as generated in Mahachie John et al. [6] are

based on two epistasis models for SNP1 and SNP2 that

incorporate varying degrees of epistasis: Model M27 and Model

M170 of [12]. In order to increase the phenotypic mean, M27

requires an individual to have at least one copy of the minor allele

at both loci whereas M170 requires an individual to be

heterozygous at one locus and homozygous at the other. The

phenotypic means for the aforementioned epistasis models only

take two values, mL (Low phenotypic mean) and mH (High

phenotypic mean). The total phenotypic variance s2

sum of genetic variance at both loci 2s2

frequencies for the functional SNPs are taken to be the same),

epistasis variance s2

1. As a consequence, the total genetic variance, s2

locus model consisting of main effects variance and epistasis

variance has an interpretation of a broad heritability measure.

SNP1 and SNP2 have MAF equal to p, with p one of

f0:1,0:25,0:5g. The MAFs of the other 98 markers are generated

from a random uniform distribution, U(0.05,0.5). MB-MDR

screening is performed on 100 SNPs in Hardy-Weinberg

Equilibrium and linkage equilibrium. The total genetic variance

s2

variance s2

variance s2

dom. As p increases, the contribution to the total genetic

variance of epistasis variance relative to main effects variance

increases for M170 and decreases for M27, and also the

contributions of additive and dominance variance to the total

main effects variance change with p (Table 1).

For SNP3 and SNP4 , main effects are imposed with associated

variances s2

4, selected from a uniform distribution U(0,

0.06) such that the total main effects variance of the 4 loci (SNP1,

SNP2, SNP3, SNP4) is s2

modes of inheritance for SNP3 and SNP4 are additive and

advantageous heterozygous. Note that SNP4 will therefore

contribute to both the additive and dominance components of

the main effects variance. This scenario allows us to investigate the

effect of global main effects correction approaches for functional

SNPs that are not part of a two-locus interaction.

In addition data are simulated under the null model for the

functional pair (i.e. s2

g~0) in two ways, giving rise to two null

hypotheses H01and H02. H01: no genetic contribution apart from

SNP3 and SNP4 as main effects and H02: no genetic contribution

from any of the SNPs whatsoever.

In summary, a total of 36 simulation settings are considered. For

each parameter setting, we consider 500 simulation replicates,

involving 2000 unrelated individuals.

tot, i.e. the

1~s2

main(the minor allele

epi, and environmental variance s2

env, is fixed at

g, for the two-

gis varied as s2

g[f0:01,0:02,0:03,0:05,0:5g. The main effects

mainconsists of additive variance s2

addand dominance

3and s2

main~2s2

1zs2

3zs2

4. The respective

Table 1. Theoretically derived proportions of the genetic

variance due to main effects (additive and dominance) or

epistasis.

Modelp

s2main/s2gen s2add/s2main s2dom/s2main

s2epi/s2gen

0.10.3190.9470.0530.681

M270.25 0.6090.8570.1430.391

0.50.8570.6670.3330.143

0.10.5810.7800.2200.419

M1700.250.118 0.4000.6000.882

0.50.0000.947 0.0531.000

doi:10.1371/journal.pone.0029594.t001

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org4 January 2012 | Volume 7 | Issue 1 | e29594

Page 5

Results

Familywise error rates and false positive rates

Table 2 shows results for settings simulated under the null

hypotheses H01and H02of no genetic associations with the trait,

yet in the presence or absence of additional main effects (SNP3

and SNP4).

We observe that MB-MDR type I error percentages are close

to the nominal type I error rate of 5%, when no correction for

main effects is performed under settings where no additional

main effects act on the quantitative trait. Type I error rates are

also kept under control when correction for main effects is

integrated in MB-MDR epistasis screening as well as prior to

MB-MDR for permutation based regression-based approach

(MB-MDRadjust, MD-MDR1D, MB-MDRlistand SRperm, respec-

tively). In particular, additive correction under H02 and co-

dominant correction for both H01 and H02. When additional

main effects are present in the data, adjusting for their effects

using additive correction give rise to inflated type I error rates

ranging from 55 to 74%. In contrast, when adopting a co-

dominant correction, type I error is under control for MRAIC,

and single regression-based correction methods (except SRperm)

which are extremely conservative (Table 2: type I error rates are

close to zero).

False positive rate estimates generated by MB-MDR (i.e.

referring to scenarios for which one or more significantly

Table 2. Type I error percentages for data generated under the null hypothesis of no genetic association of the interacting pair.

Without correction and additional main effects

PPresentabsent

0.10.982 0.046

0.25 No correction0.9840.050

0.5 0.982 0.050

With Correction and additional main effects

Way of Correction

Additive Co-dominant

presentabsent presentAbsent

0.1 0.6760.048 0.0400.052

0.25MB-MDRadjust

0.710 0.0340.0540.038

0.50.740 0.044 0.0360.050

0.10.6760.0360.058 0.030

0.25 MB-MDR1D

0.722 0.0400.0420.036

0.50.7460.0400.0380.030

0.10.6820.0380.0560.030

0.25MB-MDRlist

0.7260.0360.0440.032

0.5 0.7480.0460.0400.032

0.10.6280.038 0.0480.030

0.25 SRperm

0.6600.0360.058 0.030

0.50.6780.046 0.0440.032

0.10.5760.0140.0060.010

0.25SR0.05

0.6040.0060.0120.000

0.50.6360.0220.0080.008

0.10.552 0.0080.0000.002

0.25 MRAIC

0.578 0.004 0.002 0.000

0.50.616 0.012 0.000 0.006

0.1 0.582 0.0140.0080.008

0.25 SRtop5

0.616 0.0020.0200.000

0.5 0.6380.0260.010 0.010

0.10.5600.0120.000 0.008

0.25 SRtop10

0.592 0.0060.004 0.000

0.5 0.6260.0220.006 0.006

0.10.556 0.0100.000 0.006

0.25 SRtop15

0.5880.006 0.0020.000

0.5 0.6180.0160.004 0.006

Results are for scenarios: with and without additional main effects (SNP3 and SNP4) contributing to the genetic variance. In bold are values within Bradley’s liberal

criterion of robustness.

doi:10.1371/journal.pone.0029594.t002

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org5 January 2012 | Volume 7 | Issue 1 | e29594

Page 6

interacting pairs other than the causal SNP pair (SNP1, SNP2))

using no correction or an additive or co-dominant correction of

main effects, are shown in Figure 3. When no correction is

performed, false positive rate estimates are around 100% under

both M170 and M27 genetic epistasis models. In general, for

additive correction false positive rate estimates range from 53 to

100% whereas for co-dominant correction, false positive rate

estimates are lower and range from 0 to 19%. In particular, false

positive rates for MB-MDRadjust(always adjusting for main effects

SNPs) in a co-dominant way range from 4 to 7%, rates that are

within the interval (0.025, 0.075), satisfying Bradley’s [13] liberal

criterion of robustness. This criterion requires that the type I

error rates are controlled for any level a of significance, if the

empirical type I error rate ^ a a is contained in the interval

0:5aƒ^ a aƒ1:5a. For MB-MDR1Dand MB-MDRlist, false positive

rates are not kept under control. The actual numerical results of

the false positive profiles plotted in Figure 3 are presented in the

Table S1 for M170 and Table S2 for M27. The main reason why

we observe higher false positive rates under additive correction is

due to the fact that SNP4 contributes to both an additive and

dominance component of the main effects variance. Hence, there

is also a higher chance of identifying ‘significant’ interactions for

pairs involving SNP4. False positive rates are reduced when co-

dominant correction is performed. Table 3 shows observed false

positive rates that involve pairing with SNP3 and/or SNP4 under

additive and co-dominant correction. Only MB-MDRadjust(‘‘on-

the-fly’’ adjustment) results are shown. From Table 3, g2.0, we

observe that under additive codings, false positive rates range

from 51 to 61% for interactions between SNP3 and SNP4.

However, for interactions with SNP3 (excluding SNP3, SNP4

interaction), false positive rates range from 0 to 6%, except for

Model 27, p=0.5 and g2of 0.05 and 0.1 where false positive rates

are 27 and 92%, respectively. As observed in Table 1, model

M27, p=0.5 has the highest relative contribution of dominance

variance, hence, additive correction does not fully account for

SNP1 and SNP2.

Empirical power estimates

Power profiles of MB-MDR to detect the correct interacting

pair (SNP1, SNP2) without and with different ways of adjustment

of main effects are shown in Figure 4. Empirical power estimates

are presented as Table S1 for M170 and Table S2 for M27. In this

section, we focus on scenarios where there is some remarkable

degree of main effects contributing to the genetic variance (M170:

p=0.1, M27: p=0.25 and 0.5). For a detailed view on variance

decomposition into main and epistatic effects, we refer to [6].

Under the aforementioned scenarios, the profile for no correction

always has the highest power. Under M170, the empirical power

estimates for this profile range from 33 to 100% for p=0.1. Under

M27, the power estimates range from 27 to 100% and from 15 to

100%, for p=0.25 and 0.5 respectively. Irrespective of whether

main effects are corrected for using additive or co-dominant

coding, profiles for the considered multiple-regression, MRAICand

single regression-based methods that do not involve multiple

testing (SR0.05, SRtop5, SRtop10and SRtop15) tend to follow the

same trajectory, giving rise to the lowest empirical power

estimates. With additive adjustments, empirical power estimates

for these corrective ways range from 0 to 100% for both models

M27 and M170. With co-dominant adjustments, power estimates

range from 0 to 93% , for M170, p=0.1, from 0 to 100% and

from 0 to 18%, for model M27, p=0.25 and p=0.5 respectively.

Estimates for MB-MDRadjust (corrective methods that are

integrated as part of MB-MDR) , range from 6 to 100% for

M170, p=0.1, from 3 to 100% for M27 with p=0.25 and from 1

to 100% for M27 with p=0.5, when additive corrections are

performed. Under co-dominant corrections the estimates range

from 4 to 100% for M170 (p=0.1) and from 4 to 100% or from 0

to 68% for M27 (p=0.25 and p=0.5 respectively).

Discussion

The identification of genetic susceptibility loci for human

complex diseases has been rather successful due to the ability to

Figure 3. False positive percentages of MB-MDR based on additive (A) and co-dominant (B) correction. False positive percentage is

defined as the proportion of simulation samples for which pairs other than the causal pair (SNP1, SNP2) are significant.

doi:10.1371/journal.pone.0029594.g003

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org6January 2012 | Volume 7 | Issue 1 | e29594

Page 7

combine different genome-wide association studies via meta-

analyses. In the quest for the missing heritability, genome-wide

association interaction studies have become increasingly popular

and the field shows a boost in methodological developments [14].

When lower-order effects are not appropriately accounted for in

epistasis screening, derived results may not be trustworthy and

conclusions about genuine epistasis may be ungrounded.

Indeed, the challenge is to find epistasis effects above and

beyond singular marker contributing effects, should there be any.

In this work, we investigated the power of MB-MDR for

quantitative traits and unrelated individuals, while targeting

gene–gene interactions accounting for potential main effects.

As was already observed in [6], MB-MDR adequately controls

type I rate at 5% when no association is present (null data). Under

additive corrections, type I error and false positive rates are high

irrespective of the adjustment method considered but controlled

under co-dominant corrections. This is due to the existence of

SNP4, which was simulated with both additive and dominance

effects (advantageous heterozygous). Hence, additive adjustment

does not fully remove the effect of SNP4. As shown before in

Table 3, the consequence is that a number of SNPs appear to be

significantly interacting with SNP4. Not surprisingly, this occurs

more often under additive correction compared to co-dominant

correction. This is because when we correct for main effects using

the co-dominant model, we remove all the effect of SNP4, and

hence false positive results are only by chance (5% nominal error

rate). When no main effects adjustment is implemented, MB-

MDR gives even higher false positive rate rates.

Table 3. False positive percentages of MB-MDRadjustinvolving SNP3 and/or SNP4.

Additive Co-dominant

Pg2

SNP3_

anyotherthanSNP4 SNP3_SNP4

SNP4_

anyotherthanSNP3

SNP3_

anyotherthanSNP4SNP3_SNP4

SNP4_

anyotherthanSNP3

0.10.0020.520 0.660 0.0000.000 0.000

H01

0.2500.0000.556 0.688 0.000 0.0000.000

0.50.0020.608 0.7220.004 0.0000.002

0.01 0.002 0.5840.7040.004 0.0000.000

0.020.008 0.5820.724 0.002 0.0000.000

0.10.03 0.0000.572 0.690 0.0000.000 0.000

0.05 0.0080.5340.676 0.0020.0000.000

0.10.0720.5400.752 0.0000.0000.000

0.010.0020.5980.7140.0000.0000.004

0.02 0.0000.558 0.7120.0020.0000.002

M1700.250.03 0.0000.5440.6860.0000.0000.000

0.050.0040.5360.7060.0020.0000.000

0.10.0320.5660.738 0.0000.000 0.000

0.010.0000.5260.6640.0000.000 0.002

0.020.0000.5880.7080.000 0.000 0.002

0.50.03 0.0020.544 0.6920.002 0.000 0.002

0.050.0020.5500.6660.000 0.0000.000

0.10.002 0.5280.6620.0020.000 0.000

0.010.0020.5320.662 0.0000.0000.000

0.020.000 0.564 0.6900.0000.0000.000

0.10.030.0000.5540.6800.002 0.0000.000

0.050.0000.562 0.704 0.0020.000 0.000

0.10.000 0.5180.6380.0000.0000.000

0.010.0020.512 0.6520.000 0.000 0.002

0.02 0.0040.5200.6820.004 0.0000.000

M27 0.25 0.030.0000.562 0.700 0.0020.000 0.000

0.050.000 0.546 0.7000.000 0.0000.002

0.1 0.0420.564 0.7340.0020.000 0.000

0.010.000 0.5460.6720.000 0.0000.002

0.020.0200.508 0.6840.0000.0000.000

0.50.030.0600.518 0.7060.0000.0000.002

0.050.272 0.536 0.8060.000 0.0000.000

0.10.912 0.5900.9740.0000.0000.000

False positive percentages shown are for identifying interaction between SNP3 and SNP4 and for interactions between SNP3 or SNP4 and at least one other SNP for null

data scenario under H01and for models M170 and M27.

doi:10.1371/journal.pone.0029594.t003

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org7January 2012 | Volume 7 | Issue 1 | e29594

Page 8

Lower power profiles under co-dominant corrections in Figure 4

are explained by the different contributions of additive and

dominance effects to the total main effects variance as already

shown in Table 1. When there is a remarkable contribution of

dominance effect, as mentioned before, additive coding does not

fully remove main effect contribution of the interacting SNPs. For

instance, under M27, when the contribution of main effects is

maximum (p=0.5), almost 33% of the main effects variance is

dominance, hence a huge difference in the power profiles between

additive and co-dominant codings.

Interestingly, easy-to-use automatic subset selection procedures

(MRAIC) and single regression-based identification of important

main effects prior to MB-MDR screening result in lower power

and almost zero false positive rates. Often, a list of top SNPs is

generated to derive disease genetic risk scores. Some of these SNPs

may reach user-defined significance, some may even reach

genome-wide significance and some may not be significant at all.

Hence, correcting for SNPs in such a list (e.g. top5, 10, 15) may

remove more of the trait’s variability than is really necessary,

especially when correction for multiple testing is not performed.

Note that we considered a minimum of 5 top findings since at least

4 SNPs were allowed to contribute to the main effects variance.

In order to attain sufficient power, any main effects corrective

method that leads to an over-correction during epistasis screening

should be avoided. All considered residual-based approaches

(MRAIC, SR0.05, SRperm, SRtop5, SRtop10, SRtop15) led to

uncontrolled false positive rates. This can be explained by either

the way the residuals were obtained (inappropriate main effects

coding) or by the non-exhaustive list of markers considered in the

residual computation.

Only co-dominantly correcting for significant SNPs as integral

part of MB-MDR screening perform much better. However, the

poor performance of MB-MDR1D and MB-MDRlist and the

excellent performance of MB-MDRadjustin terms of controlling

false positive epistasis rates supports the intuition that it (only)

matters to correct for those SNPs that are involved in the SNP pair

under investigation, when no other SNPs are expected to modify

the effect of that pair.

The aforementioned discussion clearly raises questions about

how to best correct for lower-order effects when higher-order (.2)

interactions are targeted. In either case, to aid in interpretation of

results, it is always a good practice to assess the joint information of

clusters of SNPs that contribute to the trait variability [15].

Finally, we emphasize that most statistical epistasis detection

methods can be decomposed into a core component and a

multiple testing correction component. Keeping the core

component, but using a more refined multiple testing correction

can generally enhance its performance. For instance, assumptions

underlying the maxT procedure of [7] that is implemented in

MB-MDR are likely to be violated for MB-MDR1Dand MB-

MDRlist. Indeed, the null and the alternative hypotheses per pair

of SNPs under investigation are no longer the same for all

interaction tests.

In conclusion, rather than adjusting for lower-order effects prior

to MB-MDR and using residuals as the new trait, or adjusting only

for significant SNP(s), we advocate an ‘‘on-the-fly’’ main effects

adjustment (MB-MDRadjust). This type of adjustment only

removes potential main effects contributions in the pair under

investigation but keeps the null and alternative hypotheses similar

from one pair of SNPs to another. We have shown that the

commonly used additive coding in the ‘‘on-the-fly’’ adjustment

(MB-MDRadjust) is not sufficient and leads to overly optimistic

results and that co-dominant adjustments are to be preferred. This

will ensure an acceptable balance between type I error and power

to identify the interactions.

Realistic settings often involve both additive and dominance

genetic effects to the trait under investigation. Equivalent to our co-

dominant coding, a perhaps biologically more meaningful coding

involves introducing 2 variables X1and X2with values 21, 0, 1

and 21/2, 1/2, 21/2, respectively, for homogenous wild type,

heterozygote and homozygote mutant genotypes. In such a coding

scheme, both additive and dominant scales are represented. This 2-

parameter coding is statistically attractive since it is invariant to

allele coding (i.e. whether coding homogenous wild type as 1 or

homozygote mutant genotypes as 1 for X1) [16]. The utility of the

aforementioned coding as a way to adjust for lower-order effects in

Figure 4. Power to identify SNP1, SNP2, as significant for additive (A) and co-dominant (B) correction.

doi:10.1371/journal.pone.0029594.g004

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org8 January 2012 | Volume 7 | Issue 1 | e29594

Page 9

MB-MDR higher-order epistasis screening will be the subject of

future research.

Software

The MB-MDR software with the MB-MDRadjust option is

available upon request from the first author (jmahachie@ulg.ac.

be).

Supporting Information

Table S1

epistasis model M170. False positive percentage is defined as

the proportion of simulation samples for which at least one pair

other than the causal pair (SNP1, SNP2) are significant. Power is

defined as the proportion of simulated samples of which the causal

pair (SNP1, SNP2) is significant. Results are for correction of main

effects and for different ways of main effect correction. In bold are

values within Bradley’s liberal criterion of robustness.

(DOC)

MB-MDR power and false positives under the

Table S2

epistasis model M27. False positive percentage is defined as

the proportion of simulation samples for which at least one pair

other than the causal pair (SNP1, SNP2) are significant. Power is

defined as the proportion of simulated samples of which the causal

pair (SNP1, SNP2) is significant. Results are for correction of main

effects and for different ways of main effect correction. In bold are

values within Bradley’s liberal criterion of robustness.

(DOC)

MB-MDR power and false positives under the

Acknowledgments

We acknowledge the valuable discussions with Andreas Ziegler.

Author Contributions

Conceived and designed the experiments: JMMJ TC ESG KVS.

Performed the experiments: JMMJ KVS. Analyzed the data: JMMJ

KVS. Contributed reagents/materials/analysis tools: JMMJ TC ESG FVL

KVS. Wrote the paper: JMMJ TC ESG FVL KVS.

References

1. Calle ML, Urrea V, vellalta G, Malats N, Van Steen K (2008a) Model-Based

Multifactor Dimensionality Reduction for detecting interactions in high-

dimensional genomic data. (Department of Systems Biology, University of Vic,

Spain website Available: http://wwwrecercatnet/handle/2072/5001. Accessed

2011 Jun 30).

2. Calle ML, Urrea V, Vellalta G, Malats N, Van Steen K (2008b) Improving

strategies for detecting genetic patterns of disease susceptibility in association

studies. Stat Med 27: 6532–6546.

3. Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, et al.

(2011) Model-Based Multifactor Dimensionality Reduction for detecting

epistasis in case–control data in the presence of noise. Annals of Human

Genetics 75: 78–89.

4. Cattaert T, Urrea V, Naj AC, De Lobel L, De Wit V, et al. (2010) FAM-MDR:

a flexible family-base multifactor dimensionality reduction technique to detect

epistasis using related individuals. Public Library of Science ONE.

5. Mahachie John JM, Baurecht H, Rodriguez E, Naumann A, Wagenpfeil S, et al.

(2010) Analysis of the high affinity IgE receptor genes reveals epistatic effects of

FCER1A variants on eczema risk. Allergy 65: 875–882.

6. Mahachie John JM, Van Lishout F, Van Steen K (2011) Model-Based

Multifactor Dimensionality Reduction to detect epistasis for quantitative traits

in the presence of error-free and noisy data. Eur J Hum Genet 19: 696–703.

7. Westfall PH, Young SS (1993) Resampling-based multiple testing. New York:

Wiley.

8. Gauderman WJ, Thomas DC, Murcray CE, Conti D, Li D, et al. (2010)

Efficient Genome-Wide Association Testing of Gene-Environment Interaction

in Case-Parent Trios. American Journal of Epidemiology 172: 116–122.

9. Slager SL, Schaid DJ (2001) Case-Control Studies of Genetic Markers: Power

and Sample Size Approximations for Armitage’s Test for Trend. Human

Heredity 52: 149–153.

10. Balding DJ (2006) A tutorial on statistical methods for population association

studies. Nat Rev Genet 7: 781–791.

11. Hothorn LA, Hothorn T (2009) Order-restricted Scores Test for the Evaluation

of Population-based Case–control Studies when the Genetic Model is Unknown.

Biometrical Journal 51: 659–669.

12. Evans DM, Marchini J, Morris AP, Cardon LR (2006) Two-Stage Two-Locus

Models in Genome-Wide Association. PLoS Genet 2: e157.

13. Bradley JV (1978) Robustness? British Journal of Mathematical and Statistical

Psychology 31: 144–152.

14. Van Steen K (In Press) Travelling the world of gene–gene interactions. Briefings

in Bioinformatics.

15. Chanda P, Zhang A, Brazeau D, Sucheston L, Freudenheim JL, et al. (2007)

Information-Theoretic Metrics for Visualizing Gene-Environment Interactions.

American journal of human genetics 81: 939–963.

16. Ma S, Yang L, Romero R, Cui Y (2011) Varying coefficient model for gene-

environment interaction: a non-linear look. Bioinformatics.

MB-MDR with Lower-Order Effects Correction

PLoS ONE | www.plosone.org9 January 2012 | Volume 7 | Issue 1 | e29594

#### View other sources

#### Hide other sources

- Available from Elena Gusareva · Jun 6, 2014
- Available from dx.plos.org