VOL. 4 № 3 (14) 2012 | ActA nAturAe | 59
A Polygenic Approach to the Study
of Polygenic Diseases
D. Lvovs1*, O.O. Favorova2,3, A.V. Favorov1,4,5
1 Scientific Center of Russian Federation Research Institute for Genetics and Selection of Industrial
Microorganisms “Genetika”, 1-st Dorozny proezd, 1, Moscow, Russia, 113545
2 N.I. Pirogov Russian National Research Medical University, Ostrovityanova Str., 1, Moscow,
3 Russian Cardiology Research and Production Complex, 3-rd Cherepkovskaya Str., 15a, Moscow,
4 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Gubkin Str., 3,
Moscow, Russia, 117809
5 Oncology Biostatistics and Bioinformatics, Johns Hopkins School of Medicine, 550 North
Broadway, Baltimore, MD 21205, US
Copyright © 2012 Park-media, Ltd. This is an open access article distributed under the Creative Commons Attribution License,which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ABSTRACT Polygenic diseases are caused by the joint contribution of a number of independently acting or in-
teracting polymorphic genes; the individual contribution of each gene may be small or even unnoticeable. The
carriage of certain combinations of genes can determine the occurrence of clinically heterogeneous forms of the
disease and treatment efficacy. This review describes the approaches used in a polygenic analysis of data in medi-
cal genomics, in particular, pharmacogenomics, aimed at identifying the cumulative effect of genes. This effect
may result from the summation of gains of different genes or be caused by the epistatic interaction between the
genes. Both cases are undoubtedly of great interest in investigating the nature of polygenic diseases. The means
that allow one to discriminate between these two possibilities are discussed. The methods for searching for com-
binations of alleles of different genes associated with the polygenic phenotypic traits of the disease, as well as the
methods for presenting and validating the results, are described and compared. An attempt is made to evaluate
the applicability of the existing methods to an epistasis analysis. The results obtained by the authors using the
APSampler software are described and summarized.
KEYWORDS medical genomics; pharmacogenomics; polygenic analysis; epistasis
ABBREVIATIONS CDCV – common disease / common variant; CDRV – common disease / rare variant; CI – confi-
dence interval; CMC – combined multivariate and collapsing; FDR – false discovery rate; GWAS – genome-wide
association study; MCMC – Markov Chain Monte Carlo; MDR – multifactor dimensionality reduction; MS – mul-
tiple sclerosis; IS – ischemic stroke; OR – odds ratio; ORR – odds ratios ratio; RR – relative risk; SF – synergy
factor; TDT – transmission disequilibrium test.
the concepts of modern genetics subdivide hereditary
diseases into Mendelian and complex disorders. the
Mendelian disorders are determined by carriage of a
mutant variant of a single gene, whereas complex dis-
eases depend both on a genetic component determined
by the joint contribution of a large number of inde-
pendent or interacting polymorphic genes and on other
factors. Meanwhile, the individual contribution of each
gene to the development of a polygenic disease can be
small or modest. the carriage of certain allelic combi-
nations of genes can also determine the emergence of
clinically heterogeneous forms of diseases and the ther-
apeutic efficacy of certain pharmaceutical agents.
In humans, polygenic disorders occur much more
frequently than monogenic ones; they have a great
social and economic impact. However, their molecular
genetic nature has not been elucidated thus far. the
search for the genes that are involved in the develop-
ment of polygenic diseases is carried out with the use
of two major strategies, namely, establishing the role
of a certain candidate gene selected relying on the ten-
tative role of its protein product in the etiopathogen-
esis of the disease and whole genome sequencing using
the panel of genetic markers that are more or less uni-
formly distributed across the genome. the experimen-
tal approaches to determine the role of certain genes
or the function of particular genomic regions consist
60 | ActA nAturAe | VOL. 4 № 3 (14) 2012
in the analysis of their linkage or association with the
Linkage analysis is carried out in families with sev-
eral individuals affected; the role of the gene in the
formation of the susceptibility to the disease can be
considered to be confirmed if allelic variants that are
shared between the affected individuals are revealed.
Low sensitivity is a drawback of this method; there-
fore, methods with greater statistical power that are
based on the association analysis have recently taken
An association study is an attempt to find new sta-
tistical relationships between different events or verify
the already known ones. the actual causes of these re-
lationships are often beyond the knowledge or the ex-
perimental facilities of a researcher. However, once one
has collected the statistics of occurrence of combina-
tions of different observed outcomes, a conclusion can
be made regarding the significance (which is assessed
based on the probability of randomly obtaining the re-
sult observed) and intensity of these relationships. the
association between a certain polymorphic genome re-
gion and a phenotypic trait is analyzed by comparing
the distributions of its alleles and genotypes in the rep-
resentative samples of individuals, which are formed
with respect to the presence/absence of this trait and
need to match in terms of sex, age, and ethnicity. the
allelic variants under analysis can be localized in any
DnA region, including the coding sequences (exons),
introns, and promoter regions of the genes, where the
transcriptional regulatory regions are frequently lo-
cated, as well as the other DnA regions. In exon analy-
sis, not only the nonsynonymous substitutions deter-
mining the changes in the amino acid sequence of the
protein molecule being encoded are of interest, but also
the synonymous substitutions, since they can affect the
mrnA structure and stability, as well as the translation
kinetics due to the use of different isoacceptor trnAs..
However, it should be remembered that in addition to
the direct relation between the investigated locus and
the hereditary trait, the association may be based on
linkage disequilibrium between the marker locus and
the true locus of the disease, if these loci are located
sufficiently close to one another.
the aim of association studies is to link the pheno-
typic traits that are significant for medicine with such
characteristics as allelic variations in the genome, epi-
genetic modifications, effects of environmental factors,
lifestyle, etc. the phenotypic traits that are of signifi-
cance for personalized medicine typically include the
onset of a disease, its course (clinical presentation, ex-
tent of injury in the systems of the organism, etc.) or
the efficacy of therapy with a certain drug (the area of
interest of phamacogenomics). In this review, we will
focus on the association between the individual traits
and the carriage of allelic variants of the genome. Iden-
tification of these associations enables one to assess the
risk of disease development (susceptibility), predict the
character of its course, and give a preference to certain
methods of prevention, diagnosis and therapy based on
the features of the individual genome.
the analysis of the associations between polygenic
diseases and the combined occurrence of alleles of dif-
ferent genes remains a relatively poorly developed re-
search area. this can be mainly attributed to the fact
that any increase in the number of genes being ana-
lyzed results in an exponential growth in the number of
combinations of their allelic variants, which makes any
analysis using conventional exhaustive search tech-
niques almost infeasible.
the present review is devoted to bioinformatic
methods that search for such allelic combinations of
different genes that are associated with the phenotyp-
ic traits of a polygenic disease, as well as to the meth-
ods for presenting and validating the results obtained.
these methods (for the sake of brevity, they will be re-
ferred to as the polygenic analysis methods) are used to
understand the cumulative effect of the genes and the
nature of this effect. the association with the combina-
tion may be caused by the interplay of the phenotypic
effects of the alleles on the phenotype; i.e., by nonlinear
(epistatic) interaction between the genes. Alternatively,
an allelic combination with a significant impact on the
development of the trait can occur due to the summa-
tion of small independent subthreshold contributions of
the alleles composing the combination. Both these cases
will be discussed in the review.
the two major types of association studies (namely,
cohort studies and case–control studies) differ in
terms of the time sequences in which data is collected;
therefore, they also differ in terms of the parameters
that can be assessed based on monitoring. In cohort
studies, a selected group of individuals is divided into
two subgroups; individuals who have and those who
do not have a certain indicator trait (e.g., subgroups of
carriers and noncarriers of a certain genotype; smok-
er and nonsmoker subgroups). these subgroups are
monitored during a certain time interval for the de-
velopment of a trait that is of interest in terms of its
prediction (the target trait); e.g., a disorder. this ap-
proach enables one to numerically assess the intensity
of the contribution of an indicator trait to the devel-
opment of the target trait via the ratio of probabilities
of disease occurrence in the carriers and noncarriers
of an indicator trait. this value is assessed using the
relative risk (rr).
VOL. 4 № 3 (14) 2012 | ActA nAturAe | 61
the case–control studies are a more common type
of association studies. the sample here is divided into
two subgroups: the individuals who possess and those
who do not possess a target trait at an instance of study
(e.g., affected and healthy individuals). the presence of
indicator traits that possibly affect the emergence of
the disease is assessed in each group. nothing is known
about the individuals who died before the launch of the
study, thus the higher the disease mortality, the less
accurate the estimation of the level of association in
terms of rr. the odds ratio (Or) is typically used as a
criterion for the degree of difference between the carri-
ers and noncarriers of an indicator trait in case–control
studies . If absolute risk of the disease in noncarriers
is low, the Or and rr values are close. the higher the
risk, the larger the difference between Or and rr. Or
is always higher compared to rr.
the results obtained using the case–control method
can be distorted because of the ethnic heterogeneity
of the groups being compared or due to the environ-
mental factors that have not been taken into account
. the family-based methods (e.g., comparison of the
affected and healthy brothers and sisters ) are less
susceptible to distortion. However, there are require-
ments for the input data (pairs of affected and healthy
immediate relatives, preferably siblings, are needed)
that limit their applicability for obtaining reliable de-
pendences. the transmission disequilibrium test (tDt)
 imposes less strict requirements on the input sam-
ple. tDt is based on the analysis of the transfer of a
marker allele from heterozygous healthy parents to an
affected child. the data obtained are compared with
the ones expected upon Mendelian inheritance; in the
case of disequilibrium of the transfer of an allele, as-
sociation between the allele and the disease is inferred.
the AFBAc (affected family-based control) is another
family-based method of association analysis in which
the control group consists of a combination of the alleles
of healthy parents that have not been inherited by the
affected child (one allele from each parent) .
In the association analysis, both the predicted (de-
pendent) and predicting (independent) traits are the
categories that divide the sample into two classes (e.g.,
“affected” and “healthy” or “carrier” and “noncarri-
er”). It is convenient to present the intersections of the
classes as a 2×2 table (contingency table). Its values are
used to characterize the strength of association (Or)
and its significance (p-value). the p-value is calculated
using the Fisher’s exact test that was proposed in 1922
and is still widely applicable .
If a trait is represented by more than two classes that
can be ranked (e.g., using the disease severity scale as-
signed by the medical community), 2n-field contingency
tables (where n is the number of gradations of a trait) are
compiled; the Goodman-Kruskal gamma test is used to
assess the strength and significance level of an associa-
tion . If ranking makes no sense, either the Freeman–
Halton test that extends the Fisher’s test to more than
two categories  or the χ2 test  can be used.
METHODS FOR POLYGENIC ANALYSIS
All the approaches to multivariate analysis and to poly-
genic association studies in particular can be divided
into two fundamentally different types: 1) the use of a
reduced amount of input variables based on some a pri-
ori data and 2) complete analysis of all available vari-
ables. the reduction of the amount of possible variables
in polygenic studies involves selection of several candi-
date genes to carry out the association analysis . this
approach allows one to considerably reduce genotyping
costs and the space of analysis, thus reducing its com-
plexity and the time required for computations. On the
other hand, if a gene effect manifests itself only in com-
bination with other genes and is not observed upon its
individual consideration (i.e., there is no marginal effect
, ), the probability that this gene will be selected
as a candidate gene is extremely low, although its role
may be significant. Genome-wide association studies
(GWAS) [13–16] are currently gaining popularity due to
the development of both computation and genotyping
technologies. GWAS belongs to the second type of poly-
genic analyses, i.e., the analysis of all available variables.
When analyzing genome-wide data, one inevitably
encounters many extremely rare alleles. Individual
consideration of these alleles does not allow one to ar-
rive at a conclusion regarding the impact of each allele
on the disease. However, when considering the effect
of several alleles altogether, the observed data can be
sufficient to validate the assumption that they have a
combined effect. In other words, data on each of the
rare alleles is insufficient; however, that data should
not be neglected, since association can be reliably es-
tablished when data on several rare alleles is accu-
mulated. this effect is known as the additive effect;
it can also be observed for objects other than rare al-
leles. However, in the case of rare alleles, additive ef-
fect detection is one of the most promising methods for
an association study. correspondingly, the theory at-
tributing the emergence of a large number of common
diseases to the carriage of rare alleles is named cDrV
(common disease / rare variant). this theory, which is
currently gaining common acceptance, is an alterna-
tive to the cDcV (common disease / common variant)
theory. A set of methods have been specially designed
for the assessment of the additive contribution of rare
alleles, e.g., the combined multivariate and collapsing
(cMc) method , weighted sum statistics , and
the gene burden test .
62 | ActA nAturAe | VOL. 4 № 3 (14) 2012
the problem of correcting for multiple hypothe-
sis testing becomes especially urgent upon polygenic
analysis. this problem can be briefly formulated in
the following way: an increasing number of tested hy-
potheses results in an increase in the probability of a
random (including unlikely) outcome, which reduces
the significance of the postulate that the statistical re-
lationships observed represent specific non-random
If a number of comparisons used for studying the
association of a phenotypic trait with several alleles of
one highly polymorphic gene or upon simultaneous as-
sessment of the role of several biallelic candidate genes
is small (although not equal to 1), such an increase in
significance is taken into account using the Bonferroni
correction , which simply multiplies the correspond-
ing p-values by the number of tests carried out. How-
ever, the Bonferroni correction turns out to be too con-
servative because of the underlying assumption that
the tests are independent. A more accurate correction
can be obtained using the Westfall–Young method ,
which does not imply independency and compares the
best observation with the best results for the permuted
samples. Another approach to this problem consists in
assessing the false discovery rate (FDr) instead of the
family-wise error rate (FWer) [24, 25].
Gene-gene interaction (epistasis) has recently turned
into a widely discussed theme. this interest is to a sig-
nificant extent due to the poor reproducibility of the
results in assessment of the role of individual genes in
the formation of susceptibility to polygenic diseases; in
particular, in GWAS. there is a certain ambiguity in
the terms “epistasis” and “epistatic interaction.” they
were originally used to denote complete masking of the
effect of a polymorphism in one locus by the polymor-
phism of another locus; later, it was extended to refer to
any other type of influence that certain polymorphisms
have on the manifestation of other polymorphisms in
the phenotype. the differences in interpretations of the
term “epistasis,” as well as the problems arising due to
these discrepancies, have been thoroughly described
in [26, 27].
the results of an analysis of the contribution of car-
riage of the HLA class II allele DRB1*04 (A), an allele
with a 32 nucleotide deletion in the chemokine receptor
gene CCR5 (CCR5*d32) (B), and their combination (C)
to the development of multiple sclerosis (MS), a typi-
cal polygenic disease, is shown in Fig. 1 in the form of
visualized 2x2 contingency tables (the experimental
data were taken from ). In all the cases, MS patients
and healthy individuals were divided into two classes
based on carriage/noncarriage of the allele (homo- and
heterozygotes with respect to this allele were not dis-
tinguished). the polymorphism of the CCR5 gene was
indeed biallelic (the deletion allele and the wild-type
allele), whereas 18 groups of alleles of the DRB1 gene
were analyzed for this highly polymorphic gene. the
group of noncarriers of the DRB1*04 allele was made up
of the carriers of the remaining alleles of this gene. It is
clear from Fig. 1C that the carriage of the combination
of DRB1*04 and ССR5*d32 is associated with the dis-
ease to a higher extent than might be expected based on
the additive contribution of the constituting alleles. this
fact can be construed as resulting from the epistatic in-
teraction between the genes under consideration. this
example is an illustration of the simplest type of poly-
genic analyses, when only the joint contribution of two
alleles to phenotype formation is taken into account.
We have proposed the use of the odds ratios ratio
(Orr) as a numeric measure of epistasis . It is based
on the concept that if at least two alleles within a com-
bination do not interact with each other, the Or value
for carriers of this combination will be made up of the
product of the Ors of individual alleles within the com-
bination. the product is regarded as the expected Or
and compared with the observed Or. the more this ra-
tio differs from unity, the stronger the predicted epi-
static interaction between the genes.
the Orr value  can be used to analyze the in-
teraction between two or more alleles. However, the
А B C
+ – + – + –
controls MS patients
Fig. 1. Visualization of 2x2 tables of carriage by the MS
patients and control group individuals of: alleles of the
major histocompatibility complex HLA-DRB1 (A), the
chemokine receptor CCR5 (B), and their combination (C)
(based on data from  for ethnic Russians). Red areas
correspond to the case; blue – to the control group. The
ratio of the vertical fields reflects the distribution of carriers
(+) and noncarriers (-) of DRB1*04 (A), CCR5*d32 (B)
and a combination of DRB1*04 and CCR5*d32 (C). The
horizontal dashed line in (C) corresponds to the expected
ratio of the number of patients and controls among the
carriers of the allele combination calculated under the as-
sumption that the allele effects are independent.
VOL. 4 № 3 (14) 2012 | ActA nAturAe | 63
absence of a method to assess the confidence interval
(cI) is a significant drawback here. the Synergy Fac-
tor (SF), a measure of epistasis described in , has
contrasting advantages and drawbacks. the method
for cI calculation has been designed for it; however,
this value can be used for the analysis of the interaction
between two alleles (or other binary indicator traits).
Both values are the ratios between the Or observed
for the allelic combination and the product of the Or
observed individually for its components; however, the
Or values are calculated using different methods. Orr
compares the number of carriers and noncarriers of the
indicator trait (whether this is an allelic combination or
an individual allele) in patients and the control group,
as is shown in Fig. 1. In the case of SF, the carriers of an
allelic pair are compared with the carriers of neither al-
lele, as well as the carriers of each allele constituting the
combination that are also noncarriers of another allele.
Identically to the situation with Orr, SF > 1 attests to
a positive (mutually enhancing) interaction, whereas
SF < 1 attests to a negative (compensatory) interaction.
the SF value can actually be determined for more than
two alleles; however, the result will depend on their
order of combination to form complex traits. thus, it is
reasonable to use both of these assessments.
the available tools for an analysis of the cumulative
effect of several genetic variables use various algo-
rithms for data mining and are discussed below.
the conventional logistic regression, in which the co-
efficients of model terms at the second order and high-
er correspond to the interaction, is the most popular
method . Iterative simulation is required to use this
method to search for the most closely interacting allelic
combination, which weakens the statistical power of the
method. the two-step variant implemented in GenA-
BeL [31, 32] allows one to solve the problem of iterative
testing by using the data on dispersion in individual loci
to select the ones with a higher interaction probability.
Various heuristic approaches, such as genetic program-
ming , neural networks , pattern mining ,
dimensionality reduction techniques , and Markov
chain Monte carlo (McMc) methods (which include
APSampler [37, 38], BeAM [39, 40], and logic regression
Logicreg [41–43]) are used.
the association between carriage of any combination
of alleles (or another indicator trait) with the pheno-
type can be assessed in the same manner as is done for
one allele (trait). In other words, each combination can
be regarded as a compound trait and can be charac-
terized by the significance level of association and the
rr or Or values. numerous combinations are possible;
therefore, the task of searching for the combinations
characterized by the most significant association moves
to the fore.
the multifactor association analysis can also be car-
ried out using family-based data. there are multiallele
and multilocus versions of tDt  (the method that is
based on Mcnemar’s test and was originally designed
for biallelic single loci). Methods extending tDt to sev-
eral allelic variants have been proposed by a number of
authors. these methods include calculating the mar-
ginal homogeneity ; iterative grouping of alleles
into two groups: the “allele under study” and “the re-
maining alleles,” followed by the Mcnemar’s test 
and multiple testing correction; and calculation of the
disequilibrium in the allele transfer using logistic re-
gression , which is best suited for highly polymor-
phic loci. When carrying out the analysis simultane-
ously at several loci, methods involving the comparison
of the actual child’s genotype with all the theoretical
genotypes that are possible for his parents are used [45,
47, 48]. Linkage disequilibrium between the loci under
analysis is either calculated from the sample or taken
from known data (e.g., from HAPMAP  in the FAM-
HAP [48, 50] software).
Some commonly used tools for polygenic associa-
tion analysis are thoroughly discussed and compared
PLInK freeware was developed at Harvard university
[30, 51]; it is a large interrelated collection of various
algorithms for the analysis of genotypic and phenotyp-
ic data, including the methods for polygenic analysis.
PLInK has been used in a number of studies focused
on genetic interaction (e.g., [52–55]).
One of the methods for the analysis of gene inter-
action in PLInK is based on the consideration of re-
gression models . the logistic regression model as-
suming that the probability of an event (in our case,
disease) is described as a logistic function of a linear
combination of independent variables (predictors) is
used upon a binary outcome (e.g., “healthy–affect-
ed”) . the common linear regression of the same
predictors is used for quantitative phenotypes (such
as three degrees of arterial hypertension). In this
case, independent variables are indicator functions
that can assume either a 1 or 0 value, depending on
whether a certain allele or genotype is present in the
genome (or upon the presence of any other indicator
trait). the analysis yields a set of regression coeffi-
cients for the indicator functions of the alleles and
their combinations, and the levels of the statistical
significance of the values by which these coefficients
differ from zero. High significance of the difference
of the coefficient corresponding to a certain combi-
nation of alleles from zero attests to their association.
that is how the “PLInK -epistasis” test proceeds.
64 | ActA nAturAe | VOL. 4 № 3 (14) 2012
the “PLInK -case-only” is a simpler test on interac-
tion; it verifies the correlation between carriage of sev-
eral genotypes by patients. If the correlation between a
genotype pair is high and their linkage can be excluded
from consideration, it means that they interact. this
test is based on an a priori assumption that the revealed
correlation is typical only of the affected individuals.
the two-step procedure verifying the presence of the
correlation in the total sample does not include this
assumption; however, the results provided by it may
still be biased . the key advantages of the PLInK
software include its applicability for GWAS and a wide
set of analysis tools, whereas its drawback consists in
the limitations on the data format, since only biallelic
markers can be used for work using this software.
the multifactor dimensionality reduction (MDr) algo-
rithm has been widely used for mining polygenic as-
sociations in case-control studies [59–62].
At the first step, all data is randomly divided into
two sets: the training set (e.g., 9/10 of the data) and the
testing set (e.g., 1/10 of the data). A parameter char-
acterizing the ratio between the number of affected
and healthy individuals carrying the combination of
alleles and genotypes is determined for each combina-
tion. the combinations are classified into categories
(e.g., low-risk and high-risk combinations) based on the
value of this parameter. thus, a transition is made from
the n-dimensional space of all single polymorphic loci
and phenotypes to a two-dimensional space, where the
risk level is one dimension, and the carriage of a cer-
tain allelic combination is another dimension. Among
all combinations, there will be one having the lowest
classification error in the training (training Accuracy)
and testing (testing Accuracy) sets. the division into
groups is repeated 10 times with the parameters of the
random number generator varied. the cross-validation
consistency is defined as the number of cross-validation
replicates out of 10 in which that same model was cho-
sen as the best model. the model is considered to be
valid if its cross-validation consistency is at least 9/10.
In addition to a text representation of the results, the
MDr software package includes dendrograms showing
a pairwise interaction analysis, where the type of inter-
locus interaction is shown with different colors (from
epistasis to independence); the bond length shows the
the exhaustive search of all combinations (e.g., that
used in MDr by default) loses its efficiency when the
number of alleles under analysis increases because of
the large number of possible combinations. the so-
called combinatorial explosion occurs. Moreover, the
statistical significance of the combinations obtained us-
ing this procedure becomes less obvious due to the mul-
tiple testing problem. On the other hand, simple gradi-
ent (“greedy”) methods, which refine the intermediate
result in a stepwise manner, frequently yield no ad-
equate results at all, since they are prone to trapping in
local optima rather than reaching the global ones.
there are different heuristic methods enabling one
to mine the global optimum without using the exhaus-
tive search procedure. the Markov chain Monte carlo
(McMc) algorithm is one such method [38, 40, 63–65].
the main idea in this method is that, as with the
gradient search, it strives for a better solution than
the already existing one. However, unlike the gradi-
ent search, it can also proceed to a worse solution with
some probability; this probability decreases as the fit to
the data for the proposed solution becomes poorer.
In the search for associations, the BeAM (Bayesian
epistasis Association Mapping) algorithm [40, 66] is
based on the fact that the distribution of genotypes in
patients with disease-associated loci differs from that
in the control group. the algorithm is aimed at classify-
ing all the loci into loci that are not associated with the
disease, loci individually associated with the disease,
and the associated and epistatically interacting loci. the
software uses the McMc method to find the partition
of the loci set into these three classes, which is the most
probable one for the given genotypes and disease de-
grees. the loci are regarded as epistatically interacting
if the joint distribution of their alleles/genotypes fits
the data better than the distribution derived from the
independent model (product of the allelic/genotypic
distributions). BeAM can account for haplotype data in
order to differentiate them from epistatic interactions.
the Logic regression algorithm uses McMc to opti-
mize the models of regression search for polygenic as-
sociations [43, 65]. the name of the method refers to
the well-known logistic regression that solves a similar
problem in a different way. the indicator functions of
logic combinations (logic functions) of the presence of
different alleles are used as predictors of logic regres-
sion; the combination of the optimal functions is deter-
mined using McMc. the logic functions obtained show
the type of allelic interaction.
the logic of the analysis of polygenic data using the
APSampler software  differs considerably from
the previously described software packages, where
VOL. 4 № 3 (14) 2012 | ActA nAturAe | 65
the predicted phenotypic trait can possess only two
values (e.g., “affected” and “healthy”). the use of the
nonparametric Wilcoxon test in APSampler software
permits the analysis of data with more than two val-
ues of the target trait if ranking of the outcome is pos-
sible, allowing the use of a number of internationally
recognized scales to define the groups for analysis. For
example, in the case of stroke, such scales could be the
degree of depression of consciousness, the initial sever-
ity of the disease, stroke outcome, which all have their
own values and the number of values of at least three
levels. the genetic pattern (i.e., the combinations of al-
leles and genotypes of different loci associated with a
phenotypic trait) is the major object in the APSampler
software for predicting an indicator trait. the pattern
search is carried out using McMc; several patterns be-
ing considered at each step simultaneously. the set of
patterns is optimized from step to step in terms of the
probability of all the patterns within the set being in-
dependently and simultaneously associated with a trait.
the nonparametric Wilcoxon test is used to assess the
probability of association of each pattern; the subsets
being compared differ in carriage of only one pattern
within the set. the algorithm includes two steps. the
first step yields a list of patterns that have been en-
countered during the McMc search and validates the
findings by determining the significance of association
for each pattern from the list using the Fisher’s test (in
the case of a dichotomous outcome) or the Goodman’s
and Kruskal’s test , if there are more than two cate-
gories. At the second step, the software then repeatedly
mixes the labels of the phenotypic trait and runs the
search for associated patterns again. the reliabilities
of association based on the results of these permuted
runs provide the distribution of the reliabilities of the
findings on the assumption of the null hypothesis of no
association. this null distribution is used to validate the
combinations obtained in the first step.
the Table summarizes data pertaining to the func-
tional possibilities of the described software for poly-
genic association analysis. the data presented in the
Table attest to the fact that the software proposed for
polygenic analysis have considerably different func-
tions. the software being compared can be used in dif-
ferent instances, depending on the available genetic
and phenotypic data, on the content and format of the
desired results, as well as the ability of a user to run
the software at the level of the command line. One also
needs to make allowance for the fact that the target
result notably differs for different programs.
MDr is very convenient due to the presence of a user
interface and graphical visualization of the results, in-
cluding epistasis. It provides the obtained phenotype-
associated loci and their combinations, whereas AP-
Sampler takes into account the direction of association,
which is determined by carriage of alleles of the loci
and their combinations. Both APSampler and MDr op-
erate with polyvariant input traits, whereas the rest
of the programs operate only with binary indicator
traits. these two algorithms are also similar in the fact
that they allow one to analyze epistatic interaction af-
ter association has been determined, whereas BeAM
a priori divides all alleles into three groups: the ones
with the marginal effect, the ones with epistasis, and
those with no effects. the characteristics of combina-
tions of loci, which are given by MDr, are statistically
reasonable. However, their correlation with the asso-
ciation strength is not obvious. Logicreg provides no
conventionally interpretable association values at all.
APSampler and BeAM solve this problem by perform-
ing the Fisher’s exact test for the association between
the resulting indicator traits and the phenotype. In
general, BeAM, PLInK, MDr, and Logicreg can be
applied in basic research, including in studies devoted
to gene interaction or for operation within a larger in-
tegrated software environment. However, they a priori
do not have the necessary set of functions to solve such
applied medical and genetic tasks as searching for the
markers of susceptibility or searching for pharmaco-
genetic markers, for which the APSampler software
can be used.
these five programs were used with the data taken
from  in the user mode (i.e., with all default settings).
BeAM found no associations with p < 0.05; Logicreg
required additional data processing. the results of us-
ing APSampler, MDr, and PLInK are given in Fig. 2.
It is clear that APSampler found both the combinations
found by MDr and those found by PLInK; moreover,
all the validated findings of APSampler have also been
validated by at least one of these programs.
STUDIES PERFORMED USING APSAMPLER
A large number of studies using the APSampler soft-
ware have been carried out since the first publication
; the authors participated in most studies due to
the fact that at the initial stages of development, the
software was relatively difficult to operate. this fact
allowed them to upgrade the software according to
the users’ requests, supplement it with new features
broadening the potential of validation , data man-
agement, visualization of the results and the help files
elucidating the use and structure of the APSampler
software. At the moment of writing, the software is
open-source and can be used free of charge .
the authors used the APSampler software to ana-
lyze the cumulative effect of the alleles of a number of
candidate genes on the development of multiple scle-
rosis (MS) , different forms of arterial hypertension
66 | ActA nAturAe | VOL. 4 № 3 (14) 2012
[69–71], myocardial infarction , ischemic stroke (IS)
[73, 74], and hemorrhagic stroke . the studies were
carried out in compliance with the principle of ethnic
homogeneity in russians or Yakuts. the Yakut popula-
tion is of particular interest in terms of ethnogenomics,
since the founder effect and a certain geographic and
cultural isolation are observed in it . APSampler
was also used in pharmacogenetic studies of MS for the
investigation of the relationship between the genetic
status in patients and the efficacy of treatment with
immunomodulatory drugs, interferon beta (in Irish pa-
tients, ) and glatiramer acetate (in russian patients,
In most of the aforementioned studies, the group
of nonrelative patients was compared pairwise with
the control group of unaffected nonrelative individu-
als, which was similar to the affected sample in terms
of their ethnicity, sex ratio, and the mean age. two
groups of patients with clinically heterogeneous forms
of the same diseases (e.g., arterial hypertension with
Table. Brief comparison of the potential of different software for polygenic association analysis
Graphical user interface--1
Quantitative rank phenotype+-3
Working with missing data+++-4
Statistical mining of combinations of particular
alleles associated with phenotype
Assessment of the association for the established
combinations using the Fisher’s exact test
Graphical representation of epistasis-9
Possibility of carrying out the association analysis for the
allelic combination specified by the user
Possibility of using the command line to run software
(e.g., on a server).
Available for unIX+++++
Available for Windows+++++
1 There is a version of the BEAM software integrated into the GALAXY server application .
2 The algorithm has been used in the software environment for statistical computing and graphics R .
3 The software automatically divides the data into two categories using the mean value.
4 The authors propose specialized software, MDR Data Tool , for filling in the missing values.
5 The software finds the interacting and phenotype-associated loci rather than their alleles.
6 Only pairwise mining is available.
7 The number of alleles in each locus has to be equal.
8 Despite the fact that mining of epistatically interacting alleles has not been claimed to be a specific function of the AP-
Sampler software, the experience of practical use of the software attests to the possibility of using it for mining epistasis.
9 Perl software for graphical representation of epistasis has been designed .
10 The haplotype-association analysis is proposed.
11 A specialized software has been provided for this purpose .
12 Specialized software PBEAM for parallel computing .
VOL. 4 № 3 (14) 2012 | ActA nAturAe | 67
and without hyperaldosteronism ) were compared
in some cases. When studying the genetic susceptibil-
ity to arterial hypertension preceding the develop-
ment of IS, the patients were at first divided into two
subgroups in accordance with the hypertension level.
the 2×4 contingency table was subsequently used to
find such allelic combinations among identified ones
carriage of which is characterized by monotonous in-
crease from normotonics to third degree hypertensive
patients . In pharmacogenetic studies, the patients
responding and not responding to treatment were also
compared pairwise using the “comparison of extremes”
the candidate genes were selected based on the
existing conceptions of participation of their protein
products in the processes involved in the disease patho-
genesis. When analyzing the genetic susceptibility to
cardiovascular diseases, the following genes were se-
lected: the ones whose protein products participate in
inflammation, the genes of hemostasis, transport, and
lipid metabolism systems, the genes of the renin-an-
giotensin-aldosterone system, and some other genes.
For MS, the candidate gene products participate in
the development of the immune response and chronic
inflammatory process. Polymorphic regions (mostly,
single nucleotide polymorphisms, or SnPs, being of
interest in terms of their function; i.e., knowingly af-
fecting the amount or property of the encoded protein
product) were usually typed in these genes. the joint
contributions of 10 or more polymorphic markers have
been analyzed in relatively small samples consisting
of no more than 500 individuals. Although this sam-
ple size, which is typical of russian studies, cannot be
compared to the size of the groups formed by the inter-
national consortiums, we have found highly significant
associations between allelic/genotype combinations
and the phenotype under study using the APSampler
software. this can be illustrated by the data pertain-
ing to an association between the triallelic combination
FGB*–249c + APOE*ε4 + CMA*–1903A and the level
of arterial hypertension preceding the development
of IS in the Yakut population (Fig. 3). A monotonous
rise in the carriage frequency of the named triallelic
combination was observed in a sample consisting of
115 patients: from 0 in normotonics to 47% of the to-
tal number of individuals in the subgroup of third de-
gree hypertensive patients; the p-value assessed based
on the Fisher’s test in the 2x4 contingency table was
0.0003. In this case, a vivid example of the effect of the
joint contribution of the genes encoding components of
three different key systems of the homeostasis, namely,
hemostasis system (FGB), the lipid metabolism system
(APOE), and the renin-angiotensin-aldosterone system
(CMA), to the development of arterial hypertension
has been observed. thus, the disease is most likely to
emerge with the summation of the independent contri-
butions of individual genes.
the reason for such a high information value upon a
rather modest amount of experimental data can be at-
tributed to the advantages provided by the ethnic and
clinical homogeneity of the groups used in the analysis,
whereas groups consisting of tens of thousands of pa-
tients from different countries and patient care institu-
tions, which are formed within the framework of con-
sortiums, usually fulfil the homogeneity requirements
in terms of neither ethnicity nor clinical presentation.
this fact may smooth their genetic deviations from
the control group. However, the major reason for the
Fig. 2 Search for biallelic
combinations of immune
response genes associated
with the response to treat-
ment of MS with glatiramer
acetate (based on data from
 for the ethnic Russians)
with APSampler, MDR, and
PLINK. APSampler  finds
all biallelic markers found
by the other software as
well as identifies additional
combinations. The red color
marks the findings that have
been validated by permuta-
tions in APSampler (p < 0.1)
or MDR cross-validation
(CVC > 8/10).
68 | ActA nAturAe | VOL. 4 № 3 (14) 2012
0 1 2 3
Fig. 3. Carriage of the triallelic combination FGB*–249C +
APOE*ε4 + CMA*–1903A, which was found with AP-
Sampler in Yakut ischemic stroke patients with different
levels of the preceding blood pressure . 0 – normot-
onics, 1–3 – first, second and third degree hypertensive
patients, respectively, according to the criteria of 2003
ESH/ESC . Carriage is represented as a percentage
of the total number of patients in the subgroup.
high information value of the results obtained using the
APSampler software can presumably be attributed to
the high statistical power of the analysis. Without go-
ing into the details of what underlies this phenomenon,
one can summarize by saying that identification of the
association of the alleles/genotypes of individual genes
during the analysis of any studied disease was a rare
event, whereas phenotype-associated combinations of
two-four alleles were found in almost all cases. It is ap-
propriate to note here that both positive and negative
associations could be observed; a differently directed
effect of the alternative alleles has been successfully
revealed for most, but not all, cases.
the association of MS with the DRB1*15 allele of the
major histocompatibility complex [78, 79], with the mi-
crosatellite marker tnFa9  and with the biallelic
combination of DRB1*04 and CCR5*d32  (see Fig. 1)
in the russian population was previously demonstrated
without the use of the APSampler algorithm and re-
produced in an independent sample using the APSam-
pler algorithm . the replication of the data pertain-
ing to the association of these genetic factors with the
development of MS complies with the criteria widely
accepted across the world’s scientific community for
validation of the results and attests to the software’s
Based on the aforementioned observations, the con-
cept of the minimal set (combination) of alleles as a ge-
netic risk factor that is revealed in a certain study has
been formulated . this means that any subset of this
set is characterized by a lower significance of associa-
tion. thus, we have identified  two MS-associated
triallelic combinations comprising alleles of the poly-
morphic regions of the DRB1, TGFB1, CTLA4, and TNF
genes. the differences between the groups of affected
and healthy individuals in the carriage frequencies of
the biallelic combinations and of the individual alleles
within the triallelic combinations 1 and 2 did not reach
the significance level (p < 0.01). It is important to note
that the subgroups of individuals carrying the MS-pre-
disposing combinations 1 and 2 did not overlap and cor-
responded to approximately 5 and 9% of MS patients,
respectively, whereas they were not present in the
control group. thus, identically to the case of classical
monogenic dominant disorders, all the carriers of either
combination in our sample turned out to be affected.
Identical results were obtained in our other studies. In
either case, the minimum set of alleles is a compound
genetic marker of the polygenic disease or of another
We attempted to solve the question pertaining to
the type of interaction between the alleles within the
gene combination (epistatic or additive) in a pharmo-
cogenetic study where the association between the
efficacy of treatment of MS patients with the im-
modulatory drug glatiramer acetate and the allelic
polymorphisms in a number of the immune response
genes was analyzed . the carriage of allelic com-
binations of four genes (DRB1*15 + TGFB1*–509t +
CCR5*d + IFNAR1*16725G) exhibited a 14-fold in-
crease in the risk of ineffective response to glatiram-
er acetate therapy (Or = 0.072 [cI = 0.02–0.28]; р =
0.00018); the association withstood permutation test-
ing (рperm = 0.0056), which had been included into the
software by the time the study was conducted. the
triallelic combination (DRB1*15 + CCR5*d + TGFB1*–
509t) differed negligibly from the tetra-allelic combi-
nation as a marker of treatment inefficacy, whereas
the association between all the other components of
the tetra-allelic combination and treatment ineffica-
cy was considerably weaker. Graphical visualization
(the Venn diagram) of the character of the interac-
tion between different components of the “unfavora-
ble” allelic combination (DRB1*15 + TGFB1*–509t +
VOL. 4 № 3 (14) 2012 | ActA nAturAe | 69
CCR5*d + IFNAR1*16725G) is given in Fig. 4. For the
triallelic combination (DRB1*15 + TGFB1*–509t +
CCR5*d) Orr was 0.2 (i.e. it was fivefold lower than
1) and remained unchanged after the addition of the
IFNAR1*16725G allele. We regard these data as evi-
dence of the epistatic interaction between the alleles
of the DRB1, CCR5, and TGFB1 genes.
unexpected data on epistatic interactions upon for-
mation of genetic susceptibility to IS in the russian
population were obtained in . the analysis using the
APSampler algorithm has revealed the protective bial-
lelic combinations (IL6*–174c/c + FGA*4266A) and
(IL6*–174c/c + FGB*–249С), which were associated
with IS slightly more significantly than the protective
genotype IL6*−174c/c by itself and had practically the
same Or value (0.32–0.35). each of the alleles within
these combinations (FGA*4266A or FGB*–249С) upon
joint carriage of the IL6 G allele, which is the alterna-
tive to the IL6*–174c/c genotype, “neutralized” its
significance as a risk allele by reducing both the signifi-
cance levels and the Or values (from 2.9 to 1.9–2.1). In
other words, association between IS and combinations
of the alleles/genotypes of IL6, FGA and FGB has been
observed; IL6 played a key role, whereas the FGA and
FGB genes had a modulating function. this observation
presumably attests to the fact that the FGA and FGB
genes contain interleukin-6-sensitive elements, which
are capable of binding to StAt3 (the major transcrip-
tion factor transmitting signals from the interleukin-6
receptor to the nucleus) .
Searching for polygenic combinations associated with a
phenotypic trait (i.e., composite genetic markers) is an
adequate analysis tool for studying polygenic diseases.
the statistical methods enabling this type of analysis
are currently rapidly being developed.
In accordance with all the aforementioned facts,
composite genetic markers can result from epistatic
interaction between components or be of additive na-
ture. taking into account the complexity of various cu-
mulative effects and their direction, one can claim that
identification of a reliable composite marker (even if
it carries a small number of components) is an impor-
tant step in understanding the etiopathogenesis of the
disease. Indeed, such a marker may attest to the key
link in a complex regulatory network of interactions
between biological macromolecules.
The authors are grateful to O.G. Kulakova and E.Yu.
Tsareva (N.I. Pirogov Russian National Research
Medical University, Moscow), M.F. Ochs and I.
Ruczinski (Johns Hopkins University, Baltimore, MD)
for helpful comments and advice.
This work was supported by the Russian Foundation
for Basic Research (projects № 11-04-01644a and
11-04-02016a), the Scientific-Technical Program of
the Government of Moscow ((№ 8/3-280n-10), grant
issued by Johns Hopkins University Framework for the
Future, grant issued by the Commonwealth Foundation
and the SKCCC Center for Personalized Cancer
Medicine, and the European Community’s Seventh
Framework Programme [FP7/2007-2013] № 212877
0.2 1 1.2
Fig. 4. Venn diagram describing the possible interaction
between the components of the DRB1*15 + TGFB1*–
509T + CCR5*d + IFNAR1*16725G combination, which
is negatively associated with the efficiency of the treat-
ment of MS with glatiramer acetate, as identified using the
APSampler software . Each of the four ellipses in the
diagram corresponds to one of the four alleles in this com-
bination. The intersections of the ellipses correspond to
all possible combinations of the four alleles, color intensity
reflects the ratio of the observed OR to the expected OR
(ORR), in accordance with the gradient scale provided
below. The gray areas corresponding to individual alleles,
as well as the small circles, correspond to the reference
ORR, which is equal to 1. The more the color of an area
differs from gray, the stronger the epistatic interaction
of the alleles represented by the area. The values of the
expected OR are calculated for each combination as a
product of the ORs of the individual alleles corresponding
to the overlapping areas.
70 | ActA nAturAe | VOL. 4 № 3 (14) 2012
1. Bland J.M., Altman D.G. // BMJ. 2000. V. 320. № 7247.
2. Hattersley A.t., Mccarthy M.I. // the Lancet. 2005. V. 366.
№ 9493. P. 1315–1323.
3. Laird n.M., Lange c. // nat. rev. Genet. 2006. V. 7. № 5.
4. Spielman r.S., McGinnis r.e., ewens W.J. // Am. J. Hum.
Gen. 1993. V. 52. № 3. P. 506–516.
5. thomson G. // Am. J. Hum. Gen. 1995. V. 57. № 2. P. 474–
6. Fisher r.A. // J. roy. Statistical Society. 1922. V. 85. № 1.
7. Sheskin D. Handbook of parametric and nonparametric
statistical procedures. // crc Press. 2004. 752 p.
8. Freeman G.H., Halton J.H. // Biometrika. 1951. V. 38.
№ 1–2. P. 141–149.
9. Mantel n. // JASA. 1963. V. 58. № 303. P. 690–700.
10. Kwon J.M., Goate A.M. // Alcohol res Health. 2000. V. 24.
№ 3. P. 164–168.
11. cortina-Borja M., Smith A.D., combarros O., Lehmann
D.J. // BMc research notes. 2009. V. 2. № 1. P. 105.
12. cordell H.J. // nat. rev. Genet. 2009. V. 10. № 6. P. 392–
13. Ahn J., Yu K., Stolzenberg-Solomon r., Simon K.c.,
Mccullough M.L., Gallicchio L., Jacobs e.J., Ascherio A.,
Helzlsouer K., Jacobs K.B., et al. // Hum. Mol. Genet. 2010.
V. 19. № 13. P. 2739–2745.
14. Jakkula e., Leppä V., Sulonen A.-M., Varilo t., Kallio S.,
Kemppinen A., Purcell S., Koivisto K., tienari P., Sume-
lahti M.-L., et al. // Am. J. Hum. Genet. 2010. V. 86. № 2.
15. Kemppinen A., Sawcer S., compston A. // Brief Funct
Genomics. 2011. V. 10. № 2. P. 61–70.
16. Wang J.H., Pappas D., Jager P.L.D., Pelletier D., Bakker
P.I. de, Kappos L., Polman c.H., Australian and new Zea-
land Multiple Sclerosis Genetics consortium (AnZgene),
chibnik L.B., Hafler D.A., et al. // Genome Medicine. 2011.
V. 3. № 1. P. 3.
17. Hirschhorn J.n., Daly M.J. // nat. rev. Genet. 2005. V. 6.
№ 2. P. 95–108.
18. Schork n.J., Murray S.S., Frazer K.A., topol e.J. // curr.
Opin. Genet. Dev. 2009. V. 19. № 3. P. 212–219.
19. Li B., Leal S.M. // the American Journal of Human Ge-
netics. 2008. V. 83. № 3. P. 311–321.
20. Madsen B.e., Browning S.r. // PLoS genetics. 2009. V. 5.
№ 2. P. e1000384.
21. neale B.M., rivas M.A., Voight B.F., Altshuler D., Devlin
B., Orho-Melander M., Kathiresan S., Purcell S.M., roeder
K., Daly M.J. // PLoS Genet. 2011. V. 7. № 3. P. e1001322.
22. Bland J.M., Altman D.G. // BMJ. 1995. V. 310. № 6973.
23. Westfall P.H., Young S.S. resampling-based multiple
testing: examples and methods for p-value adjustment.
nY: Wiley-Interscience. 1993. 316 p.
24. Benjamini Y., Hochberg Y. // J. roy. Statistical. Society.
Ser. B Stat. Methodol. 1995. P. 289–300.
25. Storey J.D., tibshirani r. // Proc. natl. Acad. Sci. 2003.
V. 100. № 16. P. 9440–9445.
26. cordell H.J. // Hum. Mol. Gen. 2002. V. 11. № 20. P. 2463–
27. Phillips P.c. // nat. rev. Genet. 2008. V. 9. № 11. P. 855–867.
28. Favorova O.O., Andreewski t.V., Boiko A.n., Sudomoina
M.A., Alekseenkov A.D., Kulakova O.G., Slanova A.V., Gu-
sev e.I. // neurology. 2002. V. 59. № 10. P. 1652.
29. tsareva e.Y., Kulakova O.G., Boyko A.n., Shchur S.G.,
Lvovs D., Favorov A.V., Gusev e.I., Vandenbroeck K., Favo-
rova O.O. // Pharmacogenomics. 2012. V. 13. № 1. P. 43–53.
30. Purcell S., neale B., todd-Brown K., thomas L., Fer-
reira M.A., Bender D., Maller J., Sklar P., De Bakker P.I.,
Daly M.J., et al. // Am. J. Hum. Genet. 2007. V. 81. № 3.
32. Aulchenko Y.S., ripke S., Isaacs A., van Duijn c.M. //
Bioinformatics. 2007. V. 23. № 10. P. 1294–1296.
33. nunkesser r., Bernholt t., Schwender H., Ickstadt K.,
Wegener I. // Bioinformatics. 2007. V. 23. № 24. P. 3280–
34. Motsinger-reif A.A., Dudek S.M., Hahn L.W., ritchie
M.D. // Genet. epidemiol. 2008. V. 32. № 4. P. 325–340.
35. Long Q., Zhang Q., Ott J. // BMc bioinformatics. 2009.
V. 10. № Suppl 1. P. S75.
36. ritchie M.D., Hahn L.W., roodi n., Bailey L.r., Dupont
W.D., Parl F.F., Moore J.H. // Am. J. Hum. Genet. 2001. V. 69.
№ 1. P. 138–147.
38. Favorov A.V., Andreewski t.V., Sudomoina M.A., Favo-
rova O.O., Parmigiani G., Ochs M.F. // Genetics. 2005. V. 171.
№ 4. P. 2113–2121.
40. Zhang Y., Liu J.S. // nat. Genet. 2007. V. 39. № 9. P. 1167–
42. ruczinski c.K.I., LeBlanc M.L., Hsu L. // Genet. epide-
miol. 2001. V. 21. № 1. P. S626–S631.
43. Kooperberg c., ruczinski I. // Genet. epidemiol. 2005.
V. 28. № 2. P. 157–170.
44. cleves M.A., Olson J.M., Jacobs K.B. // Genet. epidemiol.
1997. V. 14. № 4. P. 337–347.
45. Sham P.c., curtis D. // Ann. Hum. Genet. 1995. V. 59. №
Pt 3. P. 323–336.
46. cordell H.J., Barratt B.J., clayton D.G. // Genet. epide-
miol. 2004. V. 26. № 3. P. 167–185.
47. Becker t., Knapp M. // Genet. epidemiol. 2004. V. 27. № 1.
48. Herold c., Becker t. // Bioinformatics. 2009. V. 25. № 1.
49. Gibbs r.A., Belmont J.W., Hardenbol P., Willis t.D., Yu
F., Yang H., ch’ang L.Y., Huang W., Liu B., Shen Y., et al. //
nature. 2003. V. 426. № 6968. P. 789–796.
52. Julià A., Ballina J., cañete J.D., Balsa A., tornero-Molina
J., naranjo A., Alperi-López M., erra A., Pascual-Salcedo
D., Barceló P., et al. // Arthritis rheum. 2008. V. 58. № 8.
53. Shen L., Kim S., risacher S.L., nho K., Swaminathan S.,
West J.D., Foroud t., Pankratz n., Moore J.H., Sloan c.D., et
al. // neuroimage. 2010. V. 53. № 3. P. 1051–1063.
54. turton J.c., Bullock J., Medway c., Shi H., Brown K.,
Belbin O., Kalsheker n., carrasquillo M.M., Dickson D.W.,
Graff-radford n.r., et al. // J. Alzheimers Dis. 2011. V. 25.
№ 4. P. 635–644.
55. Orton S.M., ramagopalan S.V., Para A.e., Lincoln M.r.,
Handunnetthi L., chao M.J., Morahan J., Morrison K.M.,
Sadovnick A.D., ebers G.c. // J. neurol. Sci. 2011. V. 305.
№ 1–2. P. 116–120.
VOL. 4 № 3 (14) 2012 | ActA nAturAe | 71
56. Schüpbach t., Xenarios I., Bergmann S., Kapur K. //
Bioinformatics. 2010. V. 26. № 11. P. 1468–1469.
57. Hosmer D.W., Lemeshow S. Applied logistic regression.
John Wiley & Sons. 2000. 373 p.
58. Mukherjee B., Ahn J., Gruber S.B., rennert G., Moreno
V., chatterjee n. // Genet. epidemiol. 2008. V. 32. № 7.
60. ritchie M.D., Motsinger A.A. // Pharmacogenomics. 2005.
V. 6. № 8. P. 823–834.
61. Brassat D., Motsinger A.A., caillier S.J., erlich H.A.,
Walker K., Steiner L.L., cree B.A.c., Barcellos L.F., Pericak-
Vance M.A., Schmidt S., et al. // Genes Immun. 2006. V. 7.
№ 4. P. 310–315.
62. Greene c.S., Sinnott-Armstrong n.A., Himmelstein D.S.,
Park P.J., Moore J.H., Harris B.t. // Bioinformatics. 2010.
V. 26. № 5. P. 694–695.
63. Ma J., Amos c.I., Warwick Daw e. // Genet. epidemiol.
2007. V. 31. № 6. P. 594–604.
64. Albrechtsen A., castella S., Andersen G., Hansen t.,
Pedersen O., nielsen r. // Genetics. 2007. V. 176. № 2.
65. Schwender H., ruczinski I. // Adv. Genet. 2010. V. 72.
66. Zhang Y., Jiang B., Zhu J., Liu J.S. // Ann. Hum. Genet.
2011. V. 75. № 1. P. 183–193.
67. O’Doherty c., Favorov A., Heggarty S., Graham c., Favo-
rova O., Ochs M., Hawkins S., Hutchinson M., O’rourke K.,
Vandenbroeck K. // Pharmacogenomics. 2009. V. 10. № 7.
68. Favorova O.O., Favorov A.V., Boiko A.n., Andreewski t.V.,
Sudomoina M.A., Alekseenkov A.D., Kulakova O.G., Gusev
e.I., Parmigiani G., Ochs M.F. // BMc Med. Genet. 2006.
V. 7. P. 63–72.
69. chikhladze n.M., Samedova Kh.F., Sudomoina M.A,
thant M., Htut Z.M., Litonova G.n., Favorov A.V., chazova
I.e., Favorova O.O. // Kardiology. 2008. V. 48. № 1. P. 37–42
70. Sudomoina M.A., nikolaeva t.Y., Parfenov M.G., Ale-
kseenkov A.D., Favorov A.V., Gekht A.B., Gusev e.I.,
Favorova O.O. // Kardiologicheskij Vestnik. 2007. V. 2. № 1.
71. Parfenov M.G., chugunova S.A., nikolaeva t.Y., Kobylina
OV, Sudomoina M.A, Koliadina Iu.A., Gekht A.B., Gusev
e.I., Favorova O.O. // Molekular. meditsina. 2008. P. 55–59.
72. Sudomoina M.A., Sukhinina t.S., Barsova r.M., Favorov
A.V., Shakhnovich r.M., titov B.V., Matveeva n.A., ry-
balkin I.n., Vlasik t.n., Ochs M.F., ruda M.Ia., et al. // Mol
Biol (Mosk). 2010. V. 44. № 3. P. 463–471.
73. titov B.V., Barsova r.M., Martynov M.Y., nikonova A.A.,
Favorov A.V., Gusev e.I.,Favorova O.O., // Mol Biol (Mosk).
2012. V. 46. № 1. P. 93–102.
74. Parfenov M.G., titov B.V., Sudomoina M.A., Martynov
M.Y., Favorov A.V., Ochs M.F., Gusev e.I., Favorova O.O. //
Mol Biol (Mosk). 2009. V. 43. № 5. P. 873–880.
75. chugunova S.A. Sudomoina M.A., nikolaeva t.Y., Parfen-
ov M.G., Makarycheva O. Iu., Gusev e.I., Favorova O.O. //
Yakutskij Medicinskij Journal. 2009. V.2. № 26. P. 105–107.
76. Fedorova S.A., Bermisheva M.A., Villems r., Maksimova
n.r., Kononova S.K., Stepanova S.K., Khusnutdinova e.K.
// Yakutskij Medicinskij Journal. 2003. № 1. P. 16–21.
77. tsareva e.Y., Kulakova O.G., Makarycheva O. Iu., Boyko
A.n., Schur S.G., Lasch n. Iy.,Popova n.F., Gusev e.I.,
Bashinskaya V.V., Lvovs D., et al. // Mol Biol (Mosk). 2011.
V. 45. № 6. P. 963–972.
78. Sudomoina M.A., Boyko A.n., Demina t.L., Gusev e.I.,
Boldyreva M.n., trofimov D.Iu., Alekseev L.P., Favorova
O.O. // Mol Biol (Mosk). 1998. V.32. № 2. P. 291–296.
79. Boiko A.n., Gusev e.I., Sudomoina M.A., Alekseenkov
A.D., Kulakova O.G., Bikova O.V., Maslova O.I., Guseva
M.r., Boiko S.Y., Guseva M.e., et al. // neurology. 2002.
V. 58. № 4. P. 658.
80. Gusev e., Sudomoina M., Boiko A., Deomina t., Favorova
O. In: Frontiers in multiple sclerosis. // Martin Dunitz. 1997.
81. Fuller G.M., Zhang Z. // Ann. nY Acad. Sci. 2001. V. 936.
№ 1. P. 469–479.
82. O’Brien e., Asmar r., Beilin L., Imai Y., Mallion J.M.,
Mancia G., Mengden t., Myers M., Padfield P., Palatini P., et
al. // J. Hypertens. 2003. V. 21. № 5. P. 821–848.
83. Goecks J., nekrutenko A., taylor J., team t.G. // Ge-
nome Biol. 2010. V. 11. № 8. P. r86.
84. Ihaka r., Gentleman r. // J. comput. Graph. Statist. 1996.
85. Moore J.H. http://compgen.blogspot.com/2005/05/mdr-
86. Bush W.S., Dudek S.M., ritchie M.D. // Bioinformatics.
2006. V. 22. № 17. P. 2173–2174.
87. Peng t., Du P., Li Y. // Bioinformation. 2009. V. 3. № 8.