Assessing the Contribution Family Data Can Make to Case‐Control Studies of Rare Variants

Article (PDF Available)inAnnals of Human Genetics 75(5):630-8 · June 2011with23 Reads
Impact Factor: 2.21 · DOI: 10.1111/j.1469-1809.2011.00660.x · Source: PubMed
Abstract
When pathogenic variants are rare then even among cases the proportion of subjects possessing a variant might be low, meaning that very large samples might be required to conclusively demonstrate evidence of an effect. Relatives of subjects within a case-control sample might provide useful additional information. The method of model-free linkage analysis implemented in MFLINK was adapted to incorporate linkage disequilibrium (LD) parameters in order to test for an effect of a putative pathogenic variant in complete LD with a disease locus. The effect of adding in to the analysis relatives of cases and controls found to carry the variant was investigated. When affected siblings or cousins of cases possessing the variant were incorporated they had a large effect on the results obtained. The evidence for involvement increased or reduced as expected, depending on whether or not the relatives themselves were found to possess the variant. The size of the effect was large relative to that expected from just increasing the size of a standard case-control sample. Affected relatives offer a valuable resource to assist the interpretation of case-control studies of rare variants. The method is capable of including other relative types and can deal with complex pedigrees.

Full-text (PDF)

Available from: David Curtis, Jun 16, 2015
doi: 10.1111/j.1469-1809.2011.00660.x
Assessing the Contribution Family Data Can Make to
Case-Control Studies of Rare Variants
David Curtis
Centre for Psychiatry, Barts and The London School of Medicine and Dentistry, London, UK
Summary
When pathogenic variants are rare then even among cases the proportion of subjects possessing a variant might be low,
meaning that very large samples might be required to conclusively demonstrate evidence of an effect. Relatives of subjects
within a case-control sample might provide useful additional information.
The method of model-free linkage analysis implemented in MFLINK was adapted to incorporate linkage disequilibrium
(LD) parameters in order to test for an effect of a putative pathogenic variant in complete LD with a disease locus. The
effect of adding in to the analysis relatives of cases and controls found to carry the variant was investigated.
When affected siblings or cousins of cases possessing the variant were incorporated they had a large effect on the results
obtained. The evidence for involvement increased or reduced as expected, depending on whether or not the relatives
themselves were found to possess the variant. The size of the effect was large relative to that expected from just increasing
the size of a standard case-control sample.
Affected relatives offer a valuable resource to assist the interpretation of case-control studies of rare variants. The method
is capable of including other relative types and can deal with complex pedigrees.
Keywords: Association, model-free linkage, linkage disequilibrium
Introduction
It is plausible that rare variants of individually large effect could
contribute to susceptibility of diseases which are relatively
common and it has been pointed out that such rare variants
might account for some of the weak associations which have
been observed with markers which themselves have a rela-
tively high minor allele frequency (Goldstein, 2009; Dwyer
et al., 2010). An obvious example is schizophrenia which has
a lifetime prevalence in the region of 1% and high heritability.
The fact that rare variants can have a major effect on suscep-
tibility to schizophrenia is demonstrated by the role of copy
number variants (International_Schizophrenia_Consortium,
2008; Murphy et al., 1999) but attempts to identify simple
coding variants have to date been less successful. A study of
polymorphisms within PCM1 identified three variants with
possible functional effects which were significant at P values
Corresponding author: David Curtis, Centre for Psychiatry, Barts,
and the London School of Medicine and Dentistry, London E1
1BB, UK. Tel: +44 20 7377 7729; Fax: +44 20 7377 7316;
Email: david.curtis@qmul.ac.uk
ranging from 0.015 to 0.002 although linkage disequilibrium
(LD) relationships between them made interpretation difficult
(Datta et al., 2008). Repeated genome-wide association stud-
ies have directed attention at loci of interest but have failed
definitively to identify common variants with moderate effect
size. However such studies might well not pick up effects of
rare variants. In fact, if there were variants with large effect
size it would not be surprising if they each had low population
frequencies. Taking as an example early-onset Alzheimer’s dis-
ease, which is a rare Mendelian-dominant disorder, there are
three separate genes involved and for each of these a num-
ber of different pathogenic mutations have been documented
(Jayadev et al., 2010). There is no reason to suppose that a
more common non-Mendelian disease would have fewer sus-
ceptibility genes or that each would harbour a smaller number
of potential variants. To take a very simplistic view, we could
easily imagine that in the case of schizophrenia there could be
10 genes and that major disruption of any one of them could
lead to a substantially increased risk. Such disruption could in
practice occur at any one of 10 sites within each gene. Thus,
the frequency of any particular variant among cases might
well be as low as 1% and quite possibly a good deal less. If one
630
Annals of Human Genetics (2011) 75,630–638
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 1
Family Data for Rare Variants
sequenced one candidate gene in 1000 cases then one would
come across only a proportion of the variants within that gene
that actually existed in the population because by chance some
would be absent from the sample. Of the ones that were de-
tected, one might expect only a handful of each to be present
among the cases tested. Carrying out further testing for these
variants in replication samples could provide further evidence
for their involvement but even so the numbers involved might
be so small as to render the findings not entirely convincing.
Thus, even if one were to accurately identify a variant which
conferred a major effect on risk one might struggle to find a
large enough sample of cases to demonstrate its role unequiv-
ocally. This problem could be exacerbated by the fact that in
order to increase the sample size one might need to widen the
geographic and ethnic origins of the cases, possibly into pop-
ulations where the variant was even less common than it had
been in the original sample. The issue of how best to detect
such rare variants is topical and has recently been discussed,
along with the suggestion that variants in the same gene could
be subjected to a pooled analysis (Morris & Zeggini, 2009;
Lawrence et al., 2010).
An obvious approach that might be helpful in this sit-
uation would be to genotype additional subjects related to
members of the original case-control sample. In particular,
if a variant really does have a major effect on risk then one
could potentially gain considerable additional information by
typing affected relatives of cases in whom the variant has
been found. This might provide a relatively quick and easy
method of identifying further subjects carrying the variant
and might require fewer resources than simply increasing
the total number of cases tested. Of course, any such af-
fected relatives could not be simply added to the pool of
cases possessing the variant. The original case-control sam-
ple provides evidence for association whereas additional re-
lated subjects could support linkage. A sequential approach
of testing association and then linkage has been reported in
astudyofABCA13 in schizophrenia (Knight et al., 2009).
However there might be an advantage in carrying out com-
bined association and linkage analysis. In this paper we seek
to quantify the effect that including additional related sub-
jects can have when they are used to enhance a case-control
study.
Methods have previously been described to carry out com-
bined association and linkage analysis (Goring & Terwilliger,
2000; Horvath et al., 2001; Thompson et al., 2003; Li et al.,
2005; Li et al., 2006) but they have not aimed specifically to
test the hypothesis that just a single, rare variant is genotyped
which directly affects susceptibility. In order to assess the ef-
fect of adding additional relatives to a case-control sample we
thought it would be helpful to devise a method which was
designed to detect a rare dominant effect and which could
be applied to unrelated case-control subjects but which could
simultaneously incorporate information from any type of rel-
ative, including extended pedig rees, and could perform full
likelihood calculations. By focusing narrowly on testing only
this hypothesis it was possible to derive a test with just a single
degree of freedom. To do this, we adapted the previously de-
scribed method of model-free linkage analysis implemented
in MFLINK (Curtis & Sham, 1995). We checked the per-
formance of the method against standard methods for case-
control analysis and then assessed the effect of adding in to
the sample different types of relative.
Materials and Methods
The method used to assess the evidence for association be-
tween the variant allele and disease is based on a modification
of the previously described “model-free” test for linkage with
locus heterogeneity implemented in the MFLINK program.
This carries out full likelihood-based linkage analysis which
can incorporate genotype information from multiple mark-
ers and uses both affected and unaffected pedigree members.
In summary, this method models disease susceptibility using
a biallelic locus parameterised with three penetrance values
f
0
,f
1
,andf
2
and a disease allele frequency of q. This disease
locus is presumed to lie at some map position x relative to the
marker loci. The hypothesis of linkage with admixture due to
locus heterogeneity assumes that a certain proportion, α,of
f amilies have disease due to the actions of a locus at this po-
sition whereas (1 α) families manifest the effects of a locus
with the same disease model parameters but at an unlinked
position (Risch, 1989). The null hypothesis of no linkage as-
sumes α = 0. The model-free test for linkage with admixture
involves obtaining the maximum log likelihood for all the
disease and marker data over a range of values for the disease
model parameters either assuming α = 0 (the null hypothesis)
or α>0 (at least some families manifest the effects of a dis-
ease locus at this map position). A var i ety of disease models
are used but with all having the population prevalence, Kp,
constrained to the true value. For parsimony the MFLINK
program tests a range of recessive and dominant models with
f
0
varying between 0 and Kp while f
2
varies between 1 and Kp
with f
0
= f
1
= 0, f
2
= 1 representing a Mendelian recessive
model, f
0
= f
1
= f
2
= Kp a null effect, and f
0
= 0, f
1
= f
2
=
1 with a Mendelian-dominant model. Each of these models
is uniquely defined by f
1
taking values ranging from 0 to 1.
We write the disease phenotype data as D and the observed
markers as M. Then, the likelihood ratio for a model-free test
for linkage with admixture at position X on the genetic map
is written as
LR = L(D, M | α>0, x = X, f
1
)/L(D, M|α = 0, x = X, f
1
)
Annals of Human Genetics (2011) 75,630–638 631
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 2
D. Curtis
This yields a one-tailed test for linkage with admixture
with one degree of freedom in that under the null hypothesis
2ln(LR) is distributed as a 50:50 mixture of X
2
1
and X
2
0
.
In order to use this approach to test for the role of a rare
variant in affecting susceptibility to disease, we make a number
of minor modifications. We consider only a single marker
locus, consisting of the variant itself, and the test position for
the disease locus is at zero recombination with this marker
locus, denoted θ = 0. We choose only to consider dominant
models, so f
1
will vary from Kp to 1. We also introduce LD
parameters, consisting of specified haplotype frequencies for
the disease and marker loci. If we denote the vector of four
haplotype frequencies as H and the particular values which
the frequencies are given under the hypothesis that the variant
is associated and directly influences susceptibility as H
a
we can
write the modified form of the test as
LR = L(D, M|α>0 = 0, H = H
a
, f
1
)/
LD, M|α = 0 = 0, H = H
a
, f
1
).
Once more, this yields a test for a direct effect of the variant
which under the null hypothesis 2ln(LR) is distributed as a
50:50 mixture of X
2
1
and X
2
0
. The single degree of freedom
is contributed by the admixture parameter, α.
To explain this test more fully, we begin by presenting
the standard model for linkage with admixture, i.e., locus
heterogeneity, using just a single marker. This assumes that
a certain proportion of families, α, have a disease locus at
recombination fraction θ with the marker. For each family, i ,
we can write the likelihood for the disease and marker data as
L
l
i
= L
i
(D, M|q, F= T),
where F denotes the vector of three penetrances and q the
disease gene frequency. (In fact, the marker allele frequencies
also enter into this likelihood calculation but these are assumed
known from population data).
Likewise, we can write the likelihood for the family if the
disease locus is unlinked as
L
u
i
= L
i
(D, M|q, F= 0.5).
Then, the overall likelihood for each family assuming linkage
with admixture for a given value of α is taken as
L
i
=αL
l
i
+(1 α)L
u
i
A combined likelihood for the whole data set is obtained
as the product of these family likelihoods or, more conven-
tionally, an overall log l ikelihood is obtained as the sum of the
f amily log likelihoods. It is this which is used in the likelihood
ratio test described above.
To adapt this scenario to the test for the effect of a variant
we need to introduce LD parameters to the model. For the
sake of clarity we will continue to refer to the disease locus
being “linked” or “unlinked” with the understanding that
in the situation of linkage there is zero recombination and
also LD whereas in the situation of non-linkage there is a
recombination fraction of 0.5 and the disease and marker
locus are in linkage equilibrium. We use H
a
to indicate the
set of haplotype frequencies modelling the LD and H
u
for the
set of haplotype frequencies under linkage equilibrium. Then
we have
L
l
i
= L
i
(D, M | q, F, H
a
= 0)
L
u
i
= L
i
(D, M | q, F, H
u
= 0.5).
L
i
=αL
l
i
+ (1 α)L
u
i
A possibly problematic aspect of this approach concerns
the choice of haplotype frequencies which should be used
to model the linked and unlinked situations. The variant is
assumed by its nature to be rare and may not have been
obser ved at all prior to the study in question. The following
procedure is proposed, although it is not claimed that this is
necessarily optimal.
We assume that there are subjects identified as forming
par t of an original case-control sample and that for some of
these subjects information on some relatives is also available.
In order to decide on marker allele and haplotype frequencies
to specify we use only information from the original case-
control sample. We denote the overall allele frequency of
the variant allele in the whole case-control sample as v
ALL
and then we specify the haplotype frequencies under linkage
equilibrium, which make up H
u
,as(1 q) (1 v
ALL
),
(1 q)v
ALL
,q(1 v
ALL
), and qv
ALL
. To obtain the haplotype
frequencies for H
a
we assumed that the variant allele only
occurred in the presence of the disease allele although the
disease allele could also occur in the absence of the variant
allele. As an estimate of the proportion of haplotypes with
the disease allele also having the variant allele we took the
proportion of cases with the variant allele, denoted 2v
CASE
(i.e., twice the frequency of the allele in cases). This meant
that the values for the haplotype frequencies under H
a
were set
to (1 q),0,q(1 2v
CASE
), and q2v
CASE
. Using these sets of
haplotype frequencies is in some ways similar to the modelling
which is done when carrying out a likelihood-based test for
heterogeneity of marker allele frequencies between cases and
controls (Zhao et al., 2002) although the LD parameters are
set to be more extreme in order to reflect the hypothesis that
the variant has a direct effect on susceptibility. It is recognised
that the model tested may not exactly reflect the true situation
but experience with applying likelihood ratio tests for linkage
and association has shown that tests based on a parsimonious if
somewhat inaccurate model may perform well relative to tests
632
Annals of Human Genetics (2011) 75,630–638
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 3
Family Data for Rare Variants
with large numbers of free parameters which may produce
better-fitting models.
We are now in a position to describe the full test for the
involvement of a rare variant in a sample of cases and controls
along with relatives for some of the subjects:
Set a plausible value for the population frequency, Kp, of
the disease in question.
Obtain the frequency of the variant allele in the whole
case-control sample and separately in the cases and controls.
Constr uct a series of disease models ranging from fully
dominant, f
0
= f
1
= 1, f
2
= 0, to null effect, f
0
= f
1
= f
2
=
Kp, with disease allele frequency, q, constrained to produce
the correct value for the overall population frequency, Kp.
Constr uct a set of haplotype frequencies under LD and
linkage equilibrium, H
a
and H
u
, based on variant allele fre-
quencies and q, as described above.
For each “family” in the data set, each consisting of a case or
control subject along with relatives where available, calculate
L
l
i
and L
u
i
.
Over values for α ranging from 0 to 1 calculate L
i
and use
these to obtain the likelihood for the whole data set for the
given values of f
1
and α.
The likelihood ratio test, as above, consists of comparing
the highest likelihood obtained for any pair of f
1
and α with
the highest likelihood for any f
1
with α set to zero.
This procedure has been implemented in a program called
MRVTEST, which uses the UNKNOWN and MLINK mod-
ules of the FASTLINK programs to carry out the necessary
likelihood calculations (Lathrop & Lalouel, 1988; Cottingham
et al., 1993; Schaffer et al., 1994).
To test the performance of the procedure, it was applied to
a number of case-control samples having different numbers of
variant alleles and the results compared with results o f three
conventional tests: two-by-two χ
2
test, Fisher’s exact test and a
likelihood-based test for heterogeneity of allele frequencies as
implemented in GENEHUNTER (Zhao et al., 2002; Curtis
et al., 2006). We took all P values to be two-sided, meaning
that all should be directly comparable with each other.
In order to assess the effect of including information from
f amily members along with the original case-control data,
different numbers of different types of relative were added to
the data set to determine the impact this had on the strength of
evidence for a role of the variant in influencing susceptibility
to disease.
Results
The performance of the test on case-control samples is shown
in Table 1. This demonstrates that the test produces very simi-
lar results to the more standard tests, especially the likelihood-
based test for heterogeneity of allele frequencies between cases
and controls.
Table 2 shows what can happen if one adds information
from affected relatives of cases who carry the variant when it
turns out that the relatives also have the variant. It can be seen
that if one genotypes an affected sib and finds that this sib also
carries the variant then this has a relatively modest effect on
the overall evidence for linkage and association. However, if
one is able to obtain two affected sibs of a case carrying the
variant and if both these affected sibs are also found to have
the variant then this can have a more substantial effect. For
example, if 5 out of 1000 cases have the variant then this is
significant with log(p) = 2.1. If each of the cases has two
affected sibs and if all the affected sibs are found to have the
variant, then the overall significance changes to log(p) =
5.2. A single affected cousin has an even greater effect—if
five cases have five cousins which all carry the variant the
finding becomes significant at log(p) = 6.7. This is actually
f ar more significant than the result one might expect from
doubling the size of the original case-control sample—if 10
cases out of 2000 have the variant then the result is only
significant at log(p) = 3.7. To take one final example, if
20 cases out of 2000 carry the variant then this is significant
at log(p) = 6.8. If five of them have an affected cousin,
all of whom also carry the variant, then the result becomes
significant at log(p) = 11.3.
Table 3 shows what happens if one genotypes affected rel-
atives of cases with the variant and it turns out that the rel-
ative does not have the variant. It can be seen that finding
two or three cases for whom these relatives are unlinked can
dramatically reduce the significance of the result which was
produced by the original case-control sample. Results were
almost identical whether the additional relatives consisted of a
single unlinked sib, two unlinked sibs or one unlinked cousin.
Table 4 explores the effect of finding that some controls
carrying the variant have one or more affected relatives and
that these affected relatives are found to carry the variant. The
effect of this is to suggest that the variant may in fact exert an
effect on susceptibility and that the control subject is a non-
penetrant carrier. Thus the result becomes more significant.
The effect from a single sib or cousin is fairly modest but
if a control with the variant has two affected sibs who are
also found to have the variant then this has quite a substantial
effect.
The situation in which affected relatives of controls with
the variant were genotyped and found not to have the variant
was also investigated. In all the situations tested this finding
had only a minimal effect on the statistical significance of the
results obtained.
Discussion
The simple method described for analysing a mixed data set
consisting mostly of a case-control sample but with some
additional relatives typed may well not be the optimal method
Annals of Human Genetics (2011) 75,630–638 633
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 4
D. Curtis
Ta b l e 1 P values obtained from different methods in a case-control study where different numbers of subjects have normal genotype (VV)
or carry the variant allele (Vv). The negative of the base 10 logarithm of the two-tailed P value is presented for each test.
Controls Cases log(p)
VV Vv VV Vv Fisher’s test 2 × 2 χ
2
Heterogeneity test MRVTEST
1000 0 1000 0 0.0 0.0 0.0 0.0
1000 0 995 5 1.5 1.6 2.1 2.1
1000 0 990 10 3.0 2.8 3.7 3.7
1000 0 985 15 4.6 4.0 5.3 5.3
1000 0 980 20 6.1 5.2 6.9 6.9
995 5 995 5 0.2 0.0 0.0 0.0
995 5 990 10 0.8 0.7 0.7 0.6
995 5 985 15 1.7 1.6 1.7 1.5
995 5 980 20 2.8 2.6 2.7 2.6
995 5 975 25 3.9 3.7 3.9 3.8
990 10 990 10 0.2 0.0 0.0 0.0
990 10 985 15 0.7 0.5 0.5 0.4
990 10 980 20 1.3 1.2 1.2 1.0
990 10 975 25 2.1 2.0 2.0 1.8
990 10 970 30 3.1 2.9 2.9 2.8
2000 0 2000 0 0.0 0.0 0.0 0.0
2000 0 1995 5 1.5 1.6 2.1 2.0
2000 0 1990 10 3.0 2.8 3.7 3.7
2000 0 1985 15 4.5 4.0 5.3 5.3
2000 0 1980 20 6.1 5.2 6.9 6.8
1995 5 1995 5 0.2 0.0 0.1 0.0
1995 5 1990 10 0.8 0.7 0.7 0.6
1995 5 1985 15 1.7 1.6 1.7 1.5
1995 5 1980 20 2.7 2.6 2.7 2.6
1995 5 1975 25 3.8 3.6 3.9 3.8
1990 10 1990 10 0.2 0.0 0.0 0.0
1990 10 1985 15 0.7 0.5 0.5 0.3
1990 10 1980 20 1.3 1.2 1.2 1.0
1990 10 1975 25 2.1 2.0 2.0 1.8
1990 10 1970 30 3.0 2.9 2.9 2.7
3000 0 3000 0 0.0 0.0 0.0 0.0
3000 0 2995 5 1.5 1.6 2.1 2.0
3000 0 2990 10 3.1 2.8 3.7 3.6
3000 0 2985 15 4.6 4.0 5.3 5.2
3000 0 2980 20 6.1 5.1 6.9 6.8
2995 5 2995 5 0.2 0.0 0.1 0.0
2995 5 2990 10 0.9 0.7 0.7 0.5
2995 5 2985 15 1.7 1.6 1.7 1.5
2995 5 2980 20 2.8 2.6 2.7 2.5
2995 5 2975 25 3.9 3.6 3.9 3.7
2990 10 2990 10 0.2 0.0 0.0 0.0
2990 10 2985 15 0.7 0.5 0.5 0.3
2990 10 2980 20 1.4 1.2 1.2 1.0
2990 10 2975 25 2.1 2.0 2.0 1.8
2990 10 2970 30 3.0 2.8 2.9 2.7
of analysis. However, it suffices to demonstrate that including
information from a small number of affected relatives can have
a dramatic effect on the strength of evidence to implicate a rare
variant as having effect on susceptibility to disease. To take one
striking example, if 5 out of 1000 cases have the variant then
finding that each has two affected sibs or one affected cousin
also with the variant provides stronger evidence than would
be obtained from finding it in 10 out of 2000 cases. This
634
Annals of Human Genetics (2011) 75,630–638
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 5
Family Data for Rare Variants
Ta b l e 2 The effects on statistical significance obtained by adding
information from relatives of cases who have the variant. In each
situation the relatives are also affected and are also found to have the
variant. The number of controls was set to be equal to the number
of cases, with none of the controls having the variant.
Cases Type of Number of cases
relative for which relative
VV Vv added added log(p)
995 5 One sib 0 2.1
12.4
22.7
33.1
43.4
53.7
995 5 Two sibs 0 2.1
12.7
23.4
34.0
44.6
55.2
995 5 One cousin 0 2.1
13.0
24.0
34.9
45.8
56.7
990 10 One sib 0 3.7
14.0
24.4
34.7
45.0
55.3
990 10 Two sibs 0 3.7
14.3
25.0
35.6
46.2
56.8
990 10 One cousin 0 3.7
14.6
25.5
36.4
47.3
58.2
1990 10 One sib 0 3.7
14.0
24.3
34.7
45.0
55.3
1990 10 Two sibs 0 3.7
14.3
25.0
35.6
Ta b l e 2 Continued
Cases Type of Number of cases
relative for which relative
VV Vv added added log(p)
46.2
56.8
1990 10 One cousin 0 3.7
14.6
25.5
36.4
47.3
58.2
1980 20 One sib 0 6.8
17.2
27.5
37.8
48.1
58.4
1980 20 Two sibs 0 6.8
17.5
28.1
38.7
49.3
59.9
1980 20 One cousin 0 6.8
17.8
28.7
39.5
4 10.4
5 11.3
is of course just what one might expect. Putting it another
way, if one finds the variant in 5 out of 1000 cases one might
expect to have to look at another 200 cases before finding
another with the variant and even if one succeeds this will
only represent a very modest addition to the evidence for
involvement. If, on the other hand, one finds that some or
all of these cases have affected relatives then if the variant
tr uly affects susceptibility there will be a good chance that at
least some of these affected relatives will carry the variant and
if one finds this to be so one can produce quite substantial
changes in statistical significance. We note that there can be
a strong effect in either direction. Thus, if one finds that an
affected relative of a case with the variant also has the variant
then the significance of the results increases whereas if the
affected relative turns out not to have the variant then the
significance reduces. The magnitude of the effect suggests
that such affected relatives represent a valuable resource.
The method implemented in this example represents only
a first attempt at adapting model-free linkage analysis to also
test for LD and it is quite possible that improvements could
be made. The approach is supposed to be specifically targeted
Annals of Human Genetics (2011) 75,630–638 635
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 6
D. Curtis
Ta b l e 3 The effects on statistical significance obtained by adding
information from relatives of cases who have the variant. In this
situation the relatives are also affected but when they are genotyped
it is found that they do not have the variant. The number of controls
was set to be equal to the number of cases, with none of the controls
having the variant. Results are shown only for the situation of adding
a single unlinked sib. Very similar results were obtained from adding
two unlinked sibs or an unlinked cousin.
Cases Number of cases
Type of relative for which
VV Vv added relative added log(p)
995 5 One sibling 0 2.1
10.8
20.2
30.0
40.0
50.0
990 10 0 3.7
12.3
21.4
30.8
40.3
50.0
1990 10 0 3.7
12.3
21.4
30.8
40.3
50.0
1980 20 0 6.8
15.2
24.1
33.2
42.4
51.8
at testing for a rare variant with major dominant effect. There
might well be better ways of modelling both LD relationships
and the effect of admixture. Nevertheless, the method suffices
for now to illustrate the kind of effects which can be expected
from incorporating family data into a case-control study.
It should be emphasised that in reality the hypothesis being
tested becomes somewhat weaker when one incorporates rel-
atives into a case-control study in this way, in that it becomes
a test for linkage and/or association rather than purely for
association. With the case-control sample, the null hypothe-
sis being rejected is that there is no association between the
variant and disease. When relatives are added one is testing
for linkage as well as LD and the null hypothesis becomes
that there is no association and no linkage. Potentially this
could give rise to debate about the interpretation of the re-
sults obtained. If one were testing a variant which had been
Ta b l e 4 The effects on statistical significance obtained by adding
information from affected relatives of controls who have the variant.
In each situation the affected relatives are also found to have the
variant.
Controls Cases Number of cases
Type of relative for which
VV Vv VV Vv added relative added log(p)
995 5 980 20 One sib 0 2.6
12.5
22.6
32.8
43.2
54.2
995 5 980 20 Two sibs 0 2.6
12.9
23.3
34.0
44.9
56.5
995 5 980 20 One cousin 0 2.6
12.6
22.7
32.9
43.2
54.0
1995 5 1980 20 One sib 0 2.6
12.5
22.5
32.8
43.2
54.2
1995 5 1980 20 Two sibs 0 2.6
12.8
23.3
34.0
44.9
56.5
1995 5 1980 20 One cousin 0 2.6
12.6
22.7
32.9
43.3
54.1
implicated by association studies and found it also present in
affected relatives of cases then that would generally be taken
to provide some further evidence that the variant might have
a direct effect on susceptibility. On the other hand, to take
an extreme example, if one had already carried out a linkage
study which had implicated the region and had gone on to do
fine-mapping and detect an associated marker one could not
636
Annals of Human Genetics (2011) 75,630–638
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 7
Family Data for Rare Variants
go on to add the original linked pedigrees in to the sample as
if they represented a new, independent source of information.
Only a few example scenarios have been explored. The
effects o f adding in unaffected relatives have not been inves-
tigated although we would expect that they would add less
information than affected relatives. Nevertheless, the method
is quite capable of incorporating information from both af-
fected and unaffected subjects from complex pedigrees, given
that MLINK can perform the appropriate likelihood calcula-
tions. Likewise, we have only looked at the effects of studying
relatives of subjects found to bear the variant. We would ex-
pect the addition of relatives of other subjects to add little
to the available information, though of course they could be
incorporated if desired. It would in principle be possible to
incorporate information from other markers nearby, for ex-
ample for subjects who had not been genotyped for the spe-
cific variant under consideration. It should also be possible to
model exposure to environmental ri sk factors by using liability
classes.
If one accepts the conclusion that incorporating relatives
can have a quite substantial effect then this has important im-
plications for the design of case-control studies. One would
want to ensure that one could retur n to subjects who were
found to possess a variant and seek to sample their relatives at
a later date, especially any affected relatives. One might seek
information about affected relatives at the time of recruiting
both case and control subjects. One might sample affected
relatives at the time of initial recruitment or one might only
record basic information about them with the aim of follow-
ing them up if the index subject were subsequently found to
possess a variant of interest. Some of these protocols might
require to be specifically proposed in order to receive ethical
approval. It would be undesirable to be in a situation where
one had subjects known to possess a putative risk variant but
one was constrained from returning to them in order to as-
cer tain affected relatives.
Another issue worthy of mention is how one might ap-
propriately deal with multiple variants within the same gene.
There might be a number of different variants, each individ-
ually very rare. One could obviously gain useful evidence to
implicate the gene by combining the information from in-
dividual variants. One might propose that one would have
some objective process for defining a variant within a gene
as “potentially of interest”—this would be based on factors
such as its rarity and its predicted effect on the gene product.
Then one might group all such variants together and code
any one of them as a dummy “gene variant” allele. When
considering relatives, one would treat a relative as having this
allele if and only if they had the same variant as the index
subject in the original case-control sample. If a relative had
a variant in the gene but not the same variant as the index
subject then obviously this would not be consistent with link-
age within the family. Such a relative would thus be coded
as having a normal genotype. This represents one proposed
approach—there might well be other appropriate methods to
bring together evidence from multiple variants within a gene
or gene network. It might be possible to develop approaches
in which separate variants were individually incorporated into
a combined model using a heterogeneity approach.
The investigations described here are designed to roughly
quantify the expected benefits which may accrue from incor-
porating data from relatives of subjects in a case-control study.
It seems that by getting information from just a small number
of individuals one may dramatically increase or reduce the
evidence for the effect of a rare variant. This would clearly
have major implications for study design.
The MRVTEST program is available for download from
http://www.smd.qmul.ac.uk/statgen/dcurtis/software.html.
References
Cottingham, R. W., Jr., Idury, R. M. & Schaffer, A. A. (1993)
Faster sequential genetic linkage computations. Am J Hum Genet
53, 252–263.
Curtis, D., Knight, J. & Sham, P. C. (2006) Prog ram report:
GENECOUNTING support programs. Ann Hum Genet 70, 277–
279.
Curtis, D. & Sham, P. C. (1995) Model-free linkage analysis using
likelihoods. Am J Hum Genet 57, 703–716.
Datta, S. R., Mcquillin, A., Rizig, M., Blaveri, E., Thirumalai, S.,
Kalsi, G., Lawrence, J., Bass, N. J., Puri, V., Choudhury, K., Pimm,
J., Crombie, C., Fraser, G., Walker, N., Curtis, D., Zvelebil, M.,
Pereira, A., Kandaswamy, R., St Clair, D. & Gurling, H. M. (2008)
A threonine to isoleucine missense mutation in the pericentrio-
lar material 1 gene is strongly associated with schizophrenia. Mol
Psychiatry 15, 615–628.
Dwyer, S., Williams, H., Holmans, P., Moskvina, V., Craddock,
N., Owen, M. J. & O’Donovan, M. C. (2010) No evidence
that rare coding variants in ZNF804A confer risk of schizophre-
nia. Am J Med Genet B Neuropsychiatr Genet 153B, 1411–
1416.
Goldstein, D. B. (2009) Common genetic variation and human traits.
NEnglJMed360, 1696–1698.
Goring, H. H. & Terwilliger, J. D. (2000) Linkage analysis in the pres-
ence of errors IV: Joint pseudomarker analysis of linkage and/or
linkage disequilibrium on a mixture of pedigrees and singletons
when the mode of inheritance cannot be accurately specified. Am
J Hum Genet 66, 1310–1327.
Horvath, S., Xu, X. & Laird, N. M. (2001) The f amily based as-
sociation test method: Strategies for studying general genotype–
phenotype associations. Eur J Hum Genet 9, 301–306.
International_Schizophrenia_Consortium (2008) Rare chromoso-
mal deletions and duplications increase risk of schizophrenia. Na-
ture 455, 237–241.
Jayadev, S., Leverenz, J. B., Steinbart, E., Stahl, J., Klunk, W., Yu,
C. E. & Bird, T. D. (2010) Alzheimer’s disease phenotypes and
genotypes associated with mutations in presenilin 2. Brain 133,
1143–1154.
Knight, H. M., Pickard, B. S., Maclean, A., Malloy, M. P., Soares, D.
C., Mcrae, A. F., Condie, A., White, A., Hawkins, W., Mcghee,
Annals of Human Genetics (2011) 75,630–638 637
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 8
D. Curtis
K., Van Beck, M., Macintyre, D. J., Starr, J. M., Deary, I. J.,
Visscher, P. M., Porteous, D. J., Cannon, R. E., St Clair, D.,
Muir, W. J. & Blackwood, D. H. (2009) A cytogenetic abnormality
and rare coding variants identify ABCA13 as a candidate gene in
schizophrenia, bipolar disorder, and depression. Am J Hum Genet
85, 833–846.
Lathrop, G. M. & Lalouel, J. M. (1988) Efficient computations
in multilocus linkage analysis. Am J Hum Genet 42, 498–
505.
Lawrence, R., Day-Williams, A. G., Elliott, K. S., Morris, A. P.
& Zeggini, E. (2010) CCRaVAT and QuTie enabling analysis
of rare variants in large-scale case control and quantitative trait
association studies. BMC Bioinformatics 11, 527.
Li, M., Boehnke, M. & Abecasis, G. R. (2005) Joint modeling of
linkage and association: Identifying SNPs responsible for a linkage
signal. Am J Hum Genet 76, 934–949.
Li, M., Boehnke, M. & Abecasis, G. R. (2006) Efficient study designs
for test of genetic association using sibship data and unrelated cases
and controls. Am J Hum Genet 78, 778–792.
Morris, A. P. & Zeggini, E. (2009) An evaluation of statistical ap-
proaches to rare variant analysis in genetic association studies.
Genet Epidemiol 34, 188–193.
Murphy, K. C., Jones, L. A. & Owen, M. J. (1999) High rates
of schizophrenia in adults with velo-cardio-facial syndrome. Arch
Gen Psychiatry 56, 940–945.
Risch, N. (1989) Linkage detection tests under heterogeneity. Genet
Epidemiol 6, 473–480.
Schaffer, A. A., Gupta, S. K., Shriram, K. & Cottingham, R. W.,
Jr. (1994) Avoiding recomputation in linkage analysis. Hum Hered
44, 225–237.
Thompson, D., Easton, D. F. & Goldgar, D. E. (2003) A full-
likelihood method for the evaluation of causality of sequence
variants from family data. Am J Hum Genet 73, 652–655.
Zhao, J. H., Lissarrague, S., Essioux, L. & Sham, P. C. (2002)
GENECOUNTING: Haplotype analysis with missing genotypes.
Bioinformatics 18, 1694–1695.
Received: 26 December 2010
Accepted: 30 March 2011
638
Annals of Human Genetics (2011) 75,630–638
C
2011 The Authors
Annals of Human Genetics
C
2011 Blackwell Publishing Ltd/University College London
Page 9
    • "Validating these findings might require sequencing the gene in large numbers of subjects. It has been suggested that an alternative approach to following up an extremely rare variant is to carry out family studies of the subjects possessing the variant (Curtis, 2011). Thus, if there are affected relatives who also have the variant one gains confidence that it has an effect whereas an affected relative not sharing the variant casts doubt on its relevance. "
    [Show abstract] [Hide abstract] ABSTRACT: For biological and statistical reasons it makes sense to combine information from variants at the level of the gene. One may wish to give more weight to variants which are rare and those that are more likely to affect function. A combined weighting scheme, implemented in the SCOREASSOC program, was applied to whole exome sequence data for 1392 subjects with schizophrenia and 982 with obesity from the UK10K project. Results conformed fairly well with null hypothesis expectations and no individual gene was strongly implicated. However, a number of the higher ranked genes appear plausible candidates as being involved in one or other phenotype and may warrant further investigation. These include MC4R, NLGN2, CRP, DONSON, GTF3A, IL36B, ADCYAP1R1, ARSA, DLG1, SIK2, SLAIN1, UBE2Q2, ZNF507, CRHR1, MUSK, NSF, SNORD115, GDF3 and HIBADH. Some individual variants in these genes have different frequencies between cohorts and could be genotyped in additional subjects. For other genes, there is a general excess of variants at many different sites so attempts at replication would be more difficult. Overall, the weighted burden test provides a convenient method for using sequence data to highlight genes of interest.
    Article · Oct 2015
  • [Show abstract] [Hide abstract] ABSTRACT: The results of linkage and association studies imply that there are no common, dominantly active variants which have a substantial effect on the risk of schizophrenia. However, there are rare structural variants with a major effect and it is argued that results to date are not incompatible with the existence of large numbers of individually rare sequence variants with major effect, since these would not have been detected by the methods used to date. It is also argued that the epidemiology is consistent with a contribution from recessively acting variants and that likewise these might have gone undetected. It is shown that methods of analysis specifically designed to detect recessive variants through testing departure from Hardy-Weinberg equilibrium offer substantial increases of power over conventional methods. It is recommended that analytic approaches aim to detect very rare variants with major effect and that specific attempts are made to detect recessively acting loci.
    Article · Feb 2013
  • [Show abstract] [Hide abstract] ABSTRACT: Objective: Familial aggregation of fibromyalgia has been increasingly recognized. The goal of this study was to conduct a genome-wide linkage scan to identify susceptibility loci for fibromyalgia. Methods: We genotyped members of 116 families from the Fibromyalgia Family Study and performed a model-free genome-wide linkage analysis of fibromyalgia with 341 microsatellite markers, using the Haseman-Elston regression approach. Results: The estimated sibling recurrence risk ratio (λs ) for fibromyalgia was 13.6 (95% confidence interval 10.0-18.5), based on a reported population prevalence of 2%. Genome-wide suggestive evidence of linkage was observed at markers D17S2196 (empirical P [Pe ]=0.00030) and D17S1294 (Pe=0.00035) on chromosome 17p11.2-q11.2. Conclusion: The estimated sibling recurrence risk ratio (λs ) observed in this study suggests a strong genetic component of fibromyalgia. This is the first report of genome-wide suggestive linkage of fibromyalgia to the chromosome 17p11.2-q11.2 region. Further investigation of these multicase families from the Fibromyalgia Family Study is warranted to identify potential causal risk variants for fibromyalgia.
    Full-text · Article · Apr 2013
Show more

Recommended publications

Discover more