GRR: graphical representation of relationship errors.
ABSTRACT A graphical tool for verifying assumed relationships between individuals in genetic studies is described. GRR can detect many common errors using genotypes from many markers. AVAILABILITY: GRR is available at http://bioinformatics.well.ox.ac.uk/GRR.
- SourceAvailable from: ncbi.nlm.nih.gov[show abstract] [hide abstract]
ABSTRACT: Linkage analyses of genetic diseases and quantitative traits generally are performed using family data. These studies assume the relationships between individuals within families are known correctly. Misclassification of relationships can lead to reduced or inappropriately increased evidence for linkage. Boehnke and Cox (1997) presented a likelihood-based method to infer the most likely relationship of a pair of putative sibs. Here, we modify this method to consider all possible pairs of individuals in the sample, to test for additional relationships, to allow explicitly for genotyping error, and to include X-linked data. Using autosomal genome scan data, our method has excellent power to differentiate monozygotic twins, full sibs, parent-offspring pairs, second-degree (2 degrees ) relatives, first cousins, and unrelated pairs but is unable to distinguish accurately among the 2 degrees relationships of half sibs, avuncular pairs, and grandparent-grandchild pairs. Inclusion of X-linked data improves our ability to distinguish certain types of 2 degrees relationships. Our method also models genotyping error successfully, to judge by the recovery of MZ twins and parent-offspring pairs that are otherwise misclassified when error exists. We have included these extensions in the latest version of our computer program RELPAIR and have applied the program to data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) study.The American Journal of Human Genetics 12/2000; 67(5):1219-31. · 11.20 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: High-resolution mapping is an important step in the identification of complex disease genes. In outbred populations, linkage disequilibrium is expected to operate over short distances and could provide a powerful fine-mapping tool. Here we build on recently developed methods for linkage-disequilibrium mapping of quantitative traits to construct a general approach that can accommodate nuclear families of any size, with or without parental information. Variance components are used to construct a test that utilizes information from all available offspring but that is not biased in the presence of linkage or familiality. A permutation test is described for situations in which maximum-likelihood estimates of the variance components are biased. Simulation studies are used to investigate power and error rates of this approach and to highlight situations in which violations of multivariate normality assumptions warrant the permutation test. The relationship between power and the level of linkage disequilibrium for this test suggests that the method is well suited to the analysis of dense maps. The relationship between power and family structure is investigated, and these results are applicable to study design in complex disease, especially for late-onset conditions for which parents are usually not available. When parental genotypes are available, power does not depend greatly on the number of offspring in each family. Power decreases when parental genotypes are not available, but the loss in power is negligible when four or more offspring per family are genotyped. Finally, it is shown that, when siblings are available, the total number of genotypes required in order to achieve comparable power is smaller if parents are not genotyped.The American Journal of Human Genetics 02/2000; 66(1):279-92. · 11.20 Impact Factor
- The American Journal of Human Genetics 12/1998; 63(5):1563-4. · 11.20 Impact Factor
BIOINFORMATICS APPLICATIONS NOTE
Vol. 17 no. 8 2001
GRR: graphical representation of relationship
Gonc ¸alo R. Abecasis, Stacey S. Cherny, W. O. C. Cookson and
Lon R. Cardon
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive,
Oxford OX3 7RZ, UK
Received on February 1, 2001; revised on March 27, 2001; accepted on March 28, 2001
Summary: A graphical tool for verifying assumed relation-
ships between individuals in genetic studies is described.
GRR can detect many common errors using genotypes
from many markers.
Availability: GRR is available at http://bioinformatics.well.
Contact: email@example.com; firstname.lastname@example.org
Many large scale linkage and association studies have
been conducted and their popularity is increasing. Simple,
efficient, quality control procedures are essential to
the successful completion of these studies. A common
problem in genetic studies is the misspecification of
relationships between DNA samples (Ott, 1991). Mis-
specification of relationships can lead to inaccurate or
biased results and it is therefore important to verify all
The effects of relationship misspecification are varied.
In studies using family data, problems such as non-
paternity and the mislabeling of monozygotic (MZ) twins
as non-twin full sibs, as well as sample mix-ups can lead
to mistaken inferences about allele sharing. For example,
MZ twins will always share more alleles than other
sibling pairs while, on average, half-siblings will share
fewer alleles than full-siblings. In larger pedigrees, the
potential for relationship misspecification is greater and
the detection of these problems is even harder.
In studies using samples of unrelated individuals, such
as association and pharmacogenetic applications, the pres-
ence of related individuals can lead to a misleading infer-
ence about statistical significance. For example, although
to a certain drug share a certain genotype, the finding is
less striking if some of the individuals are related.
The correct genetic relationship between any two
individuals defines an expected pattern of allele sharing
between them. The details of this pattern can be com-
plex, and will depend on the exact type of relationship,
marker characteristics, population history and inbreeding.
Statistics for verifying relationships through patterns of
allele sharing have been proposed, with various degrees
of sophistication, computing time requirements and
assumptions (Boehnke and Cox, 1997; Goring and Ott,
1997; Broman and Weber, 1998; Epstein et al., 2000;
McPeek and Sun, 2000). Here we describe a simple,
general approach for verifying that individuals with the
same specified relationship have similar patterns of allele
sharing. Unlike other approaches, our method does not
require specification of allele frequencies or any other
population parameter. It is expected to be robust to a small
level of random errors in the data and applicable to large
inbred samples. In addition to relationship misspecifica-
tion, our method can detect some other problems such as
sample duplications and switches.
The method is defined as follows: first, classify each
pair of individuals according to their assumed relation-
ship (such as sib-pairs, parent–offspring pairs, unrelated
individuals, etc.). Second, calculate the mean (µij) and
variance (σij) of identical-by-state allele sharing over a
number of polymorphic loci for each pair of individuals,
i and j. If the sample is homogeneous, we expect each
group to display a characteristic pattern of allele sharing.
For example, sib-pairs will be expected to share more al-
leles on average than unrelated individuals, while parent–
offspring pairs (which share at least one chromosome) are
expected to show less variability in allele sharing than sib-
pairs (which may share zero, one or two chromosomes).
A convenient way to identify individuals with patterns of
allele sharing inconsistent with their specified relationship
is to colour code and plot these mean—variance statistics
The figure presents typical results for a genome scan in
a non-inbred sample. Several distinct clusters are present:
unrelated individuals have the lowest average sharing and
high variance (coloured in blue); half-siblings have higher
sharing on average (coloured in green) and full-siblings
have even higher sharing (coloured in red); parent–
offspring pairs have a similar degree of allele sharing to
sib-pairs but with lower variance (coloured in yellow). All
c ? Oxford University Press 2001
Fig. 1. Sample screen shot. Features described in text.
other relative pairs are grouped together and not displayed
by default. Note that some sibling and full-sibling pairs
have been misclassified and appear in other clusters. A
right corner) and corresponds to a pair of identical twins.
To ensure that outlier points are easily identifiable, GRR
implements an outlier rating scheme and places likely
outliers on top of less interesting points. This scheme
is implemented by calculating the mean and variance
of each allele-sharing statistic within each relationship
group. Then each individual’s scores are standardized
to obtain Zµijand Zσijand assigned the outlier scores
are layered on top of lower rated points. Alternative
schemes for layering data points, such as the Mahalanobis
(1936) distance, can be selected by the user.
GRR recognizes standard genetic formats for genotype
and family structure data, including linkage and QTDT
format files (Ott, 1991; Abecasis et al., 2000). Interactive
features allow the user to select individual families and
inspect statistics for any pair of individuals by clicking the
appropriate plot area.
This approach is simple to implement and can be
incorporated into many genetic analysis databases and
quality control protocols. The method performs efficiently
in genome scan linkage panels, although as few as 50
unlinked markers may be sufficient to verify first-degree
relationships in family samples or to verify that no close
relatives or gross stratification are present in samples of
This research was supported by the Wellcome Trust and
by grant EY-12562 from the National Institutes of Health,
Abecasis,G.R., Cardon,L.R. and Cookson,W.O.C. (2000) A general
test of association for quantitative traits in nuclear families. Am.
J. Hum. Genet., 66, 279–292.
Boehnke,M. and Cox,N.J. (1997) Accurate inference of relation-
ships in sib-pair linkage studies. Am. J. Hum. Genet., 61, 423–
Broman,K.W. and Weber,J.L. (1998) Estimation of pairwise rela-
tionships in the presence of genotyping errors. Am. J. Hum.
Genet., 63, 1563–1564.
Epstein,M.P., Duren,W.L. and Boehnke,M. (2000) Improved infer-
ence of relationship for pairs of individuals. Am. J. Hum. Genet.,
Goring,H.H. and Ott,J. (1997) Relationship estimation in affected
sib pair analysis of late-onset diseases. Eur. J. Hum. Genet., 5,
Mahalanobis,P.C. (1936) On the generalized distance in statistics.
Proc. Natl Inst. Sci. India, 2, 49.
McPeek,M.S. and Sun,L. (2000) Statistical tests for detection of
misspecified relationships by use of genome-screen data. Am. J.
Hum. Genet., 66, 1076–1094.
Ott,J. (1991) Analysis of Human Genetic Linkage. Johns Hopkins
University Press, Baltimore.