Accounting for Linkage in Family-Based Tests of Association with Missing Parental Genotypes

Department of Medicine, Duke University Medical Center, Durham, NC 27710, USA.
The American Journal of Human Genetics (Impact Factor: 10.93). 12/2003; 73(5):1016-26. DOI: 10.1086/378779
Source: PubMed


In studies of complex diseases, a common paradigm is to conduct association analysis at markers in regions identified by linkage analysis, to attempt to narrow the region of interest. Family-based tests for association based on parental transmissions to affected offspring are often used in fine-mapping studies. However, for diseases with late onset, parental genotypes are often missing. Without parental genotypes, family-based tests either compare allele frequencies in affected individuals with those in their unaffected siblings or use siblings to infer missing parental genotypes. An example of the latter approach is the score test implemented in the computer program TRANSMIT. The inference of missing parental genotypes in TRANSMIT assumes that transmissions from parents to affected siblings are independent, which is appropriate when there is no linkage. However, using computer simulations, we show that, when the marker and disease locus are linked and the data set consists of families with multiple affected siblings, this assumption leads to a bias in the score statistic under the null hypothesis of no association between the marker and disease alleles. This bias leads to an inflated type I error rate for the score test in regions of linkage. We present a novel test for association in the presence of linkage (APL) that correctly infers missing parental genotypes in regions of linkage by estimating identity-by-descent parameters, to adjust for correlation between parental transmissions to affected siblings. In simulated data, we demonstrate the validity of the APL test under the null hypothesis of no association and show that the test can be more powerful than the pedigree disequilibrium test and family-based association test. As an example, we compare the performance of the tests in a candidate-gene study in families with Parkinson disease.

Download full-text


Available from: Eden R Martin,
  • Source
    • "APL-OSA is a method that implements the APL test to detect increased evidence for association in the tail of a covariate distribution (Chung et al. 2008). Simulation studies show that APL has the same, or better, power as other tests without type I error inflation (Martin et al. 2003), even in regions with strong linkage (Chung et al. 2007). APL-OSA detects increased association in the tails of a covariate distribution, and is an effective test for SNP-smoking interactions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We performed a gene-smoking interaction analysis using families from an early-onset coronary artery disease cohort (GENECARD). This analysis was focused on validating and expanding results from previous studies implicating single nucleotide polymorphisms (SNPs) on chromosome 3 in smoking-mediated coronary artery disease. We analyzed 430 SNPs on chromosome 3 and identified 16 SNPs that showed a gene-smoking interaction at P < 0.05 using association in the presence of linkage-ordered subset analysis, a method that uses permutations of the data to empirically estimate the strength of the association signal. Seven of the 16 SNPs were in the Rho-GTPase pathway indicating a 1.87-fold enrichment for this pathway. A meta-analysis of gene-smoking interactions in three independent studies revealed that rs9289231 in KALRN had a Fisher's combined P value of 0.0017 for the interaction with smoking. In a gene-based meta-analysis KALRN had a P value of 0.026. Finally, a pathway-based analysis of the association results using WebGestalt revealed several enriched pathways including the regulation of the actin cytoskeleton pathway as defined by the Kyoto Encyclopedia of Genes and Genomes.
    Human Genetics 08/2013; 132(12). DOI:10.1007/s00439-013-1339-7 · 4.82 Impact Factor
  • Source
    • "In the presence of association and linkage, there tends to be a positive correlation between tests of association and linkage; however, there is no such correlation between tests when linkage and association are not present, implying that true positive results in one test tend to be reflected by positive results in another [25]. Family based association with early onset CAD was performed using the association in the presence of linkage test (APL) [26]. This test appropriately accounts for the non-independence of affected siblings and calculates a robust estimate of the genetic variance. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Coronary artery disease (CAD), and one of its intermediate risk factors, dyslipidemia, possess a demonstrable genetic component, although the genetic architecture is incompletely defined. We previously reported a linkage peak on chromosome 5q31-33 for early-onset CAD where the strength of evidence for linkage was increased in families with higher mean low density lipoprotein-cholesterol (LDL-C). Therefore, we sought to fine-map the peak using association mapping of LDL-C as an intermediate disease-related trait to further define the etiology of this linkage peak. The study populations consisted of 1908 individuals from the CATHGEN biorepository of patients undergoing cardiac catheterization; 254 families (N = 827 individuals) from the GENECARD familial study of early-onset CAD; and 162 aorta samples harvested from deceased donors. Linkage disequilibrium-tagged SNPs were selected with an average of one SNP per 20 kb for 126.6-160.2 MB (region of highest linkage) and less dense spacing (one SNP per 50 kb) for the flanking regions (117.7-126.6 and 160.2-167.5 MB) and genotyped on all samples using a custom Illumina array. Association analysis of each SNP with LDL-C was performed using multivariable linear regression (CATHGEN) and the quantitative trait transmission disequilibrium test (QTDT; GENECARD). SNPs associated with the intermediate quantitative trait, LDL-C, were then assessed for association with CAD (i.e., a qualitative phenotype) using linkage and association in the presence of linkage (APL; GENECARD) and logistic regression (CATHGEN and aortas). We identified four genes with SNPs that showed the strongest and most consistent associations with LDL-C and CAD: EBF1, PPP2R2B, SPOCK1, and PRELID2. The most significant results for association of SNPs with LDL-C were: EBF1, rs6865969, p = 0.01; PPP2R2B, rs2125443, p = 0.005; SPOCK1, rs17600115, p = 0.003; and PRELID2, rs10074645, p = 0.0002). The most significant results for CAD were EBF1, rs6865969, p = 0.007; PPP2R2B, rs7736604, p = 0.0003; SPOCK1, rs17170899, p = 0.004; and PRELID2, rs7713855, p = 0.003. Using an intermediate disease-related quantitative trait of LDL-C we have identified four novel CAD genes, EBF1, PRELID2, SPOCK1, and PPP2R2B. These four genes should be further examined in future functional studies as candidate susceptibility loci for cardiovascular disease mediated through LDL-cholesterol pathways.
    BMC Genetics 02/2012; 13(1):12. DOI:10.1186/1471-2156-13-12 · 2.40 Impact Factor
  • Source
    • "Latent class models [44] have been used to estimate membership-class probabilities for individuals with similar genetic backgrounds [45-48].Ordered Subset Analysis (OSA)-based models have been extended to association, including the sequential addition (SA) procedure [49] and the OSA case-control (OSACC) method [50]. For family-based data, the OSA-TDT [51] applies OSA to the transmission disequilibrium test (TDT) [52], and the APL-OSA [53] similarly applies OSA to the "association in the presence of linkage" test (APL) [54]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Locus heterogeneity is one of the most documented phenomena in genetics. To date, relatively little work had been done on the development of methods to address locus heterogeneity in genetic association analysis. Motivated by Zhou and Pan's work, we present a mixture model of linked and unlinked trios and develop a statistical method to estimate the probability that a heterozygous parent transmits the disease allele at a di-allelic locus, and the probability that any trio is in the linked group. The purpose here is the development of a test that extends the classic transmission disequilibrium test (TDT) to one that accounts for locus heterogeneity. Our simulations suggest that, for sufficiently large sample size (1000 trios) our method has good power to detect association even the proportion of unlinked trios is high (75%). While the median difference (TDT-HET empirical power - TDT empirical power) is approximately 0 for all MOI, there are parameter settings for which the power difference can be substantial. Our multi-locus simulations suggest that our method has good power to detect association as long as the markers are reasonably well-correlated and the genotype relative risk are larger. Results of both single-locus and multi-locus simulations suggest our method maintains the correct type I error rate.Finally, the TDT-HET statistic shows highly significant p-values for most of the idiopathic scoliosis candidate loci, and for some loci, the estimated proportion of unlinked trios approaches or exceeds 50%, suggesting the presence of locus heterogeneity. We have developed an extension of the TDT statistic (TDT-HET) that allows for locus heterogeneity among coded trios. Benefits of our method include: estimates of parameters in the presence of heterogeneity, and reasonable power even when the proportion of linked trios is small. Also, we have extended multi-locus methods to TDT-HET and have demonstrated that the empirical power may be high to detect linkage. Last, given that we obtain PPBs, we conjecture that the TDT-HET may be a useful method for correctly identifying linked trios. We anticipate that researchers will find this property increasingly useful as they apply next-generation sequencing data in family based studies.
    BMC Bioinformatics 01/2012; 13(1):13. DOI:10.1186/1471-2105-13-13 · 2.58 Impact Factor
Show more