Simulated data for a complex genetic trait (Problem 2 for GAW11): How the model was developed, and why

Department of Psychiatry, Mt. Sinai Medical Center, New York, NY 10029, USA.
Genetic Epidemiology (Impact Factor: 2.6). 01/1999; 17 Suppl 1(S1):S449-59. DOI: 10.1002/gepi.1370170773
Source: PubMed


This paper describes a simulated data set created as Problem 2 for GAW11. The generating model for Problem 2 involved two different genetic diseases, or "types," in three separate populations. The two-locus (2L) type results from the epistatic interaction of two genetic loci, and the three-allele type, from a single locus with two disease-causing alleles and one normal allele. Each type has two phenotypic forms: Mild and Severe. Both forms are subject to both genetic and environmental influences. The disease occurs in three different hypothetical populations, each with different disease allele frequencies and penetrances. In two populations there is also a fourth locus with an allele that is associated with the 2L type. Misdiagnosis can occur, but only after a family has already been ascertained through > or = 2 "genetically" affected offspring. Finally, the three different populations are studied by four different hypothetical research groups. These groups each have their own ideas about how the disease is inherited and have therefore devised different ascertainment schemes based on those beliefs. Each research group collected 100-family data sets, including data on 300 markers on six chromosomes and measurements on disease status and on the proposed two environmental factors. GAW participants were supplied with 25 random replicates of each data set.

Download full-text


Available from: David A Greenberg, Sep 10, 2014
  • Source
    • "Fisher's method to combine p-values (Fisher, 1932), a standard method for meta-analysis, has been applied to genetic epidemiology by Allison & Heo (1998), who used this method to combine p-values from single point analyses at different markers across a candidate region in studies of obesity. Guerra et al. (1999) compared Fisher's method with the pooling of raw data on Genetic Analysis Workshop 11 simulated data (Greenberg et al. 1999). Zaykin et al. (2002) presented an extension of Fisher's method to combine only p-values below a certain threshold, which is presented in detail in the methods section (a similar method was proposed by Olkin & Saner (2001) in a general meta-analysis context). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Linkage genome scans for complex diseases have low power with the usual sample sizes, and hence meta-analysis of several scans for the same disease might be a promising approach. Appropriate data are now becoming accessible. Here we give an overview of the available statistical methods and current applications. In a simulation study, we compare the power of different methods to combine multipoint linkage scores, namely Fisher's p-value combination, the truncated product method, the Genome Search Meta-Analysis (GSMA) method and our weighting methods. In particular, we investigate the effects of heterogeneity introduced by different genetic marker sets and sample sizes between genome scans. The weighting methods explicitly take those differences into account and have more power in the simulated scenarios than the other methods.
    Annals of Human Genetics 02/2004; 68(Pt 1):69-83. DOI:10.1046/j.1529-8817.2003.00061.x · 2.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When searching for trait loci along the genome, properly incorporating prior genomic information into the analysis will almost certainly increase the chance of success. Recently, we devised a method that utilizes such prior information in the mapping of trait genes for complex disorders (Vieland, 1998; Wang el al. 1999; Vieland et al. 2000). This method uses the posterior probability of linkage (PPL) based on the admixture model as a measure of linkage information. In this paper, we study the consistency of the PPL. It is shown that, as the number of pedigrees increases, the PPL converges in probability to 1 when there is linkage between the marker and a trait locus, and converges to 0 otherwise. This conclusion is shown to be true for general pedigrees and trait models, even, when the likelihood functions are based on misspecified trait models. As part of the effort to prove this conclusion, it is shown that when there is no linkage, the maximum likelihood estimator of the recombination fraction in the admixture model is asymptotically 0.5, even when the admixture model misrepresents the true model.
    Annals of Human Genetics 12/2000; 64(Pt 6):533-53. DOI:10.1046/j.1469-1809.2000.6460533.x · 2.21 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The development of rigorous methods for evaluating the overall strength of evidence for genetic linkage based on multiple sets of data is becoming increasingly important in connection with genomic screens for complex disorders. We consider here what happens when we attempt to increase power to detect linkage by pooling multiple independently collected sets of families under conditions of variable levels of locus heterogeneity across samples. We show that power can be substantially reduced in pooled samples when compared to the most informative constituent subsamples considered alone, in spite of the increased sample size afforded by pooling. We demonstrate that for affected sib pair data, a simple adaptation of the lod score (which we call the compound lod), which allows for intersample admixture differences can afford appreciably higher power than the ordinary heterogeneity lod; and also, that a statistic we have proposed elsewhere, the posterior probability of linkage, performs at least as well as the compound lod while having considerable computational advantages. The companion paper (this issue, pp 217-225) shows further that in application to multiple data sets, familiar model-free methods are in some sense equivalent to ordinary lod scores based on data pooling, and that they therefore will also suffer dramatic losses in power for pooled data in the presence of locus heterogeneity and other complicating factors.
    Human Heredity 02/2001; 51(4):199-208. DOI:10.1159/000053343 · 1.47 Impact Factor
Show more