Simulated data for a complex genetic trait (Problem 2 for GAW11): How the model was developed, and why

Department of Psychiatry, Mt. Sinai Medical Center, New York, NY 10029, USA.
Genetic Epidemiology (Impact Factor: 2.6). 01/1999; 17 Suppl 1(S1):S449-59. DOI: 10.1002/gepi.1370170773
Source: PubMed


This paper describes a simulated data set created as Problem 2 for GAW11. The generating model for Problem 2 involved two different genetic diseases, or "types," in three separate populations. The two-locus (2L) type results from the epistatic interaction of two genetic loci, and the three-allele type, from a single locus with two disease-causing alleles and one normal allele. Each type has two phenotypic forms: Mild and Severe. Both forms are subject to both genetic and environmental influences. The disease occurs in three different hypothetical populations, each with different disease allele frequencies and penetrances. In two populations there is also a fourth locus with an allele that is associated with the 2L type. Misdiagnosis can occur, but only after a family has already been ascertained through > or = 2 "genetically" affected offspring. Finally, the three different populations are studied by four different hypothetical research groups. These groups each have their own ideas about how the disease is inherited and have therefore devised different ascertainment schemes based on those beliefs. Each research group collected 100-family data sets, including data on 300 markers on six chromosomes and measurements on disease status and on the proposed two environmental factors. GAW participants were supplied with 25 random replicates of each data set.

Download full-text


Available from: David A Greenberg, Sep 10, 2014
9 Reads
  • Source
    • "Fisher's method to combine p-values (Fisher, 1932), a standard method for meta-analysis, has been applied to genetic epidemiology by Allison & Heo (1998), who used this method to combine p-values from single point analyses at different markers across a candidate region in studies of obesity. Guerra et al. (1999) compared Fisher's method with the pooling of raw data on Genetic Analysis Workshop 11 simulated data (Greenberg et al. 1999). Zaykin et al. (2002) presented an extension of Fisher's method to combine only p-values below a certain threshold, which is presented in detail in the methods section (a similar method was proposed by Olkin & Saner (2001) in a general meta-analysis context). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Linkage genome scans for complex diseases have low power with the usual sample sizes, and hence meta-analysis of several scans for the same disease might be a promising approach. Appropriate data are now becoming accessible. Here we give an overview of the available statistical methods and current applications. In a simulation study, we compare the power of different methods to combine multipoint linkage scores, namely Fisher's p-value combination, the truncated product method, the Genome Search Meta-Analysis (GSMA) method and our weighting methods. In particular, we investigate the effects of heterogeneity introduced by different genetic marker sets and sample sizes between genome scans. The weighting methods explicitly take those differences into account and have more power in the simulated scenarios than the other methods.
    Annals of Human Genetics 02/2004; 68(Pt 1):69-83. DOI:10.1046/j.1529-8817.2003.00061.x · 2.21 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objectives. The cost of a genetic linkage or association study is largely determined by the number of individuals to be recruited, phenotyped, and genotyped. The efficiency can be increased by using a sequential procedure that reduces time and cost on average. Two strategies for sequential designs in genetic epidemiological studies can be distinguished: One approach is to increase the sample size sequentially and to conduct multiple significance tests on accumulating data. If significance or futility can be assumed with a certain probability, the study is stopped. Otherwise, it is carried on to the next stage. The second approach is to conduct early linkage analyses on a coarse marker grid, and to increase marker density in later stages. Interim analyses are performed to select interesting genomic areas for follow up. The aim of this article is to give a review on sequential procedures in the context of genetic linkage and association studies. Methods. A systematic literature search was performed in the Medline and the Linkage Bibliography databases. Articles were defined as relevant if a sequential design was proposed or applied in genetic linkage or association studies. Results. The majority of proposed study designs is developed to meet the demands of specific studies and lacks a theoretical foundation. A second group of procedures is based on simulation results and principally restricted to the specific simulated situations. Finally, some theoretically founded procedures have been proposed that are discussed in detail. Conclusions. Although interesting and promising procedures have been suggested, they still lack realizations for practical purposes. In addition, further developments are required to adapt sequential strategies for optimal use in genetic epidemiological studies.
    Biometrical Journal 08/2001; 43(4):501 - 525. DOI:10.1002/1521-4036(200108)43:4<501::AID-BIMJ501>3.0.CO;2-I · 0.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Juvenile myoclonic epilepsy (JME) is a common form of generalized epilepsy that starts in adolescence. A major JME susceptibility locus (EJM1) was mapped to chromosomal region 6p21 in three independent linkage studies, and association was reported between JME and a microsatellite marker in the 6p21 region. The critical region for EJM1 is delimited by obligate recombinants at HLA-DQ and HLA-DP. In the present study, we found highly significant linkage disequilibrium (LD) between JME and a core haplotype of five single-nucleotide-polymorphism (SNP) and microsatellite markers in this critical region, with LD peaking in the BRD2 (RING3) gene (odds ratio 6.45; 95% confidence interval 2.36-17.58). DNA sequencing revealed two JME-associated SNP variants in the BRD2 (RING3) promoter region but no other potentially causative coding mutations in 20 probands from families with positive LOD scores. BRD2 (RING3) is a putative nuclear transcriptional regulator from a family of genes that are expressed during development. Our findings strongly suggest that BRD2 (RING3) is EJM1, the first gene identified for a common idiopathic epilepsy. These findings also suggest that abnormalities of neural development may be a cause of common idiopathic epilepsy, and the findings have implications for the generalizability of proposed pathogenetic mechanisms, derived from diseases that show Mendelian transmission, to their complex counterparts.
    The American Journal of Human Genetics 09/2003; 73(2):261-70. DOI:10.1086/377006 · 10.93 Impact Factor
Show more