-
[show abstract]
[hide abstract]
ABSTRACT: We introduce a family-based confidence set inference (CSI) method that can be used in preliminary genome-wide association studies to obtain confidence sets of SNPs that contribute a specific percentage to the additive genetic variance of quantitative traits.
Developed in the framework of generalized linear mixed models, the method utilizes data from outbred families of arbitrary size and structure. Through our own simulation study and analysis of the Genetics Analysis Workshop 16 simulated data, we study the properties of our method and compare its performance to that of the family association method described by Chen and Abecasis [Am J Hum Genet 2007;81:913-926]. We also analyze the Framingham Heart Study data to identify SNPs regulating high-density lipoprotein levels.
The simulation studies demonstrated that CSI yields confidence sets with correct coverage and that it can outperform the method introduced by Chen and Abecasis [Am J Hum Genet 2007;81:913-926]. Furthermore, we identified five SNPs that potentially regulate high-density lipoprotein levels: rs9989419, rs11586238, rs1754415, rs9355648, and rs9356560.
The CSI method provides confidence sets of SNPs that contribute to the genetic variance of quantitative traits and is a competitive alternative to currently used family association methods. The approach is particularly useful in genome-wide association studies as it significantly reduces the number of SNPs investigated in follow-up studies.
Human Heredity 07/2012; 73(3):174-83. · 1.79 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Asthma prevalence is increasing worldwide in most populations, likely due to a combination of heritable factors and environmental changes. Curiously, however, some European farming populations are protected from asthma, which has been attributed to their traditional lifestyles and farming practices.
We conducted population-based studies of asthma and atopy in the Hutterites of South Dakota, a communal farming population, to assess temporal trends in asthma and atopy prevalence and describe the risk factors for asthma.
We studied 1325 Hutterites (ages 6-91 years) at 2 time points from 1996 to 1997 and from 2006 to 2009 by using asthma questionnaires, pulmonary function and methacholine bronchoprovocation tests, and measures of atopy.
The overall prevalence of asthma increased over the 10- to 13-year study period (7.5%-11.1%, P = .049), whereas the overall prevalence of atopy did not change (45.0%-44.8%, P = .95). Surprisingly, the rise in asthma was only among females (5.8%-11.2%, P = .02); the prevalence among males remained largely unchanged (9.4%-10.9%, P = .57). Atopy, which was not associated with asthma risk in 1996 to 1997, was the strongest risk factor for asthma among Hutterites studied in 2006 to 2009 (P = .003).
Asthma has increased over a 10- to 13-year period among Hutterite females and atopy has become a significant risk factor for asthma, suggesting a change in environmental exposures that are either sex limited or that elicit a sex-specific response.
The Journal of allergy and clinical immunology 08/2011; 128(4):774-9. · 9.17 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Understanding and modeling genetic or nongenetic factors that influence susceptibility to complex traits has been the focus of many genetic studies. Large pedigrees with known complex structure may be advantageous in epidemiological studies since they can significantly increase the number of factors whose influence on the trait can be estimated. We propose a likelihood approach, developed in the context of generalized linear mixed models, for modeling dichotomous traits based on data from hundreds of individuals all of whom are potentially correlated through either a known pedigree or an estimated covariance matrix. Our approach is based on a hierarchical model where we first assess the probability of each individual having the trait and then formulate a likelihood assuming conditional independence of individuals. The advantage of our formulation is that it easily incorporates information from pertinent covariates as fixed effects and at the same time takes into account the correlation between individuals that share genetic background or other random effects. The high dimensionality of the integration involved in the likelihood prohibits exact computations. Instead, an automated Monte Carlo expectation maximization algorithm is employed for obtaining the maximum likelihood estimates of the model parameters. Through a simulation study we demonstrate that our method can provide reliable estimates of the model parameters when the sample size is close to 500. Implementation of our method to data from a pedigree of 491 Hutterites evaluated for Type 2 diabetes (T2D) reveal evidence of a strong genetic component to T2D risk, particularly for younger and leaner cases.
Genetic Epidemiology 04/2011; 35(5):291-302. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Locus heterogeneity, wherein a disease can be caused in different individuals by different genes and/or environmental factors, is a ubiquitous feature of complex traits. A Bayesian approach has been proposed to account for variable rates of heterogeneity across families in a parametric linkage analysis setup [Biswas and Lin: J Am Stat Assoc 2006;101:1341-1351]. As with any parametric approach, its application requires specification of the disease model, which limits its practical utility.
We address this limitation by proposing a Bayesian model averaging (BMA) approach. We consider a finite number of disease models and treat the model as an unknown parameter. In practice, we use simple single-locus disease models as various categories for model.
Our simulations as well as analysis of Genetic Analysis Workshop 13 simulated data show that BMA retains at least 80% of the power that is obtained by analyzing under the true disease model. The coverage probability of interval for disease gene is maintained around the nominal level. Finally, we apply BMA to a Late-Onset Alzheimer's Disease dataset and find evidence for linkage on chromosomes 19, 9, and 21.
We conclude that the BMA approach utilizing simple single-locus models for averaging is effective for mapping heterogeneous traits.
Human Heredity 03/2010; 69(4):242-53. · 1.79 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A new method for constructing confidence intervals for the location of putative genes regulating expression levels (quantitative traits) is proposed. This method is suitable for the "intermediate" fine-mapping step usually performed between the initial whole-genome screening and the follow-up fine mapping step as a means of reducing the size of the region where the latter is performed. Assuming the existence of a single quantitative trait locus (QTL) in the region/chromosome identified by the genome scan, the method constructs a confidence region for its true position by testing each location in the chromosome to see if it can be the trait locus. We applied our method to the gene expression data from Problem 1 of Genetic Analysis Workshop 15 (GAW15) data, focusing on 25 genes that have previously been shown to share common regulating factor(s) on chromosome 14. Our results pointed to the same region on chromosome 14 for 13 of the gene expressions studied, not only partially reproducing the results of the previous analysis, but also yielding 95% confidence regions for the regulatory quantitative trait loci. Moreover, we identified three regions, one on each of the chromosomes 3, 9, and 13, which potentially harbor additional common QTLs for several of the original gene expressions.
BMC proceedings 02/2007; 1 Suppl 1:S91.
-
Ellen M Wijsman,
Yun Ju Sung,
Alfonso Buil,
Elizabeth Atkinson,
Laurel Bastone,
G Bryce Christensen,
Guoding Diao,
Tao Feng,
Nora Franceschini,
Song Huang,
Donghui Kan,
Berit Kerner,
Francesca Lantieri,
Eunjee Lee, Charalampos Papachristou,
Andrew Paterson,
Jagadish Rangrej,
Shuang Wang,
Chao Xing,
Xiaofang Zhu
[show abstract]
[hide abstract]
ABSTRACT: Group 9 participants carried out linkage analysis of the Centre d'Etude de Polymorphism Humain (CEPH) expression data, using strategies that ranged from focused investigation of a small number of traits to full genome scans of all available traits. Results from five key areas encompass the most important results within and across the 17 participating groups. First, both extensive genetic heterogeneity and poor predictability of mapping results based on heritability have key implications for study design. Second, choice of the map used for linkage analysis is influential, with the implication that meiotic maps are preferable to physical maps. Third, performance of different analytic methods was in general fairly consistent, with the exception of one variance-component method that uses marker allele sharing as the dependent rather than independent variable. Fourth, multivariate analysis approaches did not generally appear to provide advantages over univariate approaches for linkage detection. Finally, there were computational and analytic challenges in working with a large public data set, along with need for more data documentation.
Genetic Epidemiology 02/2007; 31 Suppl 1:S75-85. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The arrival of highly dense genetic maps at low cost has geared the focus of linkage analysis studies toward developing methods for placing putative trait loci in narrow regions with high confidence. This shift has led to a new analytic scheme that expands the traditional two-stage protocol of preliminary genome scan followed by fine mapping through inserting a new stage in between the two. The goal of this new "intermediate" fine mapping stage is to isolate disease loci to narrow intervals with high confidence so that association studies can be more focused, efficient, and cost-effective. In this paper, we compared and contrasted five methods that can be used for performing this intermediate step. These methods are: the lod support approach, the generalized estimating equations (GEE) method, the confidence set inference (CSI) procedure, and two bootstrap methods. We compared these methods in terms of the coverage probability and precision of localization of the resulting intervals. Results from a simulation study considering several two-locus models demonstrated that the two bootstrap methods yield intervals with approximately correct coverage. On the other hand, the 1-lod support intervals, and those produced by the GEE method, tend to significantly undercover the trait locus, while the regions obtained by the CSI incline to overcover the gene position. When the observed coverage of the confidence intervals produced by all the methods was held to be the same, those obtained through the CSI procedure displayed a higher ability to localize loci, especially when these loci have a minor contribution to the trait and when the amount of data available for the analysis is relatively small. However, with very large sample sizes, lod support intervals emerged as a winner. Application of the methods to the data from the Arthritis Research Campaign National Repository led to intervals containing the position of a known trait locus for all methods, with the greatest precision achieved by the CSI.
Genetic Epidemiology 01/2007; 30(8):677-89. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: With cost-effective high-throughput Single Nucleotide Polymorphism (SNP) arrays now becoming widely available, it is highly anticipated that SNPs will soon become the choice of markers in whole genome screens. This optimism raises a great deal of interest in assessing whether dense SNP maps offer at least as much information as their microsatellite (MS) counterparts. Factors considered to date include information content, strength of linkage signals, and effect of linkage disequilibrium. In the current report, we focus on investigating the relative merits of SNPs vs. MS markers for disease gene localization. For our comparisons, we consider three novel confidence interval estimation procedures based on confidence set inference (CSI) using affected sib-pair data. Two of these procedures are multipoint in nature, enabling them to capitalize on dense SNPs with limited heterozygosity. The other procedure makes use of markers one at a time (two-point), but is much more computationally efficient. In addition to marker type, we also assess the effects of a number of other factors, including map density and marker heterozygosity, on disease gene localization through an extensive simulation study. Our results clearly show that confidence intervals derived based on the CSI multipoint procedures can place the trait locus in much shorter chromosomal segments using densely saturated SNP maps as opposed to using sparse MS maps. Finally, it is interesting (although not surprising) to note that, should one wish to perform a quick preliminary genome screening, then the two-point CSI procedure would be a preferred, computationally cost-effective choice.
Genetic Epidemiology 02/2006; 30(1):3-17. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Preliminary genome screens are usually succeeded by fine mapping analyses focusing on the regions that signal linkage. It is advantageous to reduce the size of the regions where follow-up studies are performed, since this will help better tackle, among other things, the multiplicity adjustment issue associated with them. We describe a two-step approach that uses a confidence set inference procedure as a tool for intermediate mapping (between preliminary genome screening and fine mapping) to further localize disease loci. Apart from the usual Hardy-Weiberg and linkage equilibrium assumptions, the only other assumption of the proposed approach is that each region of interest houses at most one of the disease-contributing loci. Through a simulation study with several two-locus disease models, we demonstrate that our method can isolate the position of trait loci with high accuracy. Application of this two-step procedure to the data from the Arthritis Research Campaign National Repository also led to highly encouraging results. The method not only successfully localized a well-characterized trait contributing locus on chromosome 6, but also placed its position to narrower regions when compared to their LOD support interval counterparts based on the same data.
Genetic Epidemiology 02/2006; 30(1):18-29. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Three variants of the confidence set inference (CSI) procedure were proposed and applied to both the simulated and the Collaborative Study on the Genetics of Alcoholism (COGA) data. For each of the two applications, we first performed a preliminary genome scan study based on the microsatellite markers using the GENEHUNTER+ software to identify regions that potentially harbor disease loci. For each such region, we estimated the sibling identity-by-descent sharing probability distribution at the putative disease locus. Based on these estimated probabilities, the CSI procedures were employed to further localize the disease loci using the single-nucleotide polymorphism markers, leading to confidence intervals/regions for their locations. For our analysis with the simulated data, we had knowledge of the simulating models at the time we performed the analysis.
BMC Genetics 01/2006; 6 Suppl 1:S21. · 2.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A recent approach for gene mapping based on confidence set inference (CSI) promises several advantages, including avoidance of corrections for multiple tests, availability of confidence intervals with known statistical properties, and sufficient localizations of disease genes. This paper proposes an extended CSI procedure that can handle markers with incomplete polymorphism, thereby increasing the applicability of the set of CSI methods in practical situations. Simulation studies show that the new procedure retains the main advantages of the original CSI. Although it generally requires more data to achieve a similar power, this increase is moderate for markers with 80% heterozygosity or higher. We also investigate the effects of relative risk estimates and disease models. Our analyses show that perturbation from actual relative risks or multilocus disease models generally leads to reduction in power or inflation in type I error, as expected. Nevertheless, for certain classes of two-locus disease models, CSI can still perform well, with reasonably high actual coverage probabilities for at least one of the disease loci. Application of CSI to the data provided by the Genetic Analysis Workshop 13 yields encouraging results, as they compare favorably to those obtained from GENEHUNTER using its NPL sib-pair method.
Human Heredity 02/2005; 59(1):1-13. · 1.79 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The goal of this study is to evaluate, compare, and contrast several standard and new linkage analysis methods. First, we compare a recently proposed confidence set approach with MAPMAKER/SIBS. Then, we evaluate a new Bayesian approach that accounts for heterogeneity. Finally, the newly developed software SIMPLE is compared with GENEHUNTER. We apply these methods to several replicates of the Genetic Analysis Workshop 13 simulated data to assess their ability to detect the high blood pressure genes on chromosome 21, whose positions were known to us prior to the analyses. In contrast to the standard methods, most of the new approaches are able to identify at least one of the disease genes in all the replicates considered.
BMC Genetics 02/2003; 4 Suppl 1:S70. · 2.47 Impact Factor