Using DNA fingerprints to infer familial relationships within NHANES III households

Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, 6120 Executive Blvd. Room 8014 Rockville, MD 20852, U.S.A.
Journal of the American Statistical Association (Impact Factor: 1.98). 06/2010; 105(490):552-563. DOI: 10.1198/jasa.2010.ap09258
Source: PubMed


Developing, targeting, and evaluating genomic strategies for population-based disease prevention require population-based data. In response to this urgent need, genotyping has been conducted within the Third National Health and Nutrition Examination (NHANES III), the nationally-representative household-interview health survey in the U.S. However, before these genetic analyses can occur, family relationships within households must be accurately ascertained. Unfortunately, reported family relationships within NHANES III households based on questionnaire data are incomplete and inconclusive with regards to actual biological relatedness of family members. We inferred family relationships within households using DNA fingerprints (Identifiler(R)) that contain the DNA loci used by law enforcement agencies for forensic identification of individuals. However, performance of these loci for relationship inference is not well understood. We evaluated two competing statistical methods for relationship inference on pairs of household members: an exact likelihood ratio relying on allele frequencies to an Identical By State (IBS) likelihood ratio that only requires matching alleles. We modified these methods to account for genotyping errors and population substructure. The two methods usually agree on the rankings of the most likely relationships. However, the IBS method underestimates the likelihood ratio by not accounting for the informativeness of matching rare alleles. The likelihood ratio is sensitive to estimates of population substructure, and parent-child relationships are sensitive to the specified genotyping error rate. These loci were unable to distinguish second-degree relationships and cousins from being unrelated. The genetic data is also useful for verifying reported relationships and identifying data quality issues. An important by-product is the first explicitly nationally-representative estimates of allele frequencies at these ubiquitous forensic loci.

Full-text preview

Available from:
  • Source
    • "The use of large and sophisticated mobile examination centers allows high-quality physical measurements to be collected, including vision and dental measurements and blood and urine analysis. A striking example of the value of this approach is the genotyping of a sample of 1991-1994 NHANES respondents, constituting “the first U.S. population-based genetic dataset” [5]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A continuously operating survey can yield advantages in survey management, field operations, and the provision of timely information for policymakers and researchers. We describe the key features of the sample design of the New Zealand (NZ) Health Survey, which has been conducted on a continuous basis since mid-2011, and compare to a number of other national population health surveys. A number of strategies to improve the NZ Health Survey are described: implementation of a targeted dual-frame sample design for better Maori, Pacific, and Asian statistics; movement from periodic to continuous operation; use of core questions with rotating topic modules to improve flexibility in survey content; and opportunities for ongoing improvements and efficiencies, including linkage to administrative datasets. The use of disproportionate area sampling and a dual frame design resulted in reductions of approximately 19%, 26%, and 4% to variances of Maori, Pacific and Asian statistics respectively, but at the cost of a 17% increase to all-ethnicity variances. These were broadly in line with the survey's priorities. Respondents provided a high degree of cooperation in the first year, with an adult response rate of 79% and consent rates for data linkage above 90%. A combination of strategies tailored to local conditions gives the best results for national health surveys. In the NZ context, data from the NZ Census of Population and Dwellings and the Electoral Roll can be used to improve the sample design. A continuously operating survey provides both administrative and statistical advantages.
    Full-text · Article · Dec 2013 · Population Health Metrics
  • Source
    • "Multiple blood-related individuals are often sampled from the same household. On average 1.6 persons are sampled per household [8]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background This study is motivated by National Household Surveys that collect genetic data, in which complex samples (e.g. stratified multistage cluster sample), partially from the same family, are selected. In addition to the differential selection probabilities of selecting households and persons within the sampled households, there are two levels of correlations of the collected genetic data in National Genetic Household Surveys (NGHS). The first level of correlation is induced by the hierarchical geographic clustered sampling of households and the second level of correlation is induced by biological inheritances from individuals sampled in the same household. Results To test for Hardy-Weinberg Equilibrium (HWE) in NGHS, two test statistics, the CCS method [1] and the QS method [2], appear to be the only existing methods that take account of both correlations. In this paper, I evaluate both methods in terms of the test size and power under a variety of complex designs with different weighting schemes and varying magnitudes of the two correlation effects. Both methods are applied to a real data example from the Hispanic Health and Nutrition Examination Survey with simulated genotype data. Conclusions The QS method maintains the nominal size well and consistently achieves higher power than the CCS method in testing HWE under a variety of sample designs, and therefore is recommended for testing HWE of genetic survey data with complex designs.
    Full-text · Article · Mar 2013 · BMC Genetics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In population-based household surveys, for example, the National Health and Nutrition Examination Survey (NHANES), blood-related individuals are often sampled from the same household. Therefore, genetic data collected from national household surveys are often correlated due to two levels of clustering (correlation) with one induced by the multistage geographical cluster sampling, and the other induced by biological inheritance among multiple participants within the same sampled household. In this paper, we develop efficient statistical methods that consider the weighting effect induced by the differential selection probabilities in complex sample designs, as well as the clustering (correlation) effects described above. We examine and compare the magnitude of each level of clustering effects under different scenarios and identify the scenario under which the clustering effect induced by one level dominates the other. The proposed method is evaluated via Monte Carlo simulation studies and illustrated using the Hispanic Health and Nutrition Survey (HHANES) with simulated genotype data.
    Preview · Article · Nov 2011 · Annals of Human Genetics
Show more