Utilizing Genotype Imputation for the Augmentation of Sequence Data

Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA.
PLoS ONE (Impact Factor: 3.53). 06/2010; 5(6):e11018. DOI: 10.1371/journal.pone.0011018
Source: PubMed

ABSTRACT In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.
A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.
Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD) with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.
    PLoS ONE 07/2014; 9(7):e102544. DOI:10.1371/journal.pone.0102544 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The addition of sequence data from own-study individuals to genotypes from external data repositories, for example, the HapMap, has been shown to improve the accuracy of imputed genotypes. Early approaches for reference panel selection favored individuals who best reflect recombination patterns in the study population. By contrast, a maximization of genetic diversity in the reference panel has been recently proposed. We investigate here a novel strategy to select individuals for sequencing that relies on the characterization of the ancestral kernel of the study population. The simulated study scenarios consisted of several combinations of subpopulations from HapMap. HapMap individuals who did not belong to the study population constituted an external reference panel which was complemented with the sequences of study individuals selected according to different strategies. In addition to a random choice, individuals with the largest statistical depth according to the first genetic principal components were selected. In all simulated scenarios the integration of sequences from own-study individuals increased imputation accuracy. The selection of individuals based on the statistical depth resulted in the highest imputation accuracy for European and Asian study scenarios, whereas random selection performed best for an African-study scenario. Present findings indicate that there is no universal ‘best strategy’ to select individuals for sequencing. We propose to use the methodology described in the manuscript to assess the advantage of focusing on the ancestral kernel under own study characteristics (study size, genetic diversity, availability and properties of external reference panels, frequency of imputed variants…).
    Genetic Epidemiology 12/2014; 39(2). DOI:10.1002/gepi.21873 · 2.95 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.
    Genetics Selection Evolution 07/2014; 46(1):41. DOI:10.1186/1297-9686-46-41 · 3.75 Impact Factor


Available from