Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India.

Institute for Genetic Medicine, University of Southern California, 2250 Alcazar St., Los Angeles, California 90033, USA.
Annals of Human Genetics (Impact Factor: 2.22). 08/2008; 72(Pt 4):535-46. DOI: 10.1111/j.1469-1809.2008.00457.x
Source: PubMed

ABSTRACT When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Associations of polymorphisms from dopaminergic neurotransmitter pathway genes have mostly been reported in Caucasian ancestry schizophrenia (SZ) samples. As studies investigating single SNPs with SZ have been inconsistent, more detailed analyses utilizing multiple SNPs with the diagnostic phenotype as well as cognitive function may be more informative. Therefore, these analyses were conducted in a north Indian sample. Indian SZ case-parent trios (n = 601 families); unscreened controls (n = 468) and an independent set of 118 trio families were analyzed. Representative SNPs in the Dopamine D3 receptor (DRD3), dopamine transporter (SLC6A3), vesicular monoamine transporter 2 (SLC18A2), catechol-o-methyltransferase (COMT) and dopamine beta-hydroxylase (DBH) were genotyped using SNaPshot/SNPlex assays (n = 59 SNPs). The Trail Making Test (TMT) was administered to a subset of the sample (n = 260 cases and n = 302 parents). Eight SNPs were nominally associated with SZ in either case-control or family based analyses (p < 0.05, rs7631540 and rs2046496 in DRD3; rs363399 and rs10082463 in SLC18A2; rs4680, rs4646315 and rs9332377 in COMT). rs6271 at DBH was associated in both analyses. Haplotypes of DRD3 SNPs incorporating rs7631540-rs2134655-rs3773678-rs324030-rs6280-rs905568 showed suggestive associations in both case-parent and trio samples. At SLC18A2, rs10082463 was nominally associated with psychomotor performance and rs363285 with executive functions using the TMT but did not withstand multiple corrections. Suggestive associations with dopaminergic genes were detected in this study, but convincing links between dopaminergic polymorphisms and SZ or cognitive function were not observed.
    Journal of Psychiatric Research 08/2013; · 4.09 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.
    PLoS ONE 01/2012; 7(11):e50610. · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The allelic frequency spectrum emerging from several Next Generation Sequencing (NGS) projects is revealing important details about evolutionary and demographic forces that shaped the human genome. Herein, we discuss some of the achievements of the use of low-frequency and rare variants from NGS studies. The majority of variants that affect protein-coding regions are recent and rare. Often, the novel rare variants are enriched for deleterious alleles and are population specific, making them suitable for the study of disease susceptibility. To investigate this kind of variation and its effects in association studies, very large sample sizes will be necessary to achieve sufficient statistical power. Moreover, as these variants are typically population-specific, the replication of disease associations across populations could be very difficult due to population stratification. Therefore, the design of experiments focusing on the identification of rare variants and their effects should be carefully planned. Although several successes have already been achieved through NGS for genetic epidemiology, pharmacogenetic and clinical purposes, with improvements of the sequencing technology and decreased costs, further advances are expected in the near future. Environ. Mol. Mutagen., 2013. © 2013 Wiley Periodicals, Inc.
    Environmental and Molecular Mutagenesis 08/2013; · 3.71 Impact Factor


Available from