Making a haplotype catalog with estimated frequencies based on SNP homozygotes.
ABSTRACT Understanding the structure and frequencies of haplotypes is important for associating genetic polymorphisms with a given trait and for inferring the genetic genealogy of alleles in a population. Single nucleotide polymorphism (SNP) haplotypes can be determined without ambiguity when an individual does not have more than one heterozygous site in a given genomic region. Using genome-wide SNP genotypes for 3397 individuals from the Japanese population, we detected SNP homozygotes in the genomic regions of 1955 genes, determined haplotypes, and examined the efficiency of haplotype frequency estimation based on the proportion of SNP homozygotes in the sample. The estimated haplotype frequencies were very similar to the frequencies obtained by two statistical methods, PHASE and SNPHAP. We applied this approach to the genomic regions of 11 351 genes, and the results suggested that the sum of the frequencies of unobserved haplotypes is negligible for an analysis of a 100 kb genomic region with approximately 20 SNPs. Determination of haplotypes from homozygotes using genotype data from thousands of individuals, without a long computation time, appears to be useful for detecting real haplotypes including some low-frequency haplotypes. In addition, the unambiguously determined haplotypes with their estimated frequencies can be used as a catalog of haplotypes for the population, which is useful for the design of genome-wide association studies.
- SourceAvailable from: nature.com[Show abstract] [Hide abstract]
ABSTRACT: Although the Japanese population has a rather low genetic diversity, we recently confirmed the presence of two main clusters (the Hondo and Ryukyu clusters) through principal component analysis of genome-wide single-nucleotide polymorphism (SNP) genotypes. Understanding the genetic differences between the two main clusters requires further genome-wide analyses based on a dense SNP set and comparison of haplotype frequencies. In the present study, we determined haplotypes for the Hondo cluster of the Japanese population by detecting SNP homozygotes with 388,591 autosomal SNPs from 18,379 individuals and estimated the haplotype frequencies. Haplotypes for the Ryukyu cluster were inferred by a statistical approach using the genotype data from 504 individuals. We then compared the haplotype frequencies between the Hondo and Ryukyu clusters. In most genomic regions, the haplotype frequencies in the Hondo and Ryukyu clusters were very similar. However, in addition to the human leukocyte antigen region on chromosome 6, other genomic regions (chromosomes 3, 4, 5, 7, 10 and 12) showed dissimilarities in haplotype frequency. These regions were enriched for genes involved in the immune system, cell-cell adhesion and the intracellular signaling cascade. These differentiated genomic regions between the Hondo and Ryukyu clusters are of interest because they (1) should be examined carefully in association studies and (2) likely contain genes responsible for morphological or physiological differences between the two groups.Journal of Human Genetics 03/2012; 57(5):326-34. · 2.37 Impact Factor