Population Structure of Hispanics in the United States: The Multi-Ethnic Study of Atherosclerosis

Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America.
PLoS Genetics (Impact Factor: 7.53). 04/2012; 8(4):e1002640. DOI: 10.1371/journal.pgen.1002640
Source: PubMed


Using ~60,000 SNPs selected for minimal linkage disequilibrium, we perform population structure analysis of 1,374 unrelated Hispanic individuals from the Multi-Ethnic Study of Atherosclerosis (MESA), with self-identification corresponding to Central America (n = 93), Cuba (n = 50), the Dominican Republic (n = 203), Mexico (n = 708), Puerto Rico (n = 192), and South America (n = 111). By projection of principal components (PCs) of ancestry to samples from the HapMap phase III and the Human Genome Diversity Panel (HGDP), we show the first two PCs quantify the Caucasian, African, and Native American origins, while the third and fourth PCs bring out an axis that aligns with known South-to-North geographic location of HGDP Native American samples and further separates MESA Mexican versus Central/South American samples along the same axis. Using k-means clustering computed from the first four PCs, we define four subgroups of the MESA Hispanic cohort that show close agreement with self-identification, labeling the clusters as primarily Dominican/Cuban, Mexican, Central/South American, and Puerto Rican. To demonstrate our recommendations for genetic analysis in the MESA Hispanic cohort, we present pooled and stratified association analysis of triglycerides for selected SNPs in the LPL and TRIB1 gene regions, previously reported in GWAS of triglycerides in Caucasians but as yet unconfirmed in Hispanic populations. We report statistically significant evidence for genetic association in both genes, and we further demonstrate the importance of considering population substructure and genetic heterogeneity in genetic association studies performed in the United States Hispanic population.

Download full-text


Available from: Jerome I Rotter,
30 Reads
  • Source
    • "Thus, the population of Puerto Rico may be modeled as an admixed population with contributions from three continental areas: Sub-Saharan Africa, from the slave trade, America, from the extensive migrations of Native Americans prior to colonization, and Europe, from the colonization of the “New World” by European powers. [12], [13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Puerto Rico population may be modeled as an admixed population with contributions from three continents: Sub-Saharan Africa, Ancient America, and Europe. Extending the study of the genetics of inflammatory bowel disease (IBD) to an admixed population such as Puerto Rico has the potential to shed light on IBD genes identified in studies of European populations, find new genes contributing to IBD susceptibility, and provide basic information on IBD for the care of US patients of Puerto Rican and Latino descent. In order to study the association between immune-related genes and Crohn's disease (CD) and ulcerative colitis (UC) in Puerto Rico, we genotyped 1159 Puerto Rican cases, controls, and family members with the ImmunoChip. We also genotyped 832 subjects from the Human Genome Diversity Panel to provide data for estimation of global and local continental ancestry. Association of SNPs was tested by logistic regression corrected for global continental descent and family structure. We observed the association between Crohn's disease and NOD2 (rs17313265, 0.28 in CD, 0.19 in controls, OR 1.5, p = 9×10-6) and IL23R (rs11209026, 0.026 in CD, 0.0.071 in controls, OR 0.4, p = 3.8×10-4). The haplotype structure of both regions resembled that reported for European populations and "local" continental ancestry of the IL23R gene was almost entirely of European descent. We also observed suggestive evidence for the association of the BAZ1A promoter SNP with CD (rs1200332, 0.45 in CD, 0.35 in controls, OR 1.5, p = 2×10-6). Our estimate of continental ancestry surrounding this SNP suggested an origin in Ancient America for this putative susceptibility region. Our observations underscored the great difference between global continental ancestry and local continental ancestry at the level of the individual gene, particularly for immune-related loci.
    PLoS ONE 09/2014; 9(9):e108204. DOI:10.1371/journal.pone.0108204 · 3.23 Impact Factor
  • Source
    • "We next use simulated data to compare FastIndep with three other algorithms used in statistical genetics Primus, KING [14] and PLINK [2]. Both KING and PLINK are implemented in the Primus software, and it is these implmentations we will use. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic Analyses in large sample populations are important for a better understanding of the variation between populations, for designing conservation programs, for detecting rare mutations which may be risk factors for a variety of diseases, among other reasons. However these analyses frequently assume that the participating individuals or animals are mutually unrelated which may not be the case in large samples, leading to erroneous conclusions. In order to retain as much data as possible while minimizing the risk of false positives it is useful to identify a large subset of relatively unrelated individuals in the population. This can be done using a heuristic for finding a large set of independent of nodes in an undirected graph. We describe a fast randomized heuristic for this purpose. The same methodology can also be used for identifying a suitable set of markers for analyzing population stratification, and other instances where a rapid heuristic for maximal independent sets in large graphs is needed. We present FastIndep, a fast random heuristic algorithm for finding a maximal independent set of nodes in an arbitrary undirected graph along with an efficient implementation in C++. On a 64 bit Linux or MacOS platform the execution time is a few minutes, even with a graph of several thousand nodes. The algorithm can discover multiple solutions of the same cardinality. FastIndep can be used to discover unlinked markers, and unrelated individuals in populations. The methods presented here provide a quick and efficient method for identifying sets of unrelated individuals in large populations and unlinked markers in marker panels. The C++ source code and instructions along with utilities for generating the input files in the appropriate format are available at
    Source Code for Biology and Medicine 03/2014; 9(1):6. DOI:10.1186/1751-0473-9-6
  • Source
    • "Cultural and lifestyle homogeneity avoids compounding factors in a way that would be problematic in a major city or across a large geographic area. All the CCHC participants are self-identified Mexican Americans, a rapidly growing minority population known to be genetically admixed with European, African, and Native Amerindian ancestries (1). Elevated homeostasis model assessment of insulin resistance (HOMA-IR) index is commonly seen in this population (4). "
    [Show abstract] [Hide abstract]
    ABSTRACT: OBJECTIVE An elevated insulin resistance index (homeostasis model assessment of insulin resistance [HOMA-IR]) is more commonly seen in the Mexican American population than in European populations. We report quantitative ancestral effects within a Mexican American population, and we correlate ancestral components with HOMA-IR.RESEARCH DESIGN AND METHODS We performed ancestral analysis in 1,551 participants of the Cameron County Hispanic Cohort by genotyping 103 ancestry-informative markers (AIMs). These AIMs allow determination of the percentage (0-100%) ancestry from three major continental populations, i.e., European, African, and Amerindian.RESULTSWe observed that predominantly Amerindian ancestral components were associated with increased HOMA-IR (β = 0.124, P = 1.64 × 10(-7)). The correlation was more significant in males (Amerindian β = 0.165, P = 5.08 × 10(-7)) than in females (Amerindian β = 0.079, P = 0.019).CONCLUSIONS This unique study design demonstrates how genomic markers for quantitative ancestral information can be used in admixed populations to predict phenotypic traits such as insulin resistance.
    Diabetes care 08/2012; 35(12). DOI:10.2337/dc12-0636 · 8.42 Impact Factor
Show more