[Show abstract][Hide abstract] ABSTRACT: Genome wide association studies (GWAS) for type 2 diabetes (T2D) undertaken in European and Asian ancestry populations have yielded dozens of robustly associated loci. However, the genomics of T2D remains largely understudied in sub-Saharan Africa (SSA), where rates of T2D are increasing dramatically and where the environmental background is quite different than in these previous studies. Here, we evaluate 106 reported T2D GWAS loci in continental Africans. We tested each of these SNPs, and SNPs in linkage disequilibrium (LD) with these index SNPs, for an association with T2D in order to assess transferability and to fine map the loci leveraging the generally reduced LD of African genomes. The study included 1775 unrelated Africans (1035 T2D cases, 740 controls; mean age 54 years; 59% female) enrolled in Nigeria, Ghana, and Kenya as part of the Africa America Diabetes Mellitus (AADM) study. All samples were genotyped on the Affymetrix Axiom PanAFR SNP array. Forty-one of the tested loci showed transferability to this African sample (p < 0.05, same direction of effect), 11 at the exact reported SNP and 30 others at SNPs in LD with the reported SNP (after adjustment for the number of tested SNPs). TCF7L2 SNP rs7903146 was the most significant locus in this study (p = 1.61 × 10−8). Most of the loci that showed transferability were successfully fine-mapped, i.e., localized to smaller haplotypes than in the original reports. The findings indicate that the genetic architecture of T2D in SSA is characterized by several risk loci shared with non-African ancestral populations and that data from African populations may facilitate fine mapping of risk loci. The study provides an important resource for meta-analysis of African ancestry populations and transferability of novel loci.
Full-text · Article · Nov 2015 · Frontiers in Genetics
[Show abstract][Hide abstract] ABSTRACT: Single nucleotide polymorphisms (SNPs) represent an important type of dynamic sites within the human genome. These common variants often locally correlate within more complex multi-SNP haploblocks that are maintained throughout generations in a stable population. Information encoded in the structure of SNPs and SNP haploblock variation can be characterized through a normalized information content metric. Genodynamics is being developed as the analogous "thermodynamics" characterizing the state variables for genomic populations that are stable under stochastic environmental stresses. Since living systems have not been found to develop in the absence of environmental influences, this paper describes the analogous genomic free energy metrics in a given environment. SNP haploblocks were constructed by Haploview v4.2 for five chromosomes from phase III HapMap data, and the genomic state variables for each chromosome were calculated. An in silico analysis was performed on SNP haploblocks with the lowest genomic energy measures. Highly favorable genomic energy measures were found to correlate with highly conserved SNP haploblocks. Moreover, the most conserved haploblocks were associated with an evolutionarily conserved regulatory element and domain.
[Show abstract][Hide abstract] ABSTRACT: The human genome is a complex, dynamic information system that encodes principles of life and living systems. These principles are incorporated in the structure of human genome sequence variation and are foundational for the continuity of life and human survival. Using first principles of thermodynamics and statistical physics, we have developed analogous "genodynamic tools" for population genomic studies. Characterizing genomic information through the lens of physics has allowed us to develop energy measures for modeling genome-environment interactions. In developing biophysical parameters for genome-environment homeostasis, we found that stable genomic free energy trades off low genomic energy (genomic conservation and increased order) and high genomic entropy (genomic variation) with an environmental potential that drives the variation. In our approach, we assert that common variants are dynamic sites in the genome of a population and that the stability of whole genome adaptation is reflected in the frequencies of maintained diversity in common variants for the population in its environment. In this paper, we address the relativity of whole genome adaptation towards homeostasis. By this we mean that adaptive forces are directly reflected in the frequency distribution of alleles and/or haplotypes of the population relative to its environment, with adaptive forces driving the genome towards homeostasis. The use of genomic energy units as a biophysical metric in DNA sequence variation analyses provides new insights into the foundations of population biology and diversity. Using our biophysical tools, population differences directly reflect the adaptive influences of the environment on populations.
[Show abstract][Hide abstract] ABSTRACT: Nested in the environment of the nucleus of the cell, the 23 sets of chromosomes that comprise the human genome function as one integrated whole system, orchestrating the expression of thousands of genes underlying the biological characteristics of the cell, individual and the species. The extraction of meaningful information from this complex data set depends crucially upon the lens through which the data are examined. We present a biophysical perspective on genomic information encoded in single nucleotide polymorphisms (SNPs), and introduce metrics for modeling information encoded in the genome. Information, like energy, is considered to be a conserved physical property of the universe. The information structured in SNPs describes the adaptation of a human population to a given environment. The maintained order measured by the information content is associated with entropies, energies, and other state variables for a dynamic system in homeostasis. "Genodynamics" characterizes the state variables for genomic populations that are stable under stochastic environmental stresses. The determination of allelic energies allows the parameterization of specific environmental influences upon individual alleles across populations. The environment drives population-based genome variation. From this vantage point, the genome is modeled as a complex, dynamic information system defined by patterns of SNP alleles and SNP haplotypes.
Full-text · Article · Jun 2014 · Advances in Bioscience and Biotechnology
[Show abstract][Hide abstract] ABSTRACT: Single nucleotide polymorphisms (SNPs) represent an important type of dynamic
sites within the human genome. These common variants often locally correlate
into more complex multi-SNP haploblocks that are maintained throughout
generations in a stable population. The information encoded in the structure of
common SNPs and SNP haploblock variation can be characterized through a
normalized information content (NIC) metric. Such an intrinsic measure allows
disparate regions of individual genomes and the genomes of various populations
to be quantitatively compared in a meaningful way.
Using our defined measures of genomic information, the interplay of
maintained statistical variations due to the environmental baths within which
stable populations exist can be interrogated. We develop the analogous
"thermodynamics" characterizing the state variables for genomic populations
that are stable under stochastic environmental stresses. Since living systems
have not been found to develop in the absence of environmental influences, we
focus on describing the analogous genomic free energy measures in this
The intensive parameter describing how an environment drives genomic
diversity is found to depend inversely upon the NIC of the genome of a stable
population within that environment. Once this environmental potential has been
determined from the whole genome of a population, additive state variables can
be directly related to the probabilities of the occurrence of given viable SNP
based units (alleles) within that genome. This formulation allows the
determination of both population averaged state variables as well as the
genomic energies of individual alleles and their combinations. The
determination of individual allelic potentials then should allow the
parameterization of specific environmental influences upon shared alleles
across populations in varying environments.
[Show abstract][Hide abstract] ABSTRACT: Multiple Sclerosis (MS) is a complex disease where genetic and environmental factors have been implicated. The onset of symptoms occurs in individuals from twenty to fifty years of age, producing a progressive impairment of motor, sensory and cognitive functions. MS is more frequent in females than in males with a ratio of 4:1. The prevalence of the MS varies among ethnics groups such as Europeans, Africans and Caucasians. The estimated prevalence of MS in Puerto Rico is 42 for each 100,000 habitants, which is more than the prevalence reported for Central America and the Caribbean. In spite of this prevalence, the genetic component of MS has not been explored in order to know the alleles' expression of Puerto Rican MS patients and compare it with the allele expression in other ethnic groups. Thirty-five patients and 31 control subjects were genotyped. The allele frequencies expressed in this sample were similar to those expressed for Puerto Ricans in the National Marrow Donor Program Registry (n = 3,149). The most prevalent alleles for MS patients were HLA-DRB1*01 and *03. HLA-DQB1*04 was the most frequent in the control group and HLA-A*30, in MS patients. These findings are in agreement with published data. HLA-DQB1*04 was a marginal protector in this sample and this role has not been described before. The accuracy of the results is limited due to the sample size. After performing a statistical power analysis it showed that by increasing the sample the values would be significant.
No preview · Article · Jun 2013 · Boletín de la Asociación Médica de Puerto Rico
[Show abstract][Hide abstract] ABSTRACT: Characterization of genetic admixture of populations in the Americas and the Caribbean is of interest for anthropological, epidemiological, and historical reasons. Asthma has a higher prevalence and is more severe in populations with a high African component. Association of African ancestry with asthma has been demonstrated. We estimated admixture proportions of samples from six trihybrid populations of African descent and determined the relationship between African ancestry and asthma and total serum IgE levels (tIgE). We genotyped 237 ancestry informative markers in asthmatics and nonasthmatic controls from Barbados (190/277), Jamaica (177/529), Brazil (40/220), Colombia (508/625), African Americans from New York (207/171), and African Americans from Baltimore/Washington, D.C. (625/757). We estimated individual ancestries and evaluated genetic stratification using Structure and principal component analysis. Association of African ancestry and asthma and tIgE was evaluated by regression analysis. Mean ± SD African ancestry ranged from 0.76 ± 0.10 among Barbadians to 0.33 ± 0.13 in Colombians. The European component varied from 0.14 ± 0.05 among Jamaicans and Barbadians to 0.26 ± 0.08 among Colombians. African ancestry was associated with risk for asthma in Colombians (odds ratio (OR) = 4.5, P = 0.001) Brazilians (OR = 136.5, P = 0.003), and African Americans of New York (OR: 4.7; P = 0.040). African ancestry was also associated with higher tIgE levels among Colombians (β = 1.3, P = 0.04), Barbadians (β = 3.8, P = 0.03), and Brazilians (β = 1.6, P = 0.03). Our findings indicate that African ancestry can account for, at least in part, the association between asthma and its associated trait, tIgE levels.
Full-text · Article · May 2013 · Genetic Epidemiology
[Show abstract][Hide abstract] ABSTRACT: The complete sequencing of the human genome introduced a new knowledge base for decoding information structured in DNA sequence variation. My research is predicated on the supposition that the genome is the most sophisticated knowledge system known, as evidenced by the exquisite information it encodes on biochemical pathways and molecular processes underlying the biology of health and disease. Also, as a living legacy of human origins, migrations, adaptations, and identity, the genome communicates through the complexity of sequence variation expressed in population diversity. As a biomedical research scientist and academician, a question I am often asked is: "How is it that a black woman like you went to the University of Michigan for a PhD in Human Genetics?" As the ASCB 2012 E. E. Just Lecturer, I am honored and privileged to respond to this question in this essay on the science of the human genome and my career perspectives.
Preview · Article · Nov 2012 · Molecular biology of the cell
[Show abstract][Hide abstract] ABSTRACT: Background:
Prostate cancer (PCa) is a common malignancy and a leading cause of cancer death among men in the United States with African-American (AA) men having the highest incidence and mortality rates. Given recent results from admixture mapping and genome-wide association studies for PCa in AA men, it is clear that many risk alleles are enriched in men with West African genetic ancestry.
A total of 77 ancestry informative markers (AIMs) within surrounding candidate gene regions were genotyped and haplotyped using Pyrosequencing in 358 unrelated men enrolled in a PCa genetic association study at the Howard University Hospital between 2000 and 2004. Sequence analysis of promoter region single-nucleotide polymorphisms (SNPs) to evaluate disruption of transcription factor-binding sites was conducted using in silico methods.
Eight AIMs were significantly associated with PCa risk after adjusting for age and West African ancestry. SNP rs1993973 (intervening sequences) had the strongest association with PCa using the log-additive genetic model (P=0.002). SNPs rs1561131 (genotypic, P=0.007), rs1963562 (dominant, P=0.01) and rs615382 (recessive, P=0.009) remained highly significant after adjusting for both age and ancestry. We also tested the independent effect of each significantly associated SNP and rs1561131 (P=0.04) and rs1963562 (P=0.04) remained significantly associated with PCa development. After multiple comparisons testing using the false discovery rate, rs1993973 remained significant. Analysis of the rs156113-, rs1963562-rs615382l and rs1993973-rs585224 haplotypes revealed that the least frequently found haplotypes in this population were significantly associated with a decreased risk of PCa (P=0.032 and 0.0017, respectively).
The approach for SNP selection utilized herein showed that AIMs may not only leverage increased linkage disequilibrium in populations to identify risk and protective alleles, but may also be informative in dissecting the biology of PCa and other health disparities.
Full-text · Article · Jul 2012 · Prostate cancer and prostatic diseases
[Show abstract][Hide abstract] ABSTRACT: The 21(st) century emergence of genomic medicine is shifting the paradigm in biomedical science from the population phenotype to the individual genotype. In characterizing the biology of disease and health disparities in population genetics, human populations are often defined by the most common alleles in the group. This definition poses difficulties when categorizing individuals in the population who do not have the most common allele(s). Various epidemiological studies have shown an association between common genomic variation, such as single nucleotide polymorphisms (SNPs), and common diseases. We hypothesize that information encoded in the structure of SNP haploblock variation in the human leukocyte antigen-disease related (HLA-DR) region of the genome illumines molecular pathways and cellular mechanisms involved in the regulation of host adaptation to the environment. In this paper we describe the development and application of the normalized information content (NIC) as a novel metric based on SNP haploblock variation. The NIC facilitates translation of biochemical DNA sequence variation into a biophysical quantity derived from Boltzmann's canonical ensemble in statistical physics and used widely in information theory. Our normalization of this information metric allows for comparisons of unlike, or even unrelated, regions of the genome. We report here NIC values calculated for HLA-DR SNP haploblocks constructed by Haploview, a product of the International Haplotype Map Project. These haploblocks were scanned for potential regulatory elements using ConSite and miRBase, publicly available bioinformatics tools. We found that all of the haploblocks with statistically low NIC values contained putative transcription factor binding sites and microRNA motifs, suggesting correlation with genomic regulation. Thus, we were able to relate a mathematical measure of information content in HLA-DR SNP haploblocks to biologically relevant functional knowledge embedded in the structure of DNA sequence variation. We submit that NIC may be useful in analyzing the regulation of molecular pathways involved in host adaptation to environmental pathogens and in decoding the functional significance of common variation in the human genome.
[Show abstract][Hide abstract] ABSTRACT: Although an increasing number of hypertension-associated genetic variants is being reported, replication of these findings in independent studies has been challenging. Several genes in a human chromosome 1q linkage region have been reported to be associated with hypertension. We examined polymorphisms in three of these genes (ATP1B1, RGS5 and SELE) in relation to hypertension and blood pressure in a cohort of African-Americans.
We genotyped 87 single nucleotide polymorphisms (SNPs) from the ATP1B1, RGS5 and SELE genes in a well characterized cohort of 968 African-Americans and performed a case-control study to identify susceptibility alleles for hypertension and blood pressure regulation. Single SNP and haplotype association testing was done under an additive genetic model with adjustment for age, sex, BMI and ancestry-by-genotype (principal components).
A total of 12 SNPs showed nominal association with hypertension and/or blood pressure. The strongest signal for hypertension was for rs2815272 in the RGS5 gene (P = 9.3 × 10). For SBP, rs3917420 in the SELE gene (P = 9.0 × 10) and rs4657251 in the RGS5 gene (P = 9.7 × 10) were the top hits. Effect size for each of these variants was approximately 2-3 mmHg. A five-SNP haplotype in the SELE gene also showed significant association with SBP after correction for multiple testing (P < 0.01).
These findings provide additional support for the genetic role of ATP1B1, RGS5 and SELE in hypertension and blood pressure regulation.
No preview · Article · Aug 2011 · Journal of Hypertension