[Show abstract][Hide abstract] ABSTRACT: Systems biology is an approach to dissection of complex traits that explicitly recognizes the impact of genetic, physiological and environmental interactions in the generation of phenotypic variation. We describe comprehensive transcriptional and metabolic profiling in Drosophila melanogaster across four diets, finding little overlap in modular architecture. Genotype and genotype-by-diet interactions are a major component of transcriptional variation (24% and 5.3% of the total variation respectively) while there were no main effects of diet (<1%). Genotype was also a major contributor to metabolomic variation (16%), but in contrast to the transcriptome, diet had a large effect (9%) and the interaction effect was minor (2%) for the metabolome. Yet specific principal components of these molecular phenotypes measured in larvae are strongly correlated with particular Metabolic Syndrome-like phenotypes such as pupal weight, larval sugar content and triglyceride content, development time, and cardiac arrhythmia in adults. The second principal component of the metabolomic profile is especially informative across these traits with glycine identified as a key loading variable. To further relate this physiological variability to genotypic polymorphism, we performed evolve-and-resequence experiments, finding rapid and replicated changes in gene frequency across hundreds of loci that are specific to each diet. Adaptation to diet is thus highly polygenic. However, loci differentially transcribed across diet or previously identified by RNAi knockdown or expression QTL analysis where not the loci responding to dietary selection. Therefore, loci that respond to the selective pressures of diet cannot be readily predicted a priori from functional analyses.
[Show abstract][Hide abstract] ABSTRACT: Epistasis is the phenomenon whereby one polymorphism's effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits and contributes to their variation is a fundamental question in evolution and human genetics. Although often demonstrated in artificial gene manipulation studies in model organisms, and some examples have been reported in other species, few examples exist for epistasis among natural polymorphisms in human traits. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits, but an alternative view is that it has previously been too technically challenging to detect owing to statistical and computational issues. Here we show, using advanced computation and a gene expression study design, that many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7,339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (P < 2.91 × 10(-16)). Replication of these interactions in two independent data sets showed both concordance of direction of epistatic effects (P = 5.56 × 10(-31)) and enrichment of interaction P values, with 30 being significant at a conservative threshold of P < 9.98 × 10(-5). Forty-four of the genetic interactions are located within 5 megabases of regions of known physical chromosome interactions (P = 1.8 × 10(-10)). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example, MBNL1 is influenced by an additive effect at rs13069559, which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype-phenotype maps for each cis-trans interaction. This study presents the first evidence, to our knowledge, for many instances of segregating common polymorphisms interacting to influence human traits.
[Show abstract][Hide abstract] ABSTRACT: The WHOLE approach to personalized medicine represents an effort to integrate clinical and genomic profiling jointly into preventative health care and the promotion of wellness. Our premise is that genotypes alone are insufficient to predict health outcomes, since they fail to account for individualized responses to the environment and life history. Instead, integrative genomic approaches incorporating whole genome sequences and transcriptome and epigenome profiles, all combined with extensive clinical data obtained at annual health evaluations, have the potential to provide more informative wellness classification. As with traditional medicine where the physician interprets subclinical signs in light of the person's health history, truly personalized medicine will be founded on algorithms that extract relevant information from genomes but will also require interpretation in light of the triggers, behaviors, and environment that are unique to each person. This chapter discusses some of the major obstacles to implementation, from development of risk scores through integration of diverse omic data types to presentation of results in a format that fosters development of personal health action plans.
Advances in experimental medicine and biology 01/2014; 799:1-14. · 1.83 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have developed a novel structure-based evaluation for missense variants that explicitly models protein structure and amino acid properties to predict the likelihood that a variant disrupts protein function. A structural disruption score (SDS) is introduced as a measure to depict the likelihood that a case variant is functional. The score is constructed using characteristics that distinguish between causal and neutral variants within a group of proteins. The SDS score is correlated with standard sequence-based deleteriousness, but shows promise for improving discrimination between neutral and causal variants at less conserved sites. The prediction was performed on 3-dimentional structures of 57 gene products whose homozygous SNPs were identified as case-exclusive variants in an exome sequencing study of epilepsy disorders. We contrasted the candidate epilepsy variants with scores for likely benign variants found in the EVS database, and for positive control variants in the same genes that are suspected to promote a range of diseases. To derive a characteristic profile of damaging SNPs, we transformed continuous scores into categorical variables based on the score distribution of each measurement, collected from all possible SNPs in this protein set, where extreme measures were assumed to be deleterious. A second epilepsy dataset was used to replicate the findings. Causal variants tend to receive higher sequence-based deleterious scores, induce larger physico-chemical changes between amino acid pairs, locate in protein domains, buried sites or on conserved protein surface clusters, and cause protein destabilization, relative to negative controls. These measures were agglomerated for each variant. A list of nine high-priority putative functional variants for epilepsy was generated. Our newly developed SDS protocol facilitates SNP prioritization for experimental validation.
[Show abstract][Hide abstract] ABSTRACT: Genetic risk scores have been developed for coronary artery disease and atherosclerosis, but are not predictive of adverse cardiovascular events. We asked whether peripheral blood expression profiles may be predictive of acute myocardial infarction (AMI) and/or cardiovascular death.
Genome Medicine 01/2014; 6(5):40. · 3.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious.
This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults.
Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, alphaMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals.
The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors' website.
[Show abstract][Hide abstract] ABSTRACT: Gene expression variation provides a read-out of both genetic and environmental influences on gene activity. Geographical, genomic and sociogenomic studies have highlighted how life circumstances of an individual modify the expression of hundreds and in some cases thousands of genes in a coordinated manner. This review places such results in the context of a conserved set of 90 transcripts known as blood informative transcripts that capture the major conserved components of variation in the peripheral blood transcriptome. Pathophysiological states are also shown to associate with the perturbation of tran-script abundance along the major axes. Discussion of false-negative rates leads us to argue that simple significance thresholds provide a biased perspective on assessment of differential expression that may cloud the interpretation of studies with small sample sizes.
Current Genetic Medicine Reports. 10/2013; ISSN 2167-4876(Springer Science).
[Show abstract][Hide abstract] ABSTRACT: Principal components analysis has been employed in gene expression studies to correct for population substructure, batch and environmental effects. This method typically involves the removal of variation contained in as many as 50 principal components (PCs), which can constitute a large proportion of total variation present in the data. Each PC, however, can detect many sources of variation including gene expression networks and genetic variation influencing transcript levels. We demonstrate that PCs generated from gene expression data can simultaneously contain both genetic and non-genetic factors. From heritability estimates we show that all PCs contain a considerable portion of genetic variation whilst non-genetic artifacts such as batch effects were associated to varying degrees with the first 60 PCs. These PCs demonstrate an enrichment of biological pathways including core immune function and metabolic pathways. The use of PC correction in two independent datasets resulted in a reduction in the number of cis- and trans-eQTLs detected. Comparisons of PC and linear model correction revealed that PC correction was not as efficient at removing known batch effects and had a higher penalty on genetic variation. Therefore, this study highlights the danger of eliminating biologically relevant data when employing PC correction in gene expression data.
[Show abstract][Hide abstract] ABSTRACT: Whole genome sequencing is poised to revolutionize personalized medicine, providing the capacity to classify individuals into risk categories for a wide range of diseases. Here we begin to explore how whole genome sequencing (WGS) might be incorporated alongside traditional clinical evaluation as a part of preventive medicine. The present study illustrates novel approaches for integrating genotypic and clinical information for assessment of generalized health risks and to assist individuals in the promotion of wellness and maintenance of good health.
Whole genome sequences and longitudinal clinical profiles are described for eight middle-aged Caucasian participants (four men and four women) from the Center for Health Discovery and Well Being (CHDWB) at Emory University in Atlanta. We report multivariate genotypic risk assessments derived from common variants reported by genome-wide association studies (GWAS), single rare homozygous deleterious variants, and clinical measures in the domains of immune, metabolic, cardiovascular, musculoskeletal and mental health.
Polygenic risk is assessed for each participant for over 100 diseases and reported relative to baseline population prevalence. Two approaches for combining clinical and genetic profiles for the purposes of health assessment are then discussed. First we propose conditioning individual disease risk assessments on observed clinical status for type 2 diabetes, coronary artery disease, hypertriglyceridemia and hypertension, and obesity. An excess of concordance between genetic prediction and observed sub-clinical disease is observed. Subsequently, we show how more holistic combination of genetic, clinical and family history data can be achieved by visualizing risk in eight sub-classes of disease. Having identified where their profiles are broadly concordant or discordant, an individual can focus on individual clinical results or genotypes as they develop personalized health action plans in consultation with a health partner.
The CHDWB will facilitate longitudinal evaluation of wellness-focused medical care based on comprehensive self-knowledge of medical risks.
Genome Medicine 06/2013; 5(6):58. · 3.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: There is increasing evidence that heritable variation in gene expression underlies genetic variation in susceptibility to disease. Therefore, a comprehensive understanding of the similarity between relatives for transcript variation is warranted-in particular, dissection of phenotypic variation into additive and non-additive genetic factors and shared environmental effects. We conducted a gene expression study in blood samples of 862 individuals from 312 nuclear families containing MZ or DZ twin pairs using both pedigree and genotype information. From a pedigree analysis we show that the vast majority of genetic variation across 17,994 probes is additive, although non-additive genetic variation is identified for 960 transcripts. For 180 of the 960 transcripts with non-additive genetic variation, we identify expression quantitative trait loci (eQTL) with dominance effects in a sample of 339 unrelated individuals and replicate 31% of these associations in an independent sample of 139 unrelated individuals. Over-dominance was detected and replicated for a trans association between rs12313805 and ETV6, located 4MB apart on chromosome 12. Surprisingly, only 17 probes exhibit significant levels of common environmental effects, suggesting that environmental and lifestyle factors common to a family do not affect expression variation for most transcripts, at least those measured in blood. Consistent with the genetic architecture of common diseases, gene expression is predominantly additive, but a minority of transcripts display non-additive effects.
[Show abstract][Hide abstract] ABSTRACT: We describe a novel approach to capturing the covariance structure of peripheral blood gene expression that relies on the identification of highly conserved Axes of variation. Starting with a comparison of microarray transcriptome profiles for a new dataset of 189 healthy adult participants in the Emory-Georgia Tech Center for Health Discovery and Well-Being (CHDWB) cohort, with a previously published study of 208 adult Moroccans, we identify nine Axes each with between 99 and 1,028 strongly co-regulated transcripts in common. Each axis is enriched for gene ontology categories related to sub-classes of blood and immune function, including T-cell and B-cell physiology and innate, adaptive, and anti-viral responses. Conservation of the Axes is demonstrated in each of five additional population-based gene expression profiling studies, one of which is robustly associated with Body Mass Index in the CHDWB as well as Finnish and Australian cohorts. Furthermore, ten tightly co-regulated genes can be used to define each Axis as "Blood Informative Transcripts" (BITs), generating scores that define an individual with respect to the represented immune activity and blood physiology. We show that environmental factors, including lifestyle differences in Morocco and infection leading to active or latent tuberculosis, significantly impact specific axes, but that there is also significant heritability for the Axis scores. In the context of personalized medicine, reanalysis of the longitudinal profile of one individual during and after infection with two respiratory viruses demonstrates that specific axes also characterize clinical incidents. This mode of analysis suggests the view that, rather than unique subsets of genes marking each class of disease, differential expression reflects movement along the major normal Axes in response to environmental and genetic stimuli.
[Show abstract][Hide abstract] ABSTRACT: Summary Compared with single markers, polygenic scores that evaluate the joint effects of multiple trait-associated variants are more effective in explaining the variance of traits and risk of diseases. In total, 182 CHDWB (Emory-Georgia Tech Center for Health Discovery and Well Being study) adults were genotyped to investigate the common variant contributions to three traits (height, BMI, serum triglycerides) and three diseases (coronary artery disease (CAD), type 2 diabetes (T2D) and asthma). Association was contrasted between weighted and simple allelic sum polygenic scores with quantitative traits, and with the Framingham risk scores for CAD and T2D. Although the cohort size is two or three orders of magnitude smaller than typical discovery cohorts, we were able to detect significant associations and to explain up to 5% of the traits by the genetic risk scores, despite a strong influence of outliers. An unexpected finding was that CAD-associated single nucleotide polymorphisms (SNPs) explain a significant amount of the variation for total serum cholesterol. Forward step-wise sequential addition of SNPs into the regression model showed that the top-ranked SNPs explain a large proportion of variance, whereas inclusion of gender and ethnicity also affect the performance of polygenic scores.
[Show abstract][Hide abstract] ABSTRACT: Natural populations of the fruit fly, Drosophila melanogaster, segregate genetic variation that leads to cardiac disease phenotypes. One nearly isogenic line from a North Carolina peach orchard, WE70, is shown to harbor two genetically distinct heart phenotypes: elevated incidence of arrhythmias, and a dramatically constricted heart diameter in both diastole and systole, with resemblance to restrictive cardiomyopathy in humans. Assuming the source to be rare variants of large effect, we performed Bulked Segregant Analysis using genomic DNA hybridization to Affymetrix chips to detect single feature polymorphisms, but found that the mutant phenotypes are more likely to have a polygenic basis. Further mapping efforts revealed a complex architecture wherein the constricted cardiomyopathy phenotype was observed in individual whole chromosome substitution lines, implying that variants on both major autosomes are sufficient to produce the phenotype. A panel of 170 Recombinant Inbred Lines (RIL) was generated, and a small subset of mutant lines selected, but these each complemented both whole chromosome substitutions, implying a non-additive (epistatic) contribution to the "disease" phenotype. Low coverage whole genome sequencing was also used to attempt to map chromosomal regions contributing to both the cardiomyopathy and arrhythmia, but a polygenic architecture had to be again inferred to be most likely. These results show that an apparently simple rare phenotype can have a complex genetic basis that would be refractory to mapping by deep sequencing in pedigrees. We present this as a cautionary tale regarding assumptions related to attempts to map new disease mutations on the assumption that probands carry a single causal mutation.
PLoS ONE 01/2013; 8(4):e62909. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: One of the most rapidly evolving genes in humans, PRDM9, is a key determinant of the distribution of meiotic recombination events. Mutations in this meiotic-specific gene have previously been associated with male infertility in humans and recent studies suggest that PRDM9 may be involved in pathological genomic rearrangements. By studying genomes from families with children affected by B-cell precursor acute lymphoblastic leukemia (B-ALL), we characterized meiotic recombination patterns within a family with two siblings having hyperdiploid childhood ALL and observed unusual localization of maternal recombination events. The mother of the family carries a rare PRDM9 allele, potentially explaining the unusual patterns found. From exomes sequenced in 44 additional parents of children affected with B-ALL, we discovered a substantial and significant excess of rare allelic forms of PRDM9. The rare PRDM9 alleles are transmitted to the affected children in half the cases, nonetheless there remains a significant excess of rare alleles among patients relative to controls. We successfully replicated this latter observation in an independent cohort of 50 children with B-ALL, where we found an excess of rare PRDM9 alleles in aneuploid and infant B-ALL patients. PRDM9 variability in humans is thought to influence genomic instability, and these data support a potential role for PRDM9 variation in risk of acquiring aneuploidies or genomic rearrangements associated with childhood leukemogenesis.
[Show abstract][Hide abstract] ABSTRACT: In previous geographical genomics studies of the impact of lifestyle on gene expression inferred from microarray analysis of peripheral blood samples, we described the complex influences of culture, ethnicity, and gender in Morocco, and of pregnancy in Brisbane. Here we describe the use of nanofluidic Fluidigm quantitative RT-PCR arrays targeted at a set of 96 transcripts that are broadly informative of the major axes of immune gene expression, to explore the population structure of transcription in Fiji. As in Morocco, major differences are seen between the peripheral blood transcriptomes of rural villagers and residents of the capital city, Suva. The effect is much greater in Indian villages than in Melanesian highlanders and appears to be similar with respect to the nature of at least two axes of variation. Gender differences are much smaller than ethnicity or lifestyle effects. Body mass index is shown to associate with one of the axes as it does in Atlanta and Brisbane, establishing a link between the epidemiological transition of human metabolic disease, and gene expression profiles.
[Show abstract][Hide abstract] ABSTRACT: An under-appreciated aspect of the genetic analysis of gene expression is the impact of post-probe level normalization on biological inference. Here we contrast nine different methods for normalization of an Illumina bead-array gene expression profiling dataset consisting of peripheral blood samples from 189 individual participants in the Center for Health Discovery and Well Being study in Atlanta, quantifying differences in the inference of global variance components and covariance of gene expression, as well as the detection of variants that affect transcript abundance (eSNPs). The normalization strategies, all relative to raw log2 measures, include simple mean centering, two modes of transcript-level linear adjustment for technical factors, and for differential immune cell counts, variance normalization by interquartile range and by quantile, fitting the first 16 Principal Components, and supervised normalization using the SNM procedure with adjustment for cell counts. Robustness of genetic associations as a consequence of Pearson and Spearman rank correlation is also reported for each method, and it is shown that the normalization strategy has a far greater impact than correlation method. We describe similarities among methods, discuss the impact on biological interpretation, and make recommendations regarding appropriate strategies.