Identification of cis-regulatory variation influencing protein abundance levels in human plasma

NIHR Biomedical Research Centre for Mental Health, South London, UK.
Human Molecular Genetics (Impact Factor: 6.39). 05/2012; 21(16):3719-26. DOI: 10.1093/hmg/dds186
Source: PubMed


Proteins are central to almost all cellular processes, and dysregulation of expression and function is associated with a range of disorders. A number of studies in human have recently shown that genetic factors significantly contribute gene expression variation. In contrast, very little is known about the genetic basis of variation in protein abundance in man. Here, we assayed the abundance levels of proteins in plasma from 96 elderly Europeans using a new aptamer-based proteomic technology and performed genome-wide local (cis-) regulatory association analysis to identify protein quantitative trait loci (pQTL). We detected robust cis-associations for 60 proteins at a false discovery rate of 5%. The most highly significant single nucleotide polymorphism detected was rs7021589 (false discovery rate, 2.5 × 10(-12)), mapped within the gene coding sequence of Tenascin C (TNC). Importantly, we identified evidence of cis-regulatory variation for 20 previously disease-associated genes encoding protein, including variants with strong evidence of disease association show significant association with protein abundance levels. These results demonstrate that common genetic variants contribute to the differences in protein abundance levels in human plasma. Identification of pQTLs will significantly enhance our ability to discover and comprehend the biological and functional consequences of loci identified from genome-wide association study of complex traits. This is the first large-scale genetic association study of proteins in plasma measured using a novel, highly multiplexed slow off-rate modified aptamer (SOMAmer) proteomic platform.

Download full-text


Available from: Sally Nelson, Jul 01, 2014
  • Source
    • "Selected reaction monitoring (SRM) (Lange et al. 2008) is one of the main strategies for reproducible quantification of specific peptides and proteins in large sample cohorts and different experimental conditions with a wide dynamic range with high sensitivity and accuracy (Ebhardt et al. 2012; Sabid o et al. 2013). Although the use of these approaches to evaluate genome-wide SNP–protein associations (trans-effects) is still limited due to required number of samples, a few studies have successfully performed pQTL experiments for a restricted number of SNP–protein associations (Lourdusamy et al. 2012; Johansson et al. 2013; Wu et al. 2013, 2014; Battle et al. 2015; Liu et al. 2015). Indeed, by restricting the number of proteins and polymorphisms to the SNPs located in specific genomic regions, or to the proteins involved in a particular metabolic process, the number of associations is reduced and it is possible to identify protein cis-regulation elements by reducing the number of potential false positives. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Essential trace elements possess vital functions at molecular, cellular and physiological levels in health and disease, and they are tightly regulated in the human body. In order to assess variability and potential adaptive evolution of trace element homeostasis, we quantified 18 trace elements in 150 liver samples, together with the expression levels of 90 genes and abundances of 40 proteins involved in their homeostasis. Additionally, we genotyped 169 SNPs in the same sample set. We detected significant associations for eight protein quantitative trait loci (pQTL), 10 expression quantitative trait loci (eQTLs) and 15 micronutrient quantitative trait loci (nutriQTL). Six of these exceeded the false discovery rate (FDR) cutoff and were related to essential trace elements: i) one pQTL for GPX2 (rs10133290); ii) two previously described eQTLs for HFE (rs12346) and SELO (rs4838862) expression; and iii) three nutriQTLs: the pathogenic C282Y mutation at HFE affecting iron (rs1800562), and two SNPs within several clustered metallothionein genes determining selenium concentration (rs1811322 and rs904773). Within the complete set of significant QTLs (which involved 30 SNPs and 20 gene regions), we identified 12 SNPs with extreme patterns of population differentiation (FST values in the top 5% percentile in at least one HapMap population pair) and significant evidence for selective sweeps involving QTLs at GPX1, SELENBP1 GPX3, SLC30A9 and SLC39A8. Overall, this detailed study of various molecular phenotypes illustrates the role of regulatory variants in explaining differences in trace element homeostasis among populations and in the human adaptive response to environmental pressures related to micronutrients.
    Full-text · Article · Nov 2015 · Molecular Biology and Evolution
  • Source
    • "The dataset contained 42 (that is, about 40% of) protein biomarker analytes whose measurement has been approved by US Food and Drug Administration (FDA) for clinical purpose (hereafter, clinically assayed proteins) assayed in blood (Anderson, 2010). It compares favorably to prior multisample human plasma studies regarding analytical depth (Melzer et al, 2008; Kato et al, 2011; Lourdusamy et al, 2012; Johansson et al, 2013), particularly considering that the analytical time was a mere 2.5 h per sample and consumed only 0.015 ll of plasma per SWATH injection, and significantly exceeds the previous studies in terms of reproducibility and quantitative accuracy. We next sought to assess the properties of the SWATH data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The degree and the origins of quantitative variability of most human plasma proteins are largely unknown. Because the twin study design provides a natural opportunity to estimate the relative contribution of heritability and environment to different traits in human population, we applied here the highly accurate and reproducible SWATH mass spectrometry technique to quantify 1,904 peptides defining 342 unique plasma proteins in 232 plasma samples collected longitudinally from pairs of monozygotic and dizygotic twins at intervals of 2-7 years, and proportioned the observed total quantitative variability to its root causes, genes, and environmental and longitudinal factors. The data indicate that different proteins show vastly different patterns of abundance variability among humans and that genetic control and longitudinal variation affect protein levels and biological processes to different degrees. The data further strongly suggest that the plasma concentrations of clinical biomarkers need to be calibrated against genetic and temporal factors. Moreover, we identified 13 cis-SNPs significantly influencing the level of specific plasma proteins. These results therefore have immediate implications for the effective design of blood-based biomarker studies. © 2015 The Authors. Published under the terms of the CC BY 4.0 license.
    Full-text · Article · Feb 2015 · Molecular Systems Biology
  • Source
    • "The first genome-wide genetic analysis on gene expression was performed in haploid yeast segregants [15] and this proof-of-concept analysis demonstrated a widespread genetic effect on gene expression. Subsequent studies were carried out in many different organisms, including humans [16] [17] [18] [19] [20], and on other molecular levels, such as proteins [21] [22] [23], metabolites [24] [25] [26] and methylation [27] [28]. These studies have greatly increased our knowledge of the functional consequences of genetic variants. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Most common diseases are complex, involving multiple genetic and environmental factors and their interactions. In the past decade, genome-wide association studies (GWAS) have successfully identified thousands of genetic variants underlying susceptibility to complex diseases. However, the results from these studies often do not provide evidence on how the variants affect downstream pathways and lead to the disease. Therefore, in the post-GWAS era the greatest challenge lies in combining GWAS findings with additional molecular data to functionally characterize the associations. The advances in various~omics techniques have made it possible to investigate the effect of risk variants on intermediate molecular levels, such as gene expression, methylation, protein abundance or metabolite levels. As disease aetiology is complex, no single molecular analysis is expected to fully unravel the disease mechanism. Multiple molecular levels can interact and also show plasticity in different physiological conditions, cell types and disease stages. There is therefore a great need for new integrative approaches that can combine data from different molecular levels and can help construct the causal inference from genotype to phenotype. Systems genetics is such an approach; it is used to study genetic effects within the larger scope of systems biology by integrating genotype information with various~omics datasets as well as with environmental and physiological variables. In this review, we describe this approach and discuss how it can help us unravel the molecular mechanisms through which genetic variation causes disease. This article is part of a Special Issue entitled: From Genome to Function.
    Full-text · Article · May 2014 · Biochimica et Biophysica Acta
Show more