ArticlePDF Available

Whole genome sequencing identifies high-impact variants in well-known pharmacogenomic genes

Authors:

Abstract and Figures

More than 1100 genetic loci have been correlated with drug response outcomes but disproportionately few have been translated into clinical practice. One explanation for the low rate of clinical implementation is that the majority of associated variants may be in linkage disequilibrium (LD) with the causal variants, which are often elusive. This study aims to identify and characterize likely causal variants within well-established pharmacogenomic genes using next-generation sequencing data from the 1000 Genomes Project. We identified 69,319 genetic variations within 160 pharmacogenomic genes, of which 8207 variants are in strong LD (r2>0.8) with known pharmacogenomic variants. Of the latter, eight are coding or structural variants predicted to have high impact, with 19 additional missense variants that are predicted to have moderate impact. In conclusion, we identified putatively functional variants within known pharmacogenomics loci that could account for the association signals and represent the missing causative variants underlying drug response phenotypes.
Content may be subject to copyright.
The Pharmacogenomics Journal (2019) 19:127135
https://doi.org/10.1038/s41397-018-0048-y
ARTICLE
Whole genome sequencing identies high-impact variants in well-
known pharmacogenomic genes
Jihoon Choi1,2 Kelan G. Tantisira3,4 Qing Ling Duan1,2
Received: 5 July 2017 / Revised: 10 July 2018 / Accepted: 10 August 2018 / Published online: 14 September 2018
© Springer Nature Limited 2018
Abstract
More than 1100 genetic loci have been correlated with drug response outcomes but disproportionately few have been
translated into clinical practice. One explanation for the low rate of clinical implementation is that the majority of associated
variants may be in linkage disequilibrium (LD) with the causal variants, which are often elusive. This study aims to identify
and characterize likely causal variants within well-established pharmacogenomic genes using next-generation sequencing
data from the 1000 Genomes Project. We identied 69,319 genetic variations within 160 pharmacogenomic genes, of which
8207 variants are in strong LD (r2>0.8) with known pharmacogenomic variants. Of the latter, eight are coding or structural
variants predicted to have high impact, with 19 additional missense variants that are predicted to have moderate impact. In
conclusion, we identied putatively functional variants within known pharmacogenomics loci that could account for the
association signals and represent the missing causative variants underlying drug response phenotypes.
Introduction
The current paradigm of drug therapy follows a trial-and-
errorapproach where patients are prescribed a drug at a
standardized dose with the expectation that alternative
therapies or doses will be given during a return clinical visit
(s) [1]. Not surprisingly, this is inefcient and potentially
hazardous for patients who require urgent care or are sus-
ceptible to adverse events, which may result in prolonged
suffering and fatalities [2]. A better understanding of the
modulators of drug response will improve and hopefully
replace our current trial-and-error approach of drug therapy
with more precise methods that are based on scientic
knowledge [3].
To date, more than 1100 genetic loci have been corre-
lated with drug response phenotypes (The Pharmacoge-
nomics Knowledgebase (PharmGKB): www.pharmgkb.org)
but only a small fraction of these genomic ndings have
been implemented into clinical practice. In 2009,
PharmGKB partnered with the Pharmacognomics Research
Network (PGRN) to establish the Clinical Pharmacoge-
netics Implementation Consortium (CPIC)) [46]. The goal
of CPIC is to provide specic guidelines that instruct clin-
icians on how to use or interpret a patients genetic test
results to determine the optimal drug and dosage to each
patient. As of June 2017, there are 36 druggene pairs with
CPIC guidelines published, although there are 127 well-
established pharmacogenomic genes identied as CPIC
genes and 64 additional genes labeled as very important
pharmacogenes (VIP) by the PharmGKB curators, which
totals to 160 unique genes.
An example of a CPIC guideline is one that instructs
physicians on how to interpret genomic information from
clinical assays to determine a therapeutic dosage for war-
farin, a commonly used drug for the prevention of throm-
bosis [7]. Warfarin is known to have a narrow therapeutic
These authors contributed equally: Kelan G. Tantisira, Qing Ling
Duan
*Qing Ling Duan
qingling.duan@queensu.ca
1Department of Biomedical and Molecular Sciences, Queens
University, Kingston, ON, Canada
2School of Computing, Queens University, Kingston, ON, Canada
3Channing Division of Network Medicine, Brigham and Womens
Hospital and Harvard Medical School, Boston, MA, USA
4Division of Pulmonary and Critical Care Medicine, Brigham and
Womens Hospital and Harvard Medical School, Boston, MA,
USA
Electronic supplementary material The online version of this article
(https://doi.org/10.1038/s41397-018-0048-y) contains supplementary
material, which is available to authorized users.
1234567890();,:
1234567890();,:
index and wide effect variances among patients. For
example, a conventional dose of warfarin may not be an
effective anticoagulant in some patients or induce adverse
events (e.g. excessive bleeding) in others [8]. Thus, it is
often difcult to achieve and maintain a targeted effect by
administering conventional doses. Recent advancement in
pharmacogenomics helped to facilitate genetic tests of two
genes that can be used to predict a patientssensitivity to the
drug prior to administration. Specically, the therapeutic
dosage of warfarin may be calculated based on ones gen-
otypes at these loci, which has resulted in a signicant
improvement in drug safety [8,9].
Despite the successful translation of a small fraction of
pharmacogenomics ndings into clinical practice, the rate of
clinical implementation has been slow [6]. One explanation
is that the majority of pharmacogenomics loci are correlated
with drug response but do not represent the actual, causal
variants themselves [1012]. We hypothesize that the
majority of known pharmacogenomics loci are genetic
markers that tag causal variants, which have yet to be
identied and are likely to be in linkage disequilibrium
(LD) with the associated markers. The use of associated
variants instead of the causal variants in clinical tests is
limiting in that it may not reliably predict drug response
[13].
The primary objective of this study is to identify
potentially causal variants in well-established pharmacoge-
nomics-associated genes, which may account for the
reported association signals. Specically, we used whole-
genome sequencing data from the 1000 Genomes Project
[14,15] to derive all genetic variations identied within the
160 unique CPIC and VIP pharmacogenomics genes. Next,
we tested the LD with known pharmacogenomic variants,
and determined the predicted function of these LD variants
using annotation databases and clinical outcome databases.
Our results include a catalog of potentially functional
variants that are in LD with well-established
pharmacogenomics variants and could represent the causa-
tive mutations within these loci.
Results
Selection of pharmacogenomics loci and annotation
of variants
We selected 127 CPIC genes and 64 VIP genes (total of 160
unique loci) from PharmGKB, which we deemed as well-
establishedpharmacogenomics loci (Supplemental data 1).
Next, we identied 887,980 variants within these loci using
next-generation sequencing data from the 1000 Genomes
Project Phase I, of which 69,319 were variants with minor
allele frequencies >1% (Supplemental data 2). Annotation
analysis using SnpEff [16] (genetic variant annotation and
effect prediction toolbox) revealed that 65,333 (94%) of
these variants were single-nucleotide polymorphisms
(SNPs), 1404 (2%) were insertions, and 2582 (4%) were
deletions. As shown in Fig. 1, the majority of these occur
within intronic regions (~75%), with the remainder located
3or downstream (~11%), 5or upstream (~9%), and
exonic (~2%). Of the coding variants, approximately half of
these variants are missense (~49%), or synonymous muta-
tions (~50%) with some occurrences of nonsense (~1%)
mutations. We compared our ndings with annotation
results of whole-genome sequencing data of 1000 Genome
Project phase I dataset (http://snpeff.sourceforge.net/1kg.
html) and conrmed that the results of variant annotation
within 160 PGx genes are within an expected range (Sup-
plemental Fig. 1).
LD analysis
We assessed the LD between associated variants within
known pharmacogenomics loci and variants identied in
Intron
74.55%
Upstream
9.33%
Downstream
11.08%
Exon
1.83%
Intergenic
1.60%
UTR 3 Prime
0.80%
UTR 5 Prime
0.21%
Tran sc ri pt
0.46%
Splice Site
0.11%
Mof
0.03%
Fig. 1 Genomic regions of all
variants identied from the 1000
Genomes Project database
within 160 known
pharmacogenomics genes.
Locations of all the single-
nucleotide variants identied
within the 160
Pharmacogenomics loci using
sequence data from the 1000
Genomes Project
128 J. Choi et al.
our study. Analysis of LD was done in each of the four
populations (American, European, East Asian, African)
from Phase I of 1000 Genomes Project. This resulted in
8207 novel variants forming 21,256 instances of LD (r2>
0.8) with 859 known pharmacogenomics variants (Supple-
mental Data 3).
High-impact variations
We identied eight variants predicted to have a high-impact
using SNPEff from the 1000 GP database that were in LD
(r2> 0.8) with 22 known pharmacogenomics variants.
These included potentially functional variants that code for
an alternative splice donor site, structural interaction, fra-
meshift mutation, stop gain, or stop lost variation. Table 1
lists these new LD variants along with the corresponding
pharmacogenomics variants, the majority of which are
predicted to be non-coding located within introns, up/
downstream, and synonymous, with only few instances of
missense and frameshift variants.
Moderate-impact variations
We identied 19 missense variants that are in LD with 32
pharmacogenomics variants, which are predicted to have a
moderate, low, or modifying effects by SNPEff (Table 2).
Among the newly identied variants, two are regulatory variants
that could potentially affect protein binding, and one has been
associated with neural tube defects and spina bida cystica.
Low-impact variations
From the total of 8207 variants in LD, 7751 variants are
classied by SNPEff as variants with unpredictable impact
or modiervariants. These are in LD with 920 known
pharmacogenomics variants with similar impact features. Of
these, 324 modier variants were potential regulatory var-
iants affecting gene expression, protein binding, or tran-
scription factor binding.
In this study, we will focus on modier variants that are
classied under category 1 of RegulomeDB database, which
are known eQTLs or variants correlated with variable gene
expression. Among 324 modier variants with Reg-
ulomeDB scores, 84 variants were classied as category 1,
forming 213 instances of LD with 73 pharmacogenomics
variants which are predicted to have low or modifying
effects (Supplemental data 4).
Variants associated with clinical outcomes
Using SNPedia database, we discovered 46 variants in LD
that are correlated with clinical phenotypes as documented
in Supplemental data 5.
Discussion
This manuscript reports the identication of potentially
functional genetic variants within genes previously corre-
lated with drug response outcomes. We show that some of
the novel variants identied from next-generation sequen-
cing (NGS) of whole genomes (Phase I of the 1000
Genomes Project) are in LD with well-known pharmaco-
genomics variants and could account for the functional
basis underlying the association signals. Many of these LD
variants code for non-synonymous amino acid substitutions,
frameshift mutations introduce a splice variant that results in
alternative splicing of the transcript, or located in non-
coding regions but are correlated with gene expression
levels (expression quantitative trait loci or eQTL) or other
clinical phenotypes.
In this study, we used LD analysis to determine the
correlation between novel genetic variants identied from
the 1000 Genomes Project database and known pharma-
cogenomics variants. We reasoned that any variant(s) in
strong LD (r2> 0.8) with the known pharmacogenomics
loci could account for the association signal and have
potential to be the actual causal variants at these genomic
loci. In order to prioritize the identied variants, we used a
popular annotation toolbox (SNPEff) to predict the function
of each variant. In addition, we used additional information
such as RegulomeDB and SNPedia to prioritize the variant
(s) of higher impact from those with low impact.
Many of the variants we identied are novelin that
these have not been reported in earlier pharmacogenomics
studies. For example, we identied a splice donor variant
(rs28364311) located on a VIP gene ADH1A. This variant is
in LD with a pharmacogenomics-associated variant,
rs6811453, which is associated with increased resistance to
cytarabine, udarabine, gemtuzumab ozogamicin, and
idarubicin in patients with acute myeloid leukemia [17].
The associated pharmacogenomics variant is non-coding
and have no known biological function as it is located
downstream (3) of the gene. Considering the potential
impact of rs28364311 on splicing and its strong LD with the
associated pharmacogenomics variant, it is plausible that the
splice variant identied is the functional variant that
accounts for the original association signals at this locus.
Moreover, we identied that a stop gain variant rs4330
from the VIP gene ACE, encoding the angiotensin-
converting enzyme, is in LD with six known pharmacoge-
nomics variants (rs4341, rs4344, rs4331, rs4359, rs4363,
and rs4343), whereas the latter are intronic or code for
synonymous changes, which are less likely to have detri-
mental effects on the gene product, the identied rs4330
codes for a truncated protein that is likely to have detri-
mental effects.
Whole genome sequencing identies high-impact variants in well-known pharmacogenomic genes 129
Table 1 Variants with high-impact predictions, which are in LD with known pharmacogenomics variants
Chr New variant Gene Functional annotation PharmGKB variant Gene Functional annotation EUR r2EAS r2AMR r2AFR r2
3 rs13146 UMPS Structural interaction variant rs1801019 UMPS Missense variant 0.98 1.00 1.00 0.98
4 rs28364311 ADH1A Splice donor variant & intron variant rs6811453 ADH1A Downstream gene variant 0.99 1.00 1.00 1.00
6 rs677830 OPRM1 Stop gained rs558025 OPRM1 Downstream gene variant 1.00 1.00 0.98 <0.8
7 rs6977165 CYP3A5 Stop lost rs41303343 CYP3A5 Frameshift variant <0.8 <0.8 <0.8 0.80
17 rs4330 ACE Stop gained rs4341 ACE 3 prime UTR variant 0.99 0.95 1.00 0.95
rs4343 ACE Synonymous variant 0.95 0.95 0.87 <0.8
rs4344 ACE Upstream gene variant 0.99 0.96 1.00 0.90
rs4331 ACE Synonymous variant 0.86 <0.8 0.88 0.84
rs4359 ACE Intron variant 0.96 <0.8 0.91 <0.8
rs4363 ACE Splice region variant & intron variant 0.93 <0.8 0.86 <0.8
19 rs11322783 IFNL4 Frameshift variant rs12980275 IFNL3P1 Upstream gene variant <0.8 0.87 0.87 <0.8
rs8105790 IFNL3P1 Upstream gene variant <0.8 0.94 <0.8 <0.8
rs4803217 IFNL3 Downstream gene variant 0.83 0.97 0.87 <0.8
rs11881222 IFNL4 Downstream gene variant 0.87 0.94 0.84 <0.8
rs28416813 IFNL3 5 prime UTR variant 0.88 0.86 0.94 <0.8
rs12979860 IFNL3 Upstream gene variant 0.94 0.87 0.93 <0.8
rs8109886 IFNL4 Upstream gene variant <0.8 0.89 <0.8 <0.8
rs8113007 IFNL4 Upstream gene variant 0.88 0.97 0.84 <0.8
rs8099917 IFNL4 Upstream gene variant <0.8 0.94 <0.8 <0.8
rs7248668 IFNL4 Upstream gene variant <0.8 0.94 <0.8 <0.8
21 rs881712 CBR3 Structural interaction variant rs8133052 CBR3 Missense variant 0.94 1.00 0.83 <0.8
22 rs3761423 ADORA2A-AS1 Splice donor variant & intron variant rs5996696 ADORA2A Upstream gene variant <0.8 0.90 <0.8 <0.8
Chr Chromosome, EUR r2linkage disequilibrium in the European Population of 1000 Genomes project measured in r-squared, EAS r2linkage disequilibrium in the Eastern Asian Population of
1000 Genomes project measured in r-squared, AMR r2linkage disequilibrium in the American Population of 1000 Genomes project measured in r-squared, AFR r2linkage disequilibrium in the
African Population of 1000 Genomes project measured in r-squared. Annotation denitions:structural interaction variant =These are within proteininteraction loci, which are likely to be
supporting the protein structure. They are calculated from single protein PDB entries, by selecting amino acids that are: (a) atom within 3Å of each other and (b) are far away in the AA sequence
(over 20 AA distance). The assumption is that, since they are very close in distance, they must be interactingand thus important for protein structure. For more information, see http://snpeff.
sourceforge.net/SnpEff_manual.html.
130 J. Choi et al.
Table 2 Variants predicted with moderate impact identied in this study, which are in LD with known pharmacogenomics variants
Chr New variant Gene Functional annotation PharmGKB variant Gene Annotation EUR r2EAS r2AMR r2AFR r2
18 rs2853533* C18orf56 Missense variant & TFBS variant rs2853741 RP11806L2.5 Upstream gene variant <0.8 0.85 <0.8 <0.8
1 rs55867221 C1orf167 Missense variant & TFBS variant rs17367504 CLCN6 Upstream gene variant <0.8 0.9 <0.8 <0.8
rs3737967 C1orf167 Missense variant <0.8 0.98 0.87 <0.8
rs2274976 MTHFR Missense variant <0.8 0.96 0.87 <0.8
1 rs1537514 C1orf167 Missense variant rs3737967 C1orf167 Missense variant <0.8 0.98 0.87 <0.8
rs2274976 MTHFR Missense variant <0.8 0.96 0.87 <0.8
rs17367504 CLCN6 Upstream gene variant <0.8 0.9 <0.8 <0.8
1 rs1800595 F5 Missense variant rs6018 F5 <Missense variant 1 1 1 1
1 rs6027 F5 Missense variant rs6018 F5 Missense variant 0.94 0.89 0.97 <0.8
1 rs6033 F5 Missense variant rs6018 F5 Missense variant <0.8 0.83 <0.8 <0.8
3 rs3732765 MED12L Missense variant rs9859538 MED12L Intron variant <0.8 0.97 <0.8 <0.8
rs10935842 P2RY12 Upstream gene variant 1 0.99 0.97 <0.8
rs6798637 P2RY12 upstream gene variant 0.89 <0.8 <0.8 <0.8
4 rs1693482 ADH1C Missense variant rs1662060 ADH1C Downstream gene variant 1 1 0.96 1
rs698 ADH1C Missense variant 1 1 0.96 1
4 rs4963 ADD1 Missense variant rs4961 ADD1 Missense variant 0.88 0.99 0.96 <0.8
7 rs2307040 CALU Missense variant rs1043550 CALU 3 prime UTR variant 0.82 <0.8 0.96 0.89
rs11653 CALU 3 prime UTR variant 0.82 <0.8 0.96 0.89
9 rs56350726 SLC28A3 Missense variant rs10868138 SLC28A3 Missense variant 0.81 <0.8 0.83 <0.8
11 rs11604671 ANKK1 Missense variant rs2734849 ANKK1 Missense variant 0.97 1 0.98 <0.8
rs6277 DRD2 Synonymous variant <0.8 1 0.88 <0.8
rs2587548 DRD2 Upstream gene variant <0.8 1 <0.8 <0.8
rs2734833 DRD2 Upstream gene variant <0.8 1 <0.8 <0.8
rs1076563 DRD2 Upstream gene variant <0.8 0.97 <0.8 <0.8
16 rs115629050 CES1 Missense variant rs2307240 CES1 Missense variant <0.8 <0.8 0.9 <0.8
16 rs2307227 CES1 Missense variant rs2307240 CES1 Missense variant <0.8 <0.8 0.9 <0.8
16 rs79711700 CES1 Missense variant rs2307240 CES1 Missense variant 0.88 <0.8 1 <0.8
19 rs2336219 CD3EAP Missense variant rs967591 CD3EAP 5 prime UTR variant 0.83 1 0.96 <0.8
rs735482 CD3EAP Missense variant 1 1 0.96 0.93
19 rs12971396 IFNL4 Missense variant rs12980275 IFNL3P1 Upstream gene variant <0.8 0.84 <0.8 <0.8
rs8105790 IFNL3P1 Upstream gene variant 0.92 0.97 0.97 <0.8
rs4803217 IFNL3 Downstream gene variant <0.8 0.94 <0.8 <0.8
rs11881222 IFNL4 Downstream gene variant <0.8 0.91 <0.8 <0.8
rs28416813 IFNL3 5 prime UTR variant <0.8 0.83 <0.8 <0.8
Whole genome sequencing identies high-impact variants in well-known pharmacogenomic genes 131
Table 2 (continued)
Chr New variant Gene Functional annotation PharmGKB variant Gene Annotation EUR r2EAS r2AMR r2AFR r2
rs12979860 IFNL3 Upstream gene variant <0.8 0.84 <0.8 <0.8
rs8109886 IFNL4 Upstream gene variant <0.8 0.86 <0.8 <0.8
rs8113007 IFNL4 Upstream gene variant <0.8 0.94 <0.8 <0.8
rs8099917 IFNL4 Upstream gene variant 0.93 0.97 0.86 <0.8
rs7248668 IFNL4 Upstream gene variant 0.93 0.97 0.86 <0.8
19 rs4803221 IFNL4 Missense variant rs12980275 IFNL3P1 Upstream gene variant <0.8 0.84 <0.8 <0.8
rs8105790 IFNL3P1 Upstream gene variant 0.93 0.97 0.95 0.81
rs4803217 IFNL3 Downstream gene variant <0.8 0.94 <0.8 <0.8
rs11881222 IFNL4 Downstream gene variant <0.8 0.91 <0.8 <0.8
rs28416813 IFNL3 5 prime UTR variant <0.8 0.83 <0.8 <0.8
rs12979860 IFNL3 Upstream gene variant <0.8 0.84 <0.8 <0.8
rs8109886 IFNL4 Upstream gene variant <0.8 0.86 <0.8 <0.8
rs8113007 IFNL4 Upstream gene variant <0.8 0.94 <0.8 <0.8
rs8099917 IFNL4 Upstream gene variant 0.95 0.97 0.89 <0.8
rs7248668 IFNL4 Upstream gene variant 0.95 0.97 0.89 <0.8
19 rs762562 CD3EAP Missense variant rs967591 CD3EAP 5 prime UTR variant 0.83 1 0.92 <0.8
rs735482 CD3EAP Missense variant 1 1 1 1
rs2853533*phenotype association (SNPedia): Neural Tube Defects & Spina Bida Cystica (The G variant of rs2853533 was associated with Spina Bida in a transmission disequilibrium test.
Study size: 610 families (329 trios, 281 duos), Study population/ethnicity: Patients affected with Spina Bida and their parents; Houston, TX; Los Angeles, CA; Toronto, ON, Canada Signicance
metric(s): p=0.0213). Chr Chromosome, EUR r2linkage disequilibrium in the European Population of 1000 Genomes project measured in r-squared, EAS r2linkage disequilibrium in the Eastern
Asian Population of 1000 Genomes project measured in r-squared, AMR r2linkage disequilibrium in the American Population of 1000 Genomes project measured in r-squared, AFR r2linkage
disequilibrium in the African Population of 1000 Genomes project measured in r-squared
132 J. Choi et al.
Another example is a modier variant (rs2854509),
which we report to be in LD with a pharmacogenomics
variant (rs3213239) that is associated with decreased overall
survival and progression-free survival when treated with
platinum compounds in patients with non-small-cell lung
carcinoma. Our identied variant rs2854509 is located
at downstream, whereas pharmacogenomics variant
rs3213239 is located upstream of gene encoding X-Ray
Repair Cross Complementing 1 protein (XRCC1). Our
analysis revealed that variant rs2854509 is a cis-eQTL
variant acting on CPIC gene XRCC1, which is associated
with variable efcacy in in platinum-based chemotherapy
agents. Additional ndings from RegulomeDB showed a
direct evidence of binding-site alteration through ChIP-seq
and DNase with a matched position weight matrix to the
ChIP-seq factor and a DNase footprint. These ndings
suggest the possibility that rs2854509 has regulatory effects
on the gene XRCC1, which could modulate response to
platinum-based chemotherapy treatments.
Our proof of principle study demonstrates that many of
the well-known pharmacogenomics loci from PharmGKB
are genetic markers that may tag causal variants. Often the
latter remain elusive and are likely to be in LD with the
associated markers. Using NGS data, we identied a number
of sequence variants in LD with these pharmacogenomics
loci with supporting functional evidence from current
annotation softwares. These ndings, pending experimental
evidence, will ultimately facilitate the translation of
improved clinical assays to predict response for a particular
drug or dosage prior to administration. The implementation
of these clinical tests promises to improve efcacy of drug
therapy while reducing the incidence of adverse events [18].
One limitation of the approach taken is the exclusion of
rare variants (minor allele frequency <0.01). While rare
variants are more likely to be functional and clinically
relevant, our decision to exclude them from this study was
based on the limited sample size (approx. 200400 in each
of the four main populations: American, European, East
Asian, African) of 1KGP Phase 1. Specically, we would
not be able to determine LD among rare variants (MAF <
0.01) in such small populations. Another limitation is that
this study was based on bioinformatics methods and we did
not experimentally validate the potentially functional var-
iants identied, nor conrm their correlation with drug
response outcomes. Instead our study was proof of concept
that associated variants in well-established pharmacoge-
nomics genes could represent markers of drug response
rather than the casual variants. Further studies are needed to
identify and ultimately validate the often elusive functional
variants in these loci. These additional studies include
genotyping of these potentially functional variants (identi-
ed in LD with the associated variants) and testing them
directly for correlation with drug response outcomes in
clinical trials. Other experiments are needed to conrm the
biological impact of these variants on the resultant RNA
transcripts or proteins, which depends on the predicted
impact of the variants identied. For example, variants of
high impact (Table 1) include splicing effects, premature
stop codons, and structural interactions, which could be
validated through direct sequencing of transcripts and mass
spectrometry to detect truncated and mis-folded proteins.
Our study identied novel genetic variations located in
well-established pharmacogenomics genes, which could
account for the association signals at these loci and have
strong impact on the resulting gene products. We applied an
innovative approach that combined bioinformatics resour-
ces such as PharmGKB, sequencing data from the 1000 GP,
population annotation software such as SNPEff as well as
databases such as RegulomeDB to identify novel variants
and predict their functional effects within pharmacoge-
nomics loci. Moreover, we determined that a number of
these potentially functional variants are in LD with known
pharmacogenomics variants and could account at least in
part for the original association signals. Identication of
these elusive causal variants could facilitate more accurate
genetic tests to predict treatment response prior to drug
administration. The improved accuracy results from direct
testing instead of relying on LD, which varies among
populations (as noted by our study of LD across four
populations in the 1000 GP). Thus, identication of causal
variants will improve the translation of pharmacogenomics
ndings into clinical practice and ultimately replace the
current trial-and-error approach for drug therapy, moving us
closer towards precision medicine.
Methods
Pharmacogenomic genes
We selected 160 unique pharmacogenomics-associated loci,
containing 127 CPIC genes (5 June 2017 release) and 64
VIP genes (1 May 2017 release) from the PharmGKB
database. Then, we identied the genomic coordinates of
each gene from the GRCh37/hg19 assembly of the human
reference genome using the University of Santa Cruz
(UCSC) Genome Browser [19]. Next, genomic coordinates
were padded with 5000 bp both 5and 3of each gene to
include potential regulatory regions. All variants that appear
in at least 1% of the 1000 Genomes Project Phase I popu-
lation (February 2009 release) were extracted.
Functional annotations
After reviewing many annotation tools (including annoVar,
VEP, Polyphen/SIFT, CADD), we decided that SnpEff best
Whole genome sequencing identies high-impact variants in well-known pharmacogenomic genes 133
meets our needs as it allows a great degree of compatibility
with various input formats, offers high exibility in search
settings, can annotate a full exome set in seconds, based on
up-to-date transcript and protein databases, and has the
ability to be integrated with other tools. SnPEff (version 4.2,
build 2015-12-05) was used with the GRCh37.75 assembly
to predict the effects of identied variants. For variants with
multiple annotations (e.g. variant affects multiple genes or
have varying effects depending on the transcript), only the
most severe consequence was selected and used to represent
each variant in tables to ease the comparison of impacts
among variants. To standardize terminology used for
assessing sequence changes, SNPEff uses sequence ontol-
ogy (http://www.sequenceontology.org/)denitions to
describe functional annotations.
LD analysis
LD between the well-established pharmacogenomics var-
iants (1151 variants annotated by PharmGKB retrieved on
16 June 2017, that are found within 160 PGx loci and 1000
Genomes project phase 1 dataset) and identied variants
from the 1000 Genomes Project phase 1 dataset using Plink
(version 1.09) [20]. Distance window for the LD analysis
were set to 1 Mb and an r2threshold of >0.8.
SNPs associated with regulation and phenotypes
For each variant identied to be in LD with an established
pharmacogenomic variant, we used RegulomeDB [21]to
evaluate and score those that have the potential to cause
regulatory changes, such as eQTL, regions of DNAase
hypersensitivity, binding sites of transcription factors and
proteins. RegulomeDB uses GEO [22], the ENCODE [23]
project, and various published literatures to assess these
information. In addition to that, we used SNPedia [24], a
database of over 90,000 SNPs and associated peer-reviewed
scientic publications, to identify variants that are pre-
viously associated with phenotypes. (Fig. 2)
Code Availability
Code and data used in this manuscript can be accessed from
a public repository https://github.com/12jc59/DuanlabPha
rmacogenomicsProject.
Acknowledgements We would like to thank Dr. Alan R. Shuldiner
and support from the Translational Pharmacogenomics Project. This
manuscript was supported by NIH grants U01 HL65899, U01
HL105198, and K99 HL116651. Computations were performed on
resources and with support provided by the Centre for Advanced
Computing (CAC) at Queens University in Kingston, Ontario. The
CAC is funded by the Canada Foundation for Innovation, the
Government of Ontario, and Queens University. QLD receives
funding from the Canadian Institutes of Health Research and Queens
University.
Author contribution All authors contributed to the writing of the
manuscript. JC performed the data analyses and drafted the manu-
script. QLD supervised data analyses and assisted in the writing of the
manuscript. QLD and KGT designed the research project.
Compliance with ethical standards
Conict of interest The authors declare that they have no conict of
interest.
References
1. Evans WE, Relling MV. Moving towards individualized medicine
with pharmacogenomics. Nature. 2004;429:4648.
2. Giacomini KM, Yee SW, Ratain MJ, Weinshilboum RM,
Kamatani N, Nakamura Y. Pharmacogenomics and patient care:
one size does not t all. Sci Transl Med. 2012;4:153ps18
3. Evans WE, Relling MV. Pharmacogenomics: translating func-
tional genomics into rational therapeutics. Science. 1999;
286:48791.
4. Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, Altman
RB, et al. PharmGKB: the pharmacogenetics knowledge base.
Nucleic Acids Res. 2002;30:1635.
5. Shuldiner AR, Relling MV, Peterson JF, Hicks K, Freimuth RR,
Sadee W, et al. The Pharmacogenomics Research Network
Translational Pharmacogenetics Program: overcoming challenges
of real-world implementation. Clin Pharmacol Ther. 2013;
94:20710.
6. Relling MV, Klein TE. CPIC: Clinical Pharmacogenetics Imple-
mentation Consortium of the Pharmacogenomics Research Net-
work. Clin Pharmacol Ther. 2011;89:4647.
8 high-impact variants
(ex. stop gain/lost)
19 moderate-impact
variants
(ex. Missense variant)
Pharmcogenomics
Knowledge Databses
Selected 160 CPIC
& VIP Genes
Variant annotaons: predicted
funconal/regulatory effects &
associaons with phenotypes
84 potenal regul atory
variants
(ex. eQTL )
Idenfied 69,319
sequence variaons
Fig. 2 Overview of the experimental design. Flow of work outlined in
the Methods section of the manuscript, which highlights the selection
of 160 genes from the Pharmacogenomics Knowledge Database
(PharmGKB), identication of variants from the 1000 Genome Project
Data, and subsequent steps for annotation and test LD among variants
134 J. Choi et al.
7. Johnson JA, Gong L, Whirl-Carrillo M, Gage BF, Scott SA, Stein
CM, et al. Clinical Pharmacogenetics Implementation Consortium
guidelines for CYP2C9 and VKORC1 genotypes and warfarin
dosing. Clin Pharmacol Ther. 2011;90:6259.
8. Jaffer A, Bragg L. Practical tips for warfarin dosing and mon-
itoring. Cleve Clin J Med. 2003;70:36171.
9. Takeuchi F, McGinnis R, Bourgeois S, Barnes C, Eriksson N,
Soranzo N, et al. A genome-wide association study conrms
VKORC1, CYP2C9, and CYP4F2 as principal genetic determi-
nants of warfarin dose. PLoS Genet. 2009;5:e1000433.
10. Soranzo N, Cavalleri GL, Weale ME, Wood NW, Depondt C,
Marguerie R, et al. Identifying candidate causal variants respon-
sible for altered activity of the ABCB1 multidrug resistance gene.
Genome Res. 2004;14:133344.
11. Wechsler ME, Israel E. How pharmacogenomics will play a role
in the management of asthma. Am J Respir Crit Care Med.
2005;172:1218.
12. Zhang W, Dolan ME. Impact of the 1000 genomes project on the
next wave of pharmacogenomic discovery. Pharmacogenomics.
2010;11:24956.
13. Van den Broeck T, Joniau S, Clinckemalie L, Helsen C, Prekovic
S, Spans L et al. The role of single nucleotide polymorphisms in
predicting prostate cancer risk and therapeutic decision making.
Biomed Res Int. 2014; 2014. https://doi.org/10.1155/2014/
627510.
14. Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl
K, Thorn CF, et al. Pharmacogenomics knowledge for persona-
lized medicine. Clin Pharmacol Ther. 2012;92:4147.
15. The 1000 Genomes Project Consortium. A global reference for
human genetic variation. Nature. 2015;526:6874.
16. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L.
et al. A program for annotating and predicting the effects of single
nucleotide polymorphisms, SnpEff: SNPs in the genome of Dro-
sophila melanogaster strainw1118; iso-2; iso-3. Fly. 2012;6:8092.
17. Iacobucci I, Lonetti A, Candoni A, Sazzini M, Papayannidis C,
Formica S, et al. Proling of drug-metabolizing enzymes/trans-
porters in CD33+acute myeloid leukemia patients treated with
Gemtuzumab-Ozogamicin and Fludarabine, Cytarabine and Idar-
ubicin. Pharm J. 2013;13:33541.
18. Mancinelli L, Cronin M, Sadée W. Pharmacogenomics: the pro-
mise of personalized medicine. AAPS J. 2000;2:2941.
19. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW,
Haussler D, et al. The UCSC Table Browser data retrieval tool.
Nucleic Acids Res. 2004;32:D4936.
20. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR,
Bender D, et al. PLINK: A tool set for whole-genome association
and population-based linkage analyses. Am J Hum Genet.
2007;81:55975.
21. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA,
Kasowski M, et al. Annotation of functional variation in personal
genomes using RegulomeDB. Genome Res. 2012;22:17907.
22. Edgar R. Gene Expression Omnibus: NCBI gene expression and
hybridization array data repository. Nucleic Acids Res.
2002;30:20710.
23. Consortium EP, Dunham I, Kundaje A, Aldred SF, Collins PJ,
Davis Ca, et al. Integr Encycl DNA Elem Human Genome Nat.
2012;489:5774.
24. Cariaso M, Lennon G. SNPedia: a wiki supporting personal
genome annotation, interpretation and analysis. Nucleic Acids
Res. 2012;40:D130812.
Whole genome sequencing identies high-impact variants in well-known pharmacogenomic genes 135
... Such studies for example, demonstrated the utilization of WGS for the identification of putatively functional variants within well-known pharmacogenes. The result successfully represented the missing causative variants underlying drug response phenotypes [12]. However, state-of-the-art high throughput sequencing approaches result in a large amount of data, making it necessary to develop more powerful PGx-bioinformatics tools as well as assess the clinical validity and utility of sequencing-based tests [13]. ...
Article
Full-text available
This pilot study is aimed at implementing an approach for comprehensive clinical pharmacogenomics (PGx) profiling. Fifty patients with cardiovascular diseases and 50 healthy individuals underwent whole-exome sequencing. Data on 1800 PGx genes were extracted and analyzed through deep filtration separately. Theoretical drug induced phenoconversion was assessed for the patients, using sequence2script. In total, 4539 rare variants (including 115 damaging non-synonymous) were identified. Four publicly available PGx bioinformatics algorithms to assign PGx haplotypes were applied to nine selected very important pharmacogenes (VIP) and revealed a 45–70% concordance rate. To ensure availability of the results at point-of-care, actionable variants were stored in a web-hosted database and PGx-cards were developed for quick access and handed to the study subjects. While a comprehensive clinical PGx profile could be successfully extracted from WES data, available tools to interpret these data demonstrated inconsistencies that complicate clinical application.
... Phenotypes associated with pharmacogenomic outcomes are of particular interest and represent opportunities in gene discovery. GWAS-based studies have already reported numerous single nucleotide variants (SNVs), insertions and deletions (InDels), copy-number variation (CNVs), and some structural alterations in drug-related genes [43,44]. However, variants with unknown clinical significance (VUS) are a current challenge for clinical interpretation as their relevance need to be yet evidenced [45]. ...
Article
Breast cancer represents a health concern worldwide for being the leading cause of cancer-related women's death. The main challenge for breast cancer treatment involves its heterogeneous nature with distinct clinical outcomes. It is clinically categorized into five subtypes: luminal A; luminal B, HER2-positive, luminal-HER, and triple-negative. Despite the significant advances in the past decades, critical issues involving the development of efficient target-specific therapies and overcoming treatment resistance still need to be better addressed. OMICs-based strategies have marked a revolution in cancer biology comprehension in the past two decades. It is a consensus that Next-Generation Sequencing (NGS) is the primary source of this revolution and the development of relevant consortia translating pharmacogenomics into clinical practice. Still, new approaches, such as CRISPR editing and epigenomic sequencing are becoming essential for target and biomarker discoveries. Here, we discuss genomics and epigenomics techniques, how they have been applied in clinical management and to improve therapeutic strategies in breast cancer, as well as the pharmacogenomics translation into the current and upcoming clinical routine.
... Whole-exome sequencing (WES) and whole-genome sequencing (WGS) can be used not only for the diagnosis of Mendelian diseases but also for the comprehensive investigation of drug response-related variants in individuals (Katsila and Patrinos, 2015;Ji et al., 2018). Given the decreasing cost of NGS, many studies have applied WES and WGS to pharmacogenomic research and obtained novel insights (Altman et al., 2013;Ahn and Park, 2017;Sivadas et al., 2017;Sivadas and Scaria, 2018;Choi et al., 2019;Caspar et al., 2020). ...
Article
Full-text available
Medication safety and efficacy-related pharmacogenomic research play a critical role in precision medicine. This study comprehensively analyzed the pharmacogenomic profiles of the central Han Chinese population in the context of medication safety and efficacy and compared them with other global populations. The ultimate goal is to improve medical treatment guidelines. We performed whole-genome sequencing in 487 Han Chinese individuals and investigated the allele frequencies of pharmacogenetic variants in 1,731 drug response-related genes. We identified 2,139 (81.18%) previously reported variants in our population with annotations in the PharmGKB database. The allele frequencies of these 2,139 clinical-related variants were similar to those in other East Asian populations but different from those in other global populations. We predicted the functional effects of nonsynonymous variants in the 1,731 pharmacogenes and identified 1,281 novel and 4,442 previously reported deleterious variants. Of the 1,281 novel deleterious variants, five are common variants with an allele frequency >5%, and the rest are rare variants with an allele frequency <5%. Of the 4,442 known deleterious variants, the allele frequencies were found to differ from those in other populations, of which 146 are common variants. In addition, we found many variants in non-coding regions, the functions of which require further investigation. This study compiled a large amount of data on pharmacogenomic variants in the central Han Chinese population. At the same time, it provides insight into the role of pharmacogenomic variants in clinical medication safety and efficacy.
... August 2021 | Volume 12 | Article 693453 limitations of the investigation. However, the results from such PGx studies facilitate the translation of the findings of the genomic analysis into clinical practice (Choi et al., 2019). While the known PGx gene panels could be included in the WGS data and considered a source for clinical PGx and drug prescribing, the remainder of the information could still be useful for discovery studies. ...
Article
Full-text available
Pharmacogenomics (PGx) studies the use of genetic data to optimize drug therapy. Numerous clinical centers have commenced implementing pharmacogenetic tests in clinical routines. Next-generation sequencing (NGS) technologies are emerging as a more comprehensive and time- and cost-effective approach in PGx. This review presents the main considerations for applying NGS in guiding drug treatment in clinical practice. It discusses both the advantages and the challenges of implementing NGS-based tests in PGx. Moreover, the limitations of each NGS platform are revealed, and the solutions for setting up and management of these technologies in clinical practice are addressed.
... The advent of next-generation sequencing (NGS) has made possible the routine reconstruction of an individual's genetic variation profile across their whole genome 1,2 , while the introduction of NGS to clinical practice brings closer the promise of personalised medicine for diagnostic sensitivity and therapeutic precision 3,4 . In the context of pharmacogenomics, whole exome and genome sequencing combined with appropriate bioinformatics and statistical analysis has the potential to identify variants that correlate with clinical response to specific drugs, in a comprehensive, high-resolution and unbiased manner [5][6][7][8][9][10][11][12] , i.e. without the need for a prior hypothesis regarding the type (e.g. common or rare), location or distribution of genomic polymorphisms across the whole extent of the genome. We employed whole genome sequencing to better understand response variability to the antiepileptic drug levetiracetam (LEV), a third-generation first-line drug for the treatment of both focal and generalised epilepsies, for which high variability of clinical response is observed. ...
Article
In the context of pharmacogenomics, whole genome sequencing provides a powerful approach for identifying correlations between response variability to specific drugs and genomic polymorphisms in a population, in an unbiased manner. In this study, we employed whole genome sequencing of DNA samples from patients showing extreme response (n=72) and non-response (n=27) to the antiepileptic drug levetiracetam, in order to identify genomic variants that underlie response to the drug. Although no common SNP (MAF>5%) crossed the conventional genome-wide significance threshold of 5e-8, we found common polymorphisms in genes SPNS3, HDC, MDGA2, NSG1 and RASGEF1C, which collectively predict clinical response to levetiracetam in our cohort with ~91% predictive accuracy. Among these genes, HDC, NSG1, MDGA2 and RASGEF1C are potentially implicated in synaptic neurotransmission, while SPNS3 is an atypical solute carrier transporter homologous to SV2A, the known molecular target of levetiracetam. Furthermore, we performed gene- and pathway-based statistical analysis on sets of rare and low-frequency variants (MAF<5%) and we identified associations between the following genes or pathways and response to levetiracetam: a) genes PRKCB and DLG2, which are involved in glutamatergic neurotransmission, a known target of anticonvulsants, including levetiracetam; b) genes FILIP1 and SEMA6D, which are involved in axon guidance and modelling of neural connections; and c) pathways with a role in synaptic neurotransmission, such as WNT5A-dependent internalization of FZD4 and disinhibition of SNARE formation. In summary, our approach to utilise whole genome sequencing on subjects with extreme response phenotypes is a feasible route to generate plausible hypotheses for investigating the genetic factors underlying drug response variability in cases of pharmaco-resistant epilepsy.
... With the samples mapped and postprocessed, we created a multisample VCF file using FreeBayes software (12) to find SNPs, MNPs, indels, etc.VCF normalization was done using bcftools norm (13) with settings -c W -m-both v to split multiallelic variant records and to left-align and normalize indels. Functional genomic effects were added to the normalized VCF using SnpEff eff tool (14). Variants in the VCF file were then prioritized based on the relationship between samples and their biological phenotype by generating GEMINI-specific database dataset using GEMINI load --skip-pls --save-info-string -p PED file (15). ...
Preprint
Full-text available
Mitochondrial Encephalohepatopathy (MEH) is an autosomal recessive neurodevelopmental disorder usually accompanied by microcephaly, white matter changes, cardiac and hepatic failure. Here, we applied the whole-exome sequencing (WES) framework on a trio family data with unaffected non-consanguineous parents and proband (neonate girl) with this inherited disorder. A total of 2,928,402 variants were observed with 2,613,746 SNPs, 112,336 multiple nucleotide polymorphisms (MNPs), 72,610 insertions, 113,207 deletions and 16,503 mixed variants. These variations are responsible for 82,813,631 effects on various genomic regions. Our pipeline uncovered candidate gene mutations from these variants and retained a handful of 5,277 variants harboring 3,598 genes, out of which, 8 genes codes for non-coding RNA while 178 genes are those with high impact severity. Among these 178 variants, 125 are de-novo variants that are not previously reported in the ClinVar database. Consistent to previous studies, the leftover high impact severity genes are involved in encephalopathy, Leigh syndrome, Charcot–Marie–Tooth disease, global developmental disorder, seizures, spastic paraplegia, premature ovarian failure, mitochondrial myopathy-cerebellar, ataxia-pigmentary, retinopathy syndrome, ocular and retinal degeneration, deafness, intellectual disability, cardiofacioneurodevelopmental syndrome etc. All these clinical features were also observed in the patient studied. The current analysis highlights and expands the genetic architecture of the MEH phenotype. Furthermore, this pipeline on trio family data significantly broadens the concept of its usefulness as a first-tier diagnostic method in the detection of complex multisystem phenotypic disorders.
... In the first manuscript, we test the hypothesis that many genetic associations for complex traits are non-causal but are in linkage disequilibrium (LD) with unidentified causal variants. 15 We report potentially causal variants in linkage disequilibrium (LD) with genomic markers previously associated with drug response outcomes. These identified variants could account in part for the missing heritability of complex pharmacogenomic traits and improve accuracy of genetic testing. ...
Thesis
Full-text available
Advancements in high-throughput technologies and high-performance computing have enabled the discovery of tens of thousands of genetic associations for complex traits and diseases. However, these associations explain only part of the heritability, defined as the portion of the trait variability that is accounted for by genetic factors. The overarching goal of my dissertation is to investigate the missing heritability in order to improve our understanding of the underlying genomic factors. First, we tested the hypothesis that many of the associated markers for complex traits are non-causal but are in linkage disequilibrium (LD) with causal variants. We identified 27 potentially functional variants in LD with previously associated markers for drug response outcomes, which could account in part for the missing heritability of pharmacogenomic traits. Next, we examined the main effects of genetic variations as well as interaction effects with environmental exposures in determining risk of complex diseases. In our study of childhood asthma, we computed the genetic risk score (GRS) by integrating the weighed effects of multiple genetic variants and determined that early-life modifiable exposures interact with genetic risk to determine respiratory outcomes. Finally, we applied a weighted network model and machine learning algorithm to investigate biological networks associated with chemotherapy response among ovarian cancer patients and identified potentially regulatory variants (expression quantitative loci: eQTL) associated with gene expression. In summary, my thesis work demonstrates that the missing heritability of complex traits may be explained in part by accounting for polygenic effects, gene-gene interactions, gene-environment interactions, linkage disequilibrium, and integrative analysis of 'omics datasets.
... The advent of next-generation sequencing (NGS) has made possible the routine reconstruction of an individual's genetic variation profile across their whole genome 1,2 , while the introduction of NGS to clinical practice brings closer the promise of personalised medicine for diagnostic sensitivity and therapeutic precision 3,4 . In the context of pharmacogenomics, whole exome and genome sequencing combined with appropriate bioinformatics and statistical analysis has the potential to identify variants that correlate with clinical response to specific drugs, in a comprehensive, high-resolution and unbiased manner [5][6][7][8][9][10][11][12] , i.e. without the need for a prior hypothesis regarding the type (e.g. common or rare), location or distribution of genomic polymorphisms across the whole extent of the genome. We employed whole genome sequencing to better understand response variability to the antiepileptic drug levetiracetam (LEV), a third-generation first-line drug for the treatment of both focal and generalised epilepsies, for which high variability of clinical response is observed. ...
Preprint
Full-text available
In the context of pharmacogenomics, whole genome sequencing provides a powerful approach for identifying correlations between response variability to specific drugs and genomic polymorphisms in a population, in an unbiased manner. In this study, we employed whole genome sequencing of DNA samples from patients showing extreme response (n=72) and non-response (n=27) to the antiepileptic drug levetiracetam, in order to identify genomic variants that underlie response to the drug. Although no common SNP (MAF>5%) crossed the conventional genome-wide significance threshold of 5×10 ⁻⁸ , we found common polymorphisms in genes SPNS3, HDC, MDGA2, NSG1 and RASGEF1C , which collectively predict clinical response to levetiracetam in our cohort with ∼91% predictive accuracy (∼94% positive predictive value, ∼85% negative predictive value). Among these genes, HDC, NSG1, MDGA2 and RASGEF1C are potentially implicated in synaptic neurotransmission, while SPNS3 is an atypical solute carrier transporter homologous to SV2A , the known molecular target of levetiracetam. Furthermore, we performed gene- and pathway-based statistical analysis on sets of rare and low-frequency variants (MAF<5%) and we identified associations between genes or pathways and response to levetiracetam. Our findings include a) the genes PRKCB and DLG2 , which are involved in glutamatergic neurotransmission, a known target of anticonvulsants, including levetiracetam; b) the genes FILIP1 and SEMA6D , which are involved in axon guidance and modelling of neural connections; and c) pathways with a role in synaptic neurotransmission, such as WNT5A-dependent internalization of FZD4 and disinhibition of SNARE formation . Targeted analysis of genes involved in neurotransmitter release and transport further supports the possibility of association between drug response and genes NSG1 and DLG2 . In summary, our approach to utilise whole genome sequencing on subjects with extreme response phenotypes is a feasible route to generate plausible hypotheses for investigating the genetic factors underlying drug response variability in cases of pharmaco-resistant epilepsy. AUTHOR SUMMARY Levetiracetam (LEV) is a prominent antiepileptic drug prescribed for the treatment of both focal and generalised epilepsy. The molecular mechanism mediating its action is not well understood, but it involves the modulation of synaptic neurotransmition through binding to the synaptic vesicle glycoprotein SV2A. Identifying genomic polymorphisms that predict response to the drug is important, because it can help clinicians prescribe the most appropriate treatment in a patient-specific manner. In this study, we employed whole genome sequencing (WGS) of DNA samples from extreme responders or non-responders to LEV and we identified a small group of common variants, which successfully predict response to the drug in our cohort. These variants are mostly located in genes implicated in synaptic function. Furthermore, we identified significant associations between clinical response to LEV and low-frequency variants in genes and pathways involved in excitatory neurotransmission or in the moulding of neural networks in the brain. Our approach to utilise WGS on subjects with extreme response phenotypes is a feasible route to generate plausible hypotheses on the genomic basis of pharmaco-resistant epilepsy. We expect that the rapidly decreasing cost of WGS will allow conducting similar studies on a larger scale in the near future.
Chapter
Pharmacogenes in the human genome include extensive functional genetic variations. Some individuals might show unpredictable side effects and even drug resistance. DNA technologies are allowed to clarify the profile of the human genome, which could result in enhanced drug treatments. Complete genomic variants (including PGx-related markers) for an individual would be available by utilizing the WGS technique. Improving WES accuracy and its cost makes it a usable molecular diagnostic tool for assessing genetic disorders and pharmacogenetic tests. Panel-based testing has a strong position in precision medicine. A comprehensive study of variation in the transcriptome profiles of pharmacologically relevant tissues promises to yield an essential understanding of the molecular basis of variation in drug response. Target-enrichment approaches provide rapid detection and analysis of common and rare genetic variations that affect response to therapeutic drugs or adverse effects. Single-cell sequencing translation applications in precision cancer treatment can improve cancer diagnosis, prognosis, targeted therapy, early detection, and noninvasive monitoring. DNA microarrays are commonly used to analyze changes in gene expression patterns across the genome to link genes or proteins to drug responses. In summary, DNA technologies provide possibilities for more pertinent genotype-based treatment modifications and a promising future for pharmacogenomics-guided medicine.
Article
Full-text available
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Article
Full-text available
Prostate cancer (PCa) is a major health care problem because of its high prevalence, health-related costs, and mortality. Epidemiological studies have suggested an important role of genetics in PCa development. Because of this, an increasing number of single nucleotide polymorphisms (SNPs) had been suggested to be implicated in the development and progression of PCa. While individual SNPs are only moderately associated with PCa risk, in combination, they have a stronger, dose-dependent association, currently explaining 30% of PCa familial risk. This review aims to give a brief overview of studies in which the possible role of genetic variants was investigated in clinical settings. We will highlight the major research questions in the translation of SNP identification into clinical practice.
Article
Full-text available
Summary The Gene Expression Omnibus (GEO) project was initiated at NCBI in 1999 in response to the growing demand for a public repository for data generated from high-throughput microarray experiments. GEO has a flexible and open design that allows the submission, storage, and retrieval of many types of data sets, such as those from high-throughput gene expression, genomic hybridization, and antibody array experiments. GEO was never intended to replace lab-specific gene expression databases or laboratory information management systems (LIMS), both of which usually cater to a particular type of data set and analytical method. Rather, GEO complements these resources by acting as a central, molecular abundance-data distribution hub. GEO is available on the World Wide Web at http://www.ncbi.nih.gov/geo (http://www.ncbi.nih.gov/geo).
Article
Full-text available
The pace of discovery of potentially actionable pharmacogenetic variants has increased dramatically in recent years. However, the implementation of this new knowledge for individualized patient care has been slow. The Pharmacogenomics Research Network (PGRN) Translational Pharmacogenetics Program seeks to identify barriers and develop real-world solutions to implementation of evidence-based pharmacogenetic tests in diverse health-care settings. Dissemination of the resulting toolbox of "implementation best practices" will prove useful to a broad audience.
Article
Full-text available
The time is ripe to assess whether pharmacogenomics research-the study of the genetic basis for variation in drug response-has provided important insights into a personalized approach to prescribing and dosing medications. Here, we describe the status of the field and approaches for addressing some of the open questions in pharmacogenomics research and use of genetic testing in guiding drug therapy.
Article
Full-text available
As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.
Article
Full-text available
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
Article
netics and pharmacologic effects of medica-tions is determined by their importance for the activation or inactivation of drug sub-strates. The effects can be profound toxicity for medications that have a narrow therapeu-tic index and are inactivated by a polymor-phic enzyme (for example, mercaptopurine, azathioprine, thioguanine, and fluorouracil) (6) or reduced efficacy of medications that require activation by an enzyme exhibiting genetic polymorphism (such as codeine) (7). However, the overall pharmacologic ef-fects of medications are typically not mono-genic traits; rather, they are determined by the interplay of several genes encoding proteins involved in multiple pathways of drug metab-olism, disposition, and effects. The potential polygenic nature of drug response is illustrat-
Article
The Pharmacogenomics Knowledgebase (PharmGKB) is a resource that collects, curates, and disseminates information about the impact of human genetic variation on drug responses. It provides clinically relevant information, including dosing guidelines, annotated drug labels, and potentially actionable gene-drug associations and genotype-phenotype relationships. Curators assign levels of evidence to variant-drug associations using well-defined criteria based on careful literature review. Thus, PharmGKB is a useful source of high-quality information supporting personalized medicine-implementation projects.