ArticlePDF Available

SymMap: an integrative database of traditional Chinese medicine enhanced by symptom mapping


Abstract and Figures

Recently, the pharmaceutical industry has heavily emphasized phenotypic drug discovery (PDD), which relies primarily on knowledge about phenotype changes associated with diseases. Traditional Chinese medicine (TCM) provides a massive amount of information on natural products and the clinical symptoms they are used to treat, which are the observable disease phenotypes that are crucial for clinical diagnosis and treatment. Curating knowledge of TCM symptoms and their relationships to herbs and diseases will provide both candidate leads and screening directions for evidence-based PDD programs. Therefore, we present SymMap, an integrative database of traditional Chinese medicine enhanced by symptom mapping. We manually curated 1717 TCM symptoms and related them to 499 herbs and 961 symptoms used in modern medicine based on a committee of 17 leading experts practicing TCM. Next, we collected 5235 diseases associated with these symptoms, 19 595 herbal constituents (ingredients) and 4302 target genes, and built a large heterogeneous network containing all of these components. Thus, SymMap integrates TCM with modern medicine in common aspects at both the phenotypic and molecular levels. Furthermore, we inferred all pairwise relationships among SymMap components using statistical tests to give pharmaceutical scientists the ability to rank and filter promising results to guide drug discovery. The SymMap database can be accessed at and
Content may be subject to copyright.
Nucleic Acids Research, 2018 1
doi: 10.1093/nar/gky1021
SymMap: an integrative database of traditional
Chinese medicine enhanced by symptom mapping
Yang Wu1,2,, Feilong Zhang1,, Kuo Yang 3,, Shuangsang Fang2, Dechao Bu2, Hui Li2,
Liang Sun2,HairuoHu
1, Wei Wang1, Xuezhong Zhou3,*, Yi Zhao1,2,* and
Jianxin Chen1,*
1Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China, 2Key Laboratory of Intelligent
Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese
Academy of Sciences, Beijing 100190, China and 3School of Computer and Information Technology and Beijing Key
Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
Received August 15, 2018; Revised September 25, 2018; Editorial Decision October 11, 2018; Accepted October 22, 2018
Recently, the pharmaceutical industry has heav-
ily emphasized phenotypic drug discovery (PDD),
which relies primarily on knowledge about pheno-
type changes associated with diseases. Traditional
Chinese medicine (TCM) provides a massive amount
of information on natural products and the clinical
symptoms they are used to treat, which are the ob-
servable disease phenotypes that are crucial for clin-
ical diagnosis and treatment. Curating knowledge
of TCM symptoms and their relationships to herbs
and diseases will provide both candidate leads and
screening directions for evidence-based PDD pro-
grams. Therefore, we present SymMap, an integrative
database of traditional Chinese medicine enhanced
by symptom mapping. We manually curated 1717
TCM symptoms and related them to 499 herbs and
961 symptoms used in modern medicine based on
a committee of 17 leading experts practicing TCM.
Next, we collected 5235 diseases associated with
these symptoms, 19 595 herbal constituents (ingre-
dients) and 4302 target genes, and built a large het-
erogeneous network containing all of these compo-
nents. Thus, SymMap integrates TCM with modern
medicine in common aspects at both the phenotypic
and molecular levels. Furthermore, we inferred all
pairwise relationships among SymMap components
using statistical tests to give pharmaceutical scien-
tists the ability to rank and filter promising results
to guide drug discovery. The SymMap database can
be accessed at and https:
Two main approaches are used in modern drug discov-
ery: target-based drug discovery (TDD) and phenotypic
drug discovery (PDD) (1). TDD begins with a well-dened
molecular target for a specic disease, and compound li-
braries are generated from which optimal compounds with
activity against the target are identied. In contrast, PDD
does not rely on knowledge of molecular targets, but is
rather based on screening a large number of compounds
and monitoring phenotypic changes. An inuential analysis
in 2011 reported that PDD has been more productive than
TDD as a means of discovering rst-in-class drugs (2).
Isolation and further derivatization of natural products
from traditional medicines is a promising PDD strategy (3).
The traditional use of natural products has been extensively
documented in diverse cultures for millennia, and these de-
scriptions provide valuable therapeutics drug leads for spe-
cic disease phenotypes. It is shown that, of 122 traditional
medicine-derived compounds used as drugs in countries
hosting WHO-Traditional Medicine Centers, 80% were
used for their traditional purpose or a related ethnomedical
purpose (4). These ndings demonstrate the value of tra-
ditional medicinal knowledge in the quest to discover new
biologically active compounds.
Traditional Chinese medicine (TCM) provides a massive
amount of information on natural products and the clini-
cal symptoms they are used to treat (5), which are the ob-
servable disease phenotypes that are crucial for clinical di-
agnosis and treatment (6). These empirical knowledge can
shed light on PDD screening directions in modern drug dis-
covery. For example, the discovery of ephedrine, an anti-
*To whom correspondence should be addressed. Tel: +86 10 6260 0822; Fax: +86 10 6260 1356; Email:
Correspondence may also be addressed to Xuezhong Zhou. Tel: +86 10 5168 4931; Fax: +86 10 5168 4931; Email:
Correspondence may also be addressed to Jianxin Chen. Tel: +86 10 6428 6398; Fax: +86 10 6428 6398; Email:
The authors wish it to be known that, in their opinion, the rst three authors should be regarded as joint rst authors.
The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact
Downloaded from by Beijing Jiaotong University user on 01 November 2018
2Nucleic Acids Research, 2018
asthmatic drug identied by the rst TCM pharmacologist
Kehui Chen (1898–1988), was inspired by the clinical use of
the Chinese herb Ma Huang to treat asthma for >4000 years
(7). Artemisinin (qinghaosu), the rst-line drug for malaria,
was discovered by 2015 Nobel laureate Youyou Tu, who
was inspired by the Chinese herb qinghao for combating the
symptoms of malaria in TCM (8). Consequently, standard-
ization of the symptom vocabulary of TCM and further il-
lustration of the relationships between symptoms, natural
products (mainly herbs), diseases, and molecular targets has
the potential to provide novel lead/drug candidates.
Knowledge of the symptoms traditionally treated by
herbs is difcult for modern pharmaceutical scientists to
understand for two reasons. Firstly, most TCM symptom
terms are written in ancient Chinese. However, only a tiny
fraction of Chinese intellectuals alive today can understand
the denitions of TCM symptoms exactly. Secondly, to bet-
ter leverage the knowledge of TCM usage, TCM symptoms
must be mapped onto the terms for symptoms used in mod-
ern medicine (MM). As TCM is based on a holistic philos-
ophy that differs substantially from that of MM (9), this
task can be accomplished only by experts who are trained
in TCM and familiar with MM. However, the number of in-
dividuals qualied to perform this task has declined contin-
uously in recent years. Accordingly, linking TCM symptom-
herb relationships to MM, as well as the molecular mecha-
nisms underlying diseases, is urgent.
Therefore, we built a new database, SymMap, an integra-
tive database of traditional Chinese medicine enhanced by
symptom mapping. During the development of SymMap,
the difculties mentioned above were overcome by form-
ing a committee of 17 leading experts practicing TCM.
SymMap provides four types of new knowledge. Firstly, we
manually standardized TCM symptom terms and deni-
tions, which were mapped to herbs registered in the Chinese
Pharmacopoeia, the collection of TCM knowledge with
very high level of evidence. Secondly, we rigorously mapped
these TCM symptoms to MM symptoms recorded in the
unied medical language system (UMLS) via expert con-
sensus and subsequent manual verication. Thirdly, using
database mining, we mapped the knowledge of symptom-
herb relationships onto current data regarding the molecu-
lar mechanisms of TCM, including the compound composi-
tions of herbs (ingredients), their molecular targets (mainly
genes/proteins), and diseases related to symptoms or tar-
gets. Finally, we present all-versus-all pairwise associations
among all components in SymMap, with some of them an-
alyzed by statistical inference to enable pharmaceutical sci-
entists to rank and lter the most promising results.
In the last decade, several databases focusing on different
aspects of TCM knowledge have been published. For exam-
ple, the TCM-ID (10), HIT (11), TCMID (12), and TCMSP
(13) databases. These databases have undergone continu-
ous stepwise improvement as new components or aspects
of TCM have been added. However, information regarding
symptoms and phenotypes has never been curated, stan-
dardized, and connected to herbs and diseases, as well their
underlying molecular mechanisms. Thus, SymMap lls the
gap and presents the newly curated symptom-herb knowl-
edge, which can provide both pharmacological effects (phe-
notypic changes) and candidate leads for PDD screening ef-
forts. In addition, SymMap provides symptom-mechanism
mapping that will enable further analysis of the shared
symptoms and targets of multiple diseases for accelerating
drug repositioning studies.
Data sources of SymMap
SymMap contains six components: symptoms used in TCM
(TCM symptoms) and MM (MM symptoms), herbs, ingre-
dients, targets (also denoted as genes in the article) and dis-
eases (Figure 1A). Among these components, TCM symp-
toms, MM symptoms and herbs can be regarded as the phe-
notypic knowledge that are valuable for PDD programs,
whereas the ingredients, targets, and diseases consist of
molecular information derived from TDD efforts.
We introduced phenotype-level information by extract-
ing TCM symptoms and herb terms from the Chinese Phar-
macopoeia (CHPH, 2015 edition). A description about the
elds of each record in CHPH is illustrated in Supplemen-
tary Figure S1. We rstly invited 17 leading experts practic-
ing TCM (Supplementary Table S1) to manually check all
symptom terms from the CHPH (Figure 1B). The names
of TCM symptoms were standardized according to an au-
thoritative TCM publication, ‘Standardization research on
TCM Terminology’ (Zhongyiyao Mingci Shuyu Guifanhua
Yanjiu, published in 2016), and a published platform for in-
tegrating TCM terminologies (14). And we curated the de-
nition, locus, and property information for these symptoms
using another TCM publication, ‘Standardization of Patho-
logical Terminology’ (Bingzhuang Shuyu Guifanhua Jichu,
published in 2015). Then, we collected MM symptom terms
from the MeSH (version 2017) (15), SIDER (version 2017)
(16) and UMLS (version 2016) (17) databases, after which
the expert committee manually mapped TCM symptoms to
MM symptoms.
Next, we collected molecular information mainly by
database integration. For example, we integrated the in-
gredient information from the TCMID (version 2015),
TCMSP (version 2.3) and TCM-ID (version 1.0) databases.
To obtain non-redundant ingredient records for herbs,
we select from different records with common identi-
er provided by the three databases, such as CAS, Pub-
Chem CID and InChI Key etc. The target component of
the database was collected from two sources: the target
genes of TCM compounds from the HIT (version 2.0) and
TCMSP databases, and the target genes of modern dis-
eases from the HPO (version 2017) (18), DrugBank (version
5.0.0) (19) and NCBI gene (version 2018) (20)databases.
Similarly, the disease component was also merged from two
sources, namely OMIM (version 2017) (21) and Orphanet
(version 2017) (22) databases. The sources of all compo-
nents of the SymMap database are summarized in Table 1.
Direct associations among SymMap components
There are six direct associations among the components
(Figure 1A). Two relationships, the TCM symptom-herb
and TCM symptom-MM symptom associations, had never
before been included in a public database, but they were
Downloaded from by Beijing Jiaotong University user on 01 November 2018
Nucleic Acids Research, 2018 3
Figure 1. Schematic of the SymMap database. (A) Upper panel: the six components contained in SymMap are illustrated in six circles in different colors in
the middle. The blue arcs connecting the circles show the six direct associations, with the numbers of associations shown at the left. The gray dotted lines
connecting the six components show the nine indirect associations, with the numbers of associations shown in the right. Lower panel: implementation of
the functions of SymMap. (B) Illustration of the scheme for extraction, curation and standardization of TCM symptom terms and their relationships to
herbs. (C) Illustration of the scheme for expert curation of TCM symptom-MM symptom mapping.
Table 1. Overview of the data curated in SymMap
Components Data source Amount
Herbs Extracted from the Chinese pharmacopoeia
(2015 edition)
TCM symptoms Extracted, manually curated, and
standardized from the Chinese
pharmacopoeia (2015 edition)
MM symptoms Indexed in the UMLS database, and
manually mapped to TCM symptoms
Ingredients Integrated from the TCMID, TCMSP and
TCM-ID databases
19 595
Targets Integrated from the HIT, TCMSP, HPO,
DrugBank and NCBI databases
Diseases Integrated from the OMIM, MeSH and
Orphanet databases
included in SymMap for the rst time as a result of man-
ual curation by experts. Other relationships, including the
MM symptom–disease, herb–ingredient, ingredient–target
and target–disease (also referred to as gene–disease) associ-
ations, are dispersed distributed in multiple databases that
required integration.
We rstly mapped two types of direct associations to
TCM symptoms by manual curation. The TCM symptom–
herb relationships were obtained directly from the CHPH
after standardization of TCM symptom terms (Figure 1B).
TCM symptom-MM symptom mapping was conducted us-
ing an iterative process. For each TCM symptom term, three
experts were randomly selected and given a full list of MM
symptom terms. If the experts did not map the TCM term
to the same MM symptom then another expert was as-
signed, and this process was repeated until at least two ex-
perts reached an agreement. After all TCM symptoms were
mapped, manual rechecking was conducted to ensure the
accuracy of the database (Figure 1C). Note that the full list
of MM symptoms in UMLS identiers contains not only
Downloaded from by Beijing Jiaotong University user on 01 November 2018
4Nucleic Acids Research, 2018
concepts about symptoms, but also other types of terms in
modern medicine (Supplementary Table S2).
Next, we curated additional direct associations by
database integration. The MM symptom–disease relation-
ships were aligned and connected based on the HPO,
OMIM and Orphanet databases. We mapped the UMLS
ids of MM symptoms into the HPO ids rst, and then
related the HPO identiers of symptoms to the disease
terms in OMIM or Orphanet identiers based on the HPO
records. It is noteworthy that a number of diseases have both
OMIM and Orphanet identiers. In this case, we merged
the disease terms according to their names in a case insen-
sitive way. The herb–ingredient associations were merged
from the TCMSP, TCMID, and TCM-ID databases. The
ingredient–target associations were obtained from the HIT
and TCMSP databases, whereas the gene–disease associa-
tions were aligned and obtained from the HPO and OMIM
databases. For database integration, we carefully checked
the results to make sure that the nal lists were non-
redundant. The sources of all direct associations are sum-
marized in Supplementary Table S3.
Indirect associations among SymMap components
In addition to six direct associations involving adjacent
components, there were nine indirect associations involv-
ing non-adjacent components (Figure 1A). We chose to in-
fer indirect associations from combinations of direct rela-
tionships. For example, the indirect relationships between
herbs and MM symptoms, can be obtained using the TCM
symptom as a middle component (Supplementary Figure
S2A). To remove possible false positives, we used Fisher’s
Exact Test (23) to obtain reliable associations with statis-
tical signicance, and Fisher’s exact test is effective and
widely used for evaluating the reliability of biomedical as-
sociations (24). Furthermore, to control the false discovery
rate (FDR) due to multiple tests, we calculated the FDRs
according to both the Bonferroni (25)andtheBH(26)
methods from P-values. The strategy was used for the in-
ference of four indirect associations that can be connected
through a middle component between them. For exam-
ple, the indirect associations for TCM symptom-ingredient,
herb-target, ingredient-disease, and MM symptom-target
relationships were inferred through herbs, ingredients, tar-
gets, and diseases, respectively (Supplementary Figure S2B-
E). Note that for herb-MM symptom and TCM symptom–
disease relationships, we did not perform tests, but retained
all associations using the intermediates TCM symptom and
MM symptom, respectively (Supplementary Figure S2A,
F). Because the intermediate relationships manually cu-
rated by experts were sufciently convincing to retain.
For the three remaining indirect associations, which
could have been connected by at least two components, it
was required that one component was selected as the inter-
mediate. For example, we selected the disease component
for the TCM symptom–target indirect associations and ap-
plied the same test procedure used above with only one mid-
dle component (Supplementary Figure S3A). The only dif-
ference between this procedure and that mentioned above
was that the statistical inferences of TCM symptom–target
relationships were based on the TCM symptom–disease re-
lationship inferred previously, so the strategy had two steps.
Similarly, the indirect MM symptom–ingredient associa-
tions were inferred in a step-wise manner using TCM symp-
toms as the intermediate (Supplementary Figure S3B). It
is noteworthy that the herb-disease association was empha-
sized and obtained using three strategies because this rela-
tionship is an important guide for PDD (Supplementary
Figure S3C). Firstly, we manually curated a small num-
ber of herb–disease relationships from the CHPH, as the
indications for herbs in CHPH contain a fraction of dis-
ease information (Supplementary Figure S1). Secondly, we
used ingredients as the intermediate to conduct two-step
testing. Thirdly, we used MM symptoms as the intermedi-
ate to conduct two-step testing. For herb–disease pairs that
were inferred by multiple methods, the smallest P-values
and FDRs were selected as condence scores. We should
note that the one component chosen as the intermediate is
selected based on empirical knowledge. The sources of all
indirect associations are summarized in Supplementary Ta-
ble S4.
Implementation of SymMap
In summary, SymMap provides information about six com-
ponents related to TCM and MM and their pairwise re-
lationships using a convenient web interface from which
users can browse, search, visualize and download data.
SymMap is free to access at and without user registration.
The SymMap website was built using the Python-Flask
and Nginx frameworks. The SymMap data are stored in a
MySQL database. The SymMap website is compatible with
most major browsers.
Database statistics
The six curated components in SymMap include 1717 TCM
symptoms, 961 MM symptoms, 499 herbs, 19 595 ingredi-
ents, 4302 targets and 5235 diseases (Table 1). The six types
of direct associations in SymMap include 6638 herb–TCM
symptom associations, 2978 TCM symptom–MM symp-
tom associations, 48 372 herb–ingredient associations, 12
107 MM symptom–disease associations, 29 370 ingredient–
target associations and 7256 gene–disease associations (Fig-
ure 1A). The distributions of the connections for each type
of direct association are shown in Figure 2A. For example,
in the TCM symptom–herb associations, each herb is as-
sociated with 13.30 TCM symptoms on average, and each
TCM symptom is associated with 3.87 herbs on average. For
the TCM symptom-MM symptom mapping introduced by
SymMap, each TCM symptom is associated with 1.74 MM
symptoms, and each MM symptom is associated with 3.13
TCM symptoms. The details of all direct associations are
shown in Supplementary Table S3.
The compositions of all nine indirect associations are
summarized in Supplementary Table S4. As expected, the
Bonferroni method for multiple testing correction was quite
strict and gave rather small set of predictions. So we mainly
chose the result from the BH method for representation.
Downloaded from by Beijing Jiaotong University user on 01 November 2018
Nucleic Acids Research, 2018 5
Figure 2. Characteristics of the SymMap integrative network. (A) Box plots show the distribution of association numbers per item for six direct associations.
For each association between component 1–component 2, two boxes are shown. The rst box in blue shows the distribution of component 1, whereas the
second box in orange shows the distribution of component 2. (B) Bar plots show the total number of associations for seven indirect associations. For each
association, three bars are shown. The rst bar in blue shows the full set, the second bar in orange show the loosely selected set (P-value <0.05), and
the third bar in green shows the stringently selected set (FDR BH <0.05). (C) The sources of indirect herb-disease associations are shown. Associations
inferred via symptoms are shown in blue, whereas those inferred via ingredients are shown in orange, those inferred via both symptoms and ingredients
are shown in green, and those also be curated manually are shown in red. (D) The distribution of node degrees in the heterogeneous network of SymMap,
with direct associations shown in blue and indirect associations shown in orange.
For herb-MM symptom and TCM symptom–disease rela-
tionships, we provided all possible associations by network
neighbor extension (full set) because no statistical tests were
conducted. For the other seven types of indirect associ-
ation, we compared three datasets, including all possible
associations (full set), statistically signicant associations
with loose criteria (selected set, P<0.05), and statistically
signicant associations with stringent criteria (selected set,
FDR BH <0.05) (Figure 2B). The total number of asso-
ciations was reduced as stricter criteria were adopted. For
example, the full set of TCM symptom–ingredient associa-
tions contains 576 129 associations, but applying a P-value
cut-off of 0.05 leaves 275 097 associations, whereas applying
a FDR (BH) cutoff of 0.05 leaves 99 546 associations.
The herb–disease relationships were merged from several
paths and consist of 11 854 reliable associations in the strin-
gently selected set (FDR BH <0.05). We found that 4.35%
of the herb-disease associations were inferred by using both
MM symptoms and ingredients as the intermediates (Fig-
ure 2C). The same pattern was also observed for the full
set (Supplementary Figure S4A), the loosely selected set
(Supplementary Figure S4B). The two paths for connecting
herb–disease relationships via symptoms or ingredients can
more or less be analogized to PDD and TDD. The SymMap
data reveal that phenotype information facilitated the dis-
covery of ethnopharmacological candidates, which will pro-
vide a valuable resource for translational medicine stud-
ies. Furthermore, we found that 35.34% (P-value <0.05)
and 31.95% (FDR BH <0.05) of herb–disease associa-
tions from manual curation can also be inferred from sta-
tistical inference (Supplementary Table S5), which further
demonstrated the reliability of the statistical methods used
in SymMap.
Finally, we integrated all six components of SymMap,
as well as their pairwise relationships, including both di-
rect and indirect associations, with the latter chosen from
Downloaded from by Beijing Jiaotong University user on 01 November 2018
6Nucleic Acids Research, 2018
the stringently selected set with a FDR (BH) smaller than
0.05. We thus built a heterogeneous network including 32
281 nodes and 403 318 edges, with 106 721 edges represent-
ing direct associations and 296 597 edges representing in-
direct associations. The distributions of node degrees for
the direct and indirect associations are quite similar (Fig-
ure 2D). Most nodes have a degree lower than 20, with a
ratio of 95.98% for direct associations and 79.02% for in-
direct associations, which shows that the network is sparse
in most parts. Furthermore, we analyzed the shared molec-
ular interactions on disease–symptom associations using a
previously published method (6). Consistent with previous
observations, we found that when diseases are more similar
in term of shared symptoms, they tended to be linked with
each other through the underlying genes in their PPI net-
work (Supplementary Figure S5). It further demonstrates
the value of SymMap in connecting external symptom map-
ping and internal molecular mechanisms.
Functionality of SymMap
Users can browse, search and download the six components
and their pairwise relationships through the SymMap web
interface (Figure 3A). Users can click the search button in
the homepage, input a query term in the search page to
execute the search. A different search box is provided for
each of the components of SymMap, with multiple types of
search keys provided. For example, to search for a specic
MM symptom, three types of search keys are permitted,
including the symptom name, the external ID in a widely
accepted database, and multiple aliases that are collected
from diverse databases for the convenience of the users.
All types of allowable search keys are described under the
search boxes and further explained in the download page.
And users can download all search terms in key les pro-
vided by SymMap. Furthermore, users can select similar
keys immediately after inputting query terms using the au-
tocomplete search functionality included in SymMap.
After searching SymMap, matches for the input query
terms are displayed in the lower part of the search page in
a summary table with the SymMap ID as the rst column.
Users are encouraged to click the hyperlink on the SymMap
ID for detailed information. In the details page, we provide
descriptive information and relationships with other com-
ponents using network visualizations and tables. Further-
more, a list of all items in each of the six components can be
navigated in the browse page, and these lists are also down-
loadable in the website.
Using the SymMap database
After browsing or searching SymMap, users can click the
SymMap ID for each specic term to jump onto the details
page, which provides a summary panel including descriptive
information, a network panel visualizing the all-versus-all
relationships among the six components, and a list panel
showing tables of related items for the selected search key.
The summary panel displays descriptive information for the
search item (Figure 3B). In general, we provide three types
of information: identication information (e.g. name and
gene symbol), explanatory information (e.g. denition and
class), and external IDs in other databases, which can be
clicked directly to navigate into the database.
Next, the network panel provides a visualization of all
related components for the search term (Figure 3C). The
nodes in the network are colored and placed in different
locations according to the source of the component. The
node size is customized according to its degree in the net-
work. When the user holds the mouse pointer over a node,
the node will be enlarged, its related edges will be high-
lighted, and its ID and name will be shown in a balloon.
In addition, each node in the picture can be hyperlinked to
its own details page. We further provided control panels for
users to change the network layout, to zoom in and out, and
to download the network picture. To avoid the presence of
an excessive number of nodes in a network, we chose the
stringent selected set of indirect associations with FDR BH
<0.05 for the network visualization.
Finally, the list panel in the bottom of the detail page
(Figure 3D) shows the information for the network visual-
ization in tabular format, including ve tables shown to rep-
resent ve related components other than its own compo-
nent. We provided three drop-down menus for users to cus-
tomize the visualization. Firstly, the ‘display’ menu allows
the users to select which relationship to access. Secondly,
the ‘select’ menu enables the users to choose which dataset,
including the full set and the subsets with different level of
statistical inference, should be listed in the page. Thirdly,
the ‘sort’ menu gives the users a capability to sort the items
in the table according to SymMap IDs, P-values, FDRs
(BH) and FDRs (Bonferroni). Furthermore, we added a
‘download’ button for users for bulk downloading.
A major goal of biomedical research is to elucidate
phenotype–genotype relationships. Symptom phenotypes
are diligently observed by physicians and are crucial for
accurate clinical diagnosis and treatment. TCM symptoms
have been utilized in clinical applications for millennia in
a relatively large number of individuals. Therefore, TCM
symptom–herb relationships provide tremendously valu-
able guidance for drug discovery programs. In this report,
we present SymMap, a comprehensive database integrat-
ing TCM with MM via external symptom mapping and
internal molecular mechanisms. The TCM symptoms in
in SymMap, as well as their relationships with herbs and
MM symptoms, were manually curated by a committee of
17 leading TCM experts. SymMap is the rst publically
available database containing comprehensive information
regarding the relationships between TCM symptoms, TCM
herbs and MM symptoms. Furthermore, users can access
all-versus-all pairwise relationships between any two com-
ponents in SymMap as direct associations obtained from
database integration or indirect associations inferred based
on statistical tests. Therefore, we have combined phenotype-
based and target-based knowledge in SymMap to pro-
mote efcient phenotype-based compound screening un-
der the guidance of current knowledge about targets for
compounds and diseases. Users can easily access, navigate,
Downloaded from by Beijing Jiaotong University user on 01 November 2018
Nucleic Acids Research, 2018 7
Figure 3. An illustration of the SymMap search. (A) The index page of SymMap shows the database overview. (B) The summary panel in the details page
shows descriptive information for the search item. (C) The network panel in the details page shows all related components for the search item, with nodes
colored by their source component. Holding the mouse pointer over the node highlights the node and its related edges, while showing its ID and name, as
well as a link to its details page. (D) The list panel shown in tables. For each search item, ve tables can be selected for the ve other related components.
For each table, three datasets can be selected by the users: the full set, the loosely selected set with P-values smaller than 0.05, and two stringently selected
sets with FDRs (Bonferroni and BH) smaller than 0.05. All related components can be downloaded by pressing the button at the upper right.
and visualize these data, as well as the relationships be-
tween database components, using the website interface for
the SymMap database. We plan to continue to add data to
SymMap as additional information becomes available, as
well as to improve the user experience at the SymMap web-
Supplementary Data are available at NAR Online.
We thank Kui Xu for help with the network visualization.
We thank Dr Zhen Li and Dr Zhaoqing Ba for critical com-
ments during manuscript preparation.
National Key Research and Development Program
of China [2018YFC1313000, 2018YFC1313001,
2017YFC1703506]; National Natural Science Foun-
dation for Young Scholars of China [31701141, 31701149,
31501066]; National Natural Science Foundation of China
[91740113, 81522051]. Innovation Project for Institute of
Computing Technology, CAS [20186060]. Funding for
open access charge: National Natural Science Foundation
of China.
Conict of interest statement. None declared.
1. Swinney,D.C. (2013) Phenotypic vs. target-based drug discovery for
rst-in-class medicines. Clin. Pharmacol. Ther.,93, 299–301.
2. Swinney,D.C. and Anthony,J. (2011) How were new medicines
discovered? Nat. Rev. Drug Discov.,10, 507–519.
3. Harvey,A.L., Edrada-Ebel,R. and Quinn,R.J. (2015) The
re-emergence of natural products for drug discovery in the genomics
era. Nat. Rev. Drug Discov.,14, 111–129.
4. Cragg,G.M. and Newman,D.J. (2013) Natural products: a continuing
source of novel drug leads. Biochim. Biophys. Acta,1830, 3670–3695.
5. Qiu,J. (2007) Traditional medicine: a culture in the balance. Nature,
448, 126–128.
6. Zhou,X., Menche,J., Barabasi,A.L. and Sharma,A. (2014) Human
symptoms-disease network. Nat. Commun.,5, 4212.
7. Chen,K.K. (2012) A pharmacognostic and chemical study of ma
huang (Ephedra vulgaris var. helvetica). 1925. J. Am. Pharmacists
Assoc.: JAPhA,52, 406–412.
Downloaded from by Beijing Jiaotong University user on 01 November 2018
8Nucleic Acids Research, 2018
8. Tu,Y. (2011) The discovery of artemisinin (qinghaosu) and gifts from
Chinese medicine. Nat. Med.,17, 1217–1220.
9. Xue,R., Fang,Z., Zhang,M., Yi,Z., Wen,C. and Shi,T. (2013)
TCMID: Traditional Chinese Medicine integrative database for herb
molecular mechanism analysis. Nucleic Acids Res.,41, D1089–D1095.
10. Chen,X., Zhou,H., Liu,Y.B., Wang,J.F., Li,H., Ung,C.Y., Han,L.Y.,
Cao,Z.W. and Chen,Y.Z. (2006) Database of traditional Chinese
medicine and its application to studies of mechanism and to
prescription validation. Br.J.Pharmacol.,149, 1092–1103.
11. Ye,H., Ye,L., Kang,H., Zhang,D., Tao,L., Tang,K., Liu,X., Zhu,R.,
Liu,Q., Chen,Y.Z. et al. (2011) HIT: linking herbal active ingredients
to targets. Nucleic Acids Res.,39, D1055–D1059.
12. Huang,L., Xie,D., Yu,Y., Liu,H., Shi,Y., Shi,T. and Wen,C. (2018)
TCMID 2.0: a comprehensive resource for TCM. Nucleic Acids Res.,
46, D1117–D1120.
13. Ru,J., Li,P., Wang,J., Zhou,W., Li,B., Huang,C., Li,P., Guo,Z.,
Tao,W., Yang,Y. et al. (2014) TCMSP: a database of systems
pharmacology for drug discovery from herbal medicines. J.
Cheminform.,6, 13.
14. Zhou,X., Wu,Z., Yin,A., Wu,L., Fan,W. and Zhang,R. (2004)
Ontology development for unied traditional Chinese medical
language system. Artif. Intell. Med.,32, 15–27.
15. Minguet,F., Salgado,T.M., Boogerd,L.V.D. and Fernandez-Llimos,F.
(2015) Quality of pharmacy-specic Medical Subject Headings
(MeSH) assignment in pharmacy journals indexed in MEDLINE.
Res. Soc. Admin. Pharm.,11, 686–695.
16. Kuhn,M., Letunic,I., Jensen,L.J. and Bork,P. (2016) The SIDER
database of drugs and side effects. Nucleic Acids Res.,44,
17. Nadkarni,P., Chen,R. and Brandt,C. (2001) UMLS concept indexing
for production databases. J. Am. Med. Informatics Assoc. Jamia,8,
18. K¨
ohler,S., Vasilevsky,N.A., Engelstad,M., Foster,E., Mcmurry,J.,
Aym ´
e,S., Baynam,G., Bello,S.M., Boerkoel,C.F. and Boycott,K.M.
(2017) The Human Phenotype Ontology in 2017. Nucleic Acids Res.,
45, D865–D876.
19. Wishart,D.S., Knox,C., Guo,A.C., Shrivastava,S., Hassanali,M.,
Stothard,P., Chang,Z. and Woolsey,J. (2006) DrugBank: a
comprehensive resource for in silico drug discovery and exploration.
Nucleic Acids Res.,34, D668–D672.
20. Brown,G.R., Hem,V., Katz,K.S., Ovetsky,M., Wallin,C.,
Ermolaeva,O., Tolstoy,I., Tatusova,T., Pruitt,K.D., Maglott,D.R.
et al. (2015) Gene: a gene-centered information resource at NCBI.
Nucleic Acids Res.,43, D36–D42.
21. Hamosh,A., Scott,A.F., Amberger,J.S., Bocchini,C.A. and
Mckusick,V.A. (2005) Online Mendelian Inheritance in Man
(OMIM), a knowledgebase of human genes and genetic disorders.
Nucleic Acids Res.,33, 514–517.
22. Rath,A., Olry,A., Dhombres,F., Brandt,M.M., Urbero,B. and
Ayme,S. (2012) Representation of rare diseases in health information
systems: the Orphanet approach to serve a wide range of end users.
Hum. Mutat.,33, 803–808.
23. Fisher,R.A. (1922) On the interpretation of 2from contingency
tables, and the calculation of P. J. R. Statist. Soc. Ser. A,85, 87–94.
24. Guney,E., Menche,J., Vidal,M. and Bar´
abasi,A.L. (2016)
Network-based in silico drug efcacy screening. Nat. Commun.,7,
25. Bland,J.M. and Altman,D.G. (1995) Multiple signicance tests: the
Bonferroni method. BMJ,310, 170.
26. Benjamini,Y. and Hochberg,Y. (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple testing. J. R.
Statist. Soc.. Ser. B (Methodological),57, 289–300.
Downloaded from by Beijing Jiaotong University user on 01 November 2018
... [15], SymMap ( [16], Integrative Pharmacology-Based Research Platform of TCM (TCMIP, .html) ...
Full-text available
Objective: Traditional Chinese medicine (TCM) is an important part of the comprehensive treatment of hepatocellular carcinoma (HCC), and Chinese materia medica formulas with the effect of "Yiqi Jianpi" (replenishing qi and strengthening spleen) or "Jiedu" (removing toxicity) have been proved to be effective in treating HCC. However, mechanisms of these formulas in treating HCC remain unclear. In this paper, our goal is to explore the antitumor activity and its molecular mechanisms of Yiqi Jianpi Jiedu (YQJPJD) formula against HCC. Methods: The bioactive ingredients and targets of YQJPJD formula and HCC targets were screened by five Chinese materia medicas and two disease databases, respectively. The network pharmacology was utilized to construct the relationship network between YQJPJD formula and HCC, and the mechanisms were predicted by the protein-protein interaction (PPI) network, pathway enrichment analysis, bioinformatics, and molecular docking. Numerous in vitro assays were performed to verify the effect of YQJPJD formula on HCC cells, cancer-associated targets, and PI3K/Akt pathway. Results: The network relationship between YQJPJD formula and HCC suggested that YQJPJD formula mainly regulated the potential therapeutic targets of HCC by several key bioactive ingredients (e.g., quercetin, luteolin, baicalein, and wogonin). PPI network, bioinformatics, and molecular docking analyses displayed that YQJPJD formula may play an anti-HCC effect through key targets such as MAPK3, RAC1, and RHOA. Additionally, pathway analysis demonstrated that YQJPJD formula could play an anti-HCC effect via multiple pathways (e.g., PI3K-Akt and hepatitis B). Experimental results showed that YQJPJD formula could effectively inhibit the proliferation, migration, and invasion of HCC cells and promote HCC cell apoptosis in a concentration-dependent manner. Moreover, YQJPJD formula could decrease the mRNA expression of β-catenin, MAPK3, and RHOA and the protein expression of phosphorylated PI3K and Akt. Conclusion: YQJPJD formula mainly exerts its anti-HCC effect through multiple bioactive ingredients represented by quercetin, as well as multiple pathways and targets represented by PI3K/Akt pathway, β-catenin, MAPK3, and RHOA.
... [24]. Since the effective components of Pheretima and Cicadae Periostracum could not be retrieved from the TCMSP database, they were identified using SymMap (version 2.0) [25]. ...
Full-text available
Background: Allergic rhinitis (AR) is a highly prevalent chronic inflammatory disease of the respiratory tract. Previous studies have demonstrated that Bimin Kang Mixture (BMK) is effective in alleviating AR symptoms and reducing the secretion of inflammatory factors and mucin; however, the precise mechanisms underlying these effects remain unclear. Methods: We built target networks for each medication component using a network pharmacology technique and used RNA-seq transcriptome analysis to screen differentially expressed genes (DEGs) for AR patients and control groups. The overlapping targets in the two groups were assessed using PPI networks, GO, and KEGG enrichment analyses. The binding ability of essential components to dock with hub target genes was investigated using molecular docking. Finally, we demonstrate how BMK can treat AR by regulating the NF-κB signaling pathway through animal experiments. Results: Effective targets from network pharmacology were combined with DEGs from RNA-seq, with 20 intersections as key target genes. The construction of the PPI network finally identified 5 hub target genes, and all hub target genes were in the NF-κB signaling pathway. Molecular docking suggests that citric acid, deoxyandrographolide, quercetin, luteolin, and kaempferol are structurally stable and can spontaneously attach to IL-1β, CXCL2, CXCL8, CCL20, and PTGS2 receptors. Animal experiments have shown that BMK inhibits NF-κB transcription factor activation, reduces the expression of proinflammatory cytokines and chemokines IL-1β, CXCL2, IL-8, and COX-2, and exerts anti-inflammatory and anti-allergic effects. Conclusion: BMK by regulating the NF-κB signaling pathway improves inflammatory cell infiltration, regulates mucosal immune balance, and reduces airway hypersensitivity. These findings provide theoretical support for the clinical efficacy of BMK for AR treatment.
... [18], (5) Swiss Target Prediction ( [19], (6) Symptom Mapping (SymMap, [20], and (7) Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP, https://old. ...
Full-text available
The pharmacological mechanism of curcumin against drug resistance in non-small cell lung cancer (NSCLC) remains unclear. This study aims to summarize the genes and pathways associated with curcumin action as an adjuvant therapy in NSCLC using network pharmacology, drug-likeness, pharmacokinetics, functional enrichment, protein-protein interaction (PPI) analysis, and molecular docking. Prognostic genes were identified from the curcumin-NSCLC intersection gene set for the following drug sensitivity analysis. Immunotherapy, chemotherapy, and targeted therapy sensitivity analyses were performed using external cohorts (GSE126044 and IMvigor210) and the CellMiner database. 94 curcumin-lung adenocarcinoma (LUAD) hub targets and 41 curcumin-lung squamous cell carcinoma (LUSC) hub targets were identified as prognostic genes. The anticancer effect of curcumin was observed in KEGG pathways involved with lung cancer, cancer therapy, and other cancers. Among the prognostic curcumin-NSCLC intersection genes, 20 LUAD and 8 LUSC genes were correlated with immunotherapy sensitivity in the GSE126044 NSCLC cohort; 30 LUAD and 13 LUSC genes were associated with immunotherapy sensitivity in the IMvigor210 cohort; and 12 LUAD and 13 LUSC genes were related to chemosensitivity in the CellMiner database. Moreover, 3 LUAD and 5 LUSC genes were involved in the response to targeted therapy in the CellMiner database. Curcumin regulates drug sensitivity in NSCLC by interacting with cell cycle, NF-kappa B, MAPK, Th17 cell differentiation signaling pathways, etc. Curcumin in combination with immunotherapy, chemotherapy, or targeted drugs has the potential to be effective for drug-resistant NSCLC. The findings of our study reveal the relevant key signaling pathways and targets of curcumin as an adjuvant therapy in the treatment of NSCLC, thus providing pharmacological evidence for further experimental research.
Background Chronic obstructive pulmonary disease (COPD) is the third leading cause of death globally. The effect of Chinese medicine (CM) on mortality during acute exacerbation of COPD is unclear. We evaluated the real-world effectiveness of add-on personalized CM in hospitalized COPD patients with acute exacerbation. Methods This is a retrospective cohort study with new-user design. All electronic medical records of hospitalized adult COPD patients (n=4781) between July 2011 and November 2019 were extracted. Personalized CM exposure was defined as receiving CM that were prescribed, and not in a fixed form and dose at baseline. A 1:1 matching control cohort was generated from the same source and matched by propensity score. Primary endpoint was mortality. Multivariable Cox regression models were used to estimate the hazard ratio (HR) adjusting the same set of covariates (most prevalent with significant inter-group difference) used in propensity score calculation. Secondary endpoints included the change in hematology and biochemistry, and the association between the use of difference CMs and treatment effect. The prescription pattern was also assessed and the putative targets of the CMs on COPD was analyzed with network pharmacology approach. Results 4325 (90.5%) patients were included in the analysis. The mean total hospital stay was 16.7±11.8 days. In the matched cohort, the absolute risk reduction by add-on personalized CM was 5.2% (3.9% vs 9.1%). The adjusted HR of mortality was 0.13 (95% CI: 0.03 to 0.60, p=0.008). The result remained robust in the sensitivity analyses. The change in hematology and biochemistry were comparable between groups. Among the top 10 most used CMs, Poria (Fu-ling), Citri Reticulatae Pericarpium (Chen-pi) and Glycyrrhizae Radix Et Rhizoma (Gan-cao) were associated with significant hazard reduction in mortality. The putative targets of the CM used in this cohort on COPD were related to Jak-STAT, Toll-like receptor, and TNF signaling pathway which shares similar mechanism with a range of immunological disorders and infectious diseases. Conclusion Our results suggest that add-on personalized Chinese medicine was associated with significant mortality reduction in hospitalized COPD patients with acute exacerbation in real-world setting with minimal adverse effect on liver and renal function. Further randomized trials are warranted.
Full-text available
Background: The increasing demand for Chinese medicine resources has piqued the scientific community’s interest in modernizing the Chinese medicine industry. However, traditional Chinese medicine (TCM) is based on the "multiple targets and multiple components" treatment modality and involves unique treatment methods, such as "same disease with different treatments" and "treating different diseases with same method." Hence, it is difficult to elucidate the mechanisms of TCM formulations. Network pharmacology enables the analysis of the characteristics of "multiple components, multiple targets, and multiple pathways," which is consistent with the overall characteristics of TCM. Hence, network pharmacology analysis can be used to examine the active ingredients, mechanism of action, and compatibility rules of TCM, consequently providing a scientific basis for the theories of TCM. Results: Network pharmacology is a new pharmacological research method that can quickly predict the pharmacological action mechanism of TCMs. Compared with traditional pharmacological research, this method does not require complicated TCM extraction and long-term experimental verification, is fast and efficient, but has certain errors. Combined with multi-omics methods, it can reduce the error while shortening the experiment time, and it can be used in the analysis of the mechanism of action and compatibility of TCM quickly and efficiently. In addition, the application of this method in reverse pharmacology is helpful for the development of new Chinese medicines for specific diseases. In our article, we briefly summarize the commonly used databases and software and calculation methods, and pay more attention to the combination of multi-omics and TCM. At the same time, we also summarize the mechanism of single prescription, compound prescriptions, "same disease with different treatments and treating different diseases with the same method" in TCM, reverse pharmacology of TCM. Conclusions: Network pharmacology has unique advantages to active ingredients, compatibility rules, and mechanisms of action explorations of TCMs. Using network pharmacology to clarify the mechanism of TCM can enhance the global acceptance of TCM products and promote the modernization and international development of the TCM industry, as well as providing a scientific basis for the law of compatibility of TCM.
Although traditional Chinese medicine (TCM) and modern medicine (MM) have considerably different treatment philosophies, they both make important contributions to human health care. TCM physicians usually treat diseases using TCM formula (TCMF), which is a combination of specific herbs, based on the holistic philosophy of TCM, whereas MM physicians treat diseases using chemical drugs that interact with specific biological molecules. The difference between the holistic view of TCM and the atomistic view of MM hinders their combination. Tools that are able to bridge together TCM and MM are essential for promoting the combination of these disciplines. In this paper, we present TCMFVis, a visual analytics system that would help domain experts explore the potential use of TCMFs in MM at the molecular level. TCMFVis deals with two significant challenges, namely, (i) intuitively obtaining valuable insights from heterogeneous data involved in TCMFs and (ii) efficiently identifying the common features among a cluster of TCMFs. In this study, a four-level (herb-ingredient-target-disease) visual analytics framework was designed to facilitate the analysis of heterogeneous data in a proper workflow. Several set visualization techniques were first introduced into the system to facilitate the identification of common features among TCMFs. Case studies on two groups of TCMFs clustered by function were conducted by domain experts to evaluate TCMFVis. The results of these case studies demonstrate the usability and scalability of the system.
Full-text available
Background COVID-19, the current global pandemic caused by SARS-CoV-2 infection, can damage the heart and lead to heart failure (HF) and even cardiac death. The 2',5'-oligoadenylate synthetase (OAS) gene family encode interferon (IFN)-induced antiviral proteins which is associated with the antiviral immune responses of COVID-19. While the potential association of OAS family with cardiac injury and failure in COVID-19 has not been determined. Methods The expression levels and biological functions of OAS gene family in SARS-CoV-2 infected cardiomyocytes dataset (GSE150392) and HF dataset (GSE120852) were determined by comprehensive bioinformatic analysis and experimental validation. The associated microRNAs (miRNAs) were explored from Targetscan and GSE104150 databases. The potential OAS gene family-regulatory chemicals or ingredients were predicted using Comparative Toxicogenomics Database (CTD) and SymMap database. Results The OAS genes were highly expressed in both SARS-CoV-2 infected cardiomyocytes and in the failing hearts. The differentially expression genes (DEGs) in the two datasets were enriched in cardiovascular disease and COVID-19 related pathways, respectively. The miRNAs-target analysis indicated that 9 miRNAs could increase the expression of OAS genes. A variety of chemicals or ingredients were predicted regulating the expression of OAS gene family especially estradiol. Conclusion OAS gene family is an important mediator of HF in COVID-19 and may serve as a potential therapeutic target for cardiac injury and HF in COVID-19.
Allergic rhinitis (AR) is a series of reactions to allergen mediated by immunoglobulin E (IgE) and is one of the most common allergic diseases that affects children. Traditional Chinese Medicine, due to its diverse regulatory functions, may offer new strategies for AR therapy. Huanggui Tongqiao Granules (HTG) is a Chinese formula consisting of twelve herbs and has long been prescribed for patients with AR. The aim of this study is to determine the possible targets and action mechanisms of HTG for the AR treatment. SymMap database and TMNP algorithm were employed to show that interferon-gamma (IFN-gamma), acting as a molecular link between immunity and neural circuits, is the involved key target. The enrichment of immune and virus-related signaling pathways indicated the neuroimmunomodulatory potential of HTG. Then, AR mouse model was established by ovalbumin (OVA) challenge and was used to verify the therapeutic effects of HTG in vivo. HTG significantly relieved AR symptoms and nasal mucosal inflammation, reduced OVA-specific IgE levels and balanced IFN-gamma/IL-4 ratio. Moreover, transcriptional profile based on clinical data presented that blood cell-specific IFN-gamma co-expressed gene module (BIM) was underexpressed in AR patients, further validating the potential of IFN-gamma as target for AR. Collectively, these findings suggest that HTG could be a promising candidate drug for AR.
The advances of omics technologies have profound effects on TCM researches, pushing the TCM research towards a digitized, quantitative, and data-driven science.
Full-text available
Parkinson’s disease (PD) is the second most common neurodegenerative disease with a fast-growing prevalence. Developing disease-modifying therapies for PD remains an enormous challenge. Current drug treatment will lose efficacy and bring about severe side effects as the disease progresses. Extracts from Ginkgo biloba folium (GBE) have been shown neuroprotective in PD models. However, the complex GBE extracts intertwingled with complicated PD targets hinder further drug development. In this study, we have pioneered using single-nuclei RNA sequencing data in network pharmacology analysis. Furthermore, high-throughput screening for potent drug-target interaction (DTI) was conducted with a deep learning algorithm, DeepPurpose. The strongest DTIs between ginkgolides and MAPK14 were further validated by molecular docking. This work should help advance the network pharmacology analysis procedure to tackle the limitation of conventional research. Meanwhile, these results should contribute to a better understanding of the complicated mechanisms of GBE in treating PD and lay the theoretical ground for future drug development in PD.
Full-text available
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
Full-text available
The increasing cost of drug development together with a significant drop in the number of new drug approvals raises the need for innovative approaches for target identification and efficacy prediction. Here, we take advantage of our increasing understanding of the network-based origins of diseases to introduce a drug-disease proximity measure that quantifies the interplay between drugs targets and diseases. By correcting for the known biases of the interactome, proximity helps us uncover the therapeutic effect of drugs, as well as to distinguish palliative from effective treatments. Our analysis of 238 drugs used in 78 diseases indicates that the therapeutic effect of drugs is localized in a small network neighborhood of the disease genes and highlights efficacy issues for drugs used in Parkinson and several inflammatory disorders. Finally, network-based proximity allows us to predict novel drug-disease associations that offer unprecedented opportunities for drug repurposing and the detection of adverse effects.
Full-text available
Unwanted side effects of drugs are a burden on patients and a severe impediment in the development of new drugs. At the same time, adverse drug reactions (ADRs) recorded during clinical trials are an important source of human phenotypic data. It is therefore essential to combine data on drugs, targets and side effects into a more complete picture of the therapeutic mechanism of actions of drugs and the ways in which they cause adverse reactions. To this end, we have created the SIDER (‘Side Effect Resource’, database of drugs and ADRs. The current release, SIDER 4, contains data on 1430 drugs, 5880 ADRs and 140 064 drug–ADR pairs, which is an increase of 40% compared to the previous version. For more fine-grained analyses, we extracted the frequency with which side effects occur from the package inserts. This information is available for 39% of drug–ADR pairs, 19% of which can be compared to the frequency under placebo treatment. SIDER furthermore contains a data set of drug indications, extracted from the package inserts using Natural Language Processing. These drug indications are used to reduce the rate of false positives by identifying medical terms that do not correspond to ADRs.
Full-text available
Natural products have been a rich source of compounds for drug discovery. However, their use has diminished in the past two decades, in part because of technical barriers to screening natural products in high-throughput assays against molecular targets. Here, we review strategies for natural product screening that harness the recent technical advances that have reduced these barriers. We also assess the use of genomic and metabolomic approaches to augment traditional methods of studying natural products, and highlight recent examples of natural products in antimicrobial drug discovery and as inhibitors of protein-protein interactions. The growing appreciation of functional assays and phenotypic screens may further contribute to a revival of interest in natural products for drug discovery.
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses — the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
As a traditional medical intervention in Asia and a complementary and alternative medicine in western countries, Traditional Chinese Medicine (TCM) is capturing worldwide attention in life science field. Traditional Chinese Medicine Integrated Database (TCMID), which was originally launched in 2013, was a comprehensive database aiming at TCM's modernization and standardization. It has been highly recognized among pharmacologists and scholars in TCM researches. The latest release, TCMID 2.0 (, replenished the preceding database with 18 203 herbal ingredients, 15 prescriptions, 82 related targets, 1356 drugs, 842 diseases and numerous new connections between them. Considering that chemical changes might take place in decocting process of prescriptions, which may result in new ingredients, new data containing the prescription ingredients was collected in current version. In addition, 778 herbal mass spectrometry (MS) spectra related to 170 herbs were appended to show the variation of herbal quality in different origin and distinguish genuine medicinal materials from common ones while 3895 MS spectra of 729 ingredients were added as the supplementary materials of component identification. With the significant increase of data, TCMID 2.0 will further facilitate TCM's modernization and enhance the exploration of underlying biological processes that are response to the diverse pharmacologic actions of TCM.
Online Mendelian Inheritance in Man (OMIM™) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM ( is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics.
The Medical Subject Headings (MeSH) is the National Library of Medicine (NLM) controlled vocabulary for indexing articles. Inaccuracies in the MeSH thesaurus have been reported for several areas including pharmacy. To assess the quality of pharmacy-specific MeSH assignment to articles indexed in pharmacy journals. The 10 journals containing the highest number of articles published in 2012 indexed under the MeSH 'Pharmacists' were identified. All articles published over a 5-year period (2008-2012) in the 10 previously selected journals were retrieved from PubMed. MeSH terms used to index these articles were extracted and pharmacy-specific MeSH terms were identified. The frequency of use of pharmacy-specific MeSH terms was calculated across journals. A total of 6989 articles were retrieved from the 10 pharmacy journals, of which 328 (4.7%) were articles not fully indexed and therefore did not contain any MeSH terms assigned. Among the 6661 articles fully indexed, the mean number of MeSH terms was 10.1 (SD = 4.0), being 1.0 (SD = 1.3) considered as Major MeSH. Both values significantly varied across journals. The mean number of pharmacy-specific MeSH terms per article was 0.9 (SD = 1.2). A total of 3490 (52.4%) of the 6661 articles were indexed in pharmacy journals without a single pharmacy-specific MeSH. Of the total 67193 MeSH terms assigned to articles, on average 10.5% (SD = 13.9) were pharmacy-specific MeSH. A statistically significant different pattern of pharmacy-specific MeSH assignment was identified across journals (Kruskal-Wallis P < 0.001). The quality of assignment of the existing pharmacy-specific MeSH terms to articles indexed in pharmacy journals can be improved to further enhance evidence gathering in pharmacy. Over half of the articles published in the top-10 journals publishing pharmacy literature were indexed without a single pharmacy-specific MeSH. Copyright © 2014 Elsevier Inc. All rights reserved.