ArticlePDF Available

A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology


Abstract and Figures

Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020. Results As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment. Conclusion CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.
This content is subject to copyright. Terms and conditions apply.
Heetal. Journal of Biomedical Semantics (2022) 13:25
A comprehensive update onCIDO:
thecommunity-based coronavirus infectious
disease ontology
Yongqun He1*, Hong Yu2*, Anthony Huffman1, Asiyah Yu Lin3,4, Darren A. Natale5, John Beverley4,6, Ling Zheng7,
Yehoshua Perl8, Zhigang Wang9, Yingtong Liu1, Edison Ong1, Yang Wang1,2, Philip Huang1, Long Tran1,
Jinyang Du1, Zalan Shah1, Easheta Shah1, Roshan Desai1, Hsin‑hui Huang1,10, Yujia Tian11, Eric Merrell12,
William D. Duncan13, Sivaram Arabandi14, Lynn M. Schriml15, Jie Zheng16, Anna Maria Masci17, Liwei Wang18,
Hongfang Liu18, Fatima Zohra Smaili19, Robert Hoehndorf19, Zoë May Pendlington20, Paola Roncaglia20,
Xianwei Ye2, Jiangan Xie21, Yi‑Wei Tang22, Xiaolin Yang9, Suyuan Peng23, Luxia Zhang23, Luonan Chen24,
Junguk Hur25, Gilbert S. Omenn1, Brian Athey1 and Barry Smith4,12
Background: The current COVID‑19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have
resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe
vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to
integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important
role in standard‑based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initi‑
ated the development of the community‑based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.
Results: As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other
existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO
has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS‑CoV‑2 protein terms from the
Protein Ontology, COVID‑19‑related phenotype terms from the Human Phenotype Ontology, and over 100 COVID‑19
terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents
variants of SARS‑CoV‑2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and
methods. CIDO also describes hundreds of host‑coronavirus protein‑protein interactions (PPIs) and the drugs that
target proteins in these PPIs. CIDO has been used to model COVID‑19 related phenomena in areas such as epidemiol‑
ogy. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has
been used in various applications such as term standardization, inference, natural language processing (NLP) and
clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences
between SARS‑CoV‑2 Delta and Omicron variants. CIDO’s integrative host‑coronavirus PPIs and drug‑target knowl‑
edge has also been used to support drug repurposing for COVID‑19 treatment.
© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco
mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Open Access
1 University of Michigan Medical School, Ann Arbor, MI, USA
2 People’s Hospital of Guizhou Province, Guiyang, Guizhou, China
Full list of author information is available at the end of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
Coronavirus diseases pose major challenges to public
health. In addition to the current Coronavirus Disease
2019 (COVID-19) pandemic, Severe Acute Respiratory
Syndrome (SARS) [1] and Middle East Respiratory Syn-
drome (MERS) [2] are two other severe human corona-
virus diseases that have arisen in the past two decades.
e World Health Organization (WHO) declared the
COVID-19 outbreak as a pandemic on March 11, 2020;
at that time there were 118,326 confirmed cases and 4292
deaths globally [3]. As of April 27, 2022, the number of
COVID-19 confirmed cases has risen to over 500 million
confirmed cases, resulting in over 6 million deaths glob-
ally. e dramatic increase of COVID-19-related cases
and deaths over 2 years illustrates the urgent need for
collaborative research on coronavirus diseases, especially
COVID-19, by researchers around the world.
Extensive COVID-19 research has been conducted
since the start of the pandemic. For example, there have
been over 250,000 COVID-19-related papers recorded
in PubMed as of April 2022. ese research articles
cover various domains such as etiology, epidemiology,
and biotechnology. e initial wave of research arti-
cles focused on characterization of the original Wuhan
strain of SARS-CoV-2 [4], the molecular interactions of
putative and confirmed SARS-CoV-2 molecules [5], and
the unique disease phenotype of COVID-19 [6]. Dur-
ing this time, many novel and repurposed medical treat-
ments were developed and authorized to treat or prevent
COVID-19. is included research to develop effective
COVID-19 vaccines [7] and COVID-19 drug treatments
[8]. However, the emergence of new SARS-CoV-2 vari-
ants with unique traits prompted novel research inves-
tigating the fundamental molecular mechanisms of
virulence and transmission associated with these variants
roughout the COVID-19 pandemic, epidemiologi-
cal data from across the globe has been collected for
viral sequences and human demographics. In the era
of Information Technology and big data, biomedical
research has become data-intensive with the genera-
tion of increasingly large, complex, multidimensional,
and diverse datasets. e explosion of valuable data and
knowledge related to COVID-19 fits the 5Vs of big data
(volume, veracity, velocity, variety, and value) [10, 11] and
represents a wealth of knowledgerelated to SARS-CoV-2.
However, these studies are often stored in non-interoper-
able data repositories which resist integration, creating a
major bottleneck for COVID-19 research. e resultant
non-harmonized data and knowledge cannot be easily
analyzed by standard Artificial Intelligence (AI)/Machine
Learning (ML) techniques. e development of com-
puter-interpretable, integrative, interoperable ontologies
can contribute to needed data harmonization.
Such observations led to the development of a com-
munity-based, interoperable Coronavirus Infectious
Disease Ontology (CIDO) for standardized and efficient
representation, integration, and analysis of coronavirus
disease data. CIDO was initiated by He and Yu in early
2020 when the COVID-19 became endemic in China.
CIDO was accepted into the Open Biomedical Ontolo-
gies library in March 2020, and was initially reported in
a Comment article in the journal Scientific Data [12]. In
that article, CIDO was introduced as a community-driven
open-source OBO library ontology providing standard-
ized, computer-interpretable terminological content for
various coronavirus infectious diseases, including their
etiology, transmission, epidemiology, pathogenesis, host-
coronavirus interactions, diagnosis, prevention, and
treatment. Additionally, it was shown how host-coro-
navirus interaction mechanisms could be represented
using CIDO resources and axioms, and how such rep-
resentation could be used to aid in the identification of
potential COVID-19 treatment options based on exist-
ing knowledge of drug mechanisms of action. Indeed, it
was reported that CIDO provided instrumental guidance
during literature mining processes in which 72 chemi-
cal drugs and 27 monoclonal or polyclonal antibodies
that exhibit anti-coronavirus effects in invitro or invivo
experimental studies were identified. e Scientific Data
article closed by inviting researchers from across the
world to contribute to CIDO development and applica-
tion. We are pleased to report that there has been an out-
pouring of community support, and substantial CIDO
development and application since that time.
CIDO was presented at the 2020 International Confer-
ence on Biomedical Ontology (ICBO-2020) [13]. Sub-
sequently, authors AYL, YQH, SA, and WD organized
a “Workshop on COVID-19 Ontologies” (WCO 2020)
in October 2020 (https:// github. com/ CIDO- ontol ogy/
WCO), which led to the on-going harmonization of 9
COVID-19 related ontologies. Of these ontologies, CIDO
Conclusion: CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on
COVID‑19. It supports shared knowledge representation, data and metadata standardization and integration, and has
been used in a range of applications.
Keywords: Coronavirus, COVID‑19, SARS‑CoV‑2, Ontology, Phenotype, Diagnosis, Vaccine, Drug repurposing
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
subsumed the COVID-19 Infectious Disease Ontology
(IDO-COVID-19) and initiated alignment with the Con-
trolled Vocabulary for COVID-19 (COVoc). e ontology
harmonization effort was also presented in ICBO-2021
[14]. Since then, CIDO has been further developed to
include more terms and relations in many areas, such as
host responses to SARS-CoV-2 infection [15], host-cor-
onavirus protein-protein interactions, and COVID-19
diagnosis and vaccines. is journal manuscript provides
a comprehensive introduction to the current version of
CIDO, its development, and representative applications.
Coronavirus disease‑related data collection
Supplemental Table1 provides a summary of our coro-
navirus disease-related data repository, comprising
data collected from literature (primarily PubMed and
PubMed Central) and from openly available databases.
e classifications of viral variants and amino acid vari-
ants were obtained from GISAID (https:// www. gisaid.
org/), NextStrain (https:// nexts train. org/), and WHO.
Anti-coronaviral drug information was taken primarily
from DrugBank [16] and from data annotated using the
Chemical Entities of Biological Interest (ChEBI) ontology
[17], COVID-19 diagnostic testing data in this reposi-
tory are derived from five major sources: (i) the FDA
EUA diagnostic testing website (https:// www. fda. gov/
medic al- devic es/ coron avirus- disea se- 2019- covid- 19-
emerg ency- use- autho rizat ions- medic al- devic es/ in- vitro-
diagn ostics- euas); (ii) the AdveritasDx database (http://
adver itasdx. com/); (iii) the LOINC In Vitro Diagnostic
(LIVD) Test Code Mapping for SARS-CoV-2 Tests pro-
duced through the collaboration of the FDA, CDC, IICC,
Regenstrief Institute, and APHL (https:// www. cdc. gov/
csels/ dls/ sars- cov-2- livd- codes. html), and (iv) COVID-
19 diagnostic testing kits authorized for use in China
(provided by YT). ese resources are developed inde-
pendently and are integrated and are annotated in incon-
sistent ways. One major task of our work is to use CIDO
to support COVID-19 data integration through consist-
ent annotations.
CIDO ontology development
CIDO development followed OBO Foundry ontology
development principles (e.g., openness and collaboration)
(4), and utilized the eXtensible Ontology Development
(XOD) strategy, which prescribes: ontology term reuse,
semantic alignment, use ofontology design patterns for
new term generation, and community effort [18]. CIDO’s
development started with the reuse and alignment of
terms and relations from existing ontologies using the
Ontofox tool [19]. We used reference ontologies such as
the Ontology for Biomedical Investigations (OBI) [20],
Chemical Entities of Biological Interest (ChEBI) [17],
Human Disease Ontology (DOID) [21], Human Pheno-
type Ontology (HP) [22], and Infectious Disease Ontol-
ogy (IDO) [23] (Supplemental Table2). CIDO terms are
aligned under Basic Formal Ontology (BFO) [24], a top-
level ontology conformant to the ISO/IEC 21,838 stand-
ard (https:// www. iso. org/ stand ard/ 74572. html). BFO
is a domain-neutral framework that has been adopted
by more than 450 ontologies as starting point for the
creation of terms and definitions in specific domains. It
thereby provides a mechanism for overcoming interop-
erability issues which arise when the attempt is made to
integrate ontologies deriving from different sources.
For the generation of terms from domains ranging from
amino acid variants to diagnostic medical kits, we devel-
oped relevant ontology design patterns and then used the
Ontorat tool [25] to automate term generation. For man-
ual term generation and editing, we used the Protégé-
OWL editor [26], providing new CIDO specific terms
with International Resource Identifiers that start with
“CIDO_” followed by 7 automatically generated digits.
We worked closely with ontology development com-
munities to support coronavirus related ontology
development. For example, we worked with the Pro-
tein Ontology (PR) on generating PR representations of
SARS-CoV-2 proteins which were subsequently imported
into CIDO. We also periodically submitted issue trackers
to other related ontology efforts, for example requests for
over 40 specimen-related terms submitted to the Ontol-
ogy for Biomedical Investigations (OBI) (https:// github.
com/ obi- ontol ogy/ obi/ issues/ 1176, also: https:// github.
com/ CIDO- ontol ogy/ cido/ issues/7). e relevant terms
with OBI identifiers and definitions were then imported
back into CIDO. Additionally, we have generated many
new relations in CIDO to meet our needs, some of which
have been proposed for inclusion in the OBO Relation
Ontology (RO) [27].
CIDO is designed to support COVID-19 data FAIR-
ness (i.e., findability, accessibility, interoperability, and
reusability) [28, 29]. Our ontology development is pri-
marily task-focused and use-case driven. For COVID-19
diagnosis modeling, for example, a team of clinical doc-
tors, diagnosticians, and ontologists, was formed to study
COVID-19 diagnosis background [30, 31], collect and
annotate available diagnosis kits, focus on specific diag-
nosis use cases such as [32], design the relevant ontology
patterns, and then implement the latter in CIDO.
CIDO status, source code, deposition, andlicense
CIDO source code is freely available with the CC-BY
license on the GitHub website https:// github. com/ CIDO-
ontol ogy/ cido. CIDO has been deposited to the Ontobee
ontology repository (http:// www. ontob ee. org/ ontol ogy/
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
CIDO) the BioPortal repository (https:// biopo rtal. bioon
tology. org/ ontol ogies/ CIDO), and the OLS repository
(https:// www. ebi. ac. uk/ ols/ ontol ogies/ cido).
Visual analysis ofCIDO bysummarization network
e Ontology Abstraction Framework (OAF) tool [33]
was used to generate a color image of the layout of the
ontology hierarchy (Fig. 1 in Supplemental File 1). To
provide a more comprehensible visualization of the most
recent version of CIDO, we used the Weighted Aggregate
Partial-Area Taxonomy (WAT) summarization network
analysis method [34]. By comparing this version with
older versions of CIDO we were able to track the evo-
lution of the ontology, as summarized in Supplemental
CIDO applications
In the present communication we describe several appli-
cations of CIDO. One use case is the comparative analy-
sis of the shared and different amino acid variants found
in the Delta and Omicron variants, with the purpose
of better understanding the mechanisms of coronavi-
rus evolution, transmission, and virulence. Another use
case is a SARS-CoV-2 drug repurposing study. Using
the knowledge represented and classified in CIDO, we
systematically queried the host-coronavirus protein-
protein interactions, anti-coronavirus drugs, and protein
targets of different drugs, with the goal of identifying and
designing possible drugs with a potential for optimized
treatment performance.
The upper level structure anddesign pattern ofCIDO
Figure1 lays out the high-level hierarchical structure of
CIDO and shows the various imported external ontolo-
gies. Areas related to the coronavirus infectious disease
represented by CIDO include: coronavirus taxonomy,
coronavirus variants, genes and proteins and their muta-
tions, phenotypes, diseases, epidemiology, diagnosis,
host-coronavirus protein-protein interactions, vaccines,
and drugs. All the terms are aligned under the top-level
Basic Formal Ontology (BFO) (7) (Fig.1). CIDO imports
terms from over 20 reference ontologies from the OBO
ontology library, with the representative ontologies intro-
duced in Supplemental Table2 and Fig.1.
In addition to importing terms from existing ontolo-
gies, we have also generated many CIDO-specific terms
e.g., resources for SARS-CoV-2 viral variants, amino
acid mutations, and diagnostic medical device kits. New
axioms, such as those linking different types of proteins
and other molecules that are related to host-coronavirus
protein-protein interactions (PPIs) and drug-target inter-
actions, have also been developed for CIDO. In the ver-
sion released on August 1, 2022, there are 370 relations
Fig. 1 Top level hierarchical structure of class terms represented in CIDO. Abbreviations in parentheses indicate an entity’s source ontology
(Supplemental Table 2)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
used in CIDO, including 87 relations newly generated
with “CIDO_” prefix. Admittedly, some of the newly
generated relations in CIDO may be more suitable for
the more general level Relation Ontology (RO) [27];
future research will involve further refinement of these
Our previous Comment paper in Scientific Data [12]
describes the general CIDO design pattern that lays out
the relationships among selected major entities modeled
in the ontology. In the next sections, we provide details
of specific ontological modeling and representation pro-
vided in CIDO.
Ontological classication ofcoronaviruses andcoronavirus
CIDO imports resources from the NCBITaxon to rep-
resent various coronaviruses and their relations [13].
SARS-CoV and SARS-CoV-2 belong to the Sarbecovi-
rus, a subgenus of the genus Betacoronavirus. MERS-
CoV belongs to Merbecovirus, a sibling to Sarbecovirus.
Four human coronavirus strains (229E, NL63, HKU1, and
OC43) cause mild common colds in humans, where 229E
and NL63 belong to Alphacoronavirus, and HKU1 and
OC43 belong to Embecovirus under Betacoronavirus.
We have generated 39 CIDO specific classes to rep-
resent specific COVID-19 viral variants. CIDO defines
distinct viral variants of SARS-CoV-2 based on 3 classi-
fication methods: GISAID clades [35], PANGO lineages
[36], and WHO clades [https:// www. who. int/ en/ activ
ities/ track ing- SARS- CoV-2- varia nts/]. A viral variant is
defined as a virus that has undergone variation such that
there is a characteristic set of mutations in comparison
to the reference virus sequence. ese variants include
various genetic mutations resulting in changes in trans-
mission, infectivity, and virulence as compared to the
original Wuhan reference strain. e GISAID clades and
PANGO lineages both utilize the same data set but uti-
lize different clustering algorithms to designate specific
variants. PANGO lineages also differ by defining char-
acteristic mutations that occur in a majority of specific
SARS-Cov-2 variants while GISAID variants define uni-
versal mutations. e following examples illustrate these
three hierarchies:
‘SARS-CoV-2 Delta virus’: ‘is a’ some ‘SARS-CoV-2
based on WHO classification’
‘SARS-CoV-2 BA.5 virus’ ‘is a’ some ‘SARS-CoV-2
based on PANGO lineage’
‘SARS-CoV-2 clade G virus’: ‘is a’ some ‘SARS-
CoV-2 based on GISAID clades’
WHO utilizes GISAID clade and PANGO lineage rep-
resentations as synonyms for epidemiologically relevant
variants, designated either as a Variant of Concern (VoC)
or as a Variant of Interest (VoI) [15]. VoIs are variants
that are identified as having the potential to become
VoCs through causing increased transmission or worse
disease processes. VoCs remain designated as such until
they are no longer prevalent.
Ontological representation ofSARS‑CoV‑2 proteins
CIDO imports terms for SARS-CoV-2 proteins from the
Protein Ontology (PR) and terms for SARS-CoV-2 genes
from the Ontology of Genes and Genomes (OGG), a sim-
plified representation of which is shown in Fig.2. Gene
terms are based on those found in the NCBI Gene data-
base [37] while proteins are as given by UniProtKB [38]
[https:// www. unipr ot. org/ unipr ot/? query= prote ome:
up000 464024], with cross-reference information from
NCBI RefSeq [https:// www. ncbi. nlm. nih. gov/ prote in?
term= (sars- cov-2% 20Wuh an- Hu-1% 20AND% 20ref seq%
5Bfil ter% 5D)]. CIDO represents only those genes that
are described in NCBI Gene, and only those proteins
(and their derivatives) that are described in UniProtKB.
ere are other protein open reading frames (ORFs) such
as ORF2b (aka S.iORF1) [39], ORF-Sh and ORF-Mh [40],
which are held in reserve, but they will be added should
they gain experimental or database support. A full com-
parison between PR, RefSeq, and UniProtKB is given in
Supplemental Table3 with respect to accessions, genes,
and names used (protein length and evidence for exist-
ence are also presented).
In general, PR uses SARS-CoV-2 protein names as
given in UniProtKB and gene names as given in RefSeq,
wherever these are available. A key difference between
the PR representation and those of RefSeq and Uni-
ProtKB is that the former has a single record for each
proteolytic cleavage product of the ORF1ab (aka rep)
gene, whileeach of the latter resources has two records
for the subset of products that are encoded by both the
polyprotein 1a (pp1a, aka ORF1a) and the polyprotein
1ab (pp1ab, aka ORF1ab) transcript (where the latter is
the result of -1 ribosomal frameshifting). Both polypro-
teins are further processed by proteolytic cleavage; pro-
cessing of either will yield ten identical chains (Fig.2A,
light blue box), while one additional chain is unique to
ORF1a and five additional chains are unique to ORF1ab
(green boxes). In addition, PR unites each of the poly-
proteins under the grouping term ‘rep gene translation
product’ (the synonym is used here to prevent confusion
with the ORF1ab transcript-derived polyprotein). Several
proteins are translated from alternative ORFs within or
overlapping transcripts that also produce longer proteins
(red boxes). One of these, ORF9b, has been demonstrated
(in SARS-CoV-1) to use leaky ribosome scanning [41];
potentially this mechanism applies to the others as well,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
though the existence of the ORFs labeled ‘putative’ is
questionable [42]. All SARS-CoV-2 proteins are grouped
under ‘severe acute respiratory syndrome coronavirus
2 protein’. In total—not counting the grouping terms—
there are forty SARS-CoV-2-related PR terms. Currently,
none of these represent proteoforms with amino acid
modifications; these will be added in the future.
Ontological representation ofSARS‑CoV‑2 amino acid
In addition to the representation of viral variants, CIDO
also defines and represents various amino acid (AA) vari-
ants. Similar to the viral variant definition, an AA vari-
ant is defined in CIDO as “An amino acid in a protein
that varies from another amino acid in comparison to
the reference protein. CIDO further defines the object
property ‘is characteristic AA variant’ to describe a rela-
tion between an AA variant and a protein where the AA
variant is a characteristic AA variant of a specific viral
variant. An AA variant is defined as characteristic when
the presence of the AA can be used to identify the AA
variant. We characterize these variants by comparing the
amino acid at a given position to the reference wild-type
strain. For example, the D614G mutation in the spike
polyprotein (S:D614G) is well known for emerging in sev-
eral VoCs and has been proven to increase SARS-CoV-2
infectivity [43]. e CIDO class ‘D-614G in SARS-CoV-2
S protein’ (where S protein is just as the spike protein)
has the following axioms (Fig.2):
‘D-614G in SARS-CoV-2 S protein’:
‘characteristic AA variant of’ some ‘SARS-CoV-2
Omicron variant’
‘is a’ some ‘AA variant in SARS-CoV-2 S protein S1
RBD region’
Fig. 2 SARS‑CoV‑2 proteins and genes. A PR modeling of SARS‑CoV‑2 proteins. B OGG modeling of SARS‑CoV‑2 genes. Black lines represent the
‘has gene template’ relation connecting proteins to genes. Red boxes denote proteins translated from ORFs that are internal to or overlap with
those of the longer indicated gene (red arrows). The light blue box indicates proteins that are produced by proteolytic processing of either replicase
polyprotein 1a or replicase polyprotein 1ab, while green boxes indicate those that derive specifically and uniquely from pp1a or pp1ab
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
‘has amino acid position’ value 614
‘has part’ some ‘glycine residue’
‘has mutated from’ some ‘aspartic acid
However, the above framework does not work well for
describing characteristic deletions or other mutation
events. As the amino acid that was deleted does not exist,
this leads to issues where the ontology asserts that some-
thing holds of ‘all coronaviral amino acids’. To address
this issue, we define the AA deletion as a process. Moreo-
ver, this variation process can be generalized to include
any mutation event. e relationship between the dele-
tion process and a resulting AA variant, is defined as:
A888- deletion in SARS-CoV-2 S protein’: ‘is AA
mutation of’ some ‘SARS-CoV-2 S protein’
as shown in Fig.3.
Host phenotype modeling inCIDO
CIDO contains terms for 18 symptoms and 22 comor-
bidities commonly found in COVID-19 patients [44].
ese symptoms and comorbidities are mapped to phe-
notypes in the Human Phenotype Ontology (HP) from
where they are imported back into CIDO. To link these
symptoms and comorbidities as they occur in relation to
COVID-19, we have also generated new relations ‘disease
susceptibly has phenotype’ and disease susceptibly severe
with comorbidity’. e first relation represents the rela-
tion between a disease process and a phenotype where
the person with the disease is susceptible to having that
phenotype. e second is a shortcut relation between a
disease process which is susceptible to becoming more
severe when the patient has the comorbidity. Examples of
usage of these relations are:
SARS-CoV-2 disease process: ‘disease susceptibly has
phenotype’ some Fever.
SARS-CoV-2 disease process: ‘disease susceptibly
severe with comorbidity’ some hypertension.
CIDO also represents the relation between SARS-
CoV-2 variant and specific phenotypes, for example, the
relation between the Delta variant and the formation of
syncytia in lungs [45]:
‘Delta variant disease process’: ‘bearer of disease
susceptible to phenotype’ some syncytia
Fig. 3 CIDO modeling of AA variants and mutations. CIDO represents AA variants as material entities if they are substitutions and AA mutations as
processes to represent deletions in SARS‑CoV‑2 microbial variants. Both AA variants utilized analogous axioms due to differences in continuants and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
We are in the process of evaluating and submitting some
of our newly generated relations to the OBO Relation
Ontology (RO) as they may be more appropriate for
inclusion there. For example, we have submitted two
new relation terms ‘evolves into and ‘evolves from’ to the
RO issue tracker (https:// github. com/ oborel/ obo- relat
ions/ issues/ 620). If these relations are added to RO, we
will then obsolete our original CIDO relation terms and
replace them with the new RO terms.
Ontological modeling ofepidemiology andpublic health
CIDO includes many terms related to the epidemiol-
ogy of COVID-19, derived primarily from the Infectious
Disease Ontology (IDO) [23] and the Virus Infectious
Disease Ontology (VIDO) [14]. Recent research [46, 47]
highlights the importance of viral load to SARS-CoV-2
transmission rates. Indeed, Wuhan, Delta, and Omicron
strains are associated with distinct peak viral loads with
respect to different demographics. VIDO character-
izes ‘viral load’ as the proportion of virions to volume of
a given portion of fluid in which the virions are located.
VIDO provides a datatype property ‘has viral load meas-
urement’ which supports representation of viral load val-
ues. For example, an instance of OBI’s class blood plasma
specimen from an instance of a host infected by SARS-
CoV-2 can be (partially) represented as having a viral
load value in the following manner:
‘blood plasma specimen 1’ rdf:type ‘blood plasma
and ‘has part’ some ‘SARS-CoV-2’
and ‘has viral load measurement’ value 108
Additionally, VIDO provides virus-specific terminologi-
cal content that can be extended in CIDO to represent
other important epidemiological terms, such as COVID-
19 prevalence, SARS-CoV-2 infectivity, and COVID-19
mortality rate.
Moreover, CIDO includes resources needed for com-
parison of transmission differences among SARS-CoV-2
variants. e Omicron variant is significantly more
transmissible than the reference Wuhan strain and Delta
strain. e transmission rate is often represented using
R0, the basic reproduction number that measures the
transmissibility of infectious agents [48]. e average R0
values for the Wuhan reference strain, Delta strain, and
Omicron BA.1 strain are 2.69 [49], 5.02 [50], and 9.05
[51], respectively. Accordingly, we have generated a data
property relation ‘has average R0’, which can be used to
represent the R01 value of each variant:
‘SARS-CoV-2 reference strain: ‘has average R0’ value
‘SARS-CoV-2 Delta variant’: ‘has average R0’ value
‘SARS-CoV-2 Omicron BA.1 variant’: ‘has average
R0’ value 9.05
COVID‑19 diagnosis testing modeling inCIDO
During a pandemic, the availability of fast and accurate
diagnostic testing is essential to control the situation.
Because SARS-COV-2 is a novel virus, the traditional
pathway to approve a testing kit to be used in the market
will not satisfy the urgent demand in a timely manner. In
the US, an Emergency Use Authorization (EUA) under
Section564 of the Federal Food, Drug, and Cosmetic Act
(FD&C Act) allows the special authorization and use of
drugs and other medical products during emerging infec-
tious disease threats such as the COVID-19 pandemic.
From 2020 March until now, the US Food and Drug
Administration (FDA) has authorized hundreds of dif-
ferent types of invitro diagnostic tests under the EUA
authorizations. To make those EUA diagnostic testing
data Findable, Accessible, Interoperable, and Reusable
(FAIR) [28], it is important that the testing kits used are
registered in a structured and machine-readable manner.
CIDO comprises representations of 345 molecular and
serological diagnostic tests authorized by the FDA. We cre-
ated a term ‘COVID-19 diagnostic testing device’ and its
child term ‘FDA EUA authorized COVID-19 diagnostic test-
ing device, where the latter is to be the home of all FDA EUA
authorized InVitro Diagnostics (IVD) tests for COVID-19.
An example representation of the TaqPath COVID-19
Combo Kit from ermo Fisher Scientific, Inc., which
was authorized under an EUA authorization (https://
www. fda. gov/ media/ 136113/ downl oad) is shown in
Fig.4, which lays out the current CIDO representation
of device, assay, diagnostic process and genes that the
test is designed to detect. A device ‘TaqPath COVID-19
Combo Kit’ is capable of a ‘COVID-19 RT-PCR assay’.
is test detects the existence of N, S and ORF-1ab gene
regions that are part of the corresponding genes of the
SARS-CoV-2 reference strain. We created a short-cut
relation‘PCR kit detects gene’ to represent a direct rela-
tionship between a diagnostic testing kit and the target
gene/sequence fragments. Another short-cut relation
device utilizes material’ was created to link the diagnos-
tic testing and the tested specimen. is relation can be
logically represented as a property chain (https:// github.
com/ oborel/ obo- relat ions/ issues/ 497):
is particular diagnostic testing kit can utilize 6 speci-
men types, as again shown in Fig.4. e following axiom
represent the ontological arrangement of such a relation
using a union of 6 specimen terms:
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
‘device utilizes material’ some (‘nasopharyngeal
swab specimen’ or ‘oropharyngeal swab specimen’
or ‘anterior nasal swab specimen’ or ‘mid-turbinate
nasal swab specimen’ or ‘nasopharyngeal aspirate
specimen’ or ‘bronchial alveolar lavage’)
Using the strategy defined here, we systematically col-
lected and used CIDO to model and represent over 300
molecular and serological diagnostic tests, including 225
SARS-CoV-2 RT-PCR assays, authorized by US FDA. All
the 343 tests are annotated with a total of ten COVID-19
diagnostic technologies, such as RT-PCR, LAMP, Next
Generation Sequencing, a CRISP-based method, ELISA,
lateral flow immunoassay, chemiluminescent, and so on.
CIDO modeling andrepresentation ofhost‑coronavirus
protein‑protein interactions and drugs
CIDO represents over 300 experimentally verified
host-coronavirus protein-protein interactions (PPIs),
over 300 anti-coronaviral chemicals and/or their cor-
responding drugs, and over 400 drug targets. Here
the coronaviral proteins may derive from SARS-CoV,
MERS-CoV, or SARS-CoV-2. In early 2020, we per-
formed literature mining and identified 110 chemi-
cal drugs and 26 antibodies effective, either invitro or
invivo, against at least one human coronavirus infec-
tion, where the human coronaviruses involved are
primarily SARS-CoV and MERS-CoV [52]. Our onto-
logical representation, classification, and analysis of
these drugs yielded many potentially valuable scien-
tific insights. Since early 2020, we have collected more
drugs and chemicals with a focus on those against
SARS-CoV-2. Furthermore, we have collected and
annotated representations of further PPIs and chemi-
cal-drug interactions.
All CIDO-represented host-coronavirus PPIs are
experimentally verified and reported in the literature.
For example, CIDO has recorded 332 physically asso-
ciated PPIs identified by the affinity-purification mass
spectrometry assay [5]. ese PPIs involve both pro-
teins from the SARS-CoV-2 side and the host side, and
many of these coronaviral and host proteins are also
targets of multiple drugs.
In CIDO, each host-coronavirus PPI is defined to
have at least two participants, including one protein
from a coronavirus and one from its host. For example,
the ‘host-SARS-CoV-2 protein-protein interaction’ is
defined as:
(‘has participant’ some ‘SARS-CoV-2 protein’) and
(‘has participant’ some (organism and ‘has role’
some ‘host role’))
Figure 5 illustrates how CIDO represents hundreds of
host-SARS-CoV-2 PPIs, drug active ingredients, and
chemical-protein interactions. Specifically, there are
three specific PPIs under the class ‘SARS-CoV-2 nsp5
protein interaction with host protein’, such as ‘SARS-
CoV-2 nsp5 protein binding to human HDAC2’. is
example PPI has two participants:
‘has participant’ some ‘3C-like proteinase (SARS-
‘has participant’ some ‘histone deacetylase 2
Fig. 4 Modeling of COVID‑19 diagnostic testing using CIDO. *, only two out of six specimen terms are shown in this figure
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
Note that 3C-like proteinase, another name for nsp5, can
be inhibited by the chemical nirmatrelvir, a component
of the Pfizer drug Paxlovid. Human histone deacetylase
2 (i.e., HDAC2), can be inhibited by a chemical ‘Valp-
roic Acid’, which has been found valuable against SARS-
CoV-2 [53]. ese relations are logically defined in CIDO
as follows (Fig.5B and C):
’nirmatrelvir’: ‘chemical inhibits protein’ some
‘3C-like proteinase (SARS-CoV-2)’
‘Valproic Acid’: ‘chemical inhibits protein’ some ‘his-
tone deacetylase 2 (human)’
Anti‑coronavirus vaccine representation inCIDO
As the developers of the Vaccine Ontology (VO) [54], we
(YH, AL, AH, PH) first represented a total of over 100
COVID-19 vaccines at different stages (licensed, author-
ized, in clinical trials, or verified with laboratory animal
models) in VO, and then imported these terms from VO
to CIDO (Fig.1, Supplemental Table2). In total, we have
imported over 300 terms from the VO to CIDO. Fur-
thermore, we have developed Cov19VaxKB, a web-based
Integrative COVID-19 vaccine knowledge base, which
has used ontologies including the VO to represent, clas-
sify, and analyze various COVID-19 vaccines and vaccine
components (e.g., vaccine adjuvants), and vaccine adverse
events [55]. We have also developed reverse vaccinology
and machine learning methods to predict vaccine anti-
gen candidates [56]. e functions and immune mecha-
nisms of these candidates are being further analyzed
using ontology-based approaches [15]. Furthermore, we
have been using CIDO and other ontologies including
the Ontology of Adverse Events (OAE) to systematically
examine adverse events associated with SARS/MERS/
COVID-19 vaccine candidates.
Clinical metadata type representation inCIDO
To support classification and analysis of clinical data,
CIDO includes representations of many clinical metadata
types. Metadata is the data that provides information
about other data. In our study of COVID-19 related clini-
cal data, we have focused on two use cases: the analysis of
vaccine adverse events using the VAERS data resource as
described above and the analysis of the clinical data from
the National COVID Cohort Collaborative (N3C) pro-
gram [57]. e N3C system is a collection of harmonized
clinical data on COVID-19 from contributing data part-
ners. N3C data is represented using the OMOP common
data model (CDM). From the OBO ontology point of
view, OMOP has its issues such as the lack of semantics,
ambiguities, and hidden assumptions [58]. In our N3C
related clinical data study, we have focused on the map-
ping of the OMOP CDM elements and OBO ontologies
andadding semantic relations among terms.
Table1 lists the representative clinical metadata types
that are primarily mapped to the OMOP CDM ele-
ments. ese are general clinical data types applicable to
studies not only of COVID-19 but also of other human
diseases. As a result, all these terms are imported from
other reference OBO ontologies. e Ontology of Preci-
sion Medicine and Investigation (OPMI) [59, 60], another
Fig. 5 Host‑coronavirus protein‑protein interactions (PPIs) and drugs targeting the viral or host proteins. A The hierarchy of PPIs, including
‘SARS‑CoV‑2 nsp5 protein binding to human HDAC2’. B The chemical nirmatrelvir (a component of the Pfizer drug Paxlovid) is an inhibitor of the
virus protein nsp5 (i.e., 3C‑like proteinase), which is critical for viral replication. C A chemical ‘Valproic Acid’ is an inhibitor of the HDAC2 (i.e., histone
deacetylase 2). Valproic acid is also a valuable candidate against SARS‑CoV‑2
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
OBO library ontology, has been used as a major reference
ontology to represent those clinical data types not found
in other OBO ontologies (Table1). After the mapping of
OMOP CDM elements to OBO ontologies, we imported
these mapped terms to CIDO to support COVID-19 clin-
ical data annotation and analysis.
In the OMOP / N3C data structure, each concept set
groups terms into what are called value sets. A value set
is a set of codes selected from those defined by one or
more code systems to specify which codes can be used
in a particular context. However, their grouping is heu-
ristic and not ontology-based. e ontology support is
an ongoing project. OMOP2OBO is the first health sys-
tem-wide integration and alignment system that system-
atically maps over 23,000 concepts from OMOP standard
clinical terminologies to OBO concepts [61]. While
OMOP2OBO is more focused on the value set mapping,
our mapping and further term generation (Table 1) is
more focused on the small set of the core OMOP CDM
concept set meta elements. e two complementary
systems can be used together to support robust clinical
COVID-19 data annotation, integration, and analysis.
Visual evolution analysis ofCIDO
To provide a condensed and comprehensive visualization
of CIDO, we have previously developed a new Weighted
Aggregate Partial-Area Taxonomy (WAT) summariza-
tion network method and used it to analyze an early
version (version 1.0.108) of CIDO with a total of 5138
concepts [34]. Since then, newer versions of CIDO that
include more concepts have been generated. To evalu-
ate these new additions to CIDO, we have generated a
new WAT summarization network that visualizes CIDO
version 1.0.306 with 10,853 concepts (Fig.6). As shown
in Fig.6, major branches of CIDO include infectious dis-
eases, genes, vaccines, chemicals, and COVID-19 testing
Comparing the old version (Fig. 2 in Supplemental
File1) with the new, we can identify which nodes had a
considerable increase in the number of new descendant
terms. For example, “COVID-19 vaccine” (120){48} [72]
has been added to the ontology visualization (Fig.6). e
number (120) means that the term “COVID-19 vaccine”
includes 120 descendant terms, with 48 of those aggre-
gated from 48 descendant nodes of “COVID-19 vaccine,
each of which has only one term (less than b = 42), and
72 representing all other descendant terms of the large
partial-area “COVID-19 vaccine” before the aggrega-
tion. By expanding this node in the manner supported
by the OAF tool, we can see some interesting newly
added vaccine terms such as “Pfizer–BioNTech COVID-
19 vaccine”, “Moderna COVID-19 vaccine”, “Oxford–
AstraZeneca COVID-19 vaccine”, and “Nanocovax”.
In contrast, the old version includes only one term for
“COVID-19 vaccine” without any descendant term.
Another example is “FDA EUA authorized COVID-
19 diagnostic testing device” (345){229}[116] in Fig. 6
including terms “COVID-19 Nucleic Acid RT-PCR Test
Kit” and “BinaxNOWTM COVID-19 Ag Card Home
Test” for which there are no corresponding terms in the
old version.
Use cases ofCIDO
CIDO has been proposed and used in many applications
by us or thewider community as exemplified by refer-
ences [15, 44, 52, 6267]. Five use cases of our own appli-
cation of CIDO are introduced here.
Table 1 Representative clinical metadata types covered in CIDO. All listed examples are considered classes in the ontology
Metadata types Metadata Examples
person (NCBITaxon_9606) person ID (OPMI_0000470), gender (PATO_0001894), year of birth (OPMI_0000473), race (NCIT_C17049),
ethnicity (NCIT_C16564), care site (OPMI_0000479), geographic location (GAZ_00000448)
specimen (OBI_0100051) specimen ID (OBI_0001616), date of specimen collection (OBIB_0000714), anatomical structure
visit occurrence (OPMI_0000482) visit occurrence identifier (OPMI_0000483), visit start date (OPMI_0000487), visit end date (OPMI_0000488),
preceding visit occurrence (OPMI_0000492), ER visit (OPMI_0000486)
procedure occurrence (OPMI_0000505) procedure (NCIT_C25218), procedure start date (OPMI_0000508), procedure end date (OPMI_0000510), care
provider (OPMI_0000163)
drug exposure (OPMI_0000572) and
device exposure (OPMI_0000554) drug (CIDO_0000167), drug exposure start time (OPMI_0000565), drug exposure end time (OPMI_0000567),
medical device (NCIT_C16830), diagnostic kit (CIDO_0000453)
clinical measurement (CMO_0000000) clinical measurement identifier (OPMI_0000582), care provider (OPMI_0000163), measurement time
(OPMI_0000579), measurement unit label (IAO_0000003), measurement date (OPMI_0000580)
observation period (OPMI_0000575) observation period start date (OPMI_0000577),
observation period end date (OPMI_0000578),
provenance of observation record (OPMI_0000522)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
(1) Ontology-based coronavirus-related knowledge and
data standardization, annotation, mapping, integra-
tion, and inferencing, supporting advanced COVID-
19 data analysis
As a reference ontology in the field of coronavirus
infectious disease, CIDO provides a standard representa-
tion and definitions of terms and axioms in various areas
related to COVID-19 and other coronavirus diseases.
Fig. 6 The weighted aggregate taxonomy (WAT) for CIDO (version 1.0.306) with 10,853 concepts (b = 42). A white node inside a colored
rectangular box represents a partial‑area, which is a group of concepts having the same set of nonhierarchical (lateral) relationships and similar
semantics denoted by the concept listed inside the white node. Relationships are listed inside the colored box (inherited ones are not shown). The
boxes are color‑coded by cardinalities of their sets of lateral relationships. Upward arrows are the hierarchical relationships connecting partial‑areas.
The weight of a partial‑area is defined as the number of descendant concepts. A partial‑area with a weight less than b is small and is aggregated
into its closest ancestor large partial‑area. A large partial‑area having no aggregated partial‑areas is represented as a rectangle white box with one
number indicating the number of summarized concepts. A large partial‑area having aggregated partial‑areas is represented as a rectangle with
rounded corners and with three numbers. The first number inside () is the number of summarized concepts including concepts aggregated from
small partial‑areas, the second number inside {} is the number of small partial‑areas aggregated into it, and the third number inside [] is the number
of concepts of the partial‑area before the aggregation. See more details in Supplemental File 1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 13 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
e above sections have provided details on how CIDO
standardizes and classifies terms and relations in differ-
ent domains related to coronavirus diseases. Usage of the
CIDO standard representation enhances data FAIRness,
annotation, and integration.
e COVoc Controlled Vocabulary for COVID-19 is
an application ontology developed by the European Bio-
informatics Institute (EMBL-EBI) and the Swiss Institute
of Bioinformatics (SIB) in March 2020 [14]. e primary
usage of COVoc is to enable seamless annotation of bio-
medical literature to core databases and tools at ELIXIR
(a European-wide intergovernmental organization for life
sciences). COVoc utilizes existing OBO ontologies and
other vocabularies to augment connections to other use-
ful resources such as the COVID-19 Data Portal (https://
www. covid 19dat aport al. org/), as well as assisting in the
curation and annotation of COVID-19 literature. CIDO
has been working with COVoc to ontologize many terms
in COVoc for better COVID-19 data annotations.
In addition to the USA and Europe, CIDO has also
been applied in many other countries including China.
CIDO has also been recommended as one of the seman-
tic standards in areas related to clinical data integration
and annotations by the National Population Health Data
Center in China (NPHDC). It is included in their popula-
tion health data archive (PHDA) [68] and provides ontol-
ogy services in MedPortal [69]. And it has been also used
for the construction of knowledge graphs about COVID-
19 [70].
Since CIDO incorporates multiple different types of
knowledge about coronavirus diseases, it can be used
both to query and infer new scientific insights and to
reason from analysis of clinical data. is reasoning is
enabled by the structure of the knowledge base used by
CIDO. CIDO provides a T-box vocabulary, i.e., a general
terminological constraints for representing COVID-19
phenomena. CIDO’s vocabulary can then be used to gen-
erate new data once instance-level data, the set of which
in the knowledge base is called the A-box, has been
ingested into the knowledge base. Data organized by
CIDO is multipliedin value through the inferences ena-
bled by the ontological axioms included within it.
An example in our ontology-based clinical COVID-
19 data analysis is our analysis of differential COVID-19
symptoms during the early pandemic [44]. In this study,
we classified different symptom phenotypes in relation to
pandemic locations, time periods, and comorbidities. e
18 most common COVID-19 symptoms were mapped
to the HPO terms and imported to CIDO. Based on the
HPO classification, we grouped these symptoms into
further categories. For example, we grouped 4 COVID-
19 related symptoms (nausea, vomiting, abdominal pain,
and diarrhea) under abdominal system symptoms, and we
grouped three symptoms (headache, loss of smell, and
loss of taste) under nervous system symptoms. In addi-
tion, CIDO provides semantic representation of knowl-
edge learned from clinical data analysis. An example is
our representation of how symptoms and comorbidi-
ties are linked to COVID-19 disease [44]. Note that we
emphasize the use of ‘susceptibility’ (a subclass of ‘dispo-
sition’) to represent this knowledge, for example when
dealing with clinical phenotypes, vaccine/drug adverse
events, and immune deficiency association.
Another use case is the CIDO modeling of the molecu-
lar mechanisms of acute kidney injury (AKI) [71]. AKI
is a commonly found phenotype among hospitalized
COVID-19 patients. Our extensive literature mining
and analysis of the BioGRID COVID-19 interaction data
identified 3 key physiological processes (i.e., RAS activa-
tion, complement activation, and systemic inflammation)
and many interactors like CD147, CD209, CypA, and
MASP2 that are heavily implicated in these processes.
CIDO was used to represent our analyzed results, lead-
ing to further understanding of the COVID-19 associated
AKI mechanisms [71, 72].
(2) CIDO queries for Delta and Omicron differences for
better mechanistic understanding of virulence and
Among many SARS-CoV-2 variants, the Omicron
strain is more transmissible but less virulent than the
Delta strain, and both strains are more transmissible
than the Wuhan reference strain [7375]. We hypoth-
esized that these differences reflect underlying differ-
ences in amino acid (AA) variants. CIDO includes 92
specific CIDO terms representing characteristic muta-
tions and 35 further mutations that are not considered
as characteristic. CIDO allows for easy comparison of
coronaviral AA variants that are associated with specific
SARS-CoV-2 variants. To address the above hypothesis,
we can perform specific queries to compare the AA vari-
ants in the two strains with the aim of uncovering the
molecular mechanisms underlying the different pheno-
types (Fig.7).
Figure7A shows a DL query that searches CIDO for the
characteristic amino acid variants shared between SARS-
CoV-2 Delta strain and Omicron strain. e results
show four such variants: D614G and T478K in S protein,
K856R in pp1a [nsp3] protein, and P314L in pp1b [nsp12]
protein. S:D614G increases infectivity by allowing for
a greater binding ratio of the S-protein trimer units to
hACE2 [76]. T487K has similarly shown to increase
the actual binding affinity to SARS-Cov-2 [77]. While
the specific effects of K856R and P314L are unknown,
both mutations are located in proteins responsible for
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 14 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
viral replication [78, 79]. K856R is located in the region
responsible for cleaving the non-structural proteins from
pp1ab [78]. P314L however, is part of the RNA polymer-
ase which is responsible for viral replication [79].
Considering the significant role of S protein in binding
and entry to the host cells, we hypothesize that Omicron
has AA variants located in S protein that can explain
the high transmission rate and high immune evasion of
Omicron. Using the DL query, we found 45 AA variants
in Omicron (Fig.7B), including 33 in S, 4 in pp1a, 3 in
M, 2 in each of E and pp1b proteins, and 1 in N protein.
Among these AA variants, many have been associated
with changes in antibody recognition and consequently
evasion. ese include: S:E484K, S:N501Y, S:H69-, and
S:144Y [76, 8082] and are predominantly located on the
N-Terminal Domain (NTD) of the S protein. e riboso-
mal binding domain of the S protein, however, has AA
variants that affect binding to the S protein, and thus cell
entry into SARS-CoV-2.
As further evidence of how inferencing with CIDO
may be used to generate novel information, a Description
Logic (DL)-query further found 18 AA variants in the
Delta strain (Fig.7C), including 10 in S protein, 3 in each
of pp1b/nucleocapsid (N) proteins, and 1 in each of E/M/
pp1a proteins. Compared to one AA variant (RG203KR)
in the Omicron N protein, 3 AA variants (D377Y, D63G,
and R203M) exist in the Delta N protein. e SARS-
CoV-2 nucleocapsid (N) protein is an RNA-binding pro-
tein critical for viral genome packaging [83], and it is also
involved in the coronavirus pathogenesis [84]. Delta was
found to have reduced pathogenicity due to altered cell
tropism but less transmissibility and immune evasion
ability [74]. e fact of more variants in the N protein in
the Delta variant likely contributes to the differences in
transmission and virulence.
(3) CIDO-supported NLP for clinical and basic mecha-
nism research
Given the large volumes of COVID-19 related text in
the literature and in electronic health records (EHRs), it
is impossible for humans to extract useful information
from what is available in a short period of time. In such
cases, Natural Language Processing (NLP) is required,
and ontology can be used to significantly enhance the
performance of NLP [8587].
Understanding how pathogen and host genes inter-
act during infection can help to identify critical targets
of intervention or prevention. In this connection CIDO
has been used to support literature mining in relation to
the molecular host-coronavirus interactions. SciMiner,
our in-house tool for mining scientific literature using
dictionary- and rule-based methods [88], has been inte-
grated with biomedical ontologies and applied to the
study of vaccine-associated gene interaction networks
[89, 90]. Using coronavirus-specific genes and proteins
covered in CIDO and in the Interaction Network Ontol-
ogy (INO) [91], we have applied SciMiner to perform
literature mining on host-coronavirus interactions. Fig-
ure 8 illustrates a gene-gene interaction network we
constructed in February 2022 using a subset of SciMiner
mining results from > 220 K COVID-19-related articles
in LitCovid [92]. Two noticeable subclusters were iden-
tified, largely related to viral invasion (right), involving
Fig. 7 Query CIDO amino acid (AA) variants for Delta and Omicron strain comparison and basic transmission and virulence mechanism
understanding. A DL query for AA variants shared by Delta and Omicron strains. B DL query for amino acid variants that belong to Omicron. C DL
query for amino acid variants that belong to Delta. Current AA variants for Omicron and Delta strains are also characteristic AA variants
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 15 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
S protein and host genes such as ACE2 and TMPRSS2,
and host immune response (left), including cytokines and
proinflammatory responses. is network summarizes
the major host-pathogen interactions of SARS-CoV-2
virus and host and can be further expanded with other
vaccine components and serve as the foundation for min-
ing analyses.
CIDO has also been used in EHR mining from clinical
COVID-19 patient data in a recently proposed open NLP
development framework that addresses the issues of NLP
process heterogeneity and human factor variations [93].
A COVID-19 NLP algorithm was developed under the
open NLP development framework. Specifically, the algo-
rithm shared through the Open Health NLP (OHNLP)
(https:// github. com/ OHNLP), was first used to identify
COVID-19-associated terms including various signs and
symptoms (e.g., cough and fever) from the EHR notes of
COVID-19 patients from three N3C participant institu-
tions, including Mayo Clinic, the University of Kentucky,
and the University of Minnesota at Twin Cities. e iden-
tified terms were then mapped to the codes represented
in CIDO. ese codes are primarily imported from refer-
ence ontologies such as HPO and also cross-referenced
to other ontologies or terminologies including UMLS
[94], SNOMED-CT [95], MeSH [96], and MedDRA [97].
e usage of CIDO in the open NLP development frame-
work supports the normalization of clinical NLP results
from different N3C participant sites, leading to enhanced
data integration and analysis in the future.
(4) CIDO-based machine learning and drug cocktail
design for COVID-19 treatment
Anti-coronaviral drug design has been our first CIDO
use case since the beginning of CIDO development
[12] and we have systematically collected SARS/MERS/
SARS-2 drug data for this purpose [52, 62], along with
SARS-CoV-2 specific drug and host-coronavirus PPI
data. ese data have been used for machine learning
and cocktail drug design as detailed below.
e drug-target linkage knowledge recorded in CIDO
has been used to support candidate COVID-19 drug
prediction (Smaili et al., WCO-2020: https:// github.
com/ CIDO- ontol ogy/ WCO). Specifically, the OPA2Vec
machine learning method [98] was used to transform
the CIDO knowledge and other related information to
Fig. 8 Host‑SARS‑CoV‑2 gene‑gene interaction network using SciMiner on the litCovid paper abstracts. Color represents the type of genes: pink
(viral), green (host gene directly co‑cited with pathogen genes at the sentence level), and cyan (host gene co‑cited with the green host genes in
at least 30 or more COVID‑19 papers). Node size corresponds to the number of connections and edge thickness corresponds to the number of
co‑citing papers
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 16 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
vectors, which were further used as the input to predict
the drugs targeted for COVID-19. Our preliminary study
found that the drugs against SARS-CoV-2 exhibit pat-
terns which overlap with but are yet different from exper-
imentally identified drug candidates against SARS-CoV
and MERS-CoV [99]. More detailed information is being
produced and analyzed.
It is still a major challenge to develop a fully effective
drug for COVID-19 treatment. Hundreds of chemicals
and drugs have been experimentally verified to have
anti-coronavirus function [52, 100]. Paxlovid from Pfizer,
Molnupiravir from Merck, and Remdesivir [101] have
been authorized for emergency usage; however, their
effectivity remains low. In our previous paper, we pro-
posed a host-coronavirus interaction (HCI) checkpoint
cocktail that would interrupt the important checkpoints
in the dynamic host-coronavirus interaction (HCI) net-
work [62]. We hypothesized that such a cocktail of drugs
would be more effective than the current COVID-19 vac-
cines. e question is then how to design this cocktail
by identifying the HCI checkpoints and inferring how to
interrupt them.
CIDO provides a solution to support rational HCI
checkpoint classification and cocktail drug design as laid
out in the above cocktail hypothesis. As earlier described
and shown in Fig.5, CIDO logically represents host-cor-
onavirus protein-protein interactions (PPIs) and drugs
targeting the viral or host proteins in the PPIs. Different
proteins and PPIs have different roles in the HCI lead-
ing to disease outcomes. Major checkpoints such as the
coronavirus entry (through S-ACE2 binding) and rep-
lication can then be defined. Interestingly, all the three
drugs, Paxlovid (consisting of nirmatrelvir and ritona-
vir), Molnupiravir, and Remdesivir function by inhibiting
enzymes responsible for coronavirus replication. Specifi-
cally, nirmatrelvir inhibits SARS-CoV-2 3C-like protease
(i.e., nsp5) to stop the virus from replicating (Fig. 5),
and ritonavir slows down nirmatrelvir’s breakdown to
help keep it in the body for longer at higher concentra-
tions. is 3C-like protease is responsible for cleaving
polyproteins 1a and 1ab of SARS-CoV-2 into nonstruc-
tural proteins that are critical for viral replication. Mol-
nupiravir and Remdesivir interfere with the action of
RNA-directed RNA polymerase (RdRp), which is critical
to viral replication as well. Based on our HCI checkpoint
cocktail hypothesis, we would propose to include a drug
targeting the viral entry, which can be used together with
one of the existing drugs targeting the viral replication. A
deeper CIDO-based study is ongoing to apply CIDO for
the cocktail drug design.
We (authors: ZW and YH) have implemented the cock-
tail strategy in our newly developed DrugXplore program
(http:// medco de. link/ drugx plore/), which extends the
OmicsViz program [8, 64]. Specifically, we used the host-
coronavirus PPI and drug-target interaction data repre-
sented in CIDO and other resources such as BioGRID
[102] to find drugs targeting different HCI processes. Fig-
ure9 shows one result of our DrugXplore data analysis.
A total of 232 drugs were identified to target three coro-
navirus processes (i.e., viral entry, genome replication,
and viral release) and/or one host anti-coronaviral pro-
cess (i.e., cytokine activity), and two drugs (i.e., copper
and artenimol) were shared to target all four processes
(Fig.9). Many reports have found copper and artenimol
and their derivative drugs are potent potential drugs for
COVID-19 treatment [103108].
is manuscript provides a comprehensive update on
the development and applications of the community-
based Coronavirus Infectious Disease Ontology (CIDO).
Our study demonstrates that CIDO provides an ideal
platform to integrate important data needed to research
different coronavirus disease-related entities such as cor-
onavirus and host taxonomy, coronavirus proteins and
genes, protein variants, epidemiology, diagnostic medi-
cal devices, phenotypes, host-coronavirus interactions,
drugs, and vaccines. e ontological representation of
CIDO supports integrative representation and analysis
of COVID-19 and other human coronavirus diseases. A
visual evolution analysis of CIDO was performed. Five
groups of CIDO applications are introduced, including
COVID-19 data annotation and inferencing, Delta and
Omicron comparisons, clinical data analysis, NLP, and
COVID-19 drug repurposing.
Given intensive coronavirus research during the
COVID-19 pandemic, we have conducted very active
CIDO development and applications. Within a lit-
tle more than 2 years, CIDO has grown to include over
10,000 terms, of which over 1500 terms are CIDO spe-
cific. Meanwhile, we acknowledge that CIDO has not yet
covered all related areas and some areas of representa-
tion (e.g., host-coronavirus interactions, epidemiology,
and public health) are still not fully covered. Many appli-
cations (e.g., machine learning, N3C data analysis, and
drug repurposing design) have started but still need more
time to achieve breakthrough outcomes. However, we
have demonstrated many progresses and achievements in
different applications in this manuscript.
An ongoing CIDO development effort is to actively
model and represent various mechanisms of the molec-
ular and cellular interaction between the hosts and
coronaviruses. Such modeling will provide the foun-
dation for our rational drug repurposing and vaccine
development. For example, in our previous drug stud-
ies [52, 62], we extracted and analyzed the interactions
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 17 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
between anti-coronavirus drugs and their target pro-
teins. ese anti-coronavirus drugs were identified to
be effective against coronavirus infections in vitro or
invivo. It is likely that some of the drug targets partici-
pate in active host-SARS-CoV-2 interactions leading to
severe COVID-19 disease outcomes. Deeper modeling
and representation of the intricate host-virus-drug
interactions would help us in better drug repurposing
We will continue our ontology harmonization effort
to harmonize different COVID-19 related ontologies
[14]. We will continue to update CIDO to handle the
description of coronaviral variants. is is to account
for immune escape and for previously designed treat-
ments and vaccines losing efficacy. We will keep using
CIDO as a platform to standardize different corona-
virus-related metadata types and apply them for the
standardization and enhanced analysis of specific con-
ditions defined in different experimental and clini-
cal studies, and how these conditions would affect the
disease outcomes. We will also identify and develop
more applications that implement CIDO for different
Being a community-based ontology, CIDO is com-
mitted to serving the community and to drawing on
contributions from the community. CIDO is created
to be open and freely available for use. It is an inter-
operable ontology that reuses and interlinks to existing
ontologies and resources. We are always ready to accept
new ideas and critiques. More researchers and develop-
ers are welcome to join our community-based effort to
advance CIDO and its applications.
AKI: Acute Kidney Injury; BFO: Basic Formal Ontology; ChEBI: Chemical Entities
of Biological Interest; CIDO: Coronavirus Infectious Disease Ontology; DL
query: Description Logics query; DO: Disease Ontology; DRON: Drug Ontol‑
ogy; GO: Gene Ontology; AO: Information Artifact Ontology; INO: Interaction
Network Ontology; N3C: National COVID Cohort Collaborative; NCBITaxon:
NCBI organismal classification; OBI: Ontology for Biomedical Investigations;
OBO: The Open Biological and Biomedical Ontologies; OWL: Web Ontology
Language; PR: Protein Ontology; RDF: Resource Description Framework;
SPARQL: SPARQL Protocol and RDF Query Language; UBERON: Uberon multi‑
species anatomy ontology.
Fig. 9 SARS‑CoV‑2 drug screening based on the drug cocktail strategy. A total of 232 drugs were identified to have their protein targets involving
three coronavirus processes (i.e., viral entry, genome replication, and viral release) and/or host anti‑coronaviral processes (i.e., cytokine activity). Two
drugs (i.e., copper and artenimol) were shared to have protein targets involved in all four processes. The drug screening study was performed using
the DrugXplore program (http:// medco de. link/ drugx plore/)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 18 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
Supplementary Information
The online version contains supplementary material available at https:// doi.
org/ 10. 1186/ s13326‑ 022‑ 00279‑z.
Additional le1: Supplemental le 1. Visualization of the Evolution of
Additional le2: Supplemental Table1. Resources used for our corona‑
virus disease‑related data collection.
Additional le3: Supplemental Table2. CIDO statistics including terms
imported from major reference ontologies.
Additional le4: Supplemental Table3. Protein Ontology representa‑
tion of SARS‑CoV‑2 proteins. Comparative information in RefSeq and
UniProtKB is also provided.
We acknowledge Dr. Melissa A Haendel’s contribution as a source of ontologi‑
cal content and the N3C use case.
Authors’ contributions
YH: CIDO developer and co‑initiator, use case modeling, project design.
HY: CIDO developer and co‑initiator, COVID‑19 diagnosis and pathogenesis
domain expert. AH: CIDO developer, collection and modeling of coronavirus
and amino acid variants and host‑coronavirus interactions. AYL: CIDO devel‑
oper, esp. in COVID‑19 diagnosis and medical device branches. DAN: Protein
Ontology term addition to CIDO. JB: VIDO developer, IDO‑COVID‑19 developer,
merging of IDO‑COVID‑19 to CIDO. LZ and YP: CIDO visual and summarization
analysis. ZW: Drug cocktail analysis. YL, JD and ZS: Collection, annotation, and
CIDO representation of anti‑coronaviral drugs and drug targets. EO: Program‑
ming and technical support. YW and XYe: COVID‑19 phenotype and host
annotation and ontology modeling. PH: Ontological COVID‑19 vaccine repre‑
sentation. LT and YH: OMOP mapping to OBO ontology. ES, RD, SP, LZ, and YH:
COVID‑19 associated AKI molecular mechanism analysis and modeling. HH,
LC and JH: host‑coronavirus protein‑protein interaction mining and analysis.
YTian and YTang: Addition of Chinese diagnosis kits to CIDO. WDD and SA:
CIDO modeling as ontology experts. LMS: Disease ontology developer. JZ and
AMM: Immune response modeling. LW and HL: Clinical data NLP. FZS and RH:
CIDO‑support machine learning and drug prediction. ZMP and PR: COVoc
developers and CIDO collaborators. JX: COVID‑19 vaccine adverse event analy‑
sis. YT: CIDO usage in China. GSO and BA: COVID‑19 and ontology experts. BS:
BFO developer ensuring CIDO alignment with BFO, ontological modeling and
consultation. All authors contributed to manuscript discussion and prepara‑
tion. The author(s) read and approved the final manuscript.
This project is supported by NIH grants 1UH2AI132931 (to YH) and
1U24AI171008 (to YH and JH); U24CA210967 and P30ES017885 (to GSO);
R01GM080646, 1UL1TR001412, 1U24CA199374, and 1T15LM012495 (to
BS); the National Natural Science Foundation of China 61801067 (to JX); the
Natural Science Foundation of Chongqing CSTC2018JCYJAX0243 (to JX); the
non‑profit Central Research Institute Fund of Chinese Academy of Medical Sci‑
ences 2019PT320003 (to HY); and Undergraduate Research Opportunity Pro‑
gram (UROP) and University of Michigan Medical School Global Reach award
(to YH). The work of ZMP and PR was supported by Open Targets (OTAR005).
Publication costs are paid by a discretionary fund from Dr. William King, the
director of the Unit for Laboratory Animal Medicine (ULAM) in the University
of Michigan, Ann Arbor, MI, USA.
Availability of data and materials
Related data, including the CIDO source code, is freely available on the GitHub
website https:// github. com/ CIDO‑ ontol ogy/ cido.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1 University of Michigan Medical School, Ann Arbor, MI, USA. 2 People’s Hospital
of Guizhou Province, Guiyang, Guizhou, China. 3 National Human Genome
Research Institute, National Institutes of Health, Bethesda, MD, USA. 4 National
Center for Ontological Research, Buffalo, NY, USA. 5 Georgetown University
Medical Center, Washington, DC, USA. 6 The Johns Hopkins University Applied
Physics Laboratory, Laurel, MD, USA. 7 Computer Science and Software
Engineering Department, Monmouth University, West Long Branch, NJ, USA.
8 Department of Computer Science, New Jersey Institute of Technology, New‑
ark, NJ, USA. 9 Institute of Basic Medical Sciences, Chinese Academy of Medical
Sciences & School of Basic Medicine, Peking Union Medical College, Beijing,
China. 10 National Yang‑Ming University, Taipei, Taiwan. 11 Rutgers University,
New Brunswick, NJ, USA. 12 University at Buffalo, Buffalo, NY 14260, USA.
13 University of Florida, Gainesville, FL, USA. 14 OntoPro LLC, Houston, TX, USA.
15 University of Maryland School of Medicine, Baltimore, MD, USA. 16 Depart‑
ment of Biology, University of Pennsylvania Perelman School of Medicine,
Philadelphia, PA, USA. 17 Office of Data Science, National Institute of Environ‑
mental Health Sciences, Research Triangle Park, NC, USA. 18 Mayo Clinic, Roch‑
ester, MN, USA. 19 King Abdullah University of Science and Technology, Thuwal,
Saudi Arabia. 20 European Bioinformatics Institute (EMBL‑EBI), Wellcome
Genome Campus, Hinxton, Cambridgeshire, UK. 21 School of Bioinformatics,
Chongqing University of Posts and Telecommunications, Chongqing, China.
22 Cepheid, Danaher Diagnostic Platform, Shanghai, China. 23 National I nstitute
of Health Data Science, Peking University, Beijing, China. 24 Shanghai Institute
of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai,
China. 25 University of North Dakota School of Medicine and Health Sciences,
Grand Forks, ND, USA.
Received: 17 May 2022 Accepted: 13 September 2022
1. Control CfD, Prevention. Revised US surveillance case definition for
severe acute respiratory syndrome (SARS) and update on SARS cases‑
United States and worldwide, December 2003. MMWR Morb Mortal
Wkly Rep. 2003;52(49):1202.
2. Bernard‑Stoecklin S, Nikolay B, Assiri A, Bin Saeed AA, Ben Embarek PK,
El Bushra H, et al. Comparative analysis of eleven healthcare‑associated
outbreaks of Middle East respiratory syndrome coronavirus (Mers‑Cov)
from 2015 to 2017. Sci Rep. 2019;9(1):7385.
3. Coronavirus disease (COVID‑19) pandemic https:// www. euro. who. int/
en/ health‑ topics/ health‑ emerg encies/ coron avirus‑ covid‑ 19/ novel‑
coron avirus‑ 2019‑ ncov.
4. Liu SL, Saif L. Emerging viruses without Borders: the Wuhan coronavirus.
Viruses. 2020;12(2).
5. Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, et al. A
SARS‑CoV‑2 protein interaction map reveals targets for drug repurpos‑
ing. Nature. 2020;583(7816):459–68.
6. Torres‑Castro R, Vasconcello‑Castillo L, Alsina‑Restoy X, Solis‑Navarro L,
Burgos F, Puppo H, et al. Respiratory function in patients post‑infection
by COVID‑19: a systematic review and meta‑analysis. Pulmonology.
7. Huffman A, Ong E, Hur J, D’Mello A, Tettelin H, He Y. COVID‑19 vaccine
design using reverse and structural vaccinology, ontology‑based litera‑
ture mining and machine learning. Brief Bioinform. 2022;23(4):bbac190
https:// pubmed. ncbi. nlm. nih. gov/ 35649 389/.
8. Wang Z, He Y, Huang J, Yang X. Integrative web‑based analysis of omics
data for study of drugs against SARS‑CoV‑2. Sci Rep. 2021;11(1):10763.
9. SeyedAlinaghi S, Mirzapour P, Dadras O, Pashaei Z, Karimi A, Mohs‑
seniPour M, et al. Characterization of SARS‑CoV‑2 different variants and
related morbidity and mortality: a systematic review. Eur J Med Res.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 19 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
10. Andreu‑Perez J, Poon CC, Merrifield RD, Wong ST, Yang GZ. Big data for
health. IEEE J Biomed Health Inform. 2015;19(4):1193–208.
11. Higdon R, Haynes W, Stanberry L, Stewart E, Yandl G, Howard C,
et al. Unraveling the complexities of life sciences data. Big Data.
12. He Y, Yu H, Ong E, Wang Y, Liu Y, Huffman A, et al. CIDO, a community‑
based ontology for coronavirus disease knowledge and data integra‑
tion, sharing, and analysis. Scientific data. 2020;7(1):181.
13. He Y, Yu H, Ong E, Wang Y, Liu Y, Huffman A, et al. CIDO: The community‑
based coronavirus infectious disease ontology. In: Proceedings of the
11th International Conference on Biomedical Ontologies (ICBO) and
10th Workshop on Ontologies and Data in Life Sciences (ODLS) (2021).
Bolzano: CEUR Workshop Proceedings; 2020. p. E.1–10.
14. Lin A, Yamagata Y, Duncan WD, Carmody LC, Kushida T, Masuya H, et al.
A community effort for COVID‑19 ontology harmonization. In: The 12th
International Conference on Biomedical Ontologies; 2021.
15. Huffman A, Masci AM, Zheng J, Sanati N, Brunson T, Wu G, et al.
CIDO ontology updates and secondary analysis of host responses to
COVID‑19 infection based on ImmPort reports and literature. J Biomed
Semantics. 2021;12(1):18.
16. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. Drug‑
Bank 5.0: a major update to the DrugBank database for 2018. Nucleic
Acids Res. 2018;46(D1):D1074–82.
17. Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al.
ChEBI in 2016: improved services and an expanding collection of
metabolites. Nucleic Acids Res. 2016;44(D1):D1214–9.
18. He Y, Xiang Z, Zheng J, Lin Y, Overton JA, Ong E. The eXtensible ontol‑
ogy development (XOD) principles and tool implementation to sup‑
port ontology interoperability. J Biomed Semantics. 2018;9(1):3.
19. Xiang Z, Courtot M, Brinkman RR, Ruttenberg A, He Y. OntoFox: web‑
based support for ontology reuse. BMC Res Notes. 2010;3(175):1–12.
20. Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos
MC, et al. The ontology for biomedical investigations. Plos One.
21. Schriml LM, Munro JB, Schor M, Olley D, McCracken C, Felix V, et al.
The human disease ontology 2022 update. Nucleic Acids Res.
22. Kohler S, Gargano M, Matentzoglu N, Carmody LC, Lewis‑Smith D,
Vasilevsky NA, et al. The human phenotype ontology in 2021. Nucleic
Acids Res. 2021;49(D1):D1207–17.
23. Babcock S, Beverley J, Cowell LG, Smith B. The infectious disease ontol‑
ogy in the age of COVID‑19. J Biomed Semantics. 2021;12(1):13.
24. Arp R, Smith B, Spear AD. Building ontologies with basic formal ontol‑
ogy. Cambridge: MIT Press; 2015.
25. Xiang Z, Zheng J, Lin Y, He Y. Ontorat: Automatic generation of new
ontology terms, an‑notations, and axioms based on ontology design
patterns. J Biomed Semantics. 2015;6(1):4–10.
26. Musen MA. The Protégé project: A look back and a look forward. AI Mat‑
ters. 2015;1(4). https:// doi. org/ 10. 1145/ 25570 01. 25757 003.
27. Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, et al. Rela‑
tions in biomedical ontologies. Genome Biol. 2005;6(5):R46.
28. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak
A, et al. The FAIR guiding principles for scientific data management and
stewardship. Sci Data. 2016;3:160018.
29. Schriml LM, Chuvochina M, Davies N, Eloe‑Fadrosh EA, Finn RD, Hugen‑
holtz P, et al. COVID‑19 pandemic reveals the peril of ignoring metadata
standards. Sci Data. 2020;7(1):188.
30. Loeffelholz MJ, Tang YW. Laboratory diagnosis of emerging human
coronavirus infections ‑ the state of the art. Emerg Microbes Infect.
31. Tang YW, Schmitz JE, Persing DH, Stratton CW. Laboratory diag‑
nosis of COVID‑19: current issues and challenges. J Clin Microbiol.
2020;58(6):e00512 https:// pubmed. ncbi. nlm. nih. gov/ 32245 835/.
32. Tao X, Yuan G, Rao S, Li D, Liu Y, Zhang X, et al. Distinct RT‑PCR diagnosis
profiles of father and son patients of COVID‑19 using nasopharyngeal
and alveolar lavage fluid samples. Inflamm Cell Signal. 2020;7:e1164.
https:// www. smart scite ch. com/ index. php/ ics/ artic le/ view/ 1164.
33. Ochs C, Geller J, Perl Y, Musen MA. A unified software framework for
deriving, visualizing, and exploring abstraction networks for ontologies.
J Biomed Inform. 2016;62:90–105.
34. Zheng L, Perl Y, He Y, Ochs C, Geller J, Liu H, et al. Visual comprehension
and orientation into the COVID‑19 CIDO ontology. J Biomed Inform.
35. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data
‑ from vision to reality. Euro Surveill. 2017;22(13):30494 https:// pubmed.
ncbi. nlm. nih. gov/ 28382 917/.
36. Rambaut A, Holmes EC, O’Toole A, Hill V, McCrone JT, Ruis C, et al. A
dynamic nomenclature proposal for SARS‑CoV‑2 lineages to assist
genomic epidemiology. Nat Microbiol. 2020;5(11):1403–7.
37. Brown GR, Hem V, Katz KS, Ovetsky M, Wallin C, Ermolaeva O, et al.
Gene: a gene‑centered information resource at NCBI. Nucleic Acids Res.
2015;43(Database issue):D36–42.
38. UniProt C. UniProt: the universal protein knowledgebase in 2021.
Nucleic Acids Res. 2021;49(D1):D480–9.
39. Yoshimoto FK. A biochemical perspective of the nonstructural
proteins (NSPs) and the spike protein of SARS CoV‑2. Protein J.
40. Pavesi A. Prediction of two novel overlapping ORFs in the genome of
SARS‑CoV‑2. Virology. 2021;562:149–57.
41. Xu K, Zheng BJ, Zeng R, Lu W, Lin YP, Xue L, et al. Severe acute respira‑
tory syndrome coronavirus accessory protein 9b is a virion‑associated
protein. Virology. 2009;388(2):279–85.
42. Jungreis I, Sealfon R, Kellis M. SARS‑CoV‑2 gene content and COVID‑19
mutation impact by comparing 44 Sarbecovirus genomes. Nat Com‑
mun. 2021;12(1):2642.
43. Zhang L, Jackson CB, Mou H, Ojha A, Peng H, Quinlan BD, et al. SARS‑
CoV‑2 spike‑protein D614G mutation increases virion spike density and
infectivity. Nat Commun. 2020;11(1):6013.
44. Wang Y, Zhang F, Byrd JB, Yu H, Ye X, He Y. Differential COVID‑19 symp‑
toms given pandemic locations, time, and comorbidities during the
early pandemic. Front Med (Lausanne). 2022;9:770031.
45. Lin L, Li Q, Wang Y, Shi Y. Syncytia formation during SARS‑CoV‑2 lung
infection: a disastrous unity to eliminate lymphocytes. Cell Death Differ.
46. Puhach O, Adea K, Hulo N, Sattonnet P, Genecand C, Iten A, et al.
Infectious viral load in unvaccinated and vaccinated individuals
infected with ancestral, Delta or omicron SARS‑CoV‑2. Nat Med.
47. Singanayagam A, Hakki S, Dunning J, Madon KJ, Crone MA, Koycheva A,
et al. Community transmission and viral load kinetics of the SARS‑CoV‑2
delta (B.1.617.2) variant in vaccinated and unvaccinated individuals
in the UK: a prospective, longitudinal, cohort study. Lancet Infect Dis.
48. Achaiah NC, Subbarajasetty SB, Shetty RM. R0 and re of COVID‑19: can
we predict when the pandemic outbreak will be contained? Indian J
Crit Care Med. 2020;24(11):1125–7.
49. Rahman B, Sadraddin E, Porreca A. The basic reproduction number of
SARS‑CoV‑2 in Wuhan is about to die out, how about the rest of the
world? Rev Med Virol. 2020;30(4):e2111.
50. Liu Y, Rocklov J. The reproductive number of the Delta variant of SARS‑
CoV‑2 is far higher compared to the ancestral SARS‑CoV‑2 virus. J Travel
Med. 2021;28(7):taab124 https:// pubmed. ncbi. nlm. nih. gov/ 34369 565.
51. Ito K, Piantham C, Nishiura H. Relative instantaneous reproduction
number of omicron SARS‑CoV‑2 variant with respect to the Delta vari‑
ant in Denmark. J Med Virol. 2022;94(5):2265–8.
52. Liu Y, Chan W, Wang Z, Hur J, Xie J, Yu H, et al. Ontological and bioinfor‑
matic analysis of anti‑coronavirus drugs and their implication for drug
repurposing against COVID‑19. Preprints; 2020. p. 2020030413.
53. Collazos J, Domingo P, Fernandez‑Araujo N, Asensi‑Diaz E, Vilchez‑
Rueda H, Lalueza A, et al. Exposure to valproic acid is associated
with less pulmonary infiltrates and improvements in diverse clinical
outcomes and laboratory parameters in patients hospitalized with
COVID‑19. Plos One. 2022;17(1):e0262777.
54. Ozgur A, Xiang Z, Radev DR, He Y. Mining of vaccine‑associated
IFN‑gamma gene interaction networks using the vaccine ontology. J
Biomed Semantics. 2011;2(Suppl 2):S8.
55. Huang PC, Goru R, Huffman A, Yu Lin A, Cooke MF, He Y. Cov19VaxKB: a
web‑based integrative COVID‑19 vaccine knowledge Base. Vaccine X.
2021;100139 https:// pubmed. ncbi. nlm. nih. gov/ 34981 039/.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 20 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
56. Ong E, Wong MU, Huffman A, He Y. COVID‑19 coronavirus vaccine
design using reverse vaccinology and machine learning. Front Immu‑
nol. 2020;11:1581.
57. Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe
WA, et al. The national COVID cohort collaborative (N3C): rationale,
design, infrastructure, and deployment. J Am Med Inform Assoc.
58. Ceusters W, Blaisure J. A realism‑based view on counts in OMOP’s com‑
mon data model. Stud Health Technol Inform. 2017;237:55–62.
59. Ong E, Wang LL, Schaub J, O’Toole JF, Steck B, Rosenberg AZ, et al. Mod‑
elling kidney disease using ontology: insights from the kidney precision
medicine project. Nat Rev Nephrol. 2020;16(11):686–96.
60. He Y, Ong E, Schaub J, Dowd F, O’Toole JF, Siapos A, et al. OPMI: the
ontology of precision medicine and investigation and its support for
clinical data and metadata representation and analysis. Buffalo: Pro‑
ceedings of the 10th International Conference on Biomedical Ontology
(ICBO‑2019); 2019;2931:1–10. http:// ceur‑ ws. org/ Vol‑ 2931/ ICBO_ 2019_
paper_ 34. pdf.
61. Callahan TJ, Wyrwa JM, Vasilevsky NA, Robinson PN. Haendel MA.
OMOP2OBO: Semantic Integration of Standardized Clinical Terminolo‑
gies to Power Translational Digital Medicine Across Health Systems. In:
2020 OHDSI Symposium: Virtual meeting; 2020. https:// www. ohdsi.
org/ wp‑ conte nt/ uploa ds/ 2020/ 10/ Tiffa ny‑ Calla han‑ Tiffa ny‑ Calla han_
OMOP2 OBO_ 2020s ympos ium. pdf.
62. Liu Y, Hur J, Chan WKB, Wang Z, Xie J, Sun D, et al. Ontological modeling
and analysis of experimentally or clinically verified drugs against coro‑
navirus infection. Scientific data. 2021;8(1):16.
63. Yingtong Liu, Wenjun Ju, Becky Steck, Sanjay Jain, Matthias Kretzler and
Yongqun He. Ontology‑based modeling, representation, and analysis of
biomarkers in healthy and disease kidney tissue. Bolzano: Proceedings
of the 12th International Conference on Biomedical Ontologies (ICBO
2021); 2021;3073:70‑6. http:// ceur‑ ws. org/ Vol‑ 3073/ paper8. pdf.
64. Wang Z, He Y. Precision omics data integration and analysis with inter
operable ontologies and their application for COVID‑19 research. Brief
Funct Genomics. 2021;20(4):235–48.
65. Aronskyy I, Masoudi‑Sobhanzadeh Y, Cappuccio A, Zaslavsky E.
Advances in the computational landscape for repurposed drugs
against COVID‑19. Drug Discov Today. 2021;26(12):2800–15.
66. Turki H, Hadj Taieb MA, Shafee T, Lubiana T, Jemielniak D, Aouicha MB,
et al. Representing COVID‑19 information in collaborative knowledge
graphs: the case of Wikidata. Semantic Web. 2022;(Preprint):1–32.
67. Kaladevi R, Revathi A. Semantic and NLP‑based retrieval from Covid‑19
ontology. Machine Learn Healthc Appl. 2021;261–75 https:// www. resea
rchga te. net/ publi cation/ 35099 8559_ Seman tic_ and_ NLP‑ Based_ Retri
eval_ From_ Covid‑ 19_ Ontol ogy.
68. CIDO in the Population Health Data Archive in China https:// www.
ncmi. cn/ phda/ dataD etails. do? id= CSTR: A0006. 17. Z00Q3. 202003.
000998. Accessed 9 Oct 2022.
69. CIDO in MedPortal https:// medpo rtal. bmicc. cn/ ontol ogies/
CIDO. Accessed 9 Oct 2022.
70. Zheng X, Xiao Y, Song W, Tong F, Liu S, Zhao D. COVID19‑OBKG: an
ontology‑based knowledge graph and web service for COVID‑19. In:
2021 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM): IEEE; 2021. p. 2456–62.
71. Shah E, Desai R, Peng S, Zhang L, He Y. Ontology modeling and analysis
of COVID‑19 associated acute kidney injury and its underlying molecu‑
lar mechanisms. Inflammation. 2021;34015061:19273246.
72. Huang G, Peng S, Zhang L, He Y: Identification and ontology term
enrichment analysis of genes associated with COVID‑19 and acute kid‑
ney disease. Bolzano: Proceedings of The 12th International Conference
on Biomedical Ontologies (ICBO 2021); 2021;3073:110‑5. http:// ceur‑ ws.
org/ Vol‑ 3073/ paper 15. pdf.
73. Dhawan M, Sharma A, Priyanka TN, Rajkhowa TK, Choudhary OP. Delta
variant (B.1.617.2) of SARS‑CoV‑2: Mutations, impact, challenges and
possible solutions. Hum Vacc Immunother. 2022;18(5):2068883 https://
pubmed. ncbi. nlm. nih. gov/ 35507 895/.
74. Fan Y, Li X, Zhang L, Wan S, Zhang L, Zhou F. SARS‑CoV‑2 omicron vari‑
ant: recent progress and future perspectives. Signal Transduct Target
Ther. 2022;7(1):141.
75. Mallapaty S. COVID‑19: How Omicron overtook Delta in three charts.
Nature. 2022. https:// doi. org/ 10. 1038/ d41586‑ 022‑ 00632‑3, https://
www. nature. com/ artic les/ d41586‑ 022‑ 00632‑3, https:// pubmed. ncbi.
nlm. nih. gov/ 35246 640/.
76. Thakur S, Sasi S, Pillai SG, Nag A, Shukla D, Singhal R, et al. SARS‑CoV‑2
mutations and their impact on diagnostics, Therapeutics and Vaccines.
Front Med (Lausanne). 2022;9:815389.
77. Li Z, Zhang JZH. Mutational effect of some major COVID‑19 variants on
binding of the S protein to ACE2. Biomolecules. 2022;12(4):572 https://
pubmed. ncbi. nlm. nih. gov/ 35454 161/.
78. Angeletti S, Benvenuto D, Bianchi M, Giovanetti M, Pascarella S, Ciccozzi
M. COVID‑2019: the role of the nsp2 and nsp3 in its pathogenesis. J
Med Virol. 2020;92(6):584–8.
79. Kannan SR, Spratt AN, Sharma K, Chand HS, Byrareddy SN, Singh K.
Omicron SARS‑CoV‑2 variant: unique features and their impact on pre‑
existing antibodies. J Autoimmun. 2022;126:102779.
80. McCallum M, De Marco A, Lempp FA, Tortorici MA, Pinto D, Walls AC,
et al. N‑terminal domain antigenic mapping reveals a site of vulner‑
ability for SARS‑CoV‑2. Cell. 2021;184(9):2332–2347 e2316.
81. Wang Z, Schmidt F, Weisblum Y, Muecksch F, Barnes CO, Finkin S, et al.
mRNA vaccine‑elicited antibodies to SARS‑CoV‑2 and circulating vari‑
ants. Nature. 2021;592(7855):616–22.
82. Kemp SA, Collier DA, Datir RP, Ferreira I, Gayed S, Jahun A, et al.
SARS‑CoV‑2 evolution during treatment of chronic infection. Nature.
83. Cubuk J, Alston JJ, Incicco JJ, Singh S, Stuchell‑Brereton MD, Ward MD,
et al. The SARS‑CoV‑2 nucleocapsid protein is dynamic, disordered, and
phase separates with RNA. Nat Commun. 2021;12(1):1936.
84. Wu H, Xing N, Meng K, Fu B, Xue W, Dong P, et al. Nucleocapsid muta‑
tions R203K/G204R increase the infectivity, fitness, and virulence of
SARS‑CoV‑2. Cell Host Microbe. 2021;29(12):1788–1801 e1786.
85. Erekhinskaya T, Strebkov D, Patel S, Balakrishna M, Tatu M, Moldovan
D. Ten ways of leveraging ontologies for natural language processing
and its enterprise applications. In: Proceedings of The International
Workshop on Semantic Big Data; 2020. p. 1–6.
86. Kafkas S, Hoehndorf R. Ontology based mining of pathogen‑disease
associations from literature. J Biomed Semantics. 2019;10(1):15.
87. Hoehndorf R, Dumontier M, Oellrich A, Rebholz‑Schuhmann D, Scho‑
field PN, Gkoutos GV. Interoperability between biomedical ontologies
through relation expansion, upper‑level ontologies and automatic
reasoning. Plos One. 2011;6(7):e22006.
88. Hur J, Schuyler AD, States DJ, Feldman EL. SciMiner: web‑based litera‑
ture mining tool for target identification and functional enrichment
analysis. Bioinformatics. 2009;25(6):838–40.
89. Hur J, Ozgur A, He Y. Ontology‑based literature mining of E. coli
vaccine‑associated gene interaction networks. J Biomed Semantics.
90. Hur J, Ozgur A, Xiang Z, He Y. Development and application of an inter‑
action network ontology for literature mining of vaccine‑associated
gene‑gene interactions. J Biomed Semantics. 2015;6:2.
91. Ozgur A, Hur J, He Y. The interaction network ontology‑supported mod‑
eling and mining of complex interactions represented with multiple
keywords in biomedical literature. BioData Mining. 2016;9:41.
92. Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID‑19 literature.
Nucleic Acids Res. 2021;49(D1):D1534–40.
93. Liu S, Wen A, Wang L, He H, Fu S, Miller R, et al. An Open Natural
Language Processing Development Framework for EHR‑based Clinical
Research: a case demonstration using the National COVID Cohort Col‑
laborative (N3C). arXiv preprint arXiv. 2021:2110.10780.
94. Bodenreider O. The unified medical language system (UMLS): integrat‑
ing biomedical terminology. Nucleic Acids Res. 2004;32(Database
95. Brown SH, Elkin PL, Bauer BA, Wahner‑Roedler D, Husser CS, Temesgen
Z, et al. SNOMED CT: utility for a general medical evaluation template.
AMIA Annu Symp Proc. 2006;2006:101–5 https:// pubmed. ncbi. nlm. nih.
gov/ 17238 311/.
96. Lipscomb CE. Medical subject headings (MeSH). Bull Med Libr Assoc.
97. Brown EG, Wood L, Wood S. The medical dictionary for regulatory activi‑
ties (MedDRA). Drug Saf. 1999;20(2):109–17.
98. Smaili FZ, Gao X, Hoehndorf R. OPA2Vec: combining formal and
informal content of biomedical ontologies to improve similarity‑based
prediction. Bioinformatics. 2019;35(12):2133–40.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 21 of 21
Heetal. Journal of Biomedical Semantics (2022) 13:25
fast, convenient online submission
thorough peer review by experienced researchers in your field
rapid publication on acceptance
support for research data, including large and complex data types
gold Open Access which fosters wider collaboration and increased citations
maximum visibility for your research: over 100M website views per year
At BMC, research is always in progress.
Learn more
Ready to submit your research
Ready to submit your research
? Choose BMC and benefit from:
? Choose BMC and benefit from:
99. Smaili FZ, He Y, Gao X, Hoehndorf R. Candidate COVID‑19 Drugs Predic‑
tion. In: Workshop on COVID‑19 Ontologies (WCO‑2020), Oct 30, 2020;
Zoom Virtual; 2020.
100. Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, O’Meara MJ, et al.
A SARS‑CoV‑2‑human protein‑protein interaction map reveals drug
targets and potential drug‑repurposing. bioRxiv. 2020; https:// www.
biorx iv. org/ conte nt/ 10. 1101/ 2020. 1103. 1122. 00238 6v002 382.
101. Beigel JH, Tomashek KM, Dodd LE, Mehta AK, Zingman BS, Kalil AC, et al.
Remdesivir for the treatment of Covid‑19 ‑ preliminary report. N Engl J
Med. 2020.
102. Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, et al.
The BioGRID interaction database: 2019 update. Nucleic Acids Res.
103. You X, Jiang X, Zhang C, Jiang K, Zhao X, Guo T, et al. Dihydroarte‑
misinin attenuates pulmonary inflammation and fibrosis in rats by sup‑
pressing JAK2/STAT3 signaling. Aging (Albany NY). 2022;14(3):1110–27.
104. Govind V, Bharadwaj S, Sai Ganesh MR, Vishnu J, Shankar KV, Shankar B,
et al. Antiviral properties of copper and its alloys to inactivate covid‑19
virus: a review. Biometals. 2021;34(6):1217–35.
105. Rani I, Goyal A, Bhatnagar M, Manhas S, Goel P, Pal A, et al. Potential
molecular mechanisms of zinc‑ and copper‑mediated antiviral activity
on COVID‑19. Nutr Res. 2021;92:109–28.
106. Nair MS, Huang Y, Fidock DA, Polyak SJ, Wagoner J, Towler MJ, et al.
Artemisia annua L. extracts inhibit the in vitro replication of SARS‑CoV‑2
and two of its variants. J Ethnopharmacol. 2021;274(114016). https://
pubmed. ncbi. nlm. nih. gov/ 33716 085/.
107. Cortes AA, Zuniga JM. The use of copper to help prevent transmission
of SARS‑coronavirus and influenza viruses. A general review. Diagn
Microbiol Infect Dis. 2020;98(4):115176.
108. Sehailia M, Chemat S. Antimalarial‑agent artemisinin and deriva‑
tives portray more potent binding to Lys353 and Lys31‑binding
hotspots of SARS‑CoV‑2 spike protein than hydroxychloroquine:
potential repurposing of artenimol for COVID‑19. J Biomol Struct Dyn.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub‑
lished maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
... Around 100 novel COVoc terms were created for concepts that did not exist in other ontologies or for entirely new concepts that fit the COVoc ontology space (e.g., terms for clinical trials). These terms are currently under review to be included in CIDO (Coronavirus Infectious Disease Ontology, [CIDO, [He et al., 2020;He et al., 2022]]), Cell Ontology (CL, [Diehl et al., 2016]) or other domain ontologies where possible. The ontology can be browsed using the Ontology Lookup Service (OLS) [Jupp et al., 2015] and is available with the full ontology building pipeline on GitHub [COVoc]. ...
... During 2021, we engaged in an effort to harmonise COVID-19 ontologies created in response to the pandemic, contributing to the 12th International Conference on Biomedical Ontologies (ICBO 2021) [ICBO 2021] flash talk 'A community effort for COVID-19 Ontology Harmonization'. Through this, we began collaboration efforts with CIDO [He et al., 2020;He et al., 2022], a COVID-19 domain ontology which partly overlaps with COVoc. As COVoc is an application ontology, it is more ideal to import from domain ontologies which have been developed and curated by field experts and will continue to be updated to contain the latest information. ...
Full-text available
Motivation Since early 2020, the COVID-19 pandemic has confronted the biomedical community with an unprecedented challenge. The rapid spread of COVID-19 and ease of transmission seen worldwide is due to increased population flow and international trade. Front-line medical care, treatment research and vaccine development also require rapid and informative interpretation of the literature and COVID-19 data produced around the world, with 177,500 papers published between January 2020 and November 2021, i.e., almost 8,500 papers per month. To extract knowledge and enable interoperability across resources, we developed the COVID-19 Vocabulary (COVoc), an application ontology related to research of this pandemic. The main objective of COVoc development was to enable seamless navigation from biomedical literature to core databases and tools of ELIXIR, a European-wide intergovernmental organisation for life sciences. Results This collaborative work provided data integration into SIB Literature services (SIBiLS), an application ontology (COVoc), and a triage service named COVTriage and based on annotation processing to search for COVID-related information across pre-defined aspects with daily updates. Thanks to its interoperability potential, COVoc lends itself to wider applications, hopefully through further connections with other novel COVID-19 ontologies as has been established with CIDO. Availability, Supplementary information Supplementary data are available at Bioinformatics online.
Full-text available
COVID-19 often manifests with different outcomes in different patients, highlighting the complexity of the host-pathogen interactions involved in manifestations of the disease at the molecular and cellular levels. In this paper, we propose a set of postulates and a framework for systematically understanding complex molecular host-pathogen interaction networks. Specifically, we first propose four host-pathogen interaction (HPI) postulates as the basis for understanding molecular and cellular host-pathogen interactions and their relations to disease outcomes. These four postulates cover the evolutionary dispositions involved in HPIs, the dynamic nature of HPI outcomes, roles that HPI components may occupy leading to such outcomes, and HPI checkpoints that are critical for specific disease outcomes. Based on these postulates, an HPI Postulate and Ontology (HPIPO) framework is proposed to apply interoperable ontologies to systematically model and represent various granular details and knowledge within the scope of the HPI postulates, in a way that will support AI-ready data standardization, sharing, integration, and analysis. As a demonstration, the HPI postulates and the HPIPO framework were applied to study COVID-19 with the Coronavirus Infectious Disease Ontology (CIDO), leading to a novel approach to rational design of drug/vaccine cocktails aimed at interrupting processes occurring at critical host-coronavirus interaction checkpoints. Furthermore, the host-coronavirus protein-protein interactions (PPIs) relevant to COVID-19 were predicted and evaluated based on prior knowledge of curated PPIs and domain-domain interactions, and how such studies can be further explored with the HPI postulates and the HPIPO framework is discussed.
Full-text available
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task set necessary to assess and validate the portion of Wikidata relating to the COVID-19 epidemiology. These tasks assess statistical data and are implemented in SPARQL, a query language for semantic databases. We demonstrate the efficiency of our methods for evaluating structured non-relational information on COVID-19 in Wikidata, and its applicability in collaborative ontologies and knowledge graphs more broadly. We show the advantages and limitations of our proposed approach by comparing it to the features of other methods for the validation of linked web data as revealed by previous research.
Full-text available
Since the outbreak of the coronavirus disease 2019 (COVID-19) pandemic, there have been a few variants of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), one of which is the Omicron variant (B.1.1.529). The Omicron variant is the most mutated SARS-CoV-2 variant, and its high transmissibility and immune evasion ability have raised global concerns. Owing to its enhanced transmissibility, Omicron has rapidly replaced Delta as the dominant variant in several regions. However, recent studies have shown that the Omicron variant exhibits reduced pathogenicity due to altered cell tropism. In addition, Omicron exhibits significant resistance to the neutralizing activity of vaccines, convalescent serum, and most antibody therapies. In the present review, recent advances in the molecular and clinical characteristics of the infectivity, pathogenicity, and immune evasion of Omicron variant was summarized, and potential therapeutic applications in response to Omicron infection were discussed. Furthermore, we highlighted potential response to future waves and strategies to end the pandemic.
Full-text available
Since commencement of COVID-19 pandemic, several SARS-CoV-2 variants have emerged amid containment efforts via vaccination. The Delta variant (B.1.617.2), discovered in October 2020, was designated as a VOC by the WHO on May 11, 2021. The enhanced transmissibility of Delta variant has been associated with critical mutations such as D614G, L452R, P681R, and T478K in the S-protein. The increased affinity of the S-protein and ACE2 has been postulated as a key reason for decreased vaccine efficacy. As per evidence, the Delta variant possesses increased transmissibility and decreased vaccine efficacy compared to other VOCs like Alpha and Beta. This has led to concerns regarding the acquisition of novel mutations in the Delta variant and outbreaks in vulnerable communities, including vaccinated people. In this mini-review of Delta variant, we have explained its evolution and characteristics, the impact of spike mutations on infectivity and immune evasion, and measures to combat future outbreaks.
Full-text available
COVID-19 is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which has many variants that accelerated the spread of the virus. In this study, we investigated the quantitative effect of some major mutants of the spike protein of SARS-CoV-2 binding to the human angiotensin-converting enzyme 2 (ACE2). These mutations are directly related to the Variant of Concern (VOC) including Alpha, Beta, Gamma, Delta and Omicron. Our calculations show that five major mutations (N501Y, E484K, L452R, T478K and K417N), first reported in Alpha, Beta, Gamma and Delta variants, all increase the binding of the S protein to ACE2 (except K417N), consistent with the experimental findings. We also studied an additional eight mutations of the Omicron variant that are located on the interface of the receptor binding domain (RDB) and have not been reported in other VOCs. Our study showed that most of these mutations (except Y505H and G446S) enhance the binding of the S protein to ACE2. The computational predictions helped explain why the Omicron variant quickly became dominant worldwide. Finally, comparison of several different computational methods for binding free energy calculation of these mutants was made. The alanine scanning method used in the current calculation helped to elucidate the residue-specific interactions responsible for the enhanced binding affinities of the mutants. The results show that the ASGB (alanine scanning with generalized Born) method is an efficient and reliable method for these binding free energy calculations due to mutations.
Full-text available
Infectious viral load (VL) expelled as droplets and aerosols by infected individuals partly determines SARS-CoV-2 transmission. RNA VL measured by qRT-PCR is only a weak proxy for infectiousness. Studies on the kinetics of infectious VL are important to understand the mechanisms behind the different transmissibility of SARS-CoV-2 variants and the effect of vaccination on transmission, which allows to guide public health measures. In this study we quantified infectious VL in SARS-CoV-2 infected individuals during the first 5 symptomatic days by in vitro culturability assay in unvaccinated or vaccinated individuals infected with pre-variant of concern (pre-VOC) SARS-CoV-2, Delta, or Omicron. Unvaccinated individuals infected with pre-VOC SARS-CoV-2 had lower infectious VL compared to Delta-infected unvaccinated individuals. Full vaccination (defined as >2weeks after reception of 2nd dose during primary vaccination series) significantly reduced infectious VL for Delta breakthrough cases compared to unvaccinated individuals. For Omicron breakthrough cases, reduced infectious VL was only observed in boosted but not in fully vaccinated individuals compared to unvaccinated subjects. In addition, infectious VL was lower in fully vaccinated Omicron- compared to fully vaccinated Delta-infected individuals, suggesting that other mechanisms than increased infectious VL contribute to the high infectiousness of SARS-CoV-2 Omicron. Our findings indicate that vaccines may lower transmission risk and therefore have a public health benefit beyond the individual protection from severe disease. The infectious viral load of SARS-CoV-2 Omicron is lower than that of Delta in symptomatic breakthrough infections of recipients of two doses of a COVID-19 vaccine, suggesting that the higher transmission of Omicron is not linked to higher infectious viral load.
Full-text available
With the high rate of COVID-19 infections worldwide, the emergence of SARS-CoV-2 variants was inevitable. Several mutations have been identified in the SARS-CoV-2 genome, with the spike protein as one of the mutational hot spots. Specific amino acid substitutions such as D614G and N501Y were found to alter the transmissibility and virulence of the virus. The WHO has classified the variants identified with fitness-enhancing mutations as variants of concern (VOC), variants of interest (VOI) or variants under monitoring (VUM). The VOCs pose an imminent threat as they exhibit higher transmissibility, disease severity and ability to evade vaccine-induced and natural immunity. Here we review the mutational landscape on the SARS-CoV-2 structural and non-structural proteins and their impact on diagnostics, therapeutics and vaccines. We also look at the effectiveness of approved vaccines, antibody therapy and convalescent plasma on the currently prevalent VOCs, which are B.1.17, B.1.351, P.1, B.1.617.2 and B.1.1.529. We further discuss the possible factors influencing mutation rates and future directions.
Conference Paper
Acute kidney injury (AKI) is the main comorbidity of COVID-19, and the pathogenesis remains unclear. This study first performed a gene set enrichment analysis of 6 AKI-related Gene Expression Omnibus (GEO) studies and identified 3,876 AKI-associated genes. By incorporating COVID-19 related interactions from BioGRID, we further found 1,027 genes associated with both COVID-19 and AKI. Our Gene ontology (GO) enrichment analysis of these genes showed that viral and inflammation-related biological processes played important roles on COVID-19 related AKI. Furthermore, the COVID-19 pathways ranked second in the top 5 KEGG-enriched pathways, in which 66 enriched genes were all upregulated in the kidney tissue of the above 6 GEO studies. Ontology modeling is currently undergoing to systematically and logically represent the AKI pathogenesis process in COVID-19 patients.
Rational vaccine design, especially vaccine antigen identification and optimization, is critical to successful and efficient vaccine development against various infectious diseases including coronavirus disease 2019 (COVID-19). In general, computational vaccine design includes three major stages: (i) identification and annotation of experimentally verified gold standard protective antigens through literature mining, (ii) rational vaccine design using reverse vaccinology (RV) and structural vaccinology (SV) and (iii) post-licensure vaccine success and adverse event surveillance and its usage for vaccine design. Protegen is a database of experimentally verified protective antigens, which can be used as gold standard data for rational vaccine design. RV predicts protective antigen targets primarily from genome sequence analysis. SV refines antigens through structural engineering. Recently, RV and SV approaches, with the support of various machine learning methods, have been applied to COVID-19 vaccine design. The analysis of post-licensure vaccine adverse event report data also provides valuable results in terms of vaccine safety and how vaccines should be used or paused. Ontology standardizes and incorporates heterogeneous data and knowledge in a human- and computer-interpretable manner, further supporting machine learning and vaccine design. Future directions on rational vaccine design are discussed.