ArticlePDF Available

An Ontology and a REST API for Sequence Based Microbial Typing Data


Abstract and Figures

In the Microbial typing field, the need to have a common understanding of the concepts described and the ability to share re-sults within the community is an increasingly important requisite for the continued development of portable and accurate sequence-based typing methods. These methods are used for bacterial strain identification and are fundamental tools in Clinical Microbiology and Bacterial Population Genetics studies. In this article we propose an ontology designed for the microbial typing field, focusing on the widely used Multi Locus Sequence Typing methodology, and a RESTful API for accessing information sys-tems based on the proposed ontology. This constitutes an important first step to accurately describe, analyze, curate, and manage informa-tion for microbial typing methodologies based on sequence based typing methodologies, and allows for the future integration with data analysis Web services.
Content may be subject to copyright.
An Ontology and a REST API for Sequence
Based Microbial Typing Data
Jo˜ao Almeida1?, Jo˜ao Tiple1?, M´ario Ramirez2, Jos´e Melo-Cristino2,
atia Vaz1,3, Alexandre P. Francisco3,4, and Jo˜ao A. Carri¸co2
1DEETC, ISEL, Poly Inst of Lisbon
2IM / IMM, FM, Univ of Lisbon
3INESC-ID Lisbon
4CSE Dept, IST, Tech Univ of Lisbon
Abstract. In the Microbial typing field, the need to have a common
understanding of the concepts described and the ability to share re-
sults within the community is an increasingly important requisite for the
continued development of portable and accurate sequence-based typing
methods. These methods are used for bacterial strain identification and
are fundamental tools in Clinical Microbiology and Bacterial Population
Genetics studies. In this article we propose an ontology designed for the
microbial typing field, focusing on the widely used Multi Locus Sequence
Typing methodology, and a RESTful API for accessing information sys-
tems based on the proposed ontology. This constitutes an important
first step to accurately describe, analyze, curate, and manage informa-
tion for microbial typing methodologies based on sequence based typing
methodologies, and allows for the future integration with data analysis
Web services.
Keywords: ontology, knowledge representation, data as a service, mi-
crobial typing methods
1 Introduction
Microbial typing methods are fundamental tools for the epidemiological stud-
ies of bacterial populations [7]. These techniques allow the characterization of
bacteria at the strain level providing researchers important information for the
surveillance of infectious diseases, outbreak investigation and control, pathogen-
esis and natural history of an infection and bacterial population genetics. These
areas of research have a direct impact in several human health issues, such as in
the development of drug therapies and vaccines [1], with the concomitant social
and economical repercussions.
With the decreasing cost and increasing availability of DNA sequencing tech-
nologies, sequence based typing methods are being preferred over traditional
molecular methodologies. The large appeal of sequence-based typing methods
?These authors contributed equally to this work.
is the ability to confidently share their results due to their reproducibility and
portability, allowing for a global view and immediate comparison of microbial
strains, from clinical and research settings all over the world. Several online mi-
crobial typing databases have been made available for different methods. The
most successful examples are the Multi-Locus Sequence Typing (MLST) [6]
databases for a multitude of bacterial species [10,12,8], emm typing database
for Streptococcus pyogenes [14] and spa typing for Staphylococcus aureus [13].
However, these efforts are not standardized for data sharing, suffering from
several caveats, being the most notable the lack of interfaces for automatic query-
ing and running analysis tools. The automatic integration of data from the dif-
ferent databases is also hindered due to the lack of common identifiers among
different databases. Moreover, the absence of an automatic validation of the new
data in the submission process is leading to an increase of incomplete and unre-
liable data in the majority of these databases, seriously hampering the promised
advantages of methodological accuracy and portability of results between lab-
oratories. This is even more significant with the rise of new Single Nucleotide
Polymorphism (SNP) typing techniques based upon the Next Generation Se-
quencing (NGS) [4] methods. The validity of this new high-throughput technol-
ogy can be seriously hampered if the complete data analysis pipeline cannot be
fully described in public databases, in order for the results to be reproducible.
Also, the ability to integrate information from several well established typing
methodologies will be paramount for the validation and development of the more
informative whole genome approaches [5,3] based on these NGS methods for the
bacterial typing field.
In a largely descriptive science such as Microbiology, the need to have a
common understanding of the concepts described is fundamental for continued
development of sequence-based typing methods. Therefore, the definition of an
ontology that can validate and aggregate the knowledge of the existing microbial
typing methods, is a necessary prerequisite for data integration in this field. In
order to solve those problems, we present in this paper the design and implemen-
tation of an ontology created for the microbial typing field and an Application
Programming Interface (API) to an information system using the concepts of
the REST (Representational State Transfer) paradigm [2]. The proof-of-concept
prototype of the proposed framework, focusing on the well established MLST
methodology, is available at
The ability to accurately describe the relationships between typing methods
through the use of an ontology and to offer REST services to analyze, curate,
and manage the information will facilitate the implementation of information
systems capable of coping with the heterogeneous types of data existing in the
field, including the re-usage of legacy data formats and methods.
This paper is organized as follows. Section 2 describes the proposed ontol-
ogy, TypOn. Section 3 presents a REST API suitable for managing microbial
typing data. Section 4 briefly details the RESTful Web services prototype im-
plementation. Finally, Section 5 provides some final remarks and future work
Fig. 1. TypOn, an ontology for microbial typing data. Dashed lines represent object
properties and solid lines represent subclass relations, e.g., Host is-a Origin.
2 TypOn – Typing Ontology
An ontology should make available both the vocabulary and the semantic rules
required to proper represent knowledge of a given domain. In this section we
provide an ontology suitable to describe knowledge in the microbiology typing
methods domain, TypOn, depicted in Fig. 1. This ontology was developed and
improved based on comments by domain experts and it constitutes a first pro-
posal, that can be expanded and adapted as new typing methods are developed
and already existing ones are updated. The ontology was developed with the help
of the Prot´eg´e editor [11] and is available at
The main aim of bacterial typing methods is the characterization of bacterial
populations, where each sampled microorganism becomes an isolate, referring to
the process of isolating it from the bacterial population. Thus, Isolate is a main
concept for TyPon and it is characterized by several properties. An isolate be-
longs to a Species, property belongsToSpecies, which makes part of a Genus,
property hasSpecies. The property belongsToSpecies has the property hasIsolate
as its inverse. Moreover, for each Isolate, we know its Origin, either Host or
Environment, its GeographicInformation and its TypingInformation. Note that
aHost belongs also to a Species and that both Host and Environment may
also have GeographicInformation. Although properties hasGeographicInforma-
tion and hasOrigin have usually cardinality at most one for each Isolate, the
property hasTypingInformation has usually cardinality higher than one for each
Isolate. For instance, an Isolate usually has available information for several
typing methodologies such as MLST, antibiograms, etc. In this context, it is im-
portant to note that TypingInformation is the root of a class hierarchy which is
extensible and that defines several typing methods (see Fig. 1). In particular, we
are able to distinguish different categories of typing methods, e.g., the ontology
allow us to infer that MSLTST is a Genotypic technique and that, in contrast,
Antibiogram is a Phenotypic technique.
As mentioned before, the current version of TyPon focus on MLST concepts,
since it is the most widely used sequence based typing technique. In this context,
we note in particular the concepts Locus,Allele,MLSTSchema and MLSTST.
In MLST we can have several typing schemas described by a set of loci, each one
being part of a sequence of an housekeeping gene. Such schemas are represented
through the class MLSTSchema, which has the property hasLocus. Then, each
Isolate may have associated one or more typing informations, obtained with dif-
ferent schemas, i.e., MLSTST instances, known as sequence types characterized
by the observed alleles for each locus. Therefore, in our ontology, we associate to
each MLSTST both a schema and the observed alleles through properties hasS-
chema and hasAllele, respectively. Notice also that hasAllele is a property shared
by MLSTST and Locus classes and, thus, it does not have isLocus property as
its inverse. It is also interesting to note that, by knowing only the Locus, it is
possible to be aware of the Species that it belongs to, using the isOfGene and
belongsToSpecies properties. The property belongsToSpecies is also an example
of a property which has more than one class as domain.
We have also detailed the Antibiogram typing information technique in the
current version. Namely, we have represented each Antibiotic as a concept, allow-
ing the addition of new antibiotics as needed. The reaction of a given antibiotic
is also represented as a concept, AntibioticReaction, allowing that each Antibi-
ogram may have associated one or more antibiotic reactions, depending on the
number of used antibiotics. These relations are given through the object prop-
erties hasAntibioticReaction and hasAntibiotic, respectively.
Additional information for each class, such as id and other name, are de-
scribed through data properties. For instance, the class GeographicInformation
has data properties such as Country and Region. The class Isolate has data
properties such as Strain and Year.
3 RESTful Web services
A second contribution of our work is a RESTful API for making available mi-
crobial typing data represented through the above ontology. A Web services
framework is under development, making use of the Jena Semantic Web Frame-
work [9] and other standard Java technologies for developing Web services. The
set of endpoints that were defined for retrieving microbial typing data include:
The URI parameters, i.e., the text inside {}’s, represent specific identifiers. For
instance, {typingmethod},{genusid}and {speciesid}should be parametrized
with the name of the typing method (e.g. MLST), the name of the genus (e.g.
Streptococus) and the name of the species (e.g. pneumoniae), respectively.
Each endpoint with {}’s at the end refers to a resource identified by a given
id or unique label. As an example, with the endpoint
we may obtain the information of a specific sequence type. Moreover, with these kind
of endpoints it also possible to replace their information, using the POST method. The
other endpoints retrieve all individuals of a respective class. For instance, the endpoint
retrieves all existing MLST sequence types in the database for the specified parameters
{typingmethod},{genusid}and {speciesid}. We can also add more individuals with
these kind of endpoints, using the PUT method. However, data deletion is only possible
through the endpoints
by using the DELETE method.
All endpoints return either text/html or application/json. There is also avail-
able a SPARQL endpoint and an authenticated endpoint to retrieve and submit data
represented as rdf/xml. A more comprehensive description for the MLST data related
endpoints is available at
4 Implementation
A prototype Web client that makes use of the RESTful API and that allows users
to explore and query data for some of the MLST public datasets, is also available at In this prototype it is possible to query by MLST
schema, MLSTSchema, by the id of the sequence type, MLSTST, and by locus, Locus.
Also, the MLST schema and alleles can be downloaded in more than one format. A
graphical visualization of isolate statistics is also available in this prototype.
Our implementation makes use of the Jena Semantic Web Framework [9] and other
standard Java technologies for developing Web services. Jena provides an API to deal
with RDF data, namely a SPARQL processor for querying RDF data. In our imple-
mentation, both TypOn and all typing data are stored as RDF statements on a triple
store. We are currently using the TDB triple store, a component of Jena for RDF
storage and query. Although the Jena framework can use several reasoners, including
OWL-DL reasoners, we are using the internal RDFS reasoner for validation purposes
Fig. 2. Architecture of the Web service prototype. A REST API implemented over
the Jersey framework, is made available, where data is accessed through the Jena
framework. On the client side, we have implemented a Java REST client library and a
Web application implemented over the Google Web Toolkit (GWT).
only. Nevertheless, given Jena flexibility, we can easily process our repository of state-
ments through a more powerful reasoner, and insert inferred and relevant statements
back to our repository. This is particularly useful whenever we update the ontology
with new or equivalent concepts and properties, or when we want to index frequent
SPARQL queries, in order to improve their speed. Moreover, under the open world
assumption, with data distributed over several repositories, one may need to crawl and
index several repositories, possibly instances of our Web service implementation, before
proceed with reasoning and inference.
The REST API made available uses the Jersey implementation of JAX-RS (JSR
311), a Java API for RESTful Web services that provides support in creating Web
services according to the REST architectural style. This implementation is an official
part of Java EE 5 and it facilitates the implementations of RESTful Web services
through the usage of annotations, simplifying the development and deployment of Web
service clients and endpoints.
In the current implementation, any user can query the repository and only authen-
ticated users can insert, update or delete data. A more refined authorization model is
under development.
5 Final remarks
The proposed ontology provides the basic concepts needed to establish the semantic
relationships of the different sequence-based typing methodologies, and it is designed
to allow further expansion. It should be easily expanded to encompass the newer NGS
SNP typing techniques that are appearing in the microbial typing field, while provid-
ing a consistent link with legacy techniques and other databases. This Semantic Web
approach for sharing microbial typing data also allows for local databases from differ-
ent institutes and different methods to be connected through the use of specific REST
Moreover, the proposed REST interface and ontology facilitates the decoupling be-
tween the information system and its possible client technologies, allowing the sharing
of data in human- and machine-readable formats. This approach allows the design of
novel interfaces between different databases and data analysis softwares, through the
use of Web services mashups.
An immediate practical use of the framework is to provide the microbiology re-
searchers with a quick and effective way to share data on new methods being devel-
oped based on sequence typing methods, since the creation of a new typing schema
and adding its concepts on the ontology is straightforward. The information available
for isolates typed using a new typing schema can then be parsed to RDF statements
and uploaded to a server authenticated SPARQL endpoint and, then, a new database
is automatically accessible. The GWT Web client provides then to the end-users a
friendly interface for data access for querying and submitting new data.
Future work will focus on expanding the ontology and creating Web services to
perform automated curation of data directly from sequencer files, in order to speed up
the curation process, and ensure better quality and reproducibility of data in the field
of microbial typing.
Acknowledgments. The work presented in this paper made use of data available at [10], PubMLST [12] and Institut Pasteur MLST Databases [8].
1. Aguiar, S., Serrano, I., Pinto, F., Melo-Cristino, J., Ramirez, M.: Changes in Strep-
tococcus pneumoniae serotypes causing invasive disease with non-universal vacci-
nation coverage of the seven-valent conjugate vaccine. Clinical Microbiology and
Infection 14(9), 835–843 (2008)
2. Fielding, R.: Architectural styles and the design of network-based software archi-
tectures. Ph.D. thesis, Citeseer (2000)
3. Harris, S., Feil, E., Holden, M., Quail, M., Nickerson, E., Chantratita, N., Gardete,
S., Tavares, A., Day, N., Lindsay, J., et al.: Evolution of MRSA during hospital
transmission and intercontinental spread. Science 327(5964), 469 (2010)
4. MacLean, D., Jones, J., Studholme, D.: Application of’next-generation’sequencing
technologies to microbial genetics. Nature Reviews Microbiology 7(4), 287–296
5. Mwangi, M., Wu, S., Zhou, Y., Sieradzki, K., De Lencastre, H., Richardson, P.,
Bruce, D., Rubin, E., Myers, E., Siggia, E., et al.: Tracking the in vivo evolution
of multidrug resistance in Staphylococcus aureus by whole-genome sequencing.
Proceedings of the National Academy of Sciences 104(22), 9451 (2007)
6. Spratt, B.: Multilocus sequence typing: molecular typing of bacterial pathogens in
an era of rapid DNA sequencing and the internet. Current opinion in microbiology
2(3), 312–316 (1999)
7. Van Belkum, A., Struelens, M., De Visser, A., Verbrugh, H., Tibayrenc, M.: Role
of genomic typing in taxonomy, evolutionary genetics, and microbial epidemiology.
Clinical microbiology reviews 14(3), 547 (2001)
8. Institut Pasteur MLST Databases., Pasteur Insti-
9. Jena A Semantic Web Framework for Java., HP
and Others
10. MLST: Multi Locus Sequence Typing., Imperial College of
11. The Prot´eg´e Ontology Editor and Knowledge Acquisition System. http://, Stanford Center for Biomedical Informatics Research
12. PubMLST., University of Oxford (UK)
13. Ridom SpaServer., Ridom bioinformatics
14. Streptococcus pyogenes emm sequence database.
biotech/strep/M-ProteinGene_typing.htm, CDC
... TypOn was developed from a previous prototype ontology [6], and focuses on sequencebased typing methods, including novel NGS methodologies. We discuss the connection of TypOn to existing ontologies, how to use it to annotate data already publicly available, and the methods to effectively query it. ...
... The main concepts and properties defined in TypOn are depicted in Figures 1 and 2. The ontology, which is an extended version of a previous prototype ontology [6], is available at typon. ...
Full-text available
Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resistance. Due to advances in DNA sequencing technology, these methods have evolved to become focused on sequence-based methodologies. The need to have a common understanding of the concepts described and the ability to share results within the community at a global level are increasingly important requisites for the continued development of portable and accurate sequence-based typing methods, especially with the recent introduction of Next Generation Sequencing (NGS) technologies. In this paper, we present an ontology designed for the sequence-based microbial typing field, capable of describing any of the sequence-based typing methodologies currently in use and being developed, including novel NGS based methods. This is a fundamental step to accurately describe, analyze, curate, and manage information for microbial typing based on sequence based typing methods.
... ions [64]. To achieve these goals, an ontology of terms in the field must be explicitly described. Ontologies provide a formal, standardised representation of the data and the relationships between the data entities [65] . Recently, the prototype of an ontology for microbial typing was proposed and made publicly available at [66] . The use of the ontology and the concepts of Linked Data for the construction of webservices for data exchange and validation could prove fundamental for the integration of the present techniques with the new NGS methods. This would allow NGS databases and data analysis algorithms to be validated against the large body of data availabl ...
Full-text available
Advances in typing methodologies have been the driv-ing force in the field of molecular epidemiology of pathogens. The development of molecular methodolo-gies, and more recently of DNA sequencing methods to complement and improve phenotypic identification methods, was accompanied by the generation of large amounts of data and the need to develop ways of stor-ing and analysing them. Simultaneously, advances in computing allowed the development of special-ised algorithms for image analysis, data sharing and integration, and for mining the ever larger amounts of accumulated data. In this review, we will discuss how bioinformatics accompanied the changes in bac-terial molecular epidemiology. We will discuss the benefits for public health of specialised online typing databases and algorithms allowing for real-time data analysis and visualisation. The impact of the new and disruptive next-generation sequencing methodologies will be evaluated, and we will look ahead into these novel challenges.
Full-text available
Current methods for differentiating isolates of predominant lineages of pathogenic bacteria often do not provide sufficient resolution to define precise relationships. Here, we describe a high-throughput genomics approach that provides a high-resolution view of the epidemiology and microevolution of a dominant strain of methicillin-resistant Staphylococcus aureus (MRSA). This approach reveals the global geographic structure within the lineage, its intercontinental transmission through four decades, and the potential to trace person-to-person transmission within a hospital environment. The ability to interrogate and resolve bacterial populations is applicable to a range of infectious diseases, as well as microbial ecology.
Full-text available
New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.
Full-text available
Currently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing data for discrimination between genotypes. Apart from being an essential component of these fundamental sciences, microbial typing clearly affects several areas of applied microbiological research. The epidemiological investigation of outbreaks of infectious diseases and the measurement of genetic diversity in relation to relevant biological properties such as pathogenicity, drug resistance, and biodegradation capacities are obvious examples. The diversity among nucleic acid molecules provides the basic information for all fields described above. However, researchers in various disciplines tend to use different vocabularies, a wide variety of different experimental methods to monitor genetic variation, and sometimes widely differing modes of data processing and interpretation. The aim of the present review is to summarize the technological and fundamental concepts used in microbial taxonomy, evolutionary genetics, and epidemiology. Information on the nomenclature used in the different fields of research is provided, descriptions of the diverse genetic typing procedures are presented, and examples of both conceptual and technological research developments for Escherichia coli are included. Recommendations for unification of the different fields through standardization of laboratory techniques are made.
Full-text available
The spread of multidrug-resistant Staphylococcus aureus (MRSA) strains in the clinical environment has begun to pose serious limits to treatment options. Yet virtually nothing is known about how resistance traits are acquired in vivo. Here, we apply the power of whole-genome sequencing to identify steps in the evolution of multidrug resistance in isogenic S. aureus isolates recovered periodically from the bloodstream of a patient undergoing chemotherapy with vancomycin and other antibiotics. After extensive therapy, the bacterium developed resistance, and treatment failed. Sequencing the first vancomycin susceptible isolate and the last vancomycin nonsusceptible isolate identified genome wide only 35 point mutations in 31 loci. These mutations appeared in a sequential order in isolates that were recovered at intermittent times during chemotherapy in parallel with increasing levels of resistance. The vancomycin nonsusceptible isolates also showed a 100-fold decrease in susceptibility to daptomycin, although this antibiotic was not used in the therapy. One of the mutated loci associated with decreasing vancomycin susceptibility (the vraR operon) was found to also carry mutations in six additional vancomycin nonsusceptible S. aureus isolates belonging to different genetic backgrounds and recovered from different geographic sites. As costs drop, whole-genome sequencing will become a useful tool in elucidating complex pathways of in vivo evolution in bacterial pathogens.
New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.
The pneumococcal seven-valent conjugate vaccine (PCV7) has been administered in Portugal since late 2001 through the private sector. To evaluate the impact of PCV7 use, the serotypes and antimicrobial susceptibility of pneumococci causing invasive disease in Portugal during 2003-2005 were determined and compared with available data for the period 1999-2002. Changes in serotype distribution compatible with the introduction of PCV7 were shown for children <or=5 years of age from 2003 onwards and for adults from 2004 onwards. PCV7 use with coverage of 43% of children with four doses in the 2004 birth cohort, although substantially below universal coverage, seems to have contributed to greatly reducing the proportion of invasive infections due to vaccine serotypes 4, 6B, 14 and 23F. Similarly, significant indirect effects on the serotype distribution of pneumococci causing infections in adults were noted, with reductions in the proportion of invasive infections caused by serotypes 4, 5 and 14. These changes were accompanied by an increase in the proportion of two non-vaccine serotypes: 19A isolates in all age groups and 7F isolates in adults. Whereas serotypes 6B, 14 and 19A were associated with multidrug resistance, isolates expressing serotypes 4 and 7F were fully susceptible for the most part. There were no changes in the proportion of resistant isolates within each serotype and, in spite of the changes in serotype prevalence, there was not an overall reduction in the proportion of infections caused by resistant pneumococci.
Multilocus sequence typing is a development of multilocus enzyme electrophoresis in which the alleles at multiple house-keeping loci are assigned directly by nucleotide sequencing, rather than indirectly from the electrophoretic mobilities of their gene products. A major advantage of this approach is that sequence data are unambiguous and electronically portable, allowing molecular typing of bacterial pathogens (or other infectious agents) via the Internet.
Ridom bioinformatics
  • Spaserver