PosterPDF Available

Increasing data accessibility and reuse in phylogenetics by employing externally defined ontologies



In philosophy ontology is defined as the study of the nature and relations of being. Information sciences apply this concept to create naming conventions describing types, properties and interrelationships of certain entities. Ontologies aim to limit complexity and organize information. In phylogenetics they allow scientists to provide semantic information on data in a machine-readable way. Using fixed vocabularies from external ontologies makes the data more easily accessible for other researchers and facilitate its reuse. This is becoming more and more important with a rapidly increasing amount of biological data available in online sources. Here we present three software projects developed in our group that foster the use of external ontologies to enable users to handle according metadata. JPhyloIO is a software library that allows developers to support reading and writing different phylogenetic file formats (modeling alignments and trees), including full support for the format’s metadata models and the use of externally defined ontologies. LibrAlign and TreeGraph 2 on the other hand allow to process and visualize such metadata in bioinformatical software applications. LibrAlign is a library providing GUI components to display and edit multiple sequence alignments allowing externally implemented data areas for metadata described using externally defined ontologies. Different types of branch labels available in TreeGraph 2 offer a similar functionality for phylogenetic trees. In combination our software components simplify the use of metadata from externally defined ontologies both in data storage and in data processing for the main phylogenetic data types to foster accessibility and reuse.
Sarah Wiechers1, Kai F. Müller1, Ben C. Stöver1
1) Evolution and Biodiversity of Plants Group, Institute for Evolution and Biodiversity, WWU Münster, Hüfferstr. 1, 48149 Münster, Germany
Increasing data accessibility and reuse in phylogenetics
by employing externally defined ontologies
Software components to process and
display externally defined metadata
Poster download:
Citations: Han MV, Zmasek CM: phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 2009, 10(1), 356. Kilian N, Henning T, Plitzner P, Müller A, Güntsch A, Stöver BC, Müller KF, Berendsohn WG, Borsch T: Sample data pro-
cessing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens. Database 2015, 2015:bav094 Maddison DR, Swofford DL, Maddison WP: Nexus: An Extensible File Format for Systematic
Information. Systematic Biology 1997, 46(4), 590621. Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30 (9), 1312-1313. Stöver BC, Müller KF: TreeGraph 2: Combining and visualizing
evidence from different phylogenetic analyses. BMC Bioinformatics 2010, 11(7). Vos R, Balhoff J, Caravas J, Holder M, Lapp H, Maddison WP, Midford P, Priyam A, Sukumaran J, Xia X, Stoltzfus A: NeXML: Rich, Extensible, and Verifiable Representation of Compara-
tive Data and Metadata. Systematic Biology 2012, 61(4), 675689. Resource Description Framework (RDF) Model and Syntax Specification. (1999). Retrieved March 21, 2017, from
Why ontologies are useful
While large amounts of data are produced
in the life sciences every day, it is often dif-
ficult to extract information from textual de-
scriptions in a semantically correct and
unambiguous way to be able to (algorithmi-
cally) re-use these results. Standardized
and unambiguous statements about the
contents of the file, the analysis methods
and other metadata are needed. Widely
applied ontologies describing e.g. the work-
flow of a phylogenetic analysis would
increase data accessibility and re-use and
significantly simplify downstream analysis.
The funding of parts of the
development of JPhyloIO
and LibrAlign with grant
MU 2875/3-1 to KFM by the
DFG and parts of TreeGraph
2 by the Young Academy of
t h e N o r t h R h i n e -
Westphalian Academy of
Sciences is highly appreci-
ated. Furthermore the au-
thors are very thankful to
the developers of the open
source software used
(Apache commons, Apache
Batik, BioJava, FreeHEP Ja-
va Libraries, Hemcrest, Java
Math Expression Parser,
JPhyloIO and LibrAlign
Both products are software libraries providing shared functionality to be used by developers cre-
ating bioinformatical software applications and are developed in our group under LGPL version 3.
JPhyloIO offers functionality to read and write phylogenetic data from and to different file for-
mats, while providing a generalized access through one common interface. The aim is to allow
developers to support a variety of competing standards in one step to increase interoperability
between phylogenetic software.
LibrAlign provides a set of powerful components for graphical user interfaces that allow to
display and edit multiple sequence alignments as well as associated raw and metadata.
Both libraries are currently used in a number of bioinformatical applications, such as
AlignmentComparator, the Taxonomic Editor of the EDIT platform (Kilian et al., 2015) and
TreeGraph 2 (Stöver & Müller, 2010).
NeXML PhyloXML Nexus
Ontologies and RDF
In the computer science an ontology describes a vocabulary used to define objects and
attributes of objects in a standardized way. The RDF (resource description framework)
standard is often used to formulate statements about elements, which consist of triples
of a subject, predicate and object. Such predicates are used to describe the
relationships between elements. These identifiers of relationships can be described by
externally defined ontologies, which simplifies searching for information because the
meaning of annotations is clearly defined in a central place.
Figure 2 An element in a phylogenetic workflow e.g. a tree node can be described using a statement
consisting of a subject (the element to be described), a predicate (describing the subjects relation to
other elements) and an object (the value of the predicate).
Tree node ‚A‘
Phylogenetic trees, including the shown
bootstrap values, were calculated using
RaXML (Stamatakis, 2014). ?
Figure 1 Information described in a standardized way
using externally defined ontologies can easier be pro-
cessed automatically compared to textual descriptions.
TreeGraph 2
TreeGraph 2 is able to display metadata
defined by external ontologies using
different branch labels and offers a
metadata table for each tree document.
Figure 3 Different software components that can be used to simplify the use of metadata defined in an external ontology. The file for-
mats NeXML (Vos et al., 2012), PhyloXML (Zmasek et al., 2009) and Nexus (Maddison et al., 1997) each support attaching metadata to
phylogenies and/or multiple sequence alignments. The library JPhyloIO allows access to all this formats through one interface. Finally,
LibrAlign and TreeGraph 2 are able to visualize the metadata from external ontologies. Both the ontologies and the data areas can be
externally defined e.g. for specific purposes.
LibrAlign allows to attach externally im-
plemented data areas to sequences and
whole alignments, which are the coun-
terpart to externally defined ontologies.
JPhyloIO generalizes over the
different formats by creating a
sequence of data events from
each file. It supports using predi-
cates from externally defined
Different formats for multiple sequence alignments or phylogenetic trees allow to
attach different kinds of metadata. NeXML offers the most comprehensive model
using RDF annotations.
<meta href="pherogram.scf"
rel="ont:pherogram" … />
<property ref="ont:BayesSupp">
Doc.Start Tree.Start Meta ...
Data areas
TreeGraph 2
TreeGraph 2 (Stöver & Müller, 2010) is a user friendly and wide-
ly used tree editor with a focus on processing, visualizing and
comparing phylogenetic trees carrying numerous annotations. Figure 4 A screenshot of a document opened in TreeGraph 2. On the
right is a table showing node and branch annotations.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.