Sarah Wiechers1, Kai F. Müller1, Ben C. Stöver1
1) Evolution and Biodiversity of Plants Group, Institute for Evolution and Biodiversity, WWU Münster, Hüfferstr. 1, 48149 Münster, Germany
Increasing data accessibility and reuse in phylogenetics
by employing externally defined ontologies
Software components to process and
display externally defined metadata
Poster download: http://go.wwu.de/e4m7n
Citations: Han MV, Zmasek CM: phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 2009, 10(1), 356. Kilian N, Henning T, Plitzner P, Müller A, Güntsch A, Stöver BC, Müller KF, Berendsohn WG, Borsch T: Sample data pro-
cessing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens. Database 2015, 2015:bav094 Maddison DR, Swofford DL, Maddison WP: Nexus: An Extensible File Format for Systematic
Information. Systematic Biology 1997, 46(4), 590–621. Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30 (9), 1312-1313. Stöver BC, Müller KF: TreeGraph 2: Combining and visualizing
evidence from different phylogenetic analyses. BMC Bioinformatics 2010, 11(7). Vos R, Balhoff J, Caravas J, Holder M, Lapp H, Maddison WP, Midford P, Priyam A, Sukumaran J, Xia X, Stoltzfus A: NeXML: Rich, Extensible, and Verifiable Representation of Compara-
tive Data and Metadata. Systematic Biology 2012, 61(4), 675–689. Resource Description Framework (RDF) Model and Syntax Specification. (1999). Retrieved March 21, 2017, from https://www.w3.org/TR/PR-rdf-syntax/
Why ontologies are useful
While large amounts of data are produced
in the life sciences every day, it is often dif-
ficult to extract information from textual de-
scriptions in a semantically correct and
unambiguous way to be able to (algorithmi-
cally) re-use these results. Standardized
and unambiguous statements about the
contents of the file, the analysis methods
and other metadata are needed. Widely
applied ontologies describing e.g. the work-
flow of a phylogenetic analysis would
increase data accessibility and re-use and
significantly simplify downstream analysis.
The funding of parts of the
development of JPhyloIO
and LibrAlign with grant
MU 2875/3-1 to KFM by the
DFG and parts of TreeGraph
2 by the Young Academy of
t h e N o r t h R h i n e -
Westphalian Academy of
Sciences is highly appreci-
ated. Furthermore the au-
thors are very thankful to
the developers of the open
source software used
(Apache commons, Apache
Batik, BioJava, FreeHEP Ja-
va Libraries, Hemcrest, Java
Math Expression Parser,
JUnit, NeXML, OWL API).
JPhyloIO and LibrAlign
Both products are software libraries providing shared functionality to be used by developers cre-
ating bioinformatical software applications and are developed in our group under LGPL version 3.
JPhyloIO offers functionality to read and write phylogenetic data from and to different file for-
mats, while providing a generalized access through one common interface. The aim is to allow
developers to support a variety of competing standards in one step to increase interoperability
between phylogenetic software.
LibrAlign provides a set of powerful components for graphical user interfaces that allow to
display and edit multiple sequence alignments as well as associated raw– and metadata.
Both libraries are currently used in a number of bioinformatical applications, such as
AlignmentComparator, the Taxonomic Editor of the EDIT platform (Kilian et al., 2015) and
TreeGraph 2 (Stöver & Müller, 2010).
NeXML PhyloXML Nexus
Ontologies and RDF
In the computer science an ontology describes a vocabulary used to define objects and
attributes of objects in a standardized way. The RDF (resource description framework)
standard is often used to formulate statements about elements, which consist of triples
of a subject, predicate and object. Such predicates are used to describe the
relationships between elements. These identifiers of relationships can be described by
externally defined ontologies, which simplifies searching for information because the
meaning of annotations is clearly defined in a central place.
Figure 2 An element in a phylogenetic workflow e.g. a tree node can be described using a statement
consisting of a subject (the element to be described), a predicate (describing the subjects relation to
other elements) and an object (the value of the predicate).
Tree node ‚A‘
Phylogenetic trees, including the shown
bootstrap values, were calculated using
RaXML (Stamatakis, 2014). ?
Figure 1 Information described in a standardized way
using externally defined ontologies can easier be pro-
cessed automatically compared to textual descriptions.
TreeGraph 2 is able to display metadata
defined by external ontologies using
different branch labels and offers a
metadata table for each tree document.
Figure 3 Different software components that can be used to simplify the use of metadata defined in an external ontology. The file for-
mats NeXML (Vos et al., 2012), PhyloXML (Zmasek et al., 2009) and Nexus (Maddison et al., 1997) each support attaching metadata to
phylogenies and/or multiple sequence alignments. The library JPhyloIO allows access to all this formats through one interface. Finally,
LibrAlign and TreeGraph 2 are able to visualize the metadata from external ontologies. Both the ontologies and the data areas can be
externally defined e.g. for specific purposes.
LibrAlign allows to attach externally im-
plemented data areas to sequences and
whole alignments, which are the coun-
terpart to externally defined ontologies.
JPhyloIO generalizes over the
different formats by creating a
sequence of data events from
each file. It supports using predi-
cates from externally defined
Different formats for multiple sequence alignments or phylogenetic trees allow to
attach different kinds of metadata. NeXML offers the most comprehensive model
using RDF annotations.
rel="ont:pherogram" … />
Doc.Start Tree.Start Meta ...
TreeGraph 2 (Stöver & Müller, 2010) is a user friendly and wide-
ly used tree editor with a focus on processing, visualizing and
comparing phylogenetic trees carrying numerous annotations.
http://bioinfweb.info/JPhyloIO/ http://bioinfweb.info/LibrAlign/ Figure 4 A screenshot of a document opened in TreeGraph 2. On the
right is a table showing node and branch annotations.