Content uploaded by Ben C Stöver
Author content
All content in this area was uploaded by Ben C Stöver on Aug 19, 2017
Content may be subject to copyright.
Ben C. Stöver1, Sarah Wiechers1, Kai F. Müller1
1) Evolution and Biodiversity of Plants Group, Institute for Evolution and Biodiversity, WWU Münster, Hüfferstr. 1, 48149 Münster, Germany
JPhyloIO — A Java library for event-based reading and
writing of different alignment and tree formats through
one common interface
Event based document reading
In JPhyloIO documents are represented as sequences of events that model the contained elements. Fig-
ure 1 shows the grammar that defines how documents can be represented as event sequences, while fig-
ure 2 shows an example document and its translation into events. All elements can carry metadata.
http://www2.ieb.uni-muenster.de/EvolBiodivPlants
Poster download: http://go.wwu.de/po5ac
Citations: Han, M.V. & Zmasek, C.M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10, 356. Kumar, S., Stecher, G. & Tamura, K. (2016). MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Da-
tasets. Molecular Biology and Evolution, msw054. Maddison, D.R., Swofford, D. & Maddison, W.P. (1997). NEXUS: an extensible file format for systematic information. Systematic Biology, 46, 590–621. Stöver, B.C. & Müller, K.F. (2010). TreeGraph 2: Combining
and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics, 11, 7. Vos, R.A., Balhoff, J.P., Caravas, J.A., Holder, M.T., Lapp, H., Maddison, W.P., Midford, P.E., Priyam, A., Sukumaran, J., Xia, X. & Stoltzfus, A. (2012). NeXML: Rich, Extensible,
and Verifiable Representation of Comparative Data and Metadata. Systematic Biology, 61, 675–689.
Figure 2 The shown example document contains an OTU list and an align-
ment, which references this list. The shown sequence of events is generated
from it according to the grammar in figure 1, where each box represents one
event object. Each object has an ID in order to be referenced by subsequent
events, as exemplarily shown on the OTU list and OTU start events, which
are referenced by the according alignment and sequence start events.
Document = "DOCUMENT.START", {DocumentContent,} "DOCUMENT.END";
DocumentContent = OTUSet | Matrix | TreeNetworkGroup | CharacterSetPart | TreeNetworkSet | MetaInformaon;
OTUList = "OTUS.START", {OTUListContent,} {OTUSet,} "OTUS.END";
OTUListContent = OTU | MetaInformaon;
OTU = "OTU.START", {MetaInformaon,} "OTU.END";
OTUSet = "OTU_SET.START", {SetContent,} "OTU_SET.END";
Matrix = "ALIGNMENT.START", {MatrixContent,} "ALIGNMENT.END";
MatrixContent = CharacterDenion | TokenSetDenion | SequencePart | CharacterSetPart | SequenceSet | MetaInformaon;
CharacterDenion = "CHARACTER_DEFINITION.START" {MetaInformaon,} "CHARACTER_DEFINITION.END";
SequenceSet = "SEQUENCE_SET.START" {SetContent,} "SEQUENCE_SET.END";
TokenSetDenion = "TOKEN_SET_DEFINITION.START", {TokenSetDenionContent,} "TOKEN_SET_DEFINITION.END";
TokenSetDenionContent = SingleTokenDenion | MetaInformaon;
SingleTokenDenion = "SINGLE_TOKEN_DEFINITION.START", {MetaInformaon,} "SINGLE_TOKEN_DEFINITION.END";
SequencePart = "SEQUENCE.START", {SequencePartContent,} "SEQUENCE.END";
SequencePartContent = "SEQUENCE_TOKENS.SOLE" | SingleSequenceToken | MetaInformaon;
SingleSequenceToken = "SINGLE_SEQUENCE_TOKEN.START", {MetaInformaon,} "SINGLE_SEQUENCE_TOKEN.END";
CharacterSetPart = "CHARACTER_SET.START", {CharacterSetPartContent,} "CHARACTER_SET.END";
CharacterSetPartContent = "CHARACTER_SET_PART.SOLE" | SetContent;
(* In character sets only references to other character sets (and not single character denions) are using "SET_ELEMENT.SOLE". *)
TreeNetworkGroup = "TREE_NETWORK_GROUP.START", {TreeNetworkGroupContent,} "TREE_NETWORK_GROUP.END";
TreeNetworkGroupContent = Tree | Network | TreeNetworkSet;
Tree = "TREE.START", {TreeOrNetworkContent,} ["ROOT_EDGE.START",] {TreeOrNetworkContent,} {NodeEdgeSet,} "TREE.END";
Network = "NETWORK.START", {TreeOrNetworkContent,} {NodeEdgeSet,} "NETWORK.END";
TreeOrNetworkContent = Node | Edge | MetaInformaon;
Node = "NODE.START", {MetaInformaon,} "NODE.END";
Edge = "EDGE.START", {MetaInformaon,} "EDGE.END";
TreeNetworkSet = "TREE_NETWORK_SET.START" {SetContent,} "TREE_NETWORK_SET.END";
NodeEdgeSet = "NODE_EDGE_SET.START" {SetContent,} "NODE_EDGE_SET.END";
SetContent = "SET_ELEMENT.SOLE" | MetaInformaon;
(* Single elements and other sets of the same type can be linked using "SET_ELEMENT.SOLE". *)
MetaInformaon = ResourceMeta | LiteralMeta;
ResourceMeta = "RESOURCE_META.START", {MetaInformaon,} "RESOURCE_META.END";
LiteralMeta = "LITERAL_META.START", {"LITERAL_META_CONTENT.SOLE",} "LITERAL_META.END";
Figure 1 EBNF grammar describing JPhyloIO
event sequences. The terminal symbols (in
green) represent the types of events, each of
which either has a single SOLE or a START and
END version, depending on whether additional
data can be nested or not.
Availability
JPhyloIO is distributed under GNU
General Public License Version 3 at the
BioInfWeb Software portal:
http://bioinfweb.info/JPhyloIO
Aims and concept
To date many bioinformatic software tools support only a
single format. Applications based on JPhyloIO need to im-
plement just one single reader and writer to support all for-
mats without needing detailed knowledge of these. This
should increase interoperability and foster the usage of
more recently proposed powerful formats, such as NeXML.
Our library allows to access nine phylogenetic file formats
through one common interface, providing access to all fea-
tures of each format (including complex metadata of NeXML
and PhyloXML). Documents are translated to a stream of
events (see figure 1 and 2), allowing memory efficient pro-
cessing independent of the application business model.
Acknowledgements
The funding of parts of the development of JPhyloIO
with grant MU 2875/3-1 to KFM by the DFG (German re-
search foundation) is highly appreciated. BCS wants
to thank the European Conference on Computational
Biology (ECCB) and the International Society for Com-
putational Biology (ISCB) for partly financing the
presentation of this poster at the ECCB 2016. Further-
more the authors are very thankful to the developers
of the other open source projects JPhyloIO uses
(Apache commons, OWL API, JUnit, Hemcrest).
Writing events using data adapters
Since different formats require the data in differ-
ent orders, a simple event stream is not efficient
for writing. A number of data adapters have been
defined instead, each of them providing a subse-
quence of the event stream.
Figure 3 The data adapters of JPhyloIO are implemented by
an application to provide access to its business model.
Figure 4 An UML class diagram showing the inheritance and relations between the different data adapt-
er interfaces of JPhyloIO. DocumentDataAdapter is the start point providing access to the others.
Supported formats
NeXML (Vos et al., 2012)
Nexus (Maddison et al., 1997)
PhyloXML (Han & Zmasek, 2009)
FASTA
Newick tree format
Phylip and extended Phylip (also sequential)
MEGA (Kumar et al., 2016)
PDE used by the alignment editor PhyDE
XTG used by TreeGraph 2 (Stöver & Müller, 2010)