Conference PaperPDF Available

Abstract

Specimens form the falsifiable evidence used in plant systematics. Derivatives of specimens (including the specimen as the organism in the field) such as tissue and DNA samples play an increasing role in research. The EDIT Platform for Cybertaxonomy is a specialist's tool that allows to document and sustainably store all data that are used in the taxonomic work process, from field data to DNA sequences. The types of data stored can be very heterogeneous consisting of specimens, images, text data, primary data files, taxon assignments, etc. The EDIT Platform organizes the linking between such data by using a generic data model for representing the research process. Each step in the process is regarded as a derivation step and generates a derivative of the previous step. This could be a field unit having a specimen as its derivative or a specimen having a tissue sample as its derivative. Each derivation step also produces meta data storing who, when and how the derivation was done. The Platform's Common Data Model (CDM) and the applications build on the CDM library thus represent the first comprehensive implementation of the largely theoretical models developed in the late 1990ies (Berendsohn et al. 1999). In a pilot project research data about the genus Campanula (Kilian et al. 2015, FUB, BGBM 2012) was gathered and used to create a hierarchy of derivatives reaching from field data to DNA sequences. Additionally, the open source library for multiple sequence alignments LibrAlign (Stöver and Müller 2015) was used to integrate an alignment editor into the EDIT platform that allows to generate consensus sequences as derivatives of DNA sequences. The persistent storage of each link in the derivation process and the degree of detail on how the data and meta data are stored will speed up the research process, ease the reproducibility of research results and enhance sustainability of collections.
Proceedings of TDWG 1: e20366
doi: 10.3897/tdwgproceedings.1.20366
Conference Abstract
The CDM Applied: Unit-Derivation, from Field
Observations to DNA Sequences
Patrick Plitzner , Andreas Müller , Anton Güntsch , Walter G. Berendsohn , Andreas Kohlbecker ,
Norbert Kilian , Tilo Henning , Ben Stöver
‡ Botanic Garden and Botanical Museum, Freie Universität, Berlin, Germany
§ Universität Münster, Münster, Germany
Corresponding author: Patrick Plitzner (p.plitzner@bgbm.org), Andreas Müller (a.mueller@bgbm.org), Anton
Güntsch (a.guentsch@bgbm.org), Norbert Kilian (n.kilian@bgbm.org)
Received: 16 Aug 2017 | Published: 16 Aug 2017
Citation: Plitzner P, Müller A, Güntsch A, Berendsohn W, Kohlbecker A, Kilian N, Henning T, Stöver B (2017) The
CDM Applied: Unit-Derivation, from Field Observations to DNA Sequences. Proceedings of TDWG 1: e20366.
https://doi.org/10.3897/tdwgproceedings.1.20366
Abstract
Specimens form the falsiable evidence used in plant systematics. Derivatives of
specimens (including the specimen as the organism in the eld) such as tissue and DNA
samples play an increasing role in research. The EDIT Platform for Cybertaxonomy is a
specialist’s tool that allows to document and sustainably store all data that are used in the
taxonomic work process, from eld data to DNA sequences. The types of data stored can
be very heterogeneous consisting of specimens, images, text data, primary data les,
taxon assignments, etc.
The EDIT Platform organizes the linking between such data by using a generic data model
for representing the research process. Each step in the process is regarded as a derivation
step and generates a derivative of the previous step. This could be a eld unit having a
specimen as its derivative or a specimen having a tissue sample as its derivative. Each
derivation step also produces meta data storing who, when and how the derivation was
done. The Platform's Common Data Model (CDM) and the applications build on the CDM
library thus represent the rst comprehensive implementation of the largely theoretical
models developed in the late 1990ies (Berendsohn et al. 1999).
‡ ‡
‡ §
© Plitzner P et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY
4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are
credited.
In a pilot project research data about the genus Campanula (Kilian et al. 2015, FUB, BGBM
2012) was gathered and used to create a hierarchy of derivatives reaching from eld data
to DNA sequences. Additionally, the open source library for multiple sequence alignments
LibrAlign (Stöver and Müller 2015) was used to integrate an alignment editor into the EDIT
platform that allows to generate consensus sequences as derivatives of DNA sequences.
The persistent storage of each link in the derivation process and the degree of detail on
how the data and meta data are stored will speed up the research process, ease the
reproducibility of research results and enhance sustainability of collections.
Keywords
EDIT Platform, Taxonomy, Specimen
Presenting author
Andreas Müller
Funding program
For funding of the EDIT Platform for Cybertaxonomy please refer to the presentation by
Kohlbecker & al. The Campanula project was funded by the German Research Foundation
(DFG, Deutsche Forschungsgemeinschaft) within the Scientic Library Services and
Information Systems programme (KI 1175/1-1, MU 2875/3-1).
Hosting institution
Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Germany
References
Berendsohn W, Anagnostopoulos A, Hagedorn G, Jakupovic J, Nimis PL, Valdés B,
Güntsch A, Pankhurst R, White R, Valdés B, Güntsch A (1999) A Comprehensive
Reference Model for Biological Collections and Surveys. Taxon 48 (3): 511. https://
doi.org/10.2307/1224564
FUB, BGBM (2012) Campanula Portal. http://campanula.e-taxonomy.net/. Accessed on:
2017-8-15.
Kilian N, Henning T, Plitzner P, Müller A, Güntsch A, Stöver BC, Müller KF, Berendsohn
WG, Borsch T (2015) Sample data processing in an additive and reproducible
taxonomic workow by using character data persistently linked to preserved individual
2Plitzner P et al
specimens. Database: The Journal of Biological Databases and Curation 2015: 119.
https://doi.org/10.1093/database/bav094
Stöver BC, Müller KF (2015) LibrAlign - A powerful Java GUI library for MSA and
attached raw and meta data. http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/
Publications/ConferenceContribution?id=100872. Accessed on: 2017-8-15.
The CDM Applied: Unit-Derivation, from Field Observations to DNA Sequences 3
Article
Full-text available
Herbarium specimens have always played a central role in the classical disciplines of plant sciences and the global digitisation efforts now open new horizons. To make full use of the inherent possibilities of specimen based taxonomic descriptions corresponding workflows are needed. A crucial step in the comparative analyses of organisms is the preparation of a character matrix to record and compare the morphological variation of taxa on the basis of individual specimens. This project focuses on the optimisation of the taxonomic research process with respect to delimitation and characterisation (“descriptions”) of taxa (Henning et al. 2018). The angiosperm order Caryophyllales provides exemplar use cases through cooperation with the Global Caryophyllales Initiative (Borsch et al. 2015). The workflow for sample data handling (Kilian et al. 2015), implemented on the EDIT Platform for Cybertaxonomy (http://www.cybertaxonomy.org, Ciardelli et al. 2009), has been extended to support additive characterisation of taxa via specimen character data. The Common Data Model (CDM), already supporting persistent inter-linking of specimens and their metadata (Plitzner et al. 2017), has been adapted to facilitate specimen descriptions with characters constructed from the combination of structure and property terms and their corresponding states. Semantic web technology is used to establish and continuously elaborate expert community-coordinated exemplar vocabularies with term ontologies and explanations for characters and states (GFBio Terminology Service, Karam et al. 2016). Character data are recorded and stored in structured form in character state matrices for individual specimens instead of taxa, which allows generation of taxon characterisations by aggregating the data sets for the individual specimens included. Separating characters in structures and properties, which are based on concepts in public ontologies, guarantees a high visibility and instant re-usability of these character data. Taking into account that taxon concepts evolve during the iterative knowledge generation process in systematic biology, additivity of character data from specimen to taxon level therefore greatly facilitates the construction and reproducibility of taxon characterisations from changing specimen and character data sets.
Article
Full-text available
The Caryophyllales Network strives to assemble an online dynamic synthesis of the order Caryophyllales, uniting the current knowledge about the phylogeny of the order with up-to-date information on the individual taxa contained. Capturing taxonomic data and the decision processes involved in the definition and circumscription of the taxa requires highly complex specialized software. The Caryophyllales Network uses the EDIT Platform for Cybertaxonomy for that purpose. In the context of the online treatment of the family Nepenthaceae, we describe the steps taken to assemble the database, the interaction with other electronic sources, the links with the World Flora Online initiative, and the prospects for the maintenance and further development of the Nepenthaceae segment of the Caryophyllales database. Nepenthaceae constitute an example of a family with a relatively recent flora treatment (Flora Malesiana, published in 2001), which to a large extent covers its total range of distribution, but with further species subsequently described as new to science in mostly regional treatments, and with an analysis of relationships and species limits on the basis of evolutionary methods just emerging. A snapshot of the current state of the database is provided as an annotated checklist in PDF format in the Supplementary Material online, which includes 176 species and nine naturally occurring named hybrids and treats 435 species and infraspecific names.
Poster
Full-text available
http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/Publications/ConferenceContribution?id=100872
Article
Full-text available
Unlabelled: We present the model and implementation of a workflow that blazes a trail in systematic biology for the re-usability of character data (data on any kind of characters of pheno- and genotypes of organisms) and their additivity from specimen to taxon level. We take into account that any taxon characterization is based on a limited set of sampled individuals and characters, and that consequently any new individual and any new character may affect the recognition of biological entities and/or the subsequent delimitation and characterization of a taxon. Taxon concepts thus frequently change during the knowledge generation process in systematic biology. Structured character data are therefore not only needed for the knowledge generation process but also for easily adapting characterizations of taxa. We aim to facilitate the construction and reproducibility of taxon characterizations from structured character data of changing sample sets by establishing a stable and unambiguous association between each sampled individual and the data processed from it. Our workflow implementation uses the European Distributed Institute of Taxonomy Platform, a comprehensive taxonomic data management and publication environment to: (i) establish a reproducible connection between sampled individuals and all samples derived from them; (ii) stably link sample-based character data with the metadata of the respective samples; (iii) record and store structured specimen-based character data in formats allowing data exchange; (iv) reversibly assign sample metadata and character datasets to taxa in an editable classification and display them and (v) organize data exchange via standard exchange formats and enable the link between the character datasets and samples in research collections, ensuring high visibility and instant re-usability of the data. The workflow implemented will contribute to organizing the interface between phylogenetic analysis and revisionary taxonomic or monographic work. Database url: http://campanula.e-taxonomy.net/.
Article
Full-text available
8 Summary Berendsohn, W. G., Anagnostopoulos, A., Hagedorn, G., Jakupovic, J., Nimis, P. L., Valdés, B., Güntsch, A., Pankhurst, R. J. & White, R. J.: A comprehensive reference model for biological collections and surveys. - Taxon 48: 511-562. 1999. ISSN 0040-0262. The article describes an extended entity-relationship model covering biological collections, i.e. natural history collections of biotic origin; data collections used in floristic or faunistic mapping, survey, and monitoring projects; live collections such as botanical or zoological gardens, seed banks, microbial strain collections and gene banks; as well as novel collection kinds such as of secondary metabolites or DNA samples. The central element in the model is the unit, which stands for any object containing, being or being part of a living, fossilised, or conserved organism. The unit may be gathered (observed or collected) in the field and derived units may recursively emerge from it through specimen processing, breeding or cultivation. In addition, units may form associations (e.g. host/parasite), ensembles (lichen on a rock with fossils), and assemblages (herd, artificial grouping). Gathering events, specimen management (acquisition, accession, storage, preservation, exchange, ownership), and taxonomic or other identifications relate to the unit and are treated in detail. Geographic and geo-ecological data have not been fully modelled; taxonomic (name) data and descrip- tive information are treated by reference to other published models.
Unit-Derivation, from Field Observations to DNA Sequences
  • Cdm The
  • Applied
The CDM Applied: Unit-Derivation, from Field Observations to DNA Sequences