ArticlePDF Available

Abstract

The paper summarises many years of discussions and experience of biodiversity publishers, organisations, research projects and individual researchers, and proposes recommendations for implementation of persistent identifiers for article metadata, structural elements (sections, subsections, figures, tables, references, supplementary materials and others) and data specific to biodiversity (taxonomic treatments, treatment citations, taxon names, material citations, gene sequences, specimens, scientific collections) in taxonomy and biodiversity publishing. The paper proposes best practices on how identifiers should be used in the different cases and on how they can be minted, cited, and expressed in the backend article XML to facilitate conversion to and further re-use of the article content as FAIR data. The paper also discusses several specific routes for post-publication re-use of semantically enhanced content through large biodiversity data aggregators such as the Global Biodiversity Information Facility (GBIF), the International Nucleotide Sequence Database Collaboration (INSDC) and others, and proposes specifications of both identifiers and XML tags to be used for that purpose. A summary table provides an account and overview of the recommendations. The guidelines are supported with examples from the existing publishing practices.
Research Ideas and Outcomes 8: e97374
doi: 10.3897/rio.8.e97374
Reviewable v 1
Guidelines
Recommendations for use of annotations and
persistent identifiers in taxonomy and
biodiversity publishing
Donat Agosti , Laurence Benichou , Wouter Addink , Christos Arvanitidis , Terence Catapano , Guy
Cochrane , Mathias Dillen , Markus Döring , Teodor Georgiev , Isabelle Gérard , Quentin Groom ,
Puneet Kishor , Andreas Kroh , Jiří Kvaček , Patricia Mergen , Daniel Mietchen , Joana Pauperio ,
Guido Sautter , Lyubomir Penev
‡ Plazi, Bern, Switzerland
§ Museum national d'Histoire naturelle, Paris, France
| CETAF, Brussels, Belgium
¶ Naturalis Biodiversity Center, Leiden, Netherlands
# Distributed System of Scientific Collections - DiSSCo, Leiden, Netherlands
¤ LifeWatch ERIC, Seville, Spain
« European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton,
Cambridge, United Kingdom
» Meise Botanic Garden, Meise, Belgium
˄ GBIF, Copenhagen, Denmark
˅ Catalogue of Life, Leiden, Netherlands
¦ Pensoft Publishers, Sofia, Bulgaria
ˀ Royal Museum for Central Africa, Tervuren, Belgium
ˁ Naturhistorisches Museum, Vienna, Austria
National Museum Prague, Prague, Czech Republic
Leibniz Institute of Freshwater Ecology and inland Fisheries, Berlin, Germany
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
KIT / Plazi, Karlsruhe, Germany
Institute of Biodiversity & Ecosystem Research - Bulgarian Academy of Sciences, Sofia, Bulgaria
Corresponding author: Lyubomir Penev (l.penev@pensoft.net)
Received: 09 Nov 2022 | Published: 15 Nov 2022
Citation: Agosti D, Benichou LL, Addink W, Arvanitidis C, Catapano T, Cochrane G, Dillen M, Döring M,
Georgiev T, Gérard I, Groom Q, Kishor P, Kroh A, Kvaček J, Mergen P, Mietchen D, Pauperio J, Sautter G,
Penev L (2022) Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity
publishing. Research Ideas and Outcomes 8: e97374. https://doi.org/10.3897/rio.8.e97374
Abstract
The paper summarises many years of discussions and experience of biodiversity
publishers, organisations, research projects and individual researchers, and proposes
§,| ¶,# ¤
« » ˄,˅¦ˀ»
ˁ ˀ
¦,
© Agosti D et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY
4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are
credited.
recommendations for implementation of persistent identifiers for article metadata, structural
elements (sections, subsections, figures, tables, references, supplementary materials and
others) and data specific to biodiversity (taxonomic treatments, treatment citations, taxon
names, material citations, gene sequences, specimens, scientific collections) in taxonomy
and biodiversity publishing. The paper proposes best practices on how identifiers should
be used in the different cases and on how they can be minted, cited, and expressed in the
backend article XML to facilitate conversion to and further re-use of the article content as
FAIR data. The paper also discusses several specific routes for post-publication re-use of
semantically enhanced content through large biodiversity data aggregators such as the
Global Biodiversity Information Facility (GBIF), the International Nucleotide Sequence
Database Collaboration (INSDC) and others, and proposes specifications of both identifiers
and XML tags to be used for that purpose. A summary table provides an account and
overview of the recommendations. The guidelines are supported with examples from the
existing publishing practices.
Keywords
semantic publishing, taxonomy publishing, semantic annotation, biodiversity, persistent
identifiers, taxa, specimens, sequences, treatments, XML, JATS, TaxPub, tagging
Introduction
Specifics of taxonomic publications
Deans et al. (2012) very elegantly stated that “Taxonomists are arguably the most active
annotators of the natural world, collecting and publishing millions of phenotype data
annually through descriptions of new taxa. By formalising these data, preferably as they
are collected, taxonomists stand to contribute a data set with research potential that rivals
or even surpasses genomics”.
Taxonomic publications communicate the discovery of new biological taxa or new data on
already known taxa in the form of taxonomic treatments, well delimited sections of text for
each taxon (Fig. 1; Catapano 2010, Penev et al. 2011, Agosti and Egloff 2009, Agosti and
Egloff 2021). New research results are added to the already existing treatments by citing
previous treatments using a “treatment citation”. Altogether, the treatments and data
related to them represent the basis for the knowledge graph on the Earth’s biological
diversity. Treatments have been used from the beginning of modern taxonomy by Linnaeus
in 1753 for plants and in 1758 for animals (Linnæus 1753, Linné 1758). Treatments begin
with a nomenclature section including a unique identifier for the taxonomic name, the Latin
Binomen for species or Latin Name for a supraspecific taxon such as genus, family or
order. This is followed by one or more sections covering the citation of previous treatments
of the same taxon, description, diagnosis, etymology, distribution, material citations or
conservation. New taxa are based on type and other specimens in natural history
collections and data on these specimens are included in the treatment in the form of
2Agosti D et al
dedicated “material citations”. This new style of presenting information on biological taxa
required a certain degree of comprehension and adoption but was widely accepted by the
taxonomists in the second half of 18th century.
Translated into today’s digital world, this simple framework of presenting biological taxa in
both human readable and machine interpretable format is sufficient, given that it is present
as digital accessible knowledge (DAK, Fawcett et al. 2022), to build a knowledge graph of
the Earth’s biological diversity. By “machine readable” we mean that the data are
structured systematically so that computers can be programmed to process and interpret
the data. This requires that the elements taxonomic treatment, taxonomic name, treatment
citation, material citation and other important terms of relevance are annotated in
publications following a community accepted standard, and are made citable through
inclusion of the respective identifiers of the cited elements (e.g., treatment in treatment
citations, taxonomic name, specimens or digital specimen for the material citations). Thus
to explore known biodiversity, this is the minimal degree of digital accessible knowledge
needed to allow us to ask questions such as “What do I know about taxon X?”, “What are
the synonyms of a taxonomic name?”, and “What are the facts used to make the
changes?”.
Research results presented in the biodiversity literature are one of the best curated data
(Deans et al. 2012) providing expert linking of taxonomic names, molecular, including
omics data, phenomics data, specimens, geographical, environmental and climatic data,
taxonomies and phylogenies, previously published data, publications and people via
accession codes, material citations, treatment citations, bibliographic references or
personal identifiers, respectively. The semantic annotation or semantic role labelling (e.g.
Walls et al. (2014)) of texts, provides an additional feature for identifying the role of people
and taxonomic names. For example, a person mentioned in a material citation can be
Figure 1.
Schematic representation of the taxonomic treatment of Apis mellifera Linnaeus, 1758.
Sources: text: https://doi.org/10.5962/bhl.title.542; figures: https://linnean-online.org/16019/;
composite: https://doi.org/10.5281/zenodo.5168465.
Recommendations for use of annotations and persistent identiers in taxonomy ... 3
inferred to be a collector, whereas a person's name in a taxonomic name indicates the role
of authority of the taxon’s name, and an author of the publication in which the taxonomic
name has been published. A taxonomic name in the nomenclature section functions as a
label for the treatment, while a taxonomic name in the treatment body outside the
nomenclature section indicates some sort of connection between the two taxa.
In today’s digital arena these structured texts are an ideal prerequisite to enhance the
publications by making them readable or even machine actionable (Chester et al. 2019).
This includes making, for example, treatments and figures open, findable, accessible,
interoperable and reusable (FAIR) digital objects* , then adding persistent identifiers to the
cited materials, gene sequences, and authors, and annotating them to add a semantic
meaning to those tokens. The use of persistent identifiers is intended for many purposes,
including building a knowledge graph, understanding the use of specimens and their
collections in research, to give credit to individual scientists and institutions, and more
broadly to allow reuse by aggregators, such as the Global Biodiversity Information Facility
(GBIF) or ChecklistBank. Persistent identifiers also contribute to mitigating the taxonomic
impediment recognized by conservation policy (Abrahamse et al. 2021), create new
knowledge management systems, and bridge gaps between different domains such as
taxonomy, ecology and molecular biology in the life sciences. The first working examples of
knowledge graphs in the biodiversity realm are OpenBiodiv (Senderov et al. 2018, Penev
et al. 2019, Dimitrova et al. 2021), Ozymandias (Page 2019) and Synospecies (Gmür and
Agosti 2021).
Use of identifiers
An identifier (ID) is a label for any subject, conceptual, physical or digital
(Dillen et al. 2021). An ID can be called persistent (PID) (Directorate-General for Research
and Innovation (European Commission) et al. 2020) if it can be maintained as a label in the
longer term, in spite of any changes to the subject itself. For example, IDs for people can
be persistent even if their name(s) change or they move to another location or change
jobs. Hence, an ID aims to disambiguate the entity it relates to. To be a PID, it also
needs to be Globally Unique, Persistent and Resolvable (GUPRI, Directorate-General for
Research and Innovation (European Commission) et al. 2020). It should thus be unique at
the context in which it is used and come with a system that maintains the link between the
ID and its subject. For example, in the case of resolvable Uniform Resource Identifiers
(URIs), this system is the internet’s Domain Name System (Hyam et al. 2012). However, it
is still up to the organisation that mints the URI to ensure it remains persistent, as Domain
Names are not. Digital Object Identifiers (DOIs) are another example, making use of the
Handle system to maintain the link between ID and subject. Nevertheless, the onus still
remains on the organisation holding the digital object to ensure that the DOI resolves to the
right object. Technology cannot provide persistency of the identifier, this is an
organisational problem which seems best solved by the creation of consortia of
stakeholders responsible for the PIDs and its metadata. Examples are the DONA
Foundatation, DOI Foundation, ORCID and ROR. PIDs like Handles and DOIs are in place
1
4Agosti D et al
for about 30 years already which indicates that this way of achieving persistence seems to
work.
As an identifier serves to unequivocally label an entity, it may also be employed to track the
use of it, particularly when that entity is digital. Performance indicators are an important
tool for the efficient management and development of organisations and infrastructures.
Such indicators are used to channel appropriate funding internally, and also to request
funding externally. The more we are able to show the impact and reach of our field, the
easier it is to gather financial support to develop natural history collections, maintain
services, digitise objects and conduct research.
Parallel to the rise of e-publishing, IDs minted and used in biodiversity informatics have
diversified and become increasingly important to link to objects, specimens, and their
digital representation, as well as the component parts of literature (Guralnick et al. 2015,
McMurry et al. 2017, Page 2016, Page 2019, Madden and Woodburn 2021). They form a
scaffold on which to form a biodiversity knowledge graph.
Usage of identifiers can be broad and complex. PIDs are used to identify and link digital
and physical objects or concepts. One of the very first uses of DOIs was identifying
individually published articles as well as references in the bibliographies, which enhanced
the visibility and citability of these articles. DOIs are also used for data and figures, and are
proposed for the digital objects in DiSSCo (Hardisty et al. 2022). For physical specimens in
natural history collections there are the stable HTTP URIs proposed and implemented by
CETAF, also called CETAF stable identifiers (Güntsch et al. 2017) and in Zenodo as
“physical object” (Boschert and Dikow 2022). Likewise, Life Science Identifiers (LSID) were
once used in biodiversity informatics inter alia for concepts (taxonomic names), but in
contrast with e.g. DOIs and ORCIDs were lacking a governance structure to maintain
them, and ORCID or Wikidata identifiers serve as identifiers for people. A PID is needed for
any digital object posted on the Web so it may be easily found, cited, linked, annotated,
and reused. Furthermore, in publications, PIDs and their respective metadata can be
provided for many types of research-related content such as journals, chapters, grants or
funders, datasets, data, text, and images (Guralnick et al. 2015). An emerging consensus
for PIDs is the current development in DiSSCo infrastructure and the BiCIKL project to use
DOIs as community-agreed, unified identifiers for curation of a digital specimen. Digital
specimens will be treated as a FAIR Digital Objects(FDO), that is, as an aggregator of
several existing identifiers of data related to a specimen, such as the identifier for the
physical specimen itself, IDs of material citations of the specimen published in the
literature, IDs of gene sequences from the specimen (INSDC accession codes) and others
(Hardisty et al. 2021).
The Internet’s world wide coverage clearly makes it evident that globally unique identifiers
are a prerequisite to locate the cited resources, and, consequently, through conversion and
transformation of data, to build a knowledge graph, where all these resources can be
identified and linked to each other through their PIDs. Since digitization of objects (e.g., an
article) can occur in parallel, this can lead to collision between identifiers for physical
objects, or across domains, between articles or specimens. Identifiers of different kinds
Recommendations for use of annotations and persistent identiers in taxonomy ... 5
have a long tradition in biodiversity research — they served specific purposes such as to
label specimens from an expedition or a natural history collection, and have been
understandable within their respective context. These identifiers served internal
purpose
and therefore only had the requirement to be locally unique. They are not resolvable
through the internet making them hard to use by non-specialists or machines, as you will
need to know the format to interpret them and there is no way to know if they are correct
when used outside their source system. Assigning PIDs resolves these problems but
introduces new challenges: the global uniqueness and opaque string requirements for
persistence makes them hard to use for humans. Therefore additional IDs/labels for usage
by humans, not necessarily globally unique, is also needed, which is not a problem as long
as these are not used for data linkage. However, this will require look-up tables linking
historic identifiers with the respective PIDs or extending non-unique IDs with a prefix to
make it unique. Ideally the connection between the legacy ID and the unique PID is made
either at the metadata level of each object, or within the specimen record (material citation)
in publications.
Because of the current transitional period of digitising biodiversity data, new and different
kinds of PIDs might be minted for the same object. To connect different PIDs for the same
object we will need a discovery mechanism to build look-up tables. The different data
accessible via the resolution of the PIDs will then provide complementary, sometimes
conflicting data about the same objects (such as is discovered by GBIF’s clustering
mechanism* for seemingly similar occurrences) and thus increase the knowledge about
an object.
To minimise the costs of the significant and non-trivial effort of disambiguation of entities
and building and maintaining look-up tables, the recommendations in this paper strongly
encourage the use of harmonised PIDs that are compliant with a community accepted
standard across different journals and publishers and serve, therefore, multiple scientific
disciplines or domains. A good basis for harmonisation, for example, are the
recommendations of the European Open Science Cloud (EOSC) for the use of PIDs that
should be taken into account (Directorate-General for Research and Innovation (European
Commission) et al. 2020, Directorate-General for Research and Innovation (European
Commission) 2020).
On the need of harmonisation
The recommendations in this paper are produced collaboratively by several organisations,
research projects and biodiversity scientists. They are based on nearly 15 years of
experience on annotating unstructured legacy publications by Plazi
(Agosti and Egloff 2009), and on TaxPub XML-based structured publishing by Pensoft,
including 38 journals since 2010 (Penev et al. 2010* ). Furthermore, during several EU-
funded projects such as pro-iBiosphere, EU BON, and COST Mobilise, the focus of
discussions was on building an infrastructure to provide FAIR data, for example, the
Biodiversity Literature Repository (BLR) as well as on the implementation of persistent
identifiers in article XMLs of Plazi and Pensoft (Catapano 2010, Penev et al. 2010, Penev
2
3
6Agosti D et al
et al. 2011). Finally, part of this discussion was carried out in the CETAF e-publishing
group’s ongoing work on unique identifiers.
The paper has been largely elaborated and finalised in a collaboration between several
partners in the Biodiversity Community Integrated Knowledge Library project (BiCIKL)
(Penev et al. 2022). In a similar fashion to the harmonisation of PIDs that the Research
Organisation Registry (ROR), Datacite, Crossref and ORCID have agreed (Demeranville et
al. 2021). This has reinforced the use of their PIDs in the scientific community, and has
been the foundation for disambiguation and interlinking of institutional and biographical
data, article metadata and datasets.
Taxonomy is ruled by nomenclatural codes which state the requirements for a
nomenclatural act to be validly published, whether in print or online. These rules have
evolved with the emergence of online journals, and mandate the use of certain identifiers
within the publication and especially in the full-text XML of articles, for example the LSID of
the publication in which a new nomenclature act is published, or the mention of the ISSN
for the journal (see Penev et al. (2016) and Bénichou et al. (2018)). Hence, as a
consequence of this main mandate, we outline the use of structured data and their
identifiers to allow machines to assess whether a new taxonomic name is available
according to the Codes.
The objective of this paper is to list the main structural elements and data types present in
taxonomic publications, the existing identifiers currently in use, and make proposals for use
of additional PIDs where these do not exist yet. The paper aims at providing both
recommendations, best practices and practical advice to technical editors and publishers in
taxonomy on how to implement identifiers in their work and how they can be leveraged. For
each element, the use of an identifier is discussed from the perspective of taxonomic
publishing, its pros and cons are given, and short explanations of how and where to
implement these PIDs. We recommend that authors and publishers provide as many
identifiers and links as possible, facilitating in this way the conversion of the published
content into a digitally accessible knowledge. This would be not only a starting point for the
reuse of this important data at scale, but also spur new research based on this incredibly
rich resource. It will also allow linking data in taxonomy with other scientific disciplines to
build the future practice of evidence-based knowledge, that is to bridge the gap from a
taxonomic name to machine actionable data about it.
Publications, publication sections, sub-article data elements and
their identifiers
Modern taxonomic articles follow a rather strict structure that facilitates their representation
in a structured XML format following the widely used TaxPub* schema and enabling
efficient data exchange (Catapano 2010, Penev et al. 2012). Based on the Journal Article
Tag Set (JATS) standard* , a journal article is composed of up to three optional parts,
which should appear in the following order: Front matter is required while body and back
are optional all in that order.
4
5
Recommendations for use of annotations and persistent identiers in taxonomy ... 7
In its broader sense, publishing is the act of making content available to the public. In this
paper, we refer specifically to peer-reviewed publications, either in the form of monographs
(books) or periodicals (journals), print or electronic. While not all taxonomic publications
are peer-reviewed, most of the comments and recommendations made here would apply
to them too. Publishing taxonomic content, and specifically publishing nomenclatural acts,
has a specific meaning and requires compliance to the rules, which are defined in various
codes of nomenclature.
For Zoology, the International Code for Zoological Nomenclature (ICZN)* defines a
publication in Article 8 and in its Glossary:
publication, n.
1. Any published work.
2. The issuing of a work conforming to Articles 8 and 9.
electronic publication
A publication issued and distributed by means of electronic signals.
publish, v.
1. To issue any publication.
2. To issue a work that conforms to Article 8 and is not excluded by the
provisions of Article 9.
3. To make public in a work, conforming to (2) above, any names or
nomenclatural acts or information affecting nomenclature.
In botany and mycology, the International Code of Nomenclature for algae, fungi, and
plants (ICN)* defines a publication in its Article 29:
Publication is effected, under this Code, by distribution of printed matter (through sale,
exchange, or gift) to the general public or at least to scientific institutions with generally
accessible libraries. Publication is also effected by distribution on or after 1 January 2012
of electronic material in Portable Document Format (PDF; see also Art. 29.3 and Rec. 29A.
1) in an online publication with an International Standard Serial Number (ISSN) or an
International Standard Book Number (ISBN).
Front matter of the publication
Definition
The article’s front matter contains metadata for the article and its host journal: title, authors’
list with their affiliation, the date of publication, abstracts, keywords, a copyright statement,
etc. (see front matter structures in JATS XML here* ). These front matter components
should be encoded with JATS XML elements, the following with PIDs included: the article
6
7
8
8Agosti D et al
itself, the journal in which it is published, and the authors (see the section on Person
names below).
Persistent unique identifiers for the front matter
International Standard Serial Number (ISSN)
Definition
The ISSN is an 8-digit number used to uniquely identify a serial publication. The system
was designed in 1971, then published as a standard in 1975, and can be used for a journal
as well as for book series, and even for some websites in the scholarly domain. It is unique
and designates the publication medium, for instance if a journal is published in both print
and digitally it must have a different ISSN for each media: a Print-ISSN and an E-ISSN (a
different ISSN should also be given for any mobile version or CD-Rom version). One also
needs an ISSN in case of a different language version of the same journal. When the
publication is provided in different media, it is recommended to display all ISSN numbers
on each version of the publication, if the latter is published, e.g. in different languages in
different journals. The ISSN does not offer any resolution mechanism and is only a media-
oriented identification.
Why does a journal need an ISSN?
The ISSN is mandatory for any journal or serial publication. In taxonomy, to be compliant
with most nomenclatural codes, the nomenclatural acts should be published in a journal or
series identified with an ISSN or a book with an ISBN.
Indeed, the International Code of Nomenclature for algae, fungi, and plants (ICN) stipulates
that a nomenclatural novelty to be considered effectively published, it should be present in
a publication distributed either in print (through sale, exchange, or gift) to the general public
or at least to scientific institutions with generally accessible libraries, or (after 1 January
2012) in an online publication with an International Standard Serial Number (ISSN) or an
International Standard Book Number (ISBN) and in Portable Document Format (PDF) (see
also Art. 29.3 and Rec. 29A.1).
The International Code of Zoological Nomenclature (ICZN) stipulates in Art. 8.5 (Ride et al.
2012) that for an e-publication to be considered published in the terms of the Code, a work
issued and distributed electronically must:
have been issued after 2011
have the date of publication stated in the work itself
be registered in Zoobank (see below)
contain the evidence of such registration (LSID of the publication or of the new
name must be indicated in the work itself).
In Zoobank, the entry must have the name of an organisation other than the publisher that
intends to permanently archive the work in a manner that preserves the content and layout,
Recommendations for use of annotations and persistent identiers in taxonomy ... 9
and is capable of doing so. The ISSN or ISBN of the publication must be registered in the
Zoobank entry.
How to discover an existing ISSN
To find the ISSN of a series or journals, one may consult the ISSN portal, which provides a
comprehensive list of ISSNs and some associated metadata.
How to obtain an ISSN
To get an ISSN for a journal or series, all the necessary information, is available at https://
portal.issn.org/requesting-issn. In some countries the ISSN might not be free and may
require a registration fee between 25and 50 €, depending on the country assigning the
ISSN.
It seems possible to obtain an ISSN before the first publication of a print serial, however, it
is very common to be asked to wait until number 2 of the series to be printed. Online
publications are usually assigned an ISSN after the first or second issue is published (with
at least 5 publications published), or in some countries, after the website of the new
periodical has gone live and is fully functional.
How to annotate and display an ISSN
Display: ISSN: 1313-2989 (for the print version of a serial, if exists)
e-ISSN: 1313-2970 (for the online version of a serial, if exists)
For linked data purposes, it is even better to use the resolvable version of the ISSN: https://
portal.issn.org/resource/ISSN/1313-2989
Annotate in JATS:
Tag the ISSN number using the <issn> element, using the publication-format attribute to
specify the format or medium of the publication (e.g., “print”, “electronic”, “video”, “audio”,
“ebook”, and “online-only”)*
<issn publication-format="ppub">[ISSN number]</issn>
<issn publication-format="epub">[ISSN number]</issn>
Example of an ISSN
2118-9773 is the ISSN of the European Journal of Taxonomy (EJT). As the journal is an e-
only journal, it has only one online or e-ISSN.
ZooKeys has two ISSN depending on its version: 1313-2989 (for the print version)
1313-2970 (for the online version).
An ISSN can also identify a series of books, e.g. The Mémoires du Muséum national
d’histoire naturelle have one for the print version (ISSN: 1243-4442) and one for the online
version (e-ISSN: 1768-305X).
9
10 Agosti D et al
Sample annotation in JATS (for ZooKeys):
<journal-meta>
<issn publication-format="ppub">1313-2989</issn>
<issn publication-format="epub">1313-2970</issn>
</journal-meta>
Recommendation
Considering that ISSN (or ISBN) are mandatory for online publication in taxonomy to be
compliant to both ICN and ICZN codes, and that an ISSN makes your journals or series
more easily identifiable and findable, attributions of an ISSN or ISBN to taxonomic
publications must be considered mandatory. A unique ISSN should be assigned to each
version of the journal, print and electronic. Each linguistic version of the journal should also
have its own ISSN.
International Standard Book Number (ISBN)
Definition
The ISBN was internationally approved as an ISO standard in 1970, and published in
1972, and is a unique international identifier for monographic publications. Correct use of
the ISBN allows different product forms and editions of a book, whether printed or digital, to
be clearly differentiated, ensuring that it identifies the specific version it relates to. Similarly
to the ISSN, each version of the book, print, e-book, pdf etc., must have a different ISBN. A
book included in a book series, or published as a monograph in a journal, can be provided
with both an ISBN and the ISSN of the series in which it is published.
ISBN is a 13-digit number that identifies a book. As it is typically used in a barcode format,
it is prefixed by an European Article Number (EAN). It is constructed as it shown in Fig. 2a.
Why does a book need an ISBN?
ISBN is important for cataloguing a book and for its findability, discovery, and
dissemination. Its display is obligatory in the first pages of the book, along with the book
title, author(s) name(s) and the publisher. ISBN is the main international record of your
publication and is important for indexing and dissemination. It aims at facilitating the
compilation of book trade directories and bibliographic databases, which in turn facilitate
their dissemination as book dealers can use them to order books efficiently and
unambiguously.
In taxonomy, it is crucial to have ISBN assigned to any taxonomic monograph with
nomenclatural acts. For instance, as explained above, Zoobank requires an ISBN to
register a nomenclatural act published within a book (Ride et al. 2012, ICZN Art. 8.5). It is
also mentioned in the ICN Art. 29.3 as an alternative to ISSN when the nomenclatural
novelty is published in an electronic book.
Recommendations for use of annotations and persistent identiers in taxonomy ... 11
How to discover an existing ISBN
As a unique identifier, ISBN is part of the metadata associated with any book. To find the
ISBN of any published book, whatever version of the book, PDF, e-book or print version, a
simple query on the internet with the title followed by the mention of the ISBN will bring the
answer. WorldCat is a good place to retrieve all the ISBNs of a book. Beware that a book
may have as many ISBNs as format versions: one ISBN for the print version, another one
for the ebook, or for second edition and so on.
How to obtain an ISBN
All the information needed to get an ISBN for a publication is available at https://www.isbn-
international.org/content/how-get-isbn.
When an ISBN has been assigned to a publication, it should always be displayed to
facilitate its identification. The ISBN is also crucial for dissemination as it is displayed in a
barcode format, so libraries and bookshops can process incoming stock and outgoing
sales quickly and accurately. On a printed book, an ISBN should be included on the
copyright page, also called the title verso page, or at the foot of the title page if there is no
room on the copyright page. If there is no barcode, then the ISBN should also be on the
back cover or jacket preferably on the lower right. Each version of the book needs to be
provided with its own ISBN. More details on when to assign an ISBN are available at https:/
/www.isbn-international.org/content/isbn-assignment.
The publisher will then fill in the ISBN in the legal deposit form with all the additional
metadata of the book for cataloguing purposes at their respective national ISBN agencies.
How to annotate and display an ISBN
Display: ISBN: 978-2-85653-939-2
Annotate in JATS:
a
b
Figure 2.
Construction pattern of
a: ISBN
b: DOI
12 Agosti D et al
Tag the ISBN number using the <isbn> element, using the publication-format attribute to
specify the format or medium of the publication (e.g., “print”, “electronic”, “video”, “audio”,
“ebook”, and “online-only”)*
<isbn publication-format="[format type]">[ISBN number]</isbn>
Example of an ISBN
The Flora of New Caledonia published by the Muséum national d’Histoire naturelle in 2020
on Apocynaceae, Phellinaceae and Capparaceae has an ISBN for its print version
(978-2-85653-939-2), one for the PDF version on sale (978-285653-954-5) and one for its
bundle (print + e-book: 978-2-38036-955-2).
Sample annotation in JATS with multiple ISBN numbers:
<book-meta>
<isbn publication-format="print">978-2-85653-939-2</isbn>
<isbn publication-format="PDF">978-285653-954-5</isbn>
<isbn publication-format="bundle">978-2-38036-955-2</isbn>
<book-meta>
Recommendation
An ISBN is mandatory to properly identify a published book. Each version of the book
(PDF, print, ebook, each linguistic version, second edition) should have its own ISBN.
Considering that ISSN and ISBN are mandatory for nomenclature purposes, we must
consider the use of ISBN mandatory for taxonomic publications.
Digital Object Identifier (DOI)
Definition
The DOI system has been developed by the DOI Foundation and is implemented through a
federation of registration agencies. The two most commonly used agencies that register
DOIs in the scholarly domain are Crossref and DataCite. Both are membership
organisations providing DOIs to research outputs but for different purposes. The main
difference lies in the type of digital objects they identify, the scale of numbers of DOIs
needed and the metadata associated with the DOI.
Crossref is a non-profit membership organisation specifically serving scholarly publications.
Its members are publishers, research institutions, university presses, societies and
funders. Membership in Crossref is open to organisations that produce professional and
scholarly materials and content. In addition, applicants should be able to meet the terms
and conditions of membership.
DataCite is a global non-profit organisation that provides persistent identifiers (DOIs
specifically) for research data and other research outputs and resources. DataCite’s
members work with data centres, stewards, libraries, archives, universities, publishers and
10
Recommendations for use of annotations and persistent identiers in taxonomy ... 13
research institutes that host repositories and who have responsibility for managing,
holding, curating, and archiving data and other research outputs.
In their respective websites, a schema (Fig. 3) explains the rationale behind each of these
two agencies (e.g. https://www.crossref.org/community/datacite/).
The DOI includes three parts Fig. 2b:
To create the DOI, the DOI prefix given to an organisation is combined with a suffix of
choice. The DOI becomes active once registered with a DOI registration agency like
CrossRef or DataCite. CrossRef provides a complete documentation on best practices to
construct the suffixes.
Figure 3.
Dichotomic decision tree for the registration of CrossRef and DataCite DOIs, from the
DataCite - Crossref release.
14 Agosti D et al
How to discover an existing Digital Object Identifier (DOI)?
To find the corresponding DOI registered, enter the title, the author or any metadata in
Crossref or DataCite search engines or use alternatively the ReFindit tool.
How to mint a Digital Object Identifier (DOI)
All agencies providing DOIs are listed here: https://www.doi.org/registration_agencies.html.
Each of them may have different rules and apply different fees. Alternative repositories to
mint DOI for legacy publications are the Biodiversity Heritage Library, the Biodiversity
Literature Repository and institutional libraries retro-digitising legacy publications, such as
E-Periodica at the Federal Institute of Technology, Zurich.
To deposit a DOI to Crossref,one has to be a member. Membership fees begin at 275 USD
and depend on the revenue of the applicant. Once a member, a DOI prefix is assigned to
the joining organisation and will form the stem of links to all its metadata records. Fees
vary per record type, books, research grants, preprints, etc., from 0.15 USD for a legacy
article to 1 USD for a newly published article. Each DOI has to be registered by direct
deposit of XML, using Open Journal System Plugin for instance or, alternatively, through an
online web deposit form.
Component DOIs are often registered for figures, tables, and supplemental materials
associated with a journal article. They have their own metadata distinct from that of the
parent article DOI.
The registration of the DOI includes all the metadata, i.e. basic information such as dates
of publication, publication outlet, including the ISSN or ISBN, article title and authors. There
is a Crossref membership obligation: accurate metadata should be deposited for all DOI
registered, and the metadata should be maintained for the long term, including updating
any URLs that change. It is also an obligation to include DOIs in the reference lists for
existing works which have DOIs. A free public API is available to retrieve all existing
Crossref DOIs.
To register a DOI with DataCite, one has to be a member. Membership is open to all
organisations whose missions include research output sharing. A membership fee of 2,000
euros applies to member organisations. Once a member, non-for-profit members will have
to pay another 500 € annual fee to make use of DOI registration services. Each DOI, up to
1,999, will cost 0,80 €. There are two ways to register a DOI: using an API or a Web
Interface. All information is provided at https://support.datacite.org/docs/getting-started.
How to annotate and cite a DOI
Cite:
https://doi.org/10.3897/zookeys.1083.72939 (preferred), or DOI: 10.3897/zookeys.
1083.72939
Annotate in JATS:
Recommendations for use of annotations and persistent identiers in taxonomy ... 15
Tag the cited DOI with the <ext-link> element, using the xlink-href to provide the DOIs https
version and the ext-link-type attribute with value “doi”.*
<ext-link xlink:href="[https-version of DOI]" ext-link-type="doi">
[https-version of DOI]
</ext-link>
Example of a DOI
https://doi.org/10.3897/zookeys.1083.72939 refers to an article published in ZooKeys.
Sample annotation in JATS:
<ext-link xlink:href="https://10.3897/zookeys.1083.72939" ext-link-type="doi">
https://doi.org/10.3897/zookeys.1083.72939
</ext-link>
Recommendation
Use Crossref DOI for articles and bibliographic references. For supplementary material,
figures or tables, create Crossref component DOI or DataCite DOI. Generate a DataCite
DOI for data. If none is available, try to find a way to create a DOI using an alternative
repository such as BHL, BLR or E-Periodica, or DOIs issued for datasets deposited at
large international repositories, such as GBIF, DataONE, Dryad, Zenodo and others.
Display all the identifiers, ISSN, ISBN, DOI, on the corresponding publication page and
register all the corresponding metadata associated with the DOIs with CrossRef or
DataCite. Always include the DOI in the metadata for other publication-related registration
purposes, for example at ZooBank, IPNI, MycoBank, Zenodo, Dryad and others.
Body of the article
Definition
Most academic journals require the authors to write their articles following the IMRaD
format. IMRaD stands for Introduction, Method, Result and Discussion which are the four
main sections that constitute the structure of most scientific papers in the Science,
Technical and Medical (STM) fields. The body of the article is the main textual and graphic
content of the article and is situated between the front and the back matters. This usually
consists of sections, subsections, and paragraphs, which may themselves contain figures,
tables, etc.
In a taxonomic article, the body of the article includes specific items, such as taxonomic
treatments, material citations, descriptions, differential diagnoses, details of collecting
permits, etc.
11
16 Agosti D et al
Sections
Definition
Most journal articles are divided into sections, each with a title that describes the content of
the section, such as “Introduction”, “Materials and Methods”, or “Conclusions”* . A special
section in taxonomic publications is the taxonomic treatment as described below. The
different sections include different kinds of data and information that are important to
reproduce the research. For example, the section “Materials and Methods” lists the
collections studied, software used to analyse the data, or instruments used to make
measurements.
What are the identifiers for sections
Sections are normally tagged with internal Universally Unique Identifiers UUIDs in the
article XML. In addition, the names of the sections, which are used more or less
consistently in various science domains, e.g., “Introduction”, “Material and Methods”,
“Results”, “Conclusions” etc. can be used for inferring a semantic meaning of their content,
an approach that is currently used for the conversion to RDF and export to the OpenBioDiv
knowledge graph.
How to annotate sections
In JATS, the sections are annotated using the following elements and attributes:
<sec sec-type="[section type]" id="[internal identifier]">
<sec-meta>
<mixed-citation>
<object-id object-id-type="uuid">[UUID]</object-id>
</mixed-citation>
...
</sec>
The sec-type attribute annotates the basic structural unit of the body of a document.
Following the recommendation that sec-type “is most useful when a list of values is
maintained, and articles are tagged accordingly”, for JATS the values: "cases",
"conclusions", "discussion", "intro", "materials", "methods", "results", "subjects",
"supplementary-material", are recommended.*
The “id” attribute is a unique internal identifier of an element; it allows the element to be
cross-referenced [and linked to]. The value must be unique across a document…[id] holds
an internal document identifier that can be used by software to perform a simple link. An id
should not be confused with elements that are used to hold externally defined identifiers
such as a DOI”* . For an externally defined identifier assigned to the section, a <sec-
meta>* element may be used to provide metadata for a section, which includes <mixed-
citation>* containing an <object-id>* element to record an identifier, for example, a
UUID.
12
13
14
15
16 17
Recommendations for use of annotations and persistent identiers in taxonomy ... 17
Though not recommended, a lighter-weight solution for associating an external identifier
with a section is to “overload” the id attribute of <sec> by using an external identifier such
as a UUID as the value. However, the “id” attribute “must start with a letter of the alphabet”*
, so UUIDs (which may start with a digit) should be prefixed with a string starting with an
alphabetic character, e.g., “uuid-”, to validate.
Example
Annotation of a section “Methods” including an object identifier taken from the article of
Bueno-Soria et al. (2022).
<sec sec-type="methods" id="SECID0E1H">
<sec-meta>
<mixed-citation>
<object-id object-id-type="uuid">07ED460C-7F70-42EC-A7F8-59CC2F512131</
object-id>
</mixed-citation>
</sec-meta>
<title>Methods</title>
<p>The specimens of the genus <em>Xiphocentron</em> studied here were borrowed
from the collections of the National Museum of Natural History, Smithsonian Institution in
Washington, DC, and from the Colección Nacional de Insectos, Instituto de Biología de la
Universidad Nacional Autónoma de México.</p>
<p>The type materials are deposited as indicated in each species description,
in the collections: National Museum of Natural History, Smithsonian</p>
</sec>
Recommendation
Section and subsection titles should be tagged as such and Internal UUIDs should be
assigned to them in the article XMLs.
Figures, figure captions and citations
Definition
A figure is either a photo or a scientific drawing illustrating biological species or part(s) of
them, landscapes, habitats or equipment, or visualisation of data or results from statistical
analyses. Figures and their captions convey an essential part of the information contained
in a scientific paper and are of particular interest for the community.
The ICN states the importance of illustrations in its Art. 43.2:
“A name of a new fossil-genus or lower-ranked fossil-taxon published on or after 1 January
1912 is not validly published unless it is accompanied by an illustration or figure showing
the essential characters or by a reference to a previously and effectively published such
illustration or figure.”
According to article 40.3, illustrations can also be a type specimen prior to 1 January 2007*
.
18
19
18 Agosti D et al
The figures related to a taxonomic treatment (see definition below) are usually cited at the
beginning of the treatment and are part of it.
What are the identifiers for figures
DOIs being either Crossref component DOIs or DataCite DOIs are usually used when the
figures are deposited in a repository.
How to mint an identifier for a figure
For minting DOIs, see section “Digital Object Identifiers” above. If no DOIs are minted for
figures, these can be identified with internal UUIDs minted by software during the
compilation of the full-text article XML, and a hash of the figure allows to uniquely identify
the respective figure.
How to annotate and cite a figure, figure caption and figure citation
Cite: Figures are cited within the text following the long-established practice in scholarly
publishing (e.g., “according to Fig. X” or “see detail (Fig. Y)”). Citation style should follow
the journal’s or publisher’s instructions for the authors.
Annotate in JATS:
Figure:
<fig id="[Internal identifier]">
<object-id object-id-type="doi">[DOI]</object-id>
<caption>[free_text]</caption>
</fig>
In-text figure citation:
<xref ref-type="fig" rid="[Internal identifier]">
[Figure reference]
</xref>
Note that the citation details of the figure as delivered by the Biodiversity Literature
Repository (BLR) at Zenodo should contain both the DOI of the article and the component
DOI of the figure. In the case where no CrossRef component DOI exists, a DataCite DOI is
minted for the figure at BLR. The “rid” (reference to an identifier) attribute is needed to
perform the linking with the <fig> element via the embedded “id” element.
Examples
Annotation of a figure and in-text figure citation (from Blahnik and Andersen (2022)):
Figure:
Recommendations for use of annotations and persistent identiers in taxonomy ... 19
<fig id="F4">
<object-id object-id-type="doi">10.3897/zookeys.1111.77586.figure1</object-id>
<caption>
<p>
Malaise trap across Kaputu Stream at 1535 m altitude in the Mazumbai Forest Reserve in
the West Usambara Mountains, northeastern Tanzania (Photo: Trond Andersen).</p>
</caption>
</fig>
In-text figure citation:
<xref ref-type="fig" rid="F4">Fig. 4</xref>
https://doi.org/10.3897/zookeys.1088.78139.figure1 is a CrossRef component DOI
assigned to Figure 1 of the article https://doi.org/10.3897/zookeys.1088.78139.
Recommendation
Use Crossref component DOI to identify each figure within an article. The component DOI
has the important feature of a link from the figure DOI to its parent article DOI. If no
Crossref DOI is available, use alternatives from DataCite.
In all cases, and especially if no DOIs are minted for figures, it is recommended to assign
internal UUIDs minted by software during the compilation of the full-text article XML as well
as a hash for unique identification.
When compiling the full-text XML, it is highly recommended to cross-reference (anchor) the
in-text figure citations to their respective figures in the article body.
Tables, table citations
Definition
A table is a concise and effective way of presenting large amounts of data usually
displayed in rows and columns for reference.
Tables are increasingly important because they contain, in many cases, a compilation of
the specimens used, their sequence accession codes, specimen codes that allow linking to
the cited specimens, as well as traits, such as measurements or qualitative descriptions or
even the results of an analysis performed on the raw data taken from the specimens or
from their environment. Each row can be envisioned to represent a structured material
citation, and if used to list species used in a study, together with a taxonomic name, an
entire taxonomic treatment.
What are the identifiers for table
In TreatmentBank, tables are identified by a UUID and a persistent http URI ID. In the
Pensoft article XMLs, tables are identified by internal UUIDs.
20 Agosti D et al
How to mint a table identifier
DOIs or Crossref component DOIs relating to the article, should be minted and submitted
for registration to Crossref by the publisher. If no DOIs are minted for tables, these can be
identified with internal UUIDs minted by a software during the compilation of the full-text
article XML.
Annotating and citing tables
Tables are cited within the text following the long-established practice in scholarly
publishing (e.g., “according to Tab. X” or “see data (Tab. Y)”). Citation style should follow
the journal’s or publisher’s instructions for the authors.
Annotate in JATS:
Table:
<table-wrap id="[Internal identifier]">
<object-id object-id-type="doi">[Digital Object Identifier]</object-id>
<caption>Text</caption>
<table>
...
</table>
</table-wrap>
In-text table citation:
<xref ref-type="table" rid="[Internal identifier]">[Table reference]</xref>
The “rid” attribute is needed to perform the linking to the <table> element via the “id”
attribute of the target <table-wrap> element, which itself has optional, repeatable <object-
id> elements recording identifiers for the table it contains. The JATS tag library defines the
<object-id> element as a “Unique identifier (such as a DOI or URI) for a component within
an article (for example, for a figure or a table)”, further stating that, “the <object-id> element
holds an external identifier, typically assigned to an object such as a table by a publisher.
The contents of this element should not be confused with the "@id" attribute, which holds
an internal document identifier that can be used by software to perform a simple link inside
the document.”*
Examples
Annotation of a table and in-text table citation (from Blahnik and Andersen (2022))
Table:
<table-wrap id="T1">
<object-id content-type="table" object-id-type="doi">
10.3897/zookeys.1111.77586.table1
</object-id>
<caption>Table 1. African genus Chimarra</caption>
<table>
20
Recommendations for use of annotations and persistent identiers in taxonomy ... 21
...
</table>
</table-wrap>
In-text citations:
<xref ref-type="table" rid="T1">Tab. 1</xref>
Recommendation
Ideally, a table should be provided with a Crossref component DOI related to the article. In
all cases, and especially if no DOIs are minted for tables, it is recommended to assign
internal UUIDs minted by software during the compilation of the full-text article XML.
When compiling the full-text XML, it is highly recommended to cross-reference (anchor) the
in-text table citations to their respective tables in the article body.
Taxonomic treatments
Definition
Taxonomic treatments are sections of publications documenting the features or distribution
of a related group of organisms (taxon) (Catapano 2010). Each taxonomic name relates to
at least one taxonomic treatment: a publication, or more frequently a section of a
publication documenting the features of a taxon in ways adhering to highly formalised
conventions. Some of these descriptions are over two centuries old and are maintained by
taxonomic community ethical and professional norms regulated by the Nomenclatural
Codes. The modelling of taxonomic treatments in TaxPub XML is designed to follow the
FAIR principles and provide clarity and repeatability of the research, which both are integral
parts of the modern evidence-based science.
The features and structure of treatments have changed over time, and vary between and
within publications. Often an indication follows the name of whether the taxon is new to
science, e.g., “species nova”, “sp. nov.” or “genus novum”, “gen. nov.” and the name or
names of the persons who attribute the naming. A listing of taxa that are already known to
science, citations of earlier treatments (treatment citations), often follows in a section. In
cases when taxonomic names change as a result of a taxonomic revision, for example
because of a raise in its rank, or because a taxon is synonymized, this is followed by a
label stating the change, such as for example “syn. nov.” or “nov. stat.”. Other information,
such as persistent identifiers and references to physical specimens, may also be included
in a treatment.
A number of other sections may follow the nomenclature section. One of the most
significant sections, frequently titled “Materials Examined”, includes citations to specimens
used as the basis of the treatment and data about their properties (e.g., DNA sequences).
This section often includes the circumstances of collection and/or deposition at a museum
or other institution. Historically, these details have allowed scientists to visit the holding
22 Agosti D et al
institution, or to seek a loan, for further scientific investigation of the same material
that
was described by the treatment. Also common is a “Description” section providing
information — often in highly structured language, and sometimes in tabular form — on the
distinctive features of the collected organisms, with an aim toward characterising the entire
taxonomic class such material represents.
Similar to a “Description” section, there is a “Diagnosis” section, which contains
descriptions of only those features or unique combinations of features “that distinguish that
species from others, in the same way that the disease identification you receive when you
visit the doctor is called the diagnosis because the doctor has distinguished your illness
from all other possibilities based on the basis of your symptoms and tests (Winston 1999).
Most treatments describing new taxa include an “Etymology” section explaining the origin
of the assigned Latin name, a “Distribution” section summarising the spatial and temporal
distribution of the taxon, or an “Ecology” section discussing behaviour and relationships to
habitat or details on the environmental variables measured during the collecting events of
the specimens. For higher level taxa (such as genera and families) a “Key” presenting a
set of instructions, in the form of a decision tree or even workflow, for distinguishing lower
level taxa from one another is also common (Catapano 2010).
Similar to publications and following the FAIR principles, the treatments can be extracted
from the publications, preserved separately and made freely accessible to the public
(Fig. 4; Agosti and Egloff 2009, Patterson et al. 2014).
An XML tagset for Taxonomic treatments has been formalised as an extension of the
Journal Article Tag Suite (JATS) (Catapano 2010), and adopted in 2010 by Pensoft
Publishers in their journal production process (Penev et al. 2010), now including 38
journals* . The export of treatments from published PDFs has been adopted by CETAF’s
European Journal of Taxonomy and Muséum national d’Histoire naturelle (5 journals* ).
Legacy publications are annotated and treatments are made accessible by TreatmentBank
(780,000 treatments as of August 2022) and the Biodiversity Literature Repository
(390,000 treatments), including current content from 52,000 articles. Together with the
treatments exported by Pensoft, the total number of processed articles exceeds 70,000.
Treatments are reused by GBIF upon extraction, where they are imported as part of a
dataset in a Darwin Core Archive format compiled from taxonomic treatments and cited
figures. Currently these article-based datasets represent almost 60% of all the datasets
published in GBIF.
In Wikidata, taxonomic treatments can be annotated with the property taxonomic treatment
(P10594)* , with protologue as a subclass referring to the treatment used to describe a
new taxon, that is to create an available name sensu the ICN.
The Barcode of Life Data Systems (BOLD) Barcode Identification Numbers (BINs)
(Ratnasingham and Hebert 2013) are functionally similar to treatments, though they are not
sections of taxonomic publications and provide less information. BINs are dynamically
generated by the Barcode of Life Data System (BOLD), through an online framework that
21
22
23
Recommendations for use of annotations and persistent identiers in taxonomy ... 23
clusters barcode sequences and generates an identifier and web page for each cluster.
This framework uses a clustering algorithm based on graph theoretic methods to assign
BINs (Ratnasingham and Hebert 2013). Each BIN is assigned two identifiers, a resolvable
URI generated by BOLD, that consists of an alphanumeric identifier composed by the
prefix BOLD followed by 3 letters and a 4-digit number (e.g., BOLD:AAA0111), and a DOI.
When the submission of new information leads to the merge of two BINs, the most recently
registered BIN is synonymized. But, when the analysis splits a BIN into two, new BINs are
established and a disambiguation option is suggested. In any case DOI amendments are
made to ensure that original identifiers are not lost.
The UNITE Species Hypotheses (SHs) are functionally similar to treatments including all
the clustered public fungal ITS sequences to which a unique DOI is assigned by UNITE.
UNITE is a database and sequence management environment for the molecular
identification primarily of fungi but now also of other taxa. It focuses on nuclear ribosomal
Figure 4.
An example of a treatment accessible in various formats. Source: https://
europeanjournaloftaxonomy.eu/index.php/ejt/article/view/1639/5873
Alternative format:
TreatmentBank (TB) HTML: https://tb.plazi.org/GgServer/html/03E2AB75FFD6BE60FE39
FCF5FBF9F83B
TB XML: https://tb.plazi.org/GgServer/xml/03E2AB75FFD6 BE60FE39FCF5FBF9F83B
TB RDF: https://github.com/plazi/treatments-rdf/blob/main/data/03/E2/AB/03E2AB75FFD6BE
60FE39FCF5FBF9F83B.ttl
BLR: http://doi.org/10.5281/zenodo.6302070
GBIF: https://www.gbif.org/species/193266458
24 Agosti D et al
internal transcribed spacer (ITS) region sequences that are considered the fungal barcode.
All species hypotheses have a unique URL where the associated public sequences are
displayed (Nilsson et al. 2019). These sequences are referenced through their accession
numbers and linked to their original records at International Nucleotide Sequence
Database Collaboration (INSDC, Arita et al. 2021).
What are the identifiers for taxonomic treatments
DOI
A subtypetaxonomictreatment has been added in Zenodo as a DataCite digital object
identifier (DOI) to the “publication” type. The metadata for taxonomic treatments in Zenodo
are enhanced with added custom keywords based on existing domain specific
vocabularies (e.g., Darwin Core), links to the source publication, cited figures or related
identifiers such as the http URIs minted by TreatmentBank (see below). In case of
treatments deposited in BLR via TreatmentBank, the respective HttpURI are included in the
metadata.
For BINs* and Species Hypotheses* , DataCite DOIs are minted.
HTTP URI
The “HttpURIs” were created by Plazi for treatments in 2009 parallel to the development of
the persistent HTTP URIs for specimens now widely accepted in CETAF. The HTTP URIs
are used by GBIF when reusing TreatmentBank treatments. The HTTP URIs are kept
persistent and are built based on a unique UUID and the prefix “http://treatment.plazi.org/
id/UUID” (e.g., http://treatment.plazi.org/id/0000C505-BB5D-484C-76BE-9AB6999DEB23).
The original intention was to share the UUID with Zoobank whereby the Zoobank UUID
would resolve to the taxonomic name and to the respective taxonomic treatment in
TreatmentBank. Unfortunately this synchronisation has been discontinued.
UUID
During the publication of a taxonomic article, Pensoft journals assign UUIDs to each taxon
treatment. Those UUIDs are further used by Plazi to mint the HTTP URIs of the treatments
at TreatmentBank.
How to discover treatment identifiers
The DOI of a treatment can be found by searching ReFindit or for those minted by
Biodiversity Literature Repository, through the search engines of Zenodo or
TreatmentBank. The HTTP URIs can be found through GBIF or the Biodiversity Literature
Repository (BLR) or TreatmentBank.
Via ReFindit API (search by author, year, and taxon name, the latter as title):
24 25
Recommendations for use of annotations and persistent identiers in taxonomy ... 25
https://refindit.org/find?search=advanced&author=Kronestedt&year=2011&
title=Pardosa%20zyuzini (requires some further matching to pick correct result)
Via Zenodo UI (full text search):
https://zenodo.org/search?page=1&size=20&q=Pardosa+zyuzini+Kronestedt+2011
Via TreatmentBank statistics API (exact match search on author, year, and taxon name):
https://tb.plazi.org/GgServer/srsStats/stats?
outputFields=doc.uuid+bib.author+bib.title+bib.pubDate+bib.origin+bib.firstPage+bib.lastP
age+bib.articleFirstPage+bib.articleLastPage+tax.name+tax.genusEpithet+tax.speciesEpit
het+tax.authName+tax.authYear+tax.status&groupingFields=doc.uuid+bib.author+bib.title+
bib.pubDate+bib.origin+bib.firstPage+bib.lastPage+bib.articleFirstPage+bib.articleLastPag
e+tax.name+tax.genusEpithet+tax.speciesEpithet+tax.authName+tax.authYear+tax.status
&format=JSON&FP-tax.name=%22Alloplasta%25japonica%22&FP-
tax.authName=Watanabe&FP-tax.authYear=2022 (simplified API under development)
Via TreatmentBank UI search (fuzzy search for taxon name):
https://tb.plazi.org/GgServer/search?taxonomicName.isNomenclature=true&
taxonomicName.exactMatch=true&taxonomicName.taxonomicName=Pardosa+zyuzini
Via GBIF UI (dataset search):
https://www.gbif.org/search?q=Prosymna+lisima
Via GBIF API (species search):
http://api.gbif.org/v1/species?name=Prosymna+lisima
How to mint an identifier for treatment
Currently, Zenodo is the only place to mint a DOI for a treatment. UUIDs are generated by
some publishers during the article processing before publication (all Pensoft journals, for
example). HTTPURIs are minted by TreatmentBank. This is not an exclusive solution,
however, since a treatment is a subtype of the DataCite publication type at Zenodo.
How to annotate and cite treatments
Cite: A citation of a treatment can be provided either by its DOI or its HTTP URI generated
by Plazi’s TreatmentBank. The citation of other treatments normally happens within a given
treatment’s Nomenclature section (in the so-called “nomenclature-citation-list” of the JATS/
TaxPub XML representation), where they can also introduce a nomenclatural change,
indicated with a label (e.g. syn. nov., comb. nov., nom. nov., etc.).
Annotate in JATS (treatment and subsections in JATS/TaxPub):
26 Agosti D et al
<tp:taxon-treatment>
<tp:treatment-meta>
<mixed-citation>
<object-id object-id-type="doi">[digital object identifier]</object-id>
</mixed-citation>
</tp:treatment-meta>
</tp:taxon-treatment>
For the nomenclature subsection:
<tp:nomenclature>
<tp:taxon-name>[Taxon name]</tp:taxon-name>
</tp:nomenclature>
For all other subsections:
<tp:treatment-sec sec-type="[section type name]"> ... </tp:treatment-sec>
Section types should, if possible, make use of the following vocabulary terms: description,
diagnosis, discussion, distribution, ecology_behavior, conservation, etymology,
materials_examined, reference_group, and vernacular_names which will add a semantic
meaning to (sub-)section titles and facilitate the extraction and reuse of the data.
Examples
Annotation of treatments, nomenclature section and subsections:
Treatment:
<tp:treatment-meta>
<mixed-citation>
<object-id content-type="doi">10.5281/zenodo.6964440</object-id>
</mixed-citation>
Nomenclature section:
<tp:nomenclature>
<tp:taxon-name>Tribasodites yatung</tp:taxon-name>
</tp:nomenclature>
Subsection:
<tp:treatment-sec sec-type=”description"> ... </tp:treatment-sec>
Recommendation
Tag each taxonomic treatment in the article full-text XML and then assign a CrossRef
Component DOI or Datacite DOI or internal UUID for it. Register all the metadata
associated with the DOI.
Recommendations for use of annotations and persistent identiers in taxonomy ... 27
Treatment citations
Definition
A treatment citation is a reference to a previous treatment, in many cases the original
description of the taxon, or protologue (Fig. 5). Treatment citations reflect the history of the
taxon and its nomenclatural relationships with other taxon concepts, either by indicating a
change proposed in the treatment, e.g. a new synonymy or a new combination, or by
reconfirming previous changes. They also refer to treatments that contributed new
research results to an existing taxon. Thus, treatment citations can be grouped in several
categories, e.g. by type of a nomenclatural change (“syn. n.”, “comb. n.”, etc.) or by
confirmation of previous taxon name status, and those categories allow formal annotation
during the text mining process and further-re-use.
Treatment citations are the source and basis for creating synonymic lists and taxonomic
catalogues.
Treatment citations are analogous to bibliographic references in a publication citing
previous works.
Figure 5.
Treatment citation. Green boxes: treatment citations. A. published example. B: annotated
treatment citations in TreatmentBank using TB internal XML.
Source: https://europeanjournaloftaxonomy.eu/index.php/ejt/article/view/787/1829
TreatmentBank XML: https://treatment.plazi.org/GgServer/xml/101687E3D558FFC3FDF9
A837FECED008
28 Agosti D et al
What are the identifiers for treatment citations
No identifiers are known, however, citations can and should be tagged in the backend XML
of the article to be made discoverable and processed for further use.
How to discover treatment citations
Treatment citations are listed subsequent to the nomenclatural sections of a taxonomic
treatment. They usually consist of a taxonomic name, the authority and year, and a page
number, especially in zoology. In combination, the authority and year are also a
bibliographic citation of the original publication of the respective treatment, albeit often
implicit, because traditionally, taxonomists do not include this kind of bibliographic
references in the article reference list (Bénichou et al. 2018). This procedure has now been
suggested by CETAF, BHL and SPNHC (Benichou et al. 2022) and is strongly encouraged
by Pensoft’s journals. In case of multiple citations for the same taxonomic name, a further
element (treatment citation list) is included that allows that the taxonomic name does not
need to be repeated in each case.
How to mint an identifier for treatment citation
There is no established procedure for minting treatment citations, except for possible
assignment of internal UUIDs to them.
How to annotate and cite a treatment citation
The treatment citation annotations are attributed with persistent HTTP URIs of the
respective treatment(s) in TreatmentBank. The treatment citation element is currently being
remodelled and thus the recommendations might change in the next version of TaxPub.
Annotate in JATS/TaxPub:
<tp:nomenclature>
<tp:taxon-name>
<tp:taxon-name-part taxon-name-part-type="genus">Cus</tp:taxon-name-part>
<tp:taxon-name-part taxon-name-part-type="species">dus</tp:taxon-name-part>
</tp:taxon-name>
<tp:nomenclature-citation-list>
<tp:nomenclature-citation>
<tp:taxon-name>Aus bus</tp:taxon-name>
<mixed-citation>
<object-id content-type="taxonomic_treatment" object-id-type="doi">
[DOI]
</object-id>
</mixed-citation>
</tp:nomenclature-citation>
</tp:nomenclature-citation-list>
</tp:nomenclature>
Recommendations for use of annotations and persistent identiers in taxonomy ... 29
Example
Annotation of treatment citation in the treatment of Chondrocyclus convexiusculus (Pfeiffer,
1855) in Cole (2019):
<tp:nomenclature-citation-list>
<tp:nomenclature-citation>
<tp:taxon-name>Cyclostoma (Cyclophorus) convexiusculum</tp:taxon-name>
Pfeiffer, 1855: 104
(Type loc.: Simonstown [Macgillivray]).
</tp:nomenclature-citation>
<tp:nomenclature-citation>
<tp:taxon-name>Cyclophorus convexiusculus var. minor</tp:taxon-name>
Benson, 1856: 438
(type loc.: Table Mountain [Layard]).
</tp:nomenclature-citation>
</tp:nomenclature-citation-list>
<tp:nomenclature-citation-list>
<tp:taxon-name>Chondrocyclus convexiusculus</tp:taxon-name>
<tp:nomenclature-citation>Kobelt 1902: 230</tp:nomenclature-citation>
<tp:nomenclature-citation>Connolly 1939:536</tp:nomenclature-citation>
<tp:nomenclature-citation>Herbert & Kilburn 2004: 92</tp:nomenclature-citation>
</tp:nomenclature-citation-list>
Recommendation
Treatments should be cited by their PIDs, either through their inclusion in a nomenclature-
citation in a nomenclature section of the citing treatment or as a standalone in-text citation
in any part of the article as follows: “Based on Treatment: [hyperlinked treatment PID,
where a treatment PID can be either the DOI of the treatment provided by BLR or Plazi’s
HTTP identifier available from TreatmentBank] I conclude that ....”.
Treatment citations should be tagged in the article XML as separate entities and, if
available, should contain the existing PIDs of the cited treatments.
Recently, a joint statement of CETAF, SPNHC and BHL has been published (Benichou et
al. 2022) recommending extended citation details of taxon names by adding richer
bibliographic citation detail to each taxon concept. We provide here a shortened version of
these recommendations:
1. Provide each scientific name of a taxon, at least on its first mention in the paper,
with authorship, date, and corresponding entries to the publication’s “Bibliographic
references” section.
2. If the publisher’s guidelines do not allow you to list it as a reference, cite it properly
as a bibliographic reference by adding the page number after the date for instance.
For example, for a species described in EJT http://dx.doi.org/10.5852/ejt.
2022.828.1851 on p. 48, it is preferable to use the notation Infrantenna fissilis Liu &
Sittichaya, 2022: 48 instead of Infrantenna fissilis Liu & Sittichaya, 2022.
30 Agosti D et al
3. Provide the corresponding persistent identifier (PID) to each of these references
where they exist, i.e. a Crossref DOI minted by the publisher or minted by the
Biodiversity Heritage Library (BHL) when the legacy publication has been digitised
retrospectively and provided with a DOI, or a DataCite DOI minted by organisations
digitising legacy literature (e.g., e-Periodica at the Federal Institute of
Technology
Zurich) or the Biodiversity Literature Repository (BLR) at Zenodo.
4. Provide the PID of the taxonomic treatment where they exist, using for instance, the
DOI of the treatment deposited in BLR, or for articles with primary taxonomic
descriptions minted by BHL (for example: https://www.biodiversitylibrary.org/part/
304567).
Material citations
Definition
A material citation is a reference to, or citation of, one or multiple specimens in scholarly
publications (https://dwc.tdwg.org/terms/#materialcitation; Chester et al. 2019). Material
citations can be situated within the respective treatments, in tables, or as supplementary
material, and refer to the specimen data used in the study. They provide the best, expert-
curated identification of specimens in collections including, in many cases, explicit links to
the institution, specimen, gene sequences and geographic data. Often they are the only
evidence of the existence of a specimen in the digital world, for example, if published
through the GBIF infrastructure.
The GBIF occurrences can create a rich linking network for specimens because a GBIF
specimen record can be linked to a material citation published in a scholarly article, or at
least to the treatment or publication containing that record.
What are the identifiers for material citations
TreatmentBank and the Biodiversity Data Journal issue internal UUIDs for material
citations. They are reused in conjunction with the treatment UUID in GBIF in the form of
“treatment UUID.mc.material citations UUID”. GBIF is minting an identifier for each material
citation present as an occurrence record in their infrastructure. TB maintains the links and
identifiers of the occurrences in GBIF with their respective material citations in
TreatmentBank.
How to discover material citations' identifiers
These identifiers are currently minted post-publication by TreatmentBank, or before
publication by the Biodiversity Data Journal, and can be found using TreatmentBank data
access interface (https://tb.plazi.org/GgServer/srsStats) which can also provide access to
the related GBIF occurrence ID.
Recommendations for use of annotations and persistent identiers in taxonomy ... 31
Via TreatmentBank statistics UI with a wide variety of search fields and output fields to
choose from. The link (visit TreatmentBank statistics UI) shows taxon name, author, year,
and type status.
Via TreatmentBank statistics API with a wide variety of search fields and output fields to
choose from in the UI. The link (visit TreatmentBank statistics API) shows taxon name,
author, year, and type status, retrieved as JSON. A more simplified API is under
development.
Via GBIF API (occurrence search for taxon name, restricted to materials citations).
http://api.gbif.org/v1/occurrence/search?
basisOfRecord=MATERIAL_CITATION&scientificName=Lebertia+insignis Other search
fields are also available, e.g. country, might require further matching efforts to find
additional matches from specific source publications in GBIF.
How to mint an identifier for material citation
Follow your standard procedure for minting UUIDs.
How to annotate and cite material citation
Annotate in JATS/TaxPub:
Use “object-id” to provide an identifier for a material citation in the article which allows it to
be cited unambiguously.
<tp:material-citation>
<object-id content-type="[content type]">[Identifier]</object-id>
[material citation string]
</tp:material-citation>
To provide an external identifier for a component of a material citation (e.g., a catalog
number or occurrence id), use <named-content>, specifying the type of identifier in the
content-type attribute.
<named-content content-type="[content type]">[Identifier]</named-content>
The <uri> element may be used to tag an identifier that is a URI and provide a live link to
the representation of the identified resource:
<named-content content-type="[content type]">
<uri xlink:href="[URI]">[URI]</uri>
</named-content>
Examples
A material citation from Monomorium dryhimi that can be unambiguously cited (Sharaf and
Aldawood 2011).
32 Agosti D et al
<tp:material-citation>
<object-id content-type="arpha">B5596AA1-CDF9-DDA3-D5CD-D922E1723751</object-id>
Holotype worker. SAUDI ARABIA, Al Bahah province, Amadan forest, Al Mandaq
governorate, 20°12'N, 41°13'E, 1881 m.a.s.l. 19.V.2010 (M. R. Sharaf &amp;
A. S. Aldawood Leg. KSMA
</tp:material-citation>
A material citation citing a specimen from the MNHN Paris (Paton et al. 2016)
<tp:material-citation>
<named-content content-type="dwc:occurrenceID">
<uri xlink:href="http://coldb.mnhn.fr/catalognumber/mnhn/p/p04158076">
http://coldb.mnhn.fr/catalognumber/mnhn/p/p04158076
</uri>
</named-content>
</tp:material-citation>
Finer grained markup
<tp:material-citation>
<object-id content-type="arpha">B5596AA1-CDF9-DDA3-D5CD-D922E1723751</object-id>
<tp:type-status>Holotype</tp:type-status> worker.
<tp:material-location>King Saud Museum of Arthropods (KSMA),
College of Food and Agriculture Sciences, King Saud University, Riyadh,
Kingdom of Saudi Arabia.
</tp:material-location>
<tp:collecting-event>
<tp:collecting-location>
<tp:location>
SAUDI ARABIA, Al Bahah province, Amadan forest, Al Mandaq governorate
</tp:location>
</tp:collecting-location>,
<named-content content-type="dwc:verbatimCoordinates">
20°12'N, 41°13'E
</named-content>, 1881 m.a.s.l. 19.V.2010 (
<named-content content-type="dwc:recordedBy">
M. R. Sharaf
</named-content> &amp;
<named-content content-type="dwc:recordedBy">
A. S. Aldawood
</named-content> Leg.);
</tp:collecting-event>
</tp:material-citation>
Besides GBIF issuing an occurrence ID for the material citations, and Pensoft’s
Biodiversity Data Journal, no other publishers are using IDs for material citation so far. For
EJT and the journals of the MNHN Paris, Plazi is adding the material citations attribute
after extracting the data from the published papers.
In legacy publication annotations, material citations are attributed with a unique UUID in
TreatmentBank. These UUIDs are resolvable via Plazi SRS* , and are included in the
Darwin Core Archive submitted by TreatmentBank to GBIF where they are reused in
26
Recommendations for use of annotations and persistent identiers in taxonomy ... 33
combination with the parent taxonomic treatment UUID as identifiers for the published
material citation.
The TreatmentBank UUID for the material citation is reused in GBIF as a couple of
treatment UUID * material citation UUID:
Identifier = 03A10B47FFDFFFAFFDE0FA60FB18F865.mc.3B60B00CFFD1FFAFFF38
FED7FD4AFE22
In the Biodiversity Data Journal, the material citations are exported to Darwin Core Archive
and indexed by GBIF automatically on the date of publication. The internal material citation
UUID is minted and entered in the “occurrenceID” of Darwin Core. If the “occurrenceID” is
already occupied by the original ID supplied by the author, it should be moved to the
“associatedOccurrences” field of Darwin Core, while the “occurrenceID” field should be
used again for the internal material citation ID provided by the journal.
Example from Biodiversity Data Journal:
<tp:treatment-sec sec-type="materials
<list list-type="alpha-lower" list-content="occurrences">
<list-item>
<p>
<bold>Type status:</bold>
<named-content content-type="dwc:typeStatus">Holotype</named-content>.
<bold>Occurrence:</bold>
catalogNumber:
<named-content content-type="dwc:catalogNumber">
<ext-link xlink:href="[url]">IFRD9449</ext-link>
</named-content>;
recordedBy:
<named-content content-type="dwc:recordedBy">Liu Yu-Wei</named-content>;
occurrenceID:
<named-content content-type="dwc:occurrenceID">UUID</named-content>;
associatedOccurrences:
<named-content content-type="dwc:associatedOccurrences">
living culture IFRDCC3104
</named-content>
</p>
</list-item>
</list>
</tp:treatment-sec>
Recommendation
Publishers should use unambiguous separators, such as a Unicode character U+2022 “•”,
for the material citations within an article and identify these with UUIDs in the backend
article JATS XML. When material citations represent a holotype or other type specimens,
this specific status, the collecting event and the collection should be tagged unambiguously
in the backend XML to facilitate harvesting and reuse.
34 Agosti D et al
Taxonomic names
Definition
A taxonomic name, or more generally scientific name, is the formal name, that is the
scientific identity, given to a species or, more generally, a taxon, following the rules of
nomenclature and used widely beyond taxonomy to link data to a particular taxon. Although
the concept of scientific names, along with rules on the interrelationships of taxa, was
introduced in the ancient times by Aristotle (c. 350 BC), and subsequently by Voutsiadou et
al. (2017), binomial names were introduced by Linnaeus in 1753 and since then, have
served as a precursor to today’s persistent identifiers. Taxonomic names play different
roles inferred by their position in a publication. In other words, the context of their use
defines their role. A taxonomic name in the treatment’s nomenclature section is the
nominate taxonomic name of that treatment. A taxonomic name used in a treatment citation
of an existing treatment relates that earlier treatment to the nominate treatment, and
represents its taxonomic history; it can also be accompanied with a label indicating
nomenclatural changes such as a synonymy or a new combination. These can be
nomenclatural acts or subjective synonyms. Any mention of a taxon name in any other
section of the article is regarded as a Taxon Name Usage (TNU).
Identifiers for new taxa descriptions and other types of nomenclatural acts, and their online
registration, are used increasingly, and the process is regulated by zoological (ICZN) and
botanical (ICN) codes (Ride et al. 2012, Turland et al. 2018). Currently, registration of
nomenclatural acts, other than new taxa descriptions, as a part of a valid publication,
whether electronic or print, is mandatory only in mycology including palaeomycology.
Registration of identifiers in other disciplines is mandatory only for new taxa descriptions
but not yet for other nomenclatural acts. It is, however, planned for implementation
(Barkworth et al. 2016a, Barkworth et al. 2016b).
The Catalogue of Life (COL) consortium, in a collaboration with the Global Biodiversity
Information Facility (GBIF), aims to provide a global list of accepted names (Garnett et al.
2020, Hobern et al. 2021) by using a combination of automated and manual integration of
existing checklists including large scale checklists such as WoRMs, as well as checklists
originating from individual taxonomic publications submitted to GBIF. At the moment, COL
provides persistent identifiers for taxonomic names but not for taxon concepts. However,
WoRMS provides persistent identifiers for each available name, including higher taxa, in its
infrastructure, Aphia. Aphia uses Life Science Identifiers (LSIDs) as unique and stable
identifiers. TreatmentBank provides a persistent identifier for each available name
annotated in the nomenclature section of a taxonomic treatment in legacy literature, both
for new taxa or re-descriptions. Taxon concept identifiers are planned as part of ChecklistB
ank, a repository and index for taxonomic data. The taxonomic name for a taxon, which
can include a large number of taxonomic name usages (e.g. synonyms), is separated from
their role in nomenclature (Hobern et al. 2021) and in a subsequent section in the
treatment after the nomenclature section.
Recommendations for use of annotations and persistent identiers in taxonomy ... 35
The National Centre for Biotechnology Information (NCBI) taxonomy database holds
unique identifiers (taxIDs) for taxonomic names for which sequence data is available at the
INSDC (Schoch et al. 2020). All records at INSDC have their taxonomic information linked
to the NCBI taxIDs. This database, however, does not comprise a complete list of
taxonomic names. The BOLD taxonomy browser also contains entries for taxonomic
names, with associated identifiers. The ChecklistBank allows mapping of these identifiers
to the entries in COL.
What are the identifiers for new names or nomenclatural acts
Fungi
Pre-publication registration of identifiers for names, typifications and other nomenclatural
acts is mandatory for fungi since 1st January 2013. The identifiers must be published in the
protologue or in nomenclatural changes.
Living vascular plants: IPNI (International Plant Names Index)
In botany, the registration of nomenclatural acts was accepted at the XIX International
Botanical Congress in Shenzhen 2017 (Turland et al. 2018).
Post-publication indexing is a well-established practice of the IPNI which covers seed
plants, ferns and lycophytes, but not bryophytes or algae. IPNI is produced collaboratively
by The Royal Botanic Gardens, Kew, The Harvard University Herbaria, and The Australian
National Herbarium and is hosted by the Royal Botanic Gardens, Kew. Pre-publication
indexing and inclusion of IPNI record identifiers in the publication was first implemented by
the Pensoft journal PhytoKeys (Penev et al. 2016), and later on by EJT. IPNI provides
nomenclatural information (spelling, author, types and first place and date of publication)
for the scientific names of non-fossil vascular plants from family down to infraspecific
ranks, including an index of authors for all the groups under the International Code of
Nomenclature for algae, fungi, and plants (ICNafp).
Algae
PhycoBank is the registration system for nomenclatural acts (new names, new
combinations and types) of algae (Kusber et al. 2019). However, the registered identifiers
are not required to be listed in the original publication.
Fossil plants (except for fossil fungi and diatoms)
Pre-publication indexing is established in the Fossil Plant Names Registry (FPNR) and the
International Fossil Plant Names Index (IFPNI). Registration of taxa is not mandatory.
Bryophytes
IDs for new bryophyte names can be obtained from the Index of Mosses Database
(W³MOST).
36 Agosti D et al
Animals
ZooBank provides registration of new nomenclatural acts, published works, and authors. It
is an authoritative online, open-access, community-generated registry for zoological
nomenclature provided as a service to taxonomists, biologists, and the global biodiversity
informatics community. It is also the official register of the International Commission on
Zoological Nomenclature (ICZN).
The registration of Type Specimens is allowed in Zoobank but yet not fully implemented.
Registration is mandatory for electronic publications publishing new nomenclatural acts
since 1st January 2012. Each electronic publication receives an identifier (LSID) minted by
ZooBank.
Identifiers for taxa in Catalogue of Life, NCBI taxonomy, and TreatmentBank
The Catalogue of Life and the NCBI taxonomy are two widely used reference taxonomies.
Both issue taxon name IDs. For references in articles, authors can use hyperlinked taxon
IDs of either COL or NCBI just as they use sequence accession numbers.
TreatmentBank mints persistent identifiers for taxonomic names as part of the annotation
and FAIRizing of treatments in legacy literature. They are a combination of the treatment
UUID extended with “.taxon”.
How to discover identifiers of names
The following web sites provide the search facility for discovering the identifiers of names.
Fungi
https://www.mycobank.org
Living vascular plants
International Plant Names Index (IPNI)
Algae
https://www.phycobank.org
Fossil plants (except fossil fungi and diatoms)
IFPNI (International fossil plant names index)
PFNR (Plant Fossil Names Registry)
Animals
Zoobank.org
Recommendations for use of annotations and persistent identiers in taxonomy ... 37
Identifiers of nomenclatural acts can also be found through other services, for example the
World Register of Marine Species www.marinespecies.org.
Catalogue of Life
https://www.checklistbank.org/tools/name-match
Via Catalogue of Life UI (advanced name search):
https://www.catalogueoflife.org/data/search?q=Pardosa+zyuzini
Via Catalogue of Life API (name usage search):
https://api.catalogueoflife.org/dataset/3LR/nameusage/search?q=Pardosa+zyuzini
Via GBIF UI (species search):
https://www.gbif.org/species/search?q=Pardosa+zyuzini (results include list of other
identifiers)
Via GBIF API (species search):
https://api.gbif.org/v1/species?name=Pardosa+zyuzini (requires further matching to pick
desired identifiers, e.g. ZooBank UUID of name)
NCBI taxonomy
https://www.ncbi.nlm.nih.gov/taxonomy
TreatmentBank
Identifiers for taxonomic names can be found using TreatmentBanks stats https://
tb.plazi.org/GgServer/srsStats
How to mint an identifier for new names
Fungi
Mycobank is an on-line database aimed as a service for the mycological and scientific
community by documenting mycological nomenclatural novelties, that is, new names and
combinations, and associated data such as descriptions and illustrations.
Index Fungorum, the global fungal nomenclator coordinated and supported by the Index
Fungorum Partnership, contains names of fungi including yeasts, lichens, chromistan
fungal analogues, protozoan fungal analogues and fossil forms, at all ranks. As a result of
changes to the ICN relating to registration of names, Index Fungorum provides a
mechanism to register names of new taxa, new names, new combinations and new
typifications.
38 Agosti D et al
Authors of novel fungal taxa must register the new names in only one registry, e.g. either in
MycoBank or Index Fungorum or Fungal Names. These registries regularly coordinate
sharing of data and have arranged an informal agreement to only accept the first listed
name in case it appears in more than one registry. Registration of the same new name in
multiple registries is considered an inappropriate practice that creates a considerable
amount of confusion and extra work for the registries and necessitates the deprecation of
the duplicated registrations at a later stage.
Living vascular plants
IPNI(International Plant Names Index)uses LSIDs as unique identifiers for plant names and
provides a mechanism to register those LSIDs. IPNI records LSIDs for names of new taxa,
new combinations and replacement namesfor living and vascular plants. LSIDs are not
mandatory for valid publication of a plant name. However, if an IPNI LSID is needed, it can
be pre-registered on the IPNI website. For new taxa, the holotype data can also be
provided. The new plant name will be provided with a LSID that will be activated once the
article is published. It is important to note that IPNI can only provide LSIDs for “vascular
plants”, i.e., extant ferns, lycophytes and seed-bearing plants. Thus, IPNI will not give
LSIDs for fungi, bryophytes (mosses), macroalgae (Rhodophyceae etc.), diatoms, or any
fossil vascular plant.
Algae
PhycoBank is the registration system for nomenclatural acts such as new names, new
combinations and types of algae (Kusber et al. 2019). However, it is not required as a part
of valid publication. PhycoBank provides a user interface for curatorial and voluntary data
entry. Each nomenclatural act according to the provisions of ICN Art. 7 is identified by a
stable http identifier that links directly into the PhycoBank portal. The identifier is generated
automatically when a reference is linked to a scientific name. Preparation of a record can
be done while the manuscript is in the review process. If the preparation is not public, a
registration identifier in a manuscript will return the status ‘in preparation’. Curation can be
done once the publication is finalised and reference details like page numbers and volume
are available. The registration can be published on PhycoBank once the scientific paper is
published.
Fossil plants (except fossil fungi and diatoms)
PFNR (Plant Fossil Names Registry) is a database of preferably new names, but also
previously published names of plant fossils and associated nomenclatural acts excluding
fossil diatoms and fossil fungi. It is run by the National Museum Prague for the International
Organisation of Palaeobotany. A LSID links the name to its original publication. The
registration of a new nomenclatural act results in a registration number that is added to the
manuscript. This part is not public and, if necessary, all data can be changed during
manuscript processing. These data are available only to the account owner who registered
the manuscript, and to the editors of the database. When the paper is published, the
Recommendations for use of annotations and persistent identiers in taxonomy ... 39
missing data should be added and completed. A more detailed guide for name and
typification registration is available.
IFPNI (International fossil plant names index) is a comprehensive literature-based record of
the scientific names of all fossil plants, algae, fungi, allied prokaryotic forms, protists
(ambiregnal taxa) and microproblematica. IFPNI provides an authoritative online, open-
access, community-sourced registry of fossil plant nomenclature as a service to the global
scientific community. A dynamic database documents all nomenclatural novelties including
new scientific names of extinct organisms and associated data, including registration of the
scientific publications containing nomenclatural acts and author-generated taxonomic
literature in palaeobotany and palaeontology. IFPNI issues LSIDs for each kind of data
object to locate biologically significant data over a network. LSIDs are designed to be
automatically machine resolvable. Read more about IFPNI coverage.
Animals
To obtain a LSID for a new publication or a new name, the article has to be pre-registered
in Zoobank by filling in a form with all the metadata: type of publication, article or
monograph in a series, date of publication, authors, full title, ISSN of the journal, DOI of the
article, volume, number, pages, online archive (Penev et al. 2016). Tutorials are available
online on the Zoobank website to register a publication, a new name, an existing record,
etc.
How to annotate and cite a taxon name
In JATS/TaxPub
<object-id object-id-type="Taxon name service">[taxonomic name identifier]</object-id>
Examples of annotations
Fungi
The new fungal species Neopestalotiopsis rhapidis Qi Yang & Yong Wang bis, sp. nov.,
published in Biodiversity Data Journal ( Yang et al. 2021) has the identifier MycoBank
840065” which resolves to the MycoBank record for this new name, available after logging
in MycoBank.
In the article JATS XML, this record is annotated as:
<tp:taxon-treatment>
<tp:nomenclature>
<tp:taxon-name>
<object-id content-type="arpha">1EC88425-ADBF-5527-B845-9EF7658D0BA9</object-id>
<tp:taxon-name-part taxon-name-part-type="genus">
Neopestalotiopsis
</tp:taxon-name-part>
<tp:taxon-name-part taxon-name-part-type="species">rhapidis</tp:taxon-name-part>
<object-id object-id-type="MycoBank">840065</object-id>
40 Agosti D et al
</tp:taxon-name>
<tp:taxon-authority>Qi Yang &amp; Yong Wang bis</tp:taxon-authority>
<tp:taxon-status>sp. nov.</tp:taxon-status>
</tp:nomenclature>
</tp:taxon-treatment>
Living vascular plants
The new plant speciesArdisia whitmorei Julius & Utteridge, sp. nov. published in
PhytoKeys ( Julius and Utteridge 2022) bears the IPNI ID in the protologue: urn:lsid:ipni.
org:names:77302868-1. The IPNI ID is directly linked to the IPNI record.
In the article JATS XML, this record is annotated as:
<tp:taxon-treatment>
<tp:nomenclature>
<tp:taxon-name>
<object-id content-type="arpha">7A38F1AF-338B-5EAE-BE17-7FE4DD689BD9</object-id>
<object-id content-type="ipni">urn:lsid:ipni.org:names:77302868-1</object-id>
<tp:taxon-name-part taxon-name-part-type="genus" reg="Ardisia">
Ardisia
</tp:taxon-name-part>
<tp:taxon-name-part taxon-name-part-type="species" reg="whitmorei">
whitmorei
</tp:taxon-name-part>
</tp:taxon-name>
</tp:nomenclature>
</tp:taxon-treatment>
Algae
The new algae Halamphora kenderoviana Zidarova, P.Ivanov, Dzhembekova, M.de Haan
& Van de Vijver, published in PhytoKeys (Zidarova et al. 2022) has the PhycoBank
103140” identifier which resolves to the PhycoBank record for this name.
In the JATS XML of the article, this record is annotated as:
<tp:taxon-treatment>
<tp:nomenclature>
<tp:taxon-name>
<object-id content-type="arpha">55756560-6844-5EE2-9343-68965B35BA9C</object-id>
<named-content content-type="phycobank"
xlink:href="https://www.phycobank.org/103140">
Phycobank 103140
</named-content>
<tp:taxon-name-part taxon-name-part-type="genus" reg=" Halamphora">
Halamphora
</tp:taxon-name-part>
<tp:taxon-name-part taxon-name-part-type="species" reg="kenderoviana">
kenderoviana
</tp:taxon-name-part>
</tp:taxon-name>
<tp:taxon-authority>
Zidarova, P.Ivanov, Dzhembekova, M.de Haan &amp; Van de Vijver
</tp:taxon-authority>
Recommendations for use of annotations and persistent identiers in taxonomy ... 41
<tp:taxon-status>sp. nov.</tp:taxon-status>
</tp:nomenclature>
</tp:taxon-treatment>
Animals
A new species of giant Eunice, Eunice dharastii published in Zanol and Hutchings (2022)
(https://doi.org/10.3897/zookeys.1118.86448). The taxonomic name resolves at Zoobank
with the LSID urn:lsid:zoobank.org:act:63BC2367-9654-45DA-8021-FD17584DFFDC.
In the article, the JATS XML is annotated as follows:
<tp:nomenclature>
<tp:taxon-name>
<object-id content-type="arpha" object-id-type="UUDI">
BFC45050-4831-5B2D-9E7C-F99DC97D5DF1
</object-id>
<object-id content-type="zoobank" object-id-type="UUID"
xlink:href="https://zoobank.org/63BC2367-9654-45DA-8021-FD17584DFFDC">
63BC2367-9654-45DA-8021-FD17584DFFDC
</object-id>
<tp:taxon-name-part taxon-name-part-type="genus" reg="Eunice">
Eunice
</tp:taxon-name-part>
<tp:taxon-name-part taxon-name-part-type="species" reg="dharastii">
dharastii
</tp:taxon-name-part>
</tp:taxon-name>
<tp:taxon-status>sp. nov.</tp:taxon-status>
</tp:nomenclature>
Expression of links to taxon names in JATS from the Catalogue of Life and NCBI
Taxonomy
The link to Formica rufa in the Catalogue of Life is as follows:
<object-id content-type="COL">
https://www.catalogueoflife.org/data/taxon/6JGM9
</object-id>
The link to Formica rufa taken from the NCBI taxonomy
<object-id content-type="NCBI">
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=258706
</object-id>
Links to other taxon specific catalogues can be added by entering a respective content-
type.
Linking a taxonomic name anywhere in the text to the taxonomic name in the respective
treatment, an id and <xref rid> element can be included.
42 Agosti D et al
<tp:taxon-treatment>
<tp:nomenclature>
<tp:taxon-name id="tn1">
Formic rufa
<object-id object-id-type="UUID">
C41B87B7-C473-EE75-A7E8-8565FCE1A945.taxon</object-id>
</tp:taxon-name>
</tp:nomenclature>
...
</tp:taxon-treatment>
In a following section of the same article
<sec>
<title>Conclusions</title>
<p>
In this study,
<xref rid="tn1">
<tp:taxon-name>
Formica rufa
<object-id content-type="UUID">
C41B87B7-C473-EE75-A7E8-8565FCE1A945.taxon</object-id>
</tp:taxon-name>
</xref>
is different from Formica rufa sensu Collingwood, 1966.
</p>
</sec>
Recommendation
Provide a pre-publication registration and include identifiers of new taxa or nomenclatural
acts in the original article whenever possible, even when this is not required by a Code.
Where and how to register new taxa and get identifiers for different groups of organisms
such as algae, fungi, plants, or animals is explained in the sections above.
Specimens
Definition
Physical specimens held in collections may be cited directly, for example, material citations
as part of taxonomic treatments, or in other sections of the article. In other cases, data
derived from the specimens such as genetic sequences may include a reference to the
specimen source. To keep track of the use of these specimens, collections should assign
them with at least locally, but better with globally unique IDs (catalogue numbers)* . For
this reason, the Darwin Core (DwC) triplet comprising the catalogue number, collection
code and institution code is often used to assign an ad hoc PID to a specimen. However,
while the DwC triplets are used commonly, they are far from perfect as these codes are
poorly standardised and can change over time (Guralnick et al. 2015).
Specimens are often cited by combinations of metadata other than the DwC triplet, such as
a who-what-when-where combination, e.g. a specimen “X”, collected in locality “Y” by
27
Recommendations for use of annotations and persistent identiers in taxonomy ... 43
collector “Z” on date XX-YY-ZZZZ, belonging to Taxon A, identified by Person “B”. This may
include names of the person(s) who collected it, where and when this happened and/or a
taxonomic identification. A field number may also be used, which acts as a unique identifier
for the collection event as minted by the collectors. These numbers are not unique beyond
this narrow context and may not have a systematic syntax. The combination of these
properties may allow a specimen to be uniquely identified, but this is not a trivial task and
natural language processing as well as disambiguation efforts are required.
Increasingly, the aim is to keep track of physical specimens through digital twins, called
Digital or Extended Specimens (Hardisty et al. 2019, Lannom et al. 2020, Addink and
Hardisty 2020, Hardisty et