Content uploaded by Roeland Kindt
Author content
All content in this area was uploaded by Roeland Kindt on Jul 03, 2020
Content may be subject to copyright.
WORLDFLORA
User Guide for Graphical User Interface
Roeland Kindt, dr. ir.
Senior scientist – ecology
World Agroforestry, Nairobi
Jens-Peter Barnekow Lillesø
Senior researcher, University of Copenhagen, Copenhagen
Fellow, World Agroforestry, Nairobi
Fabio Pedercini
Consultant in Impact Assessment and Landscape Ecology
World Agroforestry, Nairobi
July 2020
1. Introduction
This document provides a worked example with the RcmdrPlugin.WorldFlora package of standardizing
plant names with a downloaded copy of the Taxonomic Backbone data set of World Flora Online.
RcmdrPlugin.WorldFlora provides a graphical user interface (GUI) that is integrated in the R
Commander GUI for the WorldFlora package
The suggested citation of the World Flora Online (WFO) is:
WFO ([Year]): World Flora Online. Version [Year].[Month]. Published on the Internet;
http://www.worldfloraonline.org. Accessed on: [Date]"
A manuscript describing the WorldFlora package has been accepted for Applications in Plant Sciences.
Once published, this article will be the suggested citation for using WorldFlora and
RcmdrPlugin.WorldFlora.
A pre-print of the manuscript is available here:
Kindt R. 2020. WorldFlora: An R package for exact and fuzzy matching of plant names against
the World Flora Online Taxonomic Backbone data.
https://www.biorxiv.org/content/10.1101/2020.02.02.930719v1
The worked example is a list of 636 gymnosperms that was downloaded from the World Economic
Plants in GRIN-Global, obtained on 28-JUN-2020 with the query of:
family/altfamily = 'gymnosperms' & native country = 'all countries' & economic uses: economic
classes = 'All'
2. Installation
The recommended installation is to install the packages in the following sequence.
First fully install the R Commander
install.packages("Rcmdr", dependencies=TRUE)
Note that the R language is case sensitive, so Install.Packages will not work!
Launch the R Commander and then also install other packages that are suggested:
library(Rcmdr)
In case that you encounter any problems, check the R Commander installation notes.
Next install the WorldFlora package
install.packages("WorldFlora", dependencies=TRUE)
Finally install the RcmdrPlugin.WorldFlora package
install.packages("RcmdrPlugin.WorldFlora", dependencies=TRUE)
Possibly you need to terminate the R session at this point, then launch R again.
Figure 2.1. The WorldFlora Graphical User Interface is integrated in the R Commander
It should now be possible to launch the Graphical User Interface through following command
library(RcmdrPlugin.WorldFlora)
Note that after installations, the Graphical User Interface (Figure 2.1) can be immediately launched
with the command directly above.
3. Change the working directory
Change the working directory. This is the default folder where R will import and save data. It will be
folder where the WFO taxonomic backbone will be downloaded to (Section 4).
The result of selecting the menu option of: WorldFlora > Change working directory… will be that a
script for changing the working directory (setwd("E:/WorldFlora")) will be included in the
script window and that outputs will be shown in the Output window (unless R Commander options
are changed; see the introductory manual for the R Commander).
Note that it is possible to directly provide or modify scripts in the R Script window. These scripts will
be executed after highlighting them and then clicking on the Submit button (a button on the bottom
right of the R Script window). It is also possible to copy the script to the R GUI and the executed from
the R GUI, as recommended in some of the following sections.
Figure 3.1. Change the working directory.
4. Download or remember the World Flora Online taxonomic
backbone
When using the package the first time, you need to download the WFO taxonomic backbone first. This
can be done through the menu option of WFO.download (Figure 4.1). The taxonomic backbone will
be unzipped as ‘classification.txt’ into the working directory. After download, the file will be loaded
into R as data set WFO.data. Please be patient when this data is loading as this will take some time
for the over 1,300,000 observations for 19 variables.
A next that you use the package, use menu option of WFO.remember, which will download the
WFO.data from the location where it was downloaded to earlier.
In case that you wanted to change the location of the ‘classification.txt’ data set, use menu option of
WFO.remember (choose file) to provide the changed location.
Figure 4.1. Menu option of WFO.download will save a copy of the WFO taxonomic backbone into the
working directory.
Once the WFO.data was loaded, you can now directly match a single taxon through the menu option
of Taxon lookup… (Figure 4.2).
The options of the interface refer to 3 WorldFlora functions:
• WFO.browse gives a list of all taxa at the next taxonomic level (in the example given here, a
list of forms, subspecies and varieties of species Olea europaea)
• WFO.synonyms gives a list of all synonyms and the accepted name
• WFO.family gives information at higher taxonomic levels (including family and order)
More details about the functions can be obtained through the Help button (bottom left of the
interface).
The option in the interface of ‘only generate script’ has the effect that only the script will be shown
on the R Script window, but that this script will not be executed. This feature allows to copy and paste
the script directly into the R GUI (not the R Commander GUI). This may be desired to see messages
directly displayed in the main output, something that may be preferred for the WFO.match and
WFO.one functions as well (next sections).
Figure 4.2. For a single taxon, you can match with WFO data through the menu option of Taxon lookup.
5. Import the data set with plant names to be checked
The R Commander provides various methods of importing data that are available from the menu
option of Data > Import data.
Two of the options are also available from the WorldFlora menu:
• Import active data set from text file…
• Import active data set from Excel file…
When importing the data, it is important that characters are not converted to factors, especially if you
want to check a large number of plant names (Figure 5.1).
As our worked example, we import the 636 names of gymnosperms from GRIN-global (see
introduction).
Figure 5.1. Importing the data from Excel. Uncheck the box to Convert character data to factors.
Figure 5.2. After importing the data, it becomes the active data set of the R Commander. You can then
view the data from the ‘Edit data set’ options.
6. Prepare the data set
The main functions of the WorldFlora package (WFO.match and WFO.one) assume that the naming
authority is not part of the plant name. In our working example, the authority is given together with
the plant name. Therefore, we need to split the names into two different fields, one for the taxonomic
name and another one for the naming authority.
The menu option of WFO.prepare attempts to make this split (Figure 6.1). Check the help function for
the various options refer to, such as the ‘punctuation.flag’ that flags that submitted plant names
contain punctuation characters such as ! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } and ~.
As the function may generate a large number of messages, you may prefer to only generate the script
from the menu interface and then paste this script in the main R GUI (Figure 6.2).
The function will result in a new data set (‘XXX.prep’) that will have attempted the desired split (Figure
6.3). This new data set will now become the active data set, that can be exported.
Figure 6.1. Menu option of WFO.prepare will attempt to split names into the botanical name and the
naming authority.
Figure 6.2. Copying the script from the R Commander to the R GUI makes it easier to see the messages.
Figure 6.3. The result of function WFO.prepare with separate columns for the botanical name
(spec.name) and the naming authority (Authorship) added to the submitted data set.
In a spreadsheet, data can now be sorted by the various flagging fields.
Sorting by the column of punctuation selects the taxa shown in Table 6.1. Based on a manual
inspection of these fields, we changed the names for some of the plants (Table 6.2). After making
these changes, we pasted the columns of spec.name and Authorship into our original data set and
then re-imported the data as active data set Gymnosperm2.
Table 6.1. Records flagged by the punctuation column
Taxa
spec.name
Authorship
Cupressus ×leylandii A. B. Jacks. & Dallim.
Cupressus ×leylandii
A. B. Jacks. & Dallim.
Cycas lane-poolei C. A. Gardner
Cycas lane-poolei
C. A. Gardner
Encephalartos eugene-maraisii I. Verd.
Encephalartos eugene-maraisii
I. Verd.
Encephalartos friderici-guilielmi Lehm.
Encephalartos friderici-guilielmi
Lehm.
Juniperus wallichiana Hook. f. & Thomson ex Brandis
Juniperus wallichiana f. &
Thomson ex Brandis
Larix gmelinii var. principis-rupprechtii (Mayr) Pilg.
Larix gmelinii var. principis-rupprechtii
(Mayr) Pilg.
Larix ×marschlinsi Coaz
Larix ×marschlinsi
Coaz
Macrozamia pauli-guilielmi W. Hill & F. Muell.
Macrozamia pauli-guilielmi
W. Hill & F. Muell.
Taxus ×media Rehder
Taxus ×media
Rehder
Unident-Cycadaceae spp.
Unident-Cycadaceae
Unident-Zamiaceae spp.
Unident-Zamiaceae
Table 6.2. Manual changes made in records flagged by the punctuation column
spec.name
Authorship
Cupressus ×leylandii
A. B. Jacks. & Dallim.
Cycas lane-poolei
C. A. Gardner
Encephalartos eugene-maraisii
I. Verd.
Encephalartos friderici-guilielmi
Lehm.
Juniperus wallichiana
Hook. f. & Thomson ex Brandis
Larix gmelinii var. principis-rupprechtii
(Mayr) Pilg.
Larix ×marschlinsi
Coaz
Macrozamia pauli-guilielmi
W. Hill & F. Muell.
Taxus ×media
Rehder
Cycadaceae
Zamiaceae
7. Check for matching names
The menu option of WFO.match brings up a menu interface for the WFO.match function of WorldFlora
(Figure 7.1). Note that this menu interface uses the active data set, which may have to be selected
anew after importing data.
The Authorship field is optional and <NONE> should be selected when this data is not available (or
when a user prefers not to calculate the Levenshtein distance to the Authorship).
When using the GUI only to generate the script and then copying into the main R GUI, it is easier to
read through the messages (Figure 7.2).
Note that the checks will take a while if many fuzzy matches are explored, so please be patient.
Figure 7.1. Menu interface for the WFO.match function.
Figure 7.2. Messages when copying the script in the main R GUI.
8. Check for the best single match
Function WFO.one reduces the number of matches to one for each submitted plant name. It operates
on the results from WFO.match, so make sure that these results are in the active data set (Figure 8.1).
Once again it is possible to use the interface to generate the script only and then copy in the R GUI
(Figure 8.2).
Figure 8.1. Menu interface for the WFO.match function.
Figure 8.2. Messages when copying the script in the main R GUI.
After executing the script, the results become the active data set. This data set can now be exported
and cross-checked. With our Gymnosperm data set, there were no exact matches at the genus level
for three taxa as shown by the variable of First.dist (Table 8.1). The submitted names were matched
with different genera with an exact match at infrageneric levels in these cases. In a typical work flow,
a user may want to check next for matches at the species level (such as matches for Pinus nigra) for
these three cases by removing the infraspecific parts.
For our example with the Gymnosperms data set, from the full list of submitted names, one can
immediately confirm that Ephedra intermedia and Pinus nigra are accepted names, as these were
among submitted species names. In checking the submitted names, it also becomes obvious that the
submitted taxon name of ‘Taxus chinensis (Pilg.) Rehder (=Taxus wallichiana var. chinensis (Pilg.)
Florin)’ was split erroneously into ‘Taxus chinensis var. chinensis’ and ‘(Pilg.) Florin)’ (this is a
consequence from WFO.prepare working on the assumption that ‘var.’ is followed by the infraspecific
name and that the section before ‘var.’ consists of the genus and species names and their naming
authorities). A subsequent check among the submitted names can then confirm that Taxus wallichiana
var. chinensis is an accepted name.
Table 8.1. The three records with a First.dist larger than 0.
Submitted
First.dist
scientificName
Old.status
Old.name
Pinus nigra subsp. nigra
6
Sambucus nigra
Synonym
Sambucus nigra subsp. nigra
Ephedra intermedia var. intermedia
5
Lechea intermedia
Synonym
Lechea intermedia var. intermedia
Taxus chinensis var. chinensis
3
Rhus chinensis
Synonym
Rhus chinensis var. chinensis
9. Acknowledgments
RK thanks the CGIAR Research Program on Forests, Trees and Agroforestry (supported by the CGIAR
Trust Fund) and the Provision of Adequate Tree Seed Portfolios project (supported by Norway’s
International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia) for
supporting his time to develop the WorldFlora and RcmdrPlugin.WorldFlora packages.