Content uploaded by Teja Kattenborn
Author content
All content in this area was uploaded by Teja Kattenborn on Jun 08, 2022
Content may be subject to copyright.
Content uploaded by Teja Kattenborn
Author content
All content in this area was uploaded by Teja Kattenborn on Jun 01, 2022
Content may be subject to copyright.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
Available online 23 May 2022
2667-3932/© 2022 The Authors. Published by Elsevier B.V. on behalf of International Society of Photogrammetry and Remote Sensing (isprs). This is an open access
article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Transfer learning from citizen science photographs enables plant species
identication in UAV imagery
Salim Soltani
a
,
*
, Hannes Feilhauer
a
,
b
, Robbert Duker
c
, Teja Kattenborn
a
,
b
a
Remote Sensing Centre for Earth System Research (RSC4Earth), Leipzig University, Germany
b
German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
c
Department of Botany, Nelson Mandela University, Port Elizabeth, South Africa
ARTICLE INFO
Keywords:
Remote sensing
Convolutional Neural Network (CNN)
Crowd-sourced data
Plant species
Transfer learning
Drones
ABSTRACT
Accurate information on the spatial distribution of plant species and communities is in high demand for various
elds of application, such as nature conservation, forestry, and agriculture. A series of studies has shown that
Convolutional Neural Networks (CNNs) accurately predict plant species and communities in high-resolution
remote sensing data, in particular with data at the centimeter scale acquired with Unoccupied Aerial Vehicles
(UAV). However, such tasks often require ample training data, which is commonly generated in the eld via
geocoded in-situ observations or labeling remote sensing data through visual interpretation. Both approaches are
laborious and can present a critical bottleneck for CNN applications. An alternative source of training data is
given by using knowledge on the appearance of plants in the form of plant photographs from citizen science
projects such as the iNaturalist database. Such crowd-sourced plant photographs typically exhibit very different
perspectives and great heterogeneity in various aspects, yet the sheer volume of data could reveal great potential
for application to bird’s eye views from remote sensing platforms. Here, we explore the potential of transfer
learning from such a crowd-sourced data treasure to the remote sensing context. Therefore, we investigate rstly,
if we can use crowd-sourced plant photographs for CNN training and subsequent mapping of plant species in
high-resolution remote sensing imagery. Secondly, we test if the predictive performance can be increased by a
priori selecting photographs that share a more similar perspective to the remote sensing data. We used two case
studies to test our proposed approach with multiple RGB orthoimages acquired from UAV with the target plant
species Fallopia japonica and Portulacaria afra respectively. Our results demonstrate that CNN models trained
with heterogeneous, crowd-sourced plant photographs can indeed predict the target species in UAV orthoimages
with surprising accuracy. Filtering the crowd-sourced photographs used for training by acquisition properties
increased the predictive performance. This study demonstrates that citizen science data can effectively anticipate
a common bottleneck for vegetation assessments and provides an example on how we can effectively harness the
ever-increasing availability of crowd-sourced and big data for remote sensing applications.
1. Introduction
Knowledge on the distribution of plant species and communities is
essential to various elds of application, including research, nature
conservation efforts, forestry, and agriculture. Remote sensing data has
shown to be an efcient source of information for mapping plant species
in time and space (Fassnacht et al., 2016; Nagendra, 2001). With recent
advances in sensor technology and remote sensing platforms, such as
high-resolution satellite missions and Unoccupied Aerial Vehicles
(UAV), optical imagery with high spatial detail is becoming more widely
available, allowing for an accurate localization of plant species (Maes
and Steppe, 2019; Lopatin et al., 2017; Curnick et al., 2021; Wagner,
2021). Deep Learning methods, specically Convolutional Neural Net-
works (CNNs) provide unprecedented opportunities for remote
sensing-based vegetation assessments using such high-resolution data
(Ferreira et al., 2020; Weinstein et al., 2020; Wagner et al., 2019; Brandt
et al., 2020). The fundamental potential of CNNs is based on the effec-
tiveness of automatically learning image features that enable an accu-
rate extraction of a vast number of plant properties from remote sensing
images (Hoeser and Kuenzer, 2020; Brodrick et al., 2019; Kattenborn
et al., 2021). This facilitates the identication of key leaf and canopy
properties required for plant species identication. Hence, with the
* Corresponding author.
E-mail address: salim.soltani@uni-leipzig.de (S. Soltani).
Contents lists available at ScienceDirect
ISPRS Open Journal of Photogrammetry
and Remote Sensing
journal homepage: www.journals.elsevier.com/isprs-open-journal-of-photogrammetry-and-remote-
sensing
https://doi.org/10.1016/j.ophoto.2022.100016
Received 8 February 2022; Received in revised form 14 April 2022; Accepted 13 May 2022
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
2
growing availability of high-resolution remote sensing data, CNNs are
expected to signicantly enhance the opportunities for mapping vege-
tation patterns and particularly plant species. Accordingly, a series of
studies have already shown that CNNs accurately predict plant species
and communities in high-resolution remote sensing data, in particular
with simple RGB data acquired in the centimeter range with UAVs (Qian
et al., 2020; Schiefer et al., 2020; Kattenborn et al., 2020; Nezami et al.,
2020; Kattenborn et al., 2019; Fromm et al., 2019; Bayraktar et al.,
2020).
However, such tasks often require ample training data to generate
CNN models that are accurate and transferable across heterogeneous
landscapes and remote sensing data characteristics, while the collection
of such training data is often costly and requires large logistical efforts.
Moreover, plant species identication can be a complex task, particu-
larly for species that differ only in subtle features. In such cases, CNN
training requires a vast number of training observations to learn the
features that are decisive to differentiate between species (Kattenborn
et al., 2021). Training data are commonly generated via geocoded
in-situ observations (Fassnacht et al., 2016) or labeling remote sensing
data through visual interpretation (Flood et al., 2019; Kattenborn et al.,
2019). Field data collection is often labor-intensive, costly, and limited
due to the inaccessibility of training areas. Also, labeling remote sensing
data using visual interpretation can be time consuming, particularly if
the appearance of the target species is varying (e.g. due to environ-
mental conditions or varying illumination conditions in the imagery),
for complex vegetation canopies that require very detailed annotations
or for species that are hard to identify or even require additional in-situ
data for precise identication (Schiefer et al., 2020; Kattenborn et al.,
2020). Thus, the realized quantity and representativeness of training
data derived from both eld surveys or visual interpretation can be a
critical bottleneck for the performance of CNN models and their trans-
ferability across different sites and remote sensing data sets (Bayraktar
et al., 2020;Rzanny et al., 2019; Brandt et al., 2020; Baeta et al., 2017;
Nogueira et al., 2017).
At the same time, our basic knowledge about the appearance of plant
species is constantly growing, namely in the form of millions of freely
accessible plant photographs with associated species names that are
collected each day in citizen science projects. A prominent example in
this regard is the iNaturalist platform, which motivates countless citi-
zens to record, share and annotate photographs of the World’s ora and
fauna (Boone and Basille, 2019; Di Cecco et al., 2021). For this, users can
manually identify the species of their observation, or optionally an al-
gorithm can predict the species based on computer vision techniques
(Van Horn et al., 2018, 2021). Subsequently, an observation can be
evaluated to research grade once more than two-thirds of the community
agree on species identication. Research grade is the highest level of data
quality and will be passed to the Global Biodiversity Information Facility
(GBIF). At the time of writing, the iNaturalist database already provides
more than 16 mio of globally distributed and annotated photographs of
vascular plant species along with their geolocation, and the quantity of
photographs is growing exponentially as more volunteers join the plat-
form (Boone and Basille, 2019; Di Cecco et al., 2021). However,
crowd-sourced plant photographs are very heterogeneous in terms of
their quality and acquisition settings. This heterogeneity emerges from
different factors: Plants photographs are taken by different devices such
as professional photographic equipment or smartphone cameras (Witt-
mann et al., 2019; Van Horn et al., 2021). Photographs are taken under
different illumination conditions and from different perspectives with
camera-to-plant distance ranging from centimeters to several hundred
meters, and very different camera angles and perspectives (Schiller
et al., 2021; Van Horn et al., 2018). Moreover, crowd-sourced
photographs do not only show a large variety in image quality, but
are also expected to greatly differ in their perspective compared to the
bird-eye view that is usually given in remote sensing imagery, such as
UAV-based orthomosaics.
Despite these partly weak links between the ground view from citi-
zens and the Earth observation perspective, the excessive amounts of
knowledge on the appearance of plants crowd-sourced each day could
potentially reduce efforts for generating ample and representative
training data collections (Tuia et al., 2021; Kattenborn et al., 2021). This
approach can be assigned to transfer learning that involves gaining
knowledge by solving one problem (here, plant species characterization
in crowd-sourced data) and applying it to another but related problem
(here, plant species characterization in remote sensing data). For image
analysis, transfer learning is frequently applied with very large but very
general data sets to create pre-trained models that learned general image
features, such as edges, shapes, or patterns (Shin et al., 2016; Hoeser
et al., 2020). In contrast, plant photograph databases such as iNaturalist
may enable to transfer of knowledge that is explicitly related to the
problem of species recognition in remote sensing data (leaf forms, can-
opy structures). Here, we explore the value of crowd-sourced data for
transfer learning in the remote sensing context using two case studies
with two different plant species and high-resolution UAV-based RGB
imagery. We investigate, rstly, if crowd-sourced plant photographs can
be used with a CNN-based transfer learning approach to map plant
species in such high-resolution remote sensing imagery. Secondly, we
test if the predictive performance of such a transfer learning approach
can be increased by pre-selecting photographs that share a more similar
perspective to the remote sensing imagery.
2. Methods
2.1. Case studies and data acquisition
We used two case studies aiming to map two different target species
in multiple UAV-based RGB orthoimages in the form of segmentation.
The basis of this segmentation is to train the CNN model to predict three
classes, including 1) the target species, 2) the potential surrounding
species, and 3) the common land cover types in the given area. The rst
case study aimed at mapping Fallopia japonica (F. japonica). In Central
Europe, F. japonica, is an invasive species that is known as a thread for
biodiversity, has a negative impact on ooding management and can
alter ecosystem functions (Schnitzler and Muller, 1998; Shaw et al.,
2011). F. japonica changes its appearance through the seasons from
small sprouts in spring to >2 m tall plants in summer that die off in fall.
Likewise the pigmentation of its large leaves changes from greenish to
yellowish and brownish in the fall season. As a study site, we selected the
Nahle river, which is a tributary of the Elster-Luppe system and sur-
rounded by the northern Leipzig oodplain forest, Germany and inves-
tigated the species in the summer season. The study area has a temperate
climate and nutrient-rich river banks with densely growing vegetation.
The common surrounding species on the study site were surveyed locally
with 13 plots of 1 m ×1 m distributed along the Nahle and Elster river.
The species cover was estimated visually using a surveying approach (cf.
Bråkenhielm and Qinghong, 1995; Vanha-Majamaa et al., 2000). Only
dominant species, here dened with a cover value >5% were consid-
ered, which included Aegopodium podagraria, Alliaria petiolata, Arctium
minus, Artemisia vulgaris, Arum maculatum, Bromus sterilis, Calamagrostis
epigejos, Capsella bursa-pastoris, Dactylis glomerata, Festuca arundinacea,
Ficaria verna, Geum urbanum, and Glechoma hederacea. Next to the target
species and the surrounding species, we considered open surface waters
as land cover type representing the Nahle river.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
3
The second case study aimed to map Portulacaria afra (P. afra), which
is a key species in the context of countering desertication and
ecosystem restoration in Africa (van der Vyver et al., 2013; Duker et al.,
2020; Mills et al., 2015). The study area is located in the Eastern Cape,
South Africa, with a mixed subtropical and Mediterranean climate, and
includes several research plots on restoration success of P. afra. The plots
greatly differ in the total cover and average plant size of P. afra, and
coverage and composition of surrounding species. Next to P. afra, the
frequently occurring species were identied by local experts and include
Pappea capensis, Schotia afra, Euclea undulata, Searsia longispina, Putter-
lickia pyracantha, Lycium sp., Aloe speciosa, Aloe africana, Aloe ferox,
Rhigozum obovatum, Euphorbia coerulescens, Boscia oleoides, Crassula
ovata, Cussonia spicata, and Searsia lucida. The plots also differ greatly in
their fraction of barren soil, and hence barren soil was dened as a
separate land cover class during the classication.
The UAV-based RGB aerial imagery for both case studies was ac-
quired with a DJI Mavic Pro 2 and autonomous ights that were planned
with Litchi (VC Technology Ltd, UK) or DroneDeploy (DroneDeploy,
USA). We used a Structure from Motion-based photogrammetric pro-
cessing chain in Metashape (vers. 1.7.6, Agisoft LLC) to create
orthoimages from the collected UAV images. For the case study of
F. japonica we selected three sites along the Nahle river covering a total
area of 20,811 m
2
(see Appendix Fig. A9 for the full orthoimages). The
imagery was acquired on an almost cloud free day in June 2021 at
approximately 15 m height, a forward overlap of 90% and a side overlap
of 70%. The estimated average Ground Sampling Distance (GSD) of the
orthoimage products is 0.3 cm/pixel. For the case study of P. afra, UAV
ights were conducted for 31 individual sites during the years 2020 and
2021 during diverse illumination conditions. The area considered in the
analysis of these individual sites varies between 580 and 840 m
2
and
sums up to 23,398 m
2
. The imagery was acquired at varying heights
between approximately 30 and 40 m, and forward overlapping of ca.
85% and side overlapping of ca. 65%. The GSD of the created orthoi-
magery ranged from 0.6 to 1.2 cm. Details on the size of the individual
sites and the cover of the target species are given in the Appendix
(Table A1). Due to the different acquisition dates and times, the indi-
vidual orthoimages for P.afra had very different illumination and color
properties. We, hence, tested the presented approach with the raw
orthoimages as well as with orthoimages that were standardized using
histogram matching (Appendix Table A5).
For both case studies, we queried the iNaturalist database for
research grade photographs of the target species and the surrounding
species that are expected in the areas of each case study. Using the
scientic name of the plant species, we searched and downloaded
matching plant photographs using the R-package rinat (v.0.1.8), which
is an interface to the API of iNaturalist. The data availability differs
widely by species. We hence restricted the amount of data available for a
few species and applied a stratication at a later stage during training
(see Section 2.3). For the target species F. japonica we downloaded
10,000 of the more than 70,000 available photographs in the iNaturalist
database. For the surrounding species that are expected in the case study
F. japonica, we downloaded 32,534 photographs. For P. afra, 864 pho-
tographs could be acquired and for species in the surroundings of P. afra,
a total of 11,026 photographs. As described above, an additional land
cover class was considered for each case study, i.e. open surface water
for the case study F. japonica and barren soil for the case study P. afra.
For both cases, 4,184 and 910 photographs, respectively, were down-
loaded from the web using the Google Search API and different queries
(e.g., river, river bed, open water and soil, barren, ground, dry respectively).
Fig. 2 shows an example of downloaded images for the three classes for
both case studies.
2.2. Filtering crowd-sourced photographs by acquisition settings (angle
and distance)
Crowd-sourced plant photographs have very heterogeneous acqui-
sition settings and their perspective often greatly differs from the typical
bird-view perspective of remote sensing data (Fig. 2). To test if the ac-
curacy of the species identication can be increased by pre-selecting
photographs with a similar perspective to the remote sensing data, we
ltered training photographs based on their acquisition angle and dis-
tance from the target plants (Fig. 3). As the iNaturalist photographs do
not readily inform acquisition distance and angle, we used CNN-based
regression models to predict these properties for each downloaded
photograph. To train such a CNN to estimate the acquisition angle and
distance, we visually labeled the acquisition distance and angle of 4,500
photographs sampled across the different species (Section 2.1). This
enabled to create a single CNN model that was applicable to both case
studies.
We randomly sampled 10% of the training photographs as test data
and the remaining data in 80% for model training and 20% for model
validation. Before model training, the differently sized plant photo-
graphs were normalized to a single common size using quadratic crop-
ping along the short side of each photograph and resampled all
photographs to a common resolution. After initial tests, we used a res-
olution of 256 ×256 pixels.
Fig. 1. Examples photographs of F. japonica (left) and P. afra (right) from the iNaturalist database.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
4
The visual interpretation of the imagery revealed that the acquisition
distance and angle of the iNaturalist plant photographs are not evenly
distributed. For instance, the majority of photographs are taken from
very close distances. To prevent problems associated with imbalanced
data for the training of the CNN regression model, we transformed the
data with a log10-transformation. Afterward, the distance and angle
labels were standardized between 0-1.
As a feature extractor, we used the ResNet50V2 backbone and the
following top layers, i.e. global max pooling, layer atten, dropout with
the rate of 0.5, four fully connected layers with 512, 512, 256, 128 units,
and the nal regression layer with 1 unit and a sigmoid activation
function. All other fully connected layers were congured with the ReLU
activation function and L2 kernel regularizer with the value of 0.001.
We used adaptive moment estimation (Adam) as an optimizer with the
learning rate of 0.0001, and Mean Squared Error (MSE) as loss param-
eters to train the model. The models were trained with a batch size of 20
over 50 epochs.
We used the model of these epochs with the lowest loss to predict the
acquisition angle and distance for all available photographs. Using these
angle and distance estimates and different thresholds, we ltered the
plant photographs prior to training CNN-based species classication (see
next Section). The assessments shown here are for the sake of brevity
restricted to ltering (removing) photographs with a distance < 0.5 m
and with acquisition angles < 0 (where 0 corresponds to horizontal, 90
to nadir, and −90 to zenith).
Fig. 2. Example of iNaturalist photographs used in both case studies.
Fig. 3. Schematic workow of the study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
5
2.3. Training CNN models for plant species identication using crowd-
sourced plant photographs
The CNN-based identication of the target species was implemented
as a classication problem, including the following classes: 1) the target
species, 2) the expected surrounding species, and 3) surrounding land-
cover types. The availability of observations within these classes
greatly varied, while unbalanced data sets may bias the model towards
more frequently occurring classes. To alleviate such risk and limit
computational load, we sampled equal class sizes of 4,000 photographs
per class. For underrepresented classes, we applied sampling with
replacement, hence, duplicating existing photographs for underrepre-
sented classes. For the class of the expected surrounding species (2), we
sampled the 4,000 photographs equally over all species, resulting in 117
photographs per species (n =34) in the case study on F. japonica and 333
photographs per species (n =12) in the case study on P. afra. Similar to
the CNN-based angle and distance prediction described above (Section
2.2), we applied a data augmentation to modify duplicates and regu-
larize the model. This included random vertical or horizontal ipping, as
well as brightness, contrast, and saturation alterations. All photographs
were cropped to a quadratic shape and resampled to different sizes.
Given that the image size can greatly impact the model performance
(Kattenborn et al., 2021; Schiefer et al., 2020; Litjens et al., 2017), we
tested different sizes, including 64 ×64, 128 ×128, and 256 ×256.
Models trained with an input size of 128 ×128 resulted in highest
prediction accuracy and, hence, only results for the input size 128 ×128
pixels are considered in this study. As initial evaluation of the model
performance for the species classication, we split the photographs into
test, training, and validation data. We randomly sampled 10% of the
data as test data and the remaining data in 80% for model training and
20% for model validation. Note, that the test data set only served for
initial evaluation, while the nal classication accuracy of the models
with UAV data was assessed with independent reference data at a later
stage (Section 2.4).
We tested a variety of pre-built model backbones, including Dense-
Net, MobileNet, and ResNet congurations with various layer depths. In
line with the CNN model for the prediction of angle and distance (see
Section 2.2), test results showed that ResNet50V2 outperformed the
other backbones. We added the following customized layers on top of
the backbone: Global max pooling, layer attening, three fully con-
nected layers with 512, 256, and 128 units, and the nal classication
layer with three units. All three fully connected layers were congured
with the ReLU activation function and L2 kernel regularizer with the
value of 0.001. The nal layer used the softmax activation function. We
used Root Mean Squared Propagation (RMSprop) as an optimizer with a
learning rate of 0.0001 and categorical cross entropy as a loss function to
train the model. The models were trained with a batch size of 10 for 50
epochs. We used the model with the lowest loss value obtained using the
validation data set for the nal prediction on the orthoimages (see
Section 2.4).
All analysis was implemented using Keras and Tensorow (v. 2.7.) in
R (v. 4.1.1.) The code is available on GitHub (https://github.com/sali
msoltani28/CNN_CitizenScience_ UAV_plantspeciesMapping).
2.4. CNN model application to UAV-based orthoimages
The CNN models for the detection of the target species were used to
segment the latter in the UAV orthoimagery using a moving window
approach (Fig. 3). For each step of the moving window a class was
predicted and stored with the center coordinates of the window. The size
of the moving window equaled the input size of the CNN models (128 ×
128 pixels) and was sequentially shifted over the orthoimage with a
xed step size. The step size corresponded to 10% of the moving window
size (128 pixels) and was selected considering the trade-off of compu-
tational load and spatial detail of the prediction output. For each step
and location, the maximum probability value of the predictions was
selected to assign one of the classes (target species, surrounding species,
surrounding land cover).
The individual class predictions obtained for each location of the
sliding window procedure were spatially aggregated to a raster, result-
ing in a high resolution segmentation map of the target species. The
output resolutions of the segmentation map varied depending on the
resolution of the orthoimagery between 3.8 cm and 16 cm. We tested
this procedure with and without ltering photographs prior to training
by their estimated acquisition properties (cf. Section 2.2).
We validated the segmentation of the target species per pixel-basis
using wall-to-wall reference data of the target species, which were ob-
tained from visual interpretation of the orthoimagery. The accuracy for
classifying the target species was quantied using the Precision, Recall,
F
1
-score and overall accuracy (OA) based on the counts of true positive
(TP), true negative (TN), false positive (FP) and false negative (FN)
classied reference pixels.
Precision =
TP
TP +FP
Recall =
TP
TP +FN
F1=2×
Precision ×Recall
Precision +Recall
OA =
TP +TN
TP +TN +FP +FN
3. Results
3.1. Filtering photographs by acquisition settings
In view of the heterogeneity of crowd-sourced photographs, we
tested to only include photographs taken from distances or with angles
that correspond to similar settings as perceived from the UAV perspec-
tive. The accuracy of the CNN-based regression for predicting angles and
distances tested with independent test data resulted in a R
2
=0.7 for
both variables (Fig. 4 and Appendix Fig. A10). Using the predicted angle
and distance values for all photographs and different thresholds, we
ltered the training data prior to training the CNN models for plant
species classication and tested the effect on the nal plant segmenta-
tion accuracy in the orthoimages. The best results were obtained by
ltering training photographs by distance (>0.5 m), which increased
the model accuracy (F
1
-score) by 8.3 and 9.3% for F. japonica and P. afra,
respectively, while ltering by angle had no positive effects on the
models (see Appendix Table A2 to Table A4 for detailed comparisons). In
the following, only results based on the distance-ltered training data
are shown.
3.2. Plant species segmentation in UAV-based orthoimages
The nal models for the species segmentation in UAV imagery were
based on distance-ltered training data and were applied with a moving
window of 128 ×128 pixels. For both studies, the nal models were
evaluated with the wall-to-wall reference data created from the ortho-
mosaics by visual interpretation. For the case study on F. japonica, the
accuracy of segmenting the target species with the nal CNN model
amounted to a precision from 0.38 to 0.75 (mean 0.6), Recall from 0.87
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
6
to 0.94 (mean 0.91), and F
1
-score from 0.53 to 0.82 (mean 0.71)
(Table 1). For the case study on P. afra, the accuracy metrics were more
heterogeneous and ranged from a precision from 0.05 to 0.89 (mean
0.66), recall from 0.1 to 0.93, and F1 from 0.05 to 0.83 (mean 0.60).
These accuracy metrics for P. afra are based on the raw orthoimages. The
histogram matching (standardization of the brightness and color prop-
erties) for the relatively heterogeneous orthoimages did not result in
increased accuracy. The low accuracy metrics for P. afra for some
orthoimages (e.g., orthoimage 14 and 27) coincided with very low
values for the average plant size and total cover of P. afra therein. Pre-
cision, Recall, and F
1
-score correlated with at least an R
2
of 0.5 with the
average plant size and total cover (Appendix Fig. A11), meaning that the
model did not perform well to detect very small P. afra plants (cf.
Fig. 6b).
4. Discussion
4.1. Training data ltering
The CNN-based regression models for predicting the acquisition
angle and distance of photographs resulted in accuracy values that were
considered to be acceptable for the purpose of ltering the anticipated
thresholds (thresholds corresponding to imagery with a very different
perspective than the UAV-based perspective). For the distance predic-
tion, the accuracy clearly decreased with larger distances, which we may
be attributed to the lower density of data annotated for such distances
(Fig. 4). This data imbalance is caused by the fact that photographs in
iNaturalist are most often acquired from distances below 10 m. Here,
ltering of photographs by acquisition distance was implemented to
avoid including images that were acquired from very close distances, e.
g., imagery showing individual leaves, fruits, or barks, as such plant
features are not assumed to be visible in operational UAV orthoimages
(at least not at scales considered here). For these rather close distance
ranges, we found that our model produces very accurate predictions (cf.
Fig. 4). Note that considerations about ltering photographs by acqui-
sitions distance should resemble the resolution of the available remote
sensing data (here in the range of 0.3–1.5 cm pixel size).
The prediction of acquisition angles was relatively accurate over the
range of possible angles (Appendix Fig. A10). However, for both case
studies ltering photographs by acquisition angle did not improve the
species classication (for detailed comparisons, see Appendix Table A2
to Table A4). We assume that for the species considered here, the plant
features that are key for the species identication, such as specic leave
forms, branching and canopy patterns, are visible from most acquisition
angles. Thus, reducing the observations using acquisition-angle related
thresholds will also reduce the total amount of training data used for
learning such patterns, which in turn may explain the decreased per-
formance of corresponding trials.
Fig. 4. Results of the distance-based ltering, i.e. predicting the acquisition distance of each photograph. Left: Scatter plot of the predictions from the CNN-based
regression against reference data obtained from visual interpretation (the blue smoothing line representing the t was determined using a locally weighted least
squares regression and the gray area shows its 95% condence interval). Right: Photographs of P. afra clustered in rows according to the predicted distance.
Table 1
Accuracy metrics for the individual orthoimages of the two case studies. The
results are based on models trained with distance-ltered training data
(excluding images < 0.5 m; see Section 2.2). Results based on other ltering
approaches (no ltering or angle-based ltering) are given in the Appendix in
Tables A2, A3, and A4.
Case study Orthoimage Precision Recall F1 OA
1 0.38 0.87 0.53 0.97
F. Japonica 2 0.66 0.94 0.78 0.99
3 0.75 0.91 0.82 0.99
Mean 0.6 0.91 0.71 0.98
1 0.88 0.85 0.87 0.91
2 0.71 0.7 0.7 0.93
3 0.78 0.89 0.83 0.89
4 0.7 0.81 0.75 0.95
5 0.71 0.9 0.79 0.94
P. afra 6 0.69 0.69 0.69 0.87
7 0.71 0.72 0.72 0.92
8 0.89 0.77 0.83 0.83
9 0.82 0.72 0.77 0.94
10 0.33 0.54 0.41 0.95
11 0.6 0.68 0.64 0.91
12 0.69 0.72 0.71 0.88
13 0.7 0.31 0.43 0.97
14 0.62 0.11 0.18 0.96
15 0.65 0.81 0.72 0.9
16 0.87 0.74 0.8 0.93
17 0.48 0.66 0.55 0.82
18 0.66 0.93 0.77 0.92
19 0.24 0.8 0.37 0.75
20 0.72 0.55 0.62 0.96
21 0.79 0.67 0.73 0.92
22 0.85 0.35 0.5 0.87
23 0.86 0.66 0.74 0.84
24 0.7 0.62 0.65 0.97
25 0.51 0.16 0.24 0.99
26 0.31 0.1 0.15 0.99
27 0.05 0.05 0.05 0.98
28 0.59 0.42 0.49 0.93
29 0.68 0.59 0.63 0.97
30 0.86 0.56 0.67 0.9
31 0.86 0.32 0.47 0.83
Mean 0.66 0.59 0.6 0.91
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
7
4.2. Training data selection, availability and balancing
The availability of photographs in the iNaturalist database differs
greatly for different plant species. For example, there were over 70,000
photographs available for the target species F. japonica, but only 864
photographs for P. afra. Despite these low sample numbers for P. afra,
the segmentation accuracy was comparable among the two case studies
(cf. Table 1). We also found large differences in data availability for the
surrounding species. To avoid issues emerging from class imbalances
during model training, we applied a stratied sampling over the species
and classes. For underrepresented classes, sampling with replacement
was applied, while replacements were modied with a subsequent data
augmentation. This workaround may not be necessary in case sufcient
amounts of photographs are found for all classes, which is expected in
the future for most classes due to the rapid growth of the iNaturalist
database (Di Cecco et al., 2021). Note, that other approaches may be
used to compensate class imbalances, such as using weights during
training that reect the sampling frequency of a class (Buda et al., 2018;
Huang et al., 2016).
The presented segmentation approach was build on three classes, i.e.
the target species, potentially surrounding species and additional land
cover types (barren soil, water). The surrounding species were selected
Fig. 5. Excerpts of the segmentation result of the target species F. japonica. The segmentation results for all orthoimages are given in the Appendix in Fig. A9. From
left to the right: UAV-based orthoimage, manually delineated reference data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The
last row (e), shows a close up.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
8
with knowledge from experts and local plot surveys. For other applica-
tions, such plot information may be requested from local institutions
(nature conservation agencies, forest inventories) or accessed from plot
databases such as sPlot (Bruelheide et al., 2019) or its open pendant
sPlotOpen (Sabatini et al., 2021). Alternatively, random samples of the
iNaturalist database located in a study area may be used.
4.3. CNN model application to drone-based aerial images
Based on our results, higher classication accuracy was recorded in
the case study of F. japonica compared to the case study of P. afra. These
differences may be attributed to multiple factors: i) The higher avail-
ability of training data. ii) The higher spatial resolution of the orthoi-
magery for the case study on F. japonica may facilitated that plant
features learned from the iNaturalist imagery are recognized in the UAV
imagery. iii) F. japonica in general appears to have more contrasting
plant features compared to its surrounding species (e.g., large leaves
with characteristic form) than is the case for P. afra. iv) For P. afra, the
sites were more heterogeneous in terms canopy cover, structure and
species composition. Additionally, the acquisitions dates and, hence,
illumination conditions varied greatly, while the orthoimages for
F. japonica were relatively homogeneous, since these were acquired on a
single day.
Given that we found large differences in brightness and colors for the
orthoimagery of P. afra, we tested the models with orthoimagery that
were pre-processed with a histogram matching (Appendix Table A5).
The segmentation results did not improve, suggesting that the models
are indeed robust and transferable across heterogeneous illumination
conditions. The latter may result from the fact that the iNaturalist data
itself is very heterogeneous and facilities the transferability of the
Fig. 6. Excerpts of the segmentation result of the target species of the case study P. afra. The segmentation results for all orthoimages are given in the Appendix in
Fig. A1 to Fig. A8. From left to the right: UAV-based orthoimage, manually delineated reference data, prediction maps derived from the CNN models trained with
crowd-sourced imagery. The last row (e), shows a close up.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
9
resulting models. The transferability across illumination conditions is
further conrmed in the predictions of individual orthoimagery, where
we could not observe systematic trends in segmentation accuracy be-
tween illuminated or shadowed parts of the canopies (cf. Figs. 5 and 6).
For F. japonica, we observed scattered false negatives within the
segmented canopies (Fig. 5). Although the segmented areas appeared to
overall adequately reproduce the actual canopy extents, the speckled
false negatives obviously negatively affected the accuracy metrics
(particularly precision, Table 1). Such scattered false negatives could be
optionally compensated with post-classication methods (e.g., clump or
sieve operators, Buddenbaum et al., 2005), while here we refrained from
further tuning the initial mapping output and only presented the raw
output of the approach.
Rather systematic false negatives for F. japonica were found for
canopies that had been subject to herbivory (presumably by deer or
sheep; cf. south-east canopy segments in Fig. 5a), and thus deviated from
the expected canopy texture (we did not observe any training data with
herbivory effects in the iNaturalist-derived training data). In very few
cases, false positives of F. japonica were found for dried plants of the
surrounding species. For P. afra, we found systematic miss-
classications for very small plants (cf. Fig. 6 and Appendix Fig. A1 to
Fig. A8), which was quantitatively conrmed by the high correlations of
the estimated plant size and total cover of P. afra with segmentation
accuracy results (Appendix Fig. A11). This may be a consequence of the
fact that the already fewer image data for P. afra contain very few im-
ages for very small individuals.
The presented transfer learning approach can also partly be consid-
ered as weakly supervised, since it ultimately aims at a segmentation,
while the initial CNN model is trained with sparse labels (a simple
species annotation per image and not a mask). To apply the model
trained with sparse labels for segmentation of the orthoimagery we used
a moving window method. Selecting a suitable window size (here 128 ×
128 pixels) is crucial since it determines how the partitioned plant
canopies are presented towards the network and should be carefully
tested considering the orthoimage resolution and the size of the target
plants (Mahdianpari et al., 2018; Fricker et al., 2019). The step size of
the moving window was xed at 10% of the moving window size (128 ×
128), which resulted in segmentation masks with a spatial resolution
between 3.8 cm and 16 cm. Here, this was found to be a suitable
compromise between spatial detail and processing speed, while techni-
cally, such a segmentation could be produced at the original orthoimage
resolution. Note that in comparison to common segmentation methods
that build on fully connected networks, such as Unet (Ronneberger et al.,
2015) or DeepLab (Chen et al., 2017), the presented approach is
computationally relatively expensive. However, common segmentation
approaches also require spatially explicit labels for model training,
which are not available from citizen science data (i.e., a species label for
each individual image pixel). Yet, the segmentation output of the pre-
sented moving window approach may be used to train common seg-
mentation methods that are more scalable to large data sets.
Overall, our results demonstrate that robust predictions in UAV im-
agery can be created from CNN models trained with citizen science-
based plant photographs. Note that the approach was tested with two
case studies where the target species have quite different morphological
characteristics compared to the surrounding species. For instance,
P. afra has quite roundish leaves combined with a mostly star-shaped
branching structure, while F. japonica differs from the surrounding
vegetation through its large heart-shaped leaves (Figs. 1 and 2). How-
ever, the success of such an approach not only depends on contrasting
morphological properties between the target and the surrounding
species but also if such properties are visible in both the citizen science
photographs and the orthoimagery. Therefore, we tested the approach
with UAV imagery with a ground sampling distance in the range of 0.3
cm–1.5 cm. We expect that higher resolutions will generally ultimately
greatly increase the classication accuracy. However, higher resolutions
also come in line with reduced area extents per UAV acquisition and
increased data storage and processing demands. In the future, such
limitations may be of little concern given expected further technological
advances in robotics and computing resources (cf. Floreano and Wood,
2015; De Masi and Ferrante, 2020).
The phenology of plant species may also greatly affect the appear-
ance of plants in the remote sensing data (Schiefer et al., 2021; Culbert
et al., 2009). This was not tested specically in this study, as only im-
agery across multiple dates was available for P. afra, which, however,
shows little phenological variations. According to the species and the
data used here and in line with other studies, the citizen science ob-
servations cover a wide range of phenological phases for most species
and should therefore be very suitable for generating temporally trans-
ferable applications (McDonough MacKenzie et al., 2020; Barve et al.,
2020; Di Cecco et al., 2021).
5. Conclusion
CNN-based methods greatly enhanced our capabilities to exploit
remote sensing data for vegetation assessments, but their success and
applicability are often challenged by the availability of sufcient
amounts of training data. The presented transfer learning approach
showed that crowd-sourced plant photographs can serve as training data
for a CNN-based segmentation of plant species in high-resolution remote
sensing imagery. Filtering the crowd-sourced plant photographs used for
model training by their acquisition properties enhanced the segmenta-
tion results. The models showed a high transferability across sites with
different vegetation structure and orthoimages with varying in illumi-
nation conditions. The remote sensing data image resolution and quality
as well as the contrast of the target species to the surrounding species
appear to be critical factors for the success of this approach. Overall, this
study demonstrates that freely available knowledge, in form of plant
photographs and species annotations in open databases can effectively
anticipate a common bottleneck for vegetation assessments. This study
also provides an example on how we can effectively harness the ever-
increasing availability of crowd-sourced and big data for remote
sensing applications.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Acknowledgements
The study was funded by the German Research Foundation (DFG)
under the project BigPlantSens - Assessing the Synergies of Big Data and
Deep Learning for the Remote Sensing of Plant Species (Project number
444524904). We want to especially thank Guido Kraemer for his tech-
nical support on computational facilities. We also want to thank Alastair
Potts (Nelson Mandela University) for discussions on the study design
and manuscript. Open Access funding is enabled and organized by
Projekt DEAL. We acknowledge support from Leipzig University for
Open Access Publishing.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
10
A Appendix
Fig. A1. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study (total is 31 orthoimages).
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
11
Fig. A2. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
12
Fig. A3. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
13
Fig. A4. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
14
Fig. A5. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
15
Fig. A6. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
16
Fig. A7. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
17
Fig. A8. Overview of segmentation result of the target species of the case study P. afra. From left to the right: UAV-based orthoimage, manually delineated reference
data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row number on the left indicates the orthoimage of the corresponding
site in this case study.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
18
Fig. A9. Overview of segmentation result of the target species of the case study F. japonica. The box illustrates one orthoimage, which from top to bottom includes
UAV-based orthoimage, manually delineated reference data, and prediction map derived from the CNN models. For the two remaining orthoimages: from left to the
right: UAV-based orthoimage, manually delineated reference data, prediction maps derived from the CNN models trained with crowd-sourced imagery. The row
number on the left indicates the orthoimage of the corresponding site in this case study (total is 3 orthoimages).
Table A1
Total area of study sites and target species cover [m
2
].
Case study Orthoimage Total area Target Species cover
1 8113.06 247.41
F. japonica 2 5545.73 363.09
3 7152.53 492.48
1 419.44 274.7
2 710.29 103.59
3 683.76 247.59
(continued on next page)
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
19
Table A1 (continued )
Case study Orthoimage Total area Target Species cover
4 691.55 115.1
5 635.25 135.63
6 736.7 177.38
7 709.25 165.84
8 612.19 377.58
9 646.23 124.82
10 643.79 25.99
11 757.34 182.35
12 588.67 234.05
13 735.35 32.64
14 749.81 43.49
15 520.2 136.31
P. afra 16 619 142.19
17 563.51 159.96
18 867.63 238.41
19 672.52 84.92
20 684.93 69.64
21 762.4 168.38
22 770.03 169.91
23 672.32 253.74
24 722.45 57.13
25 724.07 13.55
26 813.96 5.48
27 792.89 9.18
28 616.34 53.78
29 837.06 57.23
30 773.88 207.01
31 759.08 213.78
Table A2
Accuracy metrics for the individual orthoimages of the two case studies with distance and angle based ltered photographs (removing
photographs with a distance <0.5 m and angle <0).
Case study Orthoimage Precision Recall F1 OA
1 0.15 0.74 0.25 0.92
F. Japonica 2 0.56 0.81 0.67 0.98
3 0.51 0.79 0.62 0.98
Mean 0.41 0.78 0.51 0.96
1 0.91 0.67 0.77 0.86
2 0.69 0.67 0.68 0.93
3 0.8 0.81 0.81 0.88
4 0.72 0.75 0.73 0.95
5 0.7 0.9 0.79 0.94
P. afra 6 0.66 0.49 0.56 0.84
7 0.74 0.59 0.65 0.91
8 0.93 0.52 0.67 0.72
9 0.83 0.63 0.72 0.93
10 0.36 0.49 0.41 0.95
11 0.66 0.51 0.58 0.91
12 0.76 0.53 0.62 0.87
13 0.76 0.24 0.36 0.97
14 0.52 0.52 0.52 0.96
15 0.73 0.66 0.69 0.91
16 0.87 0.74 0.8 0.93
17 0.47 0.46 0.47 0.82
18 0.68 0.78 0.73 0.92
19 0.25 0.65 0.36 0.78
20 0.74 0.52 0.61 0.96
21 0.84 0.48 0.61 0.91
22 0.84 0.22 0.35 0.85
23 0.86 0.72 0.78 0.86
24 0.73 0.63 0.68 0.97
25 0.56 0.06 0.1 0.99
26 0.28 0.04 0.07 0.99
27 0.05 0.04 0.05 0.99
28 0.66 0.4 0.5 0.94
29 0.73 0.65 0.68 0.97
30 0.87 0.49 0.63 0.89
31 0.88 0.24 0.38 0.81
Mean 0.68 0.52 0.56 0.91
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
20
Table A3
Accuracy metrics for the individual orthoimages of the two case studies with angle based ltered photographs (removing photographs with
angle <0).
Case study Orthoimage Precision Recall F1 OA
1 0.28 0.87 0.42 0.96
F. Japonica 2 0.59 0.91 0.72 0.98
3 0.68 0.87 0.76 0.99
Mean 0.52 0.88 0.63 0.98
1 0.93 0.49 0.64 0.8
2 0.78 0.34 0.47 0.91
3 0.83 0.42 0.55 0.8
4 0.78 0.4 0.53 0.93
5 0.75 0.36 0.48 0.91
P. afra 6 0.68 0.17 0.27 0.81
7 0.81 0.39 0.52 0.9
8 0.94 0.43 0.59 0.69
9 0.85 0.4 0.54 0.91
10 0.42 0.36 0.39 0.96
11 0.63 0.23 0.34 0.89
12 0.77 0.31 0.44 0.84
13 0.71 0.27 0.39 0.97
14 0.52 0.37 0.43 0.96
15 0.77 0.53 0.62 0.9
16 0.9 0.5 0.64 0.89
17 0.51 0.18 0.26 0.84
18 0.74 0.63 0.68 0.92
19 0.3 0.37 0.33 0.86
20 0.75 0.49 0.59 0.96
21 0.83 0.13 0.23 0.86
22 0.9 0.17 0.28 0.85
23 0.87 0.45 0.6 0.78
24 0.73 0.3 0.43 0.96
25 0.48 0.05 0.09 0.99
26 0.5 0.05 0.09 0.99
27 0.08 0.03 0.05 0.99
28 0.7 0.23 0.35 0.93
29 0.65 0.35 0.45 0.96
30 0.91 0.21 0.34 0.84
31 0.84 0.25 0.38 0.81
Mean 0.71 0.32 0.42 0.89
Table A4
Accuracy metrics for the individual orthoimages of the two case studies with no photograph ltering applied.
Case study Orthoimage Precision Recall F1 OA
1 0.37 0.15 0.21 0.98
F. Japonica 2 0.82 0.21 0.33 0.98
3 0.9 0.25 0.39 0.98
Mean 0.7 0.2 0.31 0.98
1 0.89 0.61 0.73 0.84
2 0.58 0.47 0.52 0.9
3 0.74 0.8 0.76 0.85
4 0.62 0.43 0.51 0.92
5 0.52 0.33 0.4 0.88
P. afra 6 0.38 0.15 0.22 0.77
7 0.77 0.54 0.63 0.91
8 0.87 0.36 0.51 0.63
9 0.74 0.64 0.69 0.92
10 0.3 0.39 0.34 0.95
11 0.58 0.26 0.36 0.89
12 0.68 0.32 0.43 0.83
13 0.56 0.32 0.4 0.96
14 0.41 0.62 0.49 0.95
15 0.57 0.66 0.61 0.87
16 0.76 0.82 0.79 0.91
17 0.33 0.12 0.17 0.82
18 0.68 0.51 0.58 0.9
19 0.19 0.31 0.23 0.81
20 0.63 0.66 0.65 0.95
21 0.43 0.06 0.11 0.84
22 0.74 0.56 0.64 0.88
23 0.8 0.57 0.66 0.79
24 0.4 0.26 0.31 0.94
25 0.06 0.09 0.07 0.97
(continued on next page)
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
21
Table A4 (continued )
Case study Orthoimage Precision Recall F1 OA
26 0.15 0.05 0.08 0.99
27 0.11 0.12 0.11 0.98
28 0.51 0.31 0.38 0.92
29 0.47 0.6 0.53 0.95
30 0.81 0.52 0.63 0.88
31 0.83 0.42 0.56 0.84
Mean 0.55 0.42 0.45 0.89
Table A5
Accuracy metrics for the individual histogram matched orthoimages for P. afra. For this, we used the model based on ltering the training
data with distance >0.5 m. The histogram matching was applied using the R-package histMatch (Leutner et al., 2017). The histogram
matching parameters were calculated from P. afra canopies as determined by the polygons from the visual interpretation. We used two
orthoimages with presumably favorable illumination conditions as a reference (orthoimages 1 and 16), while on all other orthoimages the
histogram matching produre was applied.
Case study Orthoimage Precision Recall F1 OA
1 0.89 0.79 0.83 0.89
2 0.73 0.5 0.59 0.92
3 0.78 0.76 0.77 0.86
4 0.74 0.59 0.66 0.94
5 0.74 0.61 0.67 0.93
P. afra 6 0.75 0.44 0.55 0.85
7 0.66 0.65 0.65 0.9
8 0.87 0.7 0.78 0.78
9 0.83 0.62 0.71 0.93
10 0.32 0.38 0.35 0.95
11 0.45 0.69 0.55 0.86
12 0.66 0.66 0.66 0.86
13 0.28 0.57 0.37 0.93
14 0.71 0.29 0.41 0.97
15 0.55 0.82 0.66 0.87
16 0.87 0.74 0.8 0.93
17 0.42 0.63 0.5 0.8
18 0.71 0.74 0.73 0.92
19 0.2 0.83 0.32 0.66
20 0.51 0.8 0.62 0.94
21 0.77 0.41 0.53 0.89
22 0.61 0.52 0.56 0.85
23 0.79 0.81 0.8 0.85
24 0.56 0.6 0.58 0.95
25 0.57 0.13 0.21 0.99
26 0.04 0.04 0.04 0.99
27 0.13 0.5 0.2 0.97
28 0.56 0.42 0.48 0.93
29 0.63 0.65 0.64 0.96
30 0.78 0.61 0.69 0.89
31 0.61 0.81 0.7 0.83
Mean 0.6 0.59 0.57 0.9
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
22
Fig. A10. Model accuracy assessment of the regression model for predicting acquisition angles from photographs (the blue smoothing line representing the t was
determined using a locally weighted least squares regression and the gray area shows its 95% condence interval).
Fig. A11. Comparison of plant size (top) and plant total cover (bottom) of P. afra with the map accuracy (Precision, Recall, F
1
-score). The total plant cover was
calculated from the reference data (total cover of P. afra per plot). The average plant size was approximated by using the mean size of individual P. afra polygons in
the respective plot.
References
Baeta, R., Nogueira, K., Menotti, D., dos Santos, J.A., 2017. Learning deep features on
multiple scales for coffee crop recognition. In: 2017 30th SIBGRAPI Conference on
Graphics, Patterns and Images. SIBGRAPI), pp. 262–268. https://doi.org/10.1109/
SIBGRAPI.2017.41.
Barve, V.V., Brenskelle, L., Li, D., Stucky, B.J., Barve, N.V., Hantak, M.M., McLean, B.S.,
Paluh, D.J., Oswald, J.A., Belitz, M.W., et al., 2020. Methods for broad-scale plant
phenology assessments using citizen scientists’ photographs. Appl. Plant Sci. 8 (1),
e11315.
Bayraktar, E., Basarkan, M.E., Celebi, N., 2020. A low-cost uav framework towards
ornamental plant detection and counting in the wild. ISPRS J. Photogrammetry
Remote Sens. 167 (1–11).
Boone, M.E., Basille, M., 2019. Using inaturalist to contribute your nature observations
to science. Environ. Data Inf. Serv. (4) 5–5, 2019.
Bråkenhielm, S., Qinghong, L., 1995. Comparison of eld methods in vegetation
monitoring. In: Biogeochemical Monitoring in Small Catchments. Springer,
pp. 75–87.
Brandt, M., Tucker, C.J., Kariryaa, A., Rasmussen, K., Abel, C., Small, J., Chave, J.,
Rasmussen, L.V., Hiernaux, P., Diouf, A.A., Kergoat, L., 2020. An unexpectedly large
count of trees in the west african sahara and sahel. Nature 587 (7832), 78–82.
S. Soltani et al.
ISPRS Open Journal of Photogrammetry and Remote Sensing 5 (2022) 100016
23
Brodrick, P.G., Davies, A.B., Asner, G.P., 2019. Uncovering ecological patterns with
convolutional neural networks. Trends Ecol. Evol. 34 (8), 734–745.
Bruelheide, H., Dengler, J., Jim´
enez-Alfaro, B., Purschke, O., Hennekens, S.M.,
Chytrý, M., Pillar, V.D., Jansen, F., Kattge, J., Sandel, B., et al., 2019. splot–a new
tool for global vegetation analyses. J. Veg. Sci. 30 (2), 161–186.
Buda, M., Maki, A., Mazurowski, M.A., 2018. A systematic study of the class imbalance
problem in convolutional neural networks. Neural Network. 106, 249–259.
Buddenbaum, H., Schlerf, M., Hill, J., 2005. Classication of coniferous tree species and
age classes using hyperspectral data and geostatistical methods. Int. J. Rem. Sens. 26
(24), 5453–5465.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Deeplab, A. L. Yuille, 2017.
Semantic image segmentation with deep convolutional nets, atrous convolution, and
fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), 834–848.
Culbert, P.D., Pidgeon, A.M., Louis, V.S., Bash, D., Radeloff, V.C., 2009. The impact of
phenological variation on texture measures of remotely sensed imagery. IEEE J. Sel.
Top. Appl. Earth Obs. Rem. Sens. 2 (4), 299–309.
Curnick, D.J., Davies, A.J., Duncan, C., Freeman, R., Jacoby, D.M., Shelley, H.T.,
Rossi, C., Wearn, O.R., Williamson, M.J., Pettorelli, N., 2021. SmallSats: a new
technological frontier in ecology and conservation? Remote Sens. Ecol. Conser. 8 (2),
139–150.
De Masi, G., Ferrante, E., 2020. Quality-dependent adaptation in a swarm of drones for
environmental monitoring. In: Advances in Science and Engineering Technology
International Conferences (ASET). IEEE, pp. 1–6, 2020.
Di Cecco, G.J., Barve, V., Belitz, M.W., Stucky, B.J., Guralnick, R.P., Hurlbert, A.H.,
2021. Observing the observers: how participants contribute data to inaturalist and
implications for biodiversity science. Bioscience 71 (11), 1179–1188.
Duker, R., Cowling, R.M., van der Vyver, M.L., Potts, A.J., 2020. Site selection for
subtropical thicket restoration: mapping cold-air pooling in the south african sub-
escarpment lowlands. PeerJ 8, e8980.
Fassnacht, F.E., Lati, H., Stere´
nczak, K., Modzelewska, A., Lefsky, M., Waser, L.T.,
Straub, C., Ghosh, A., 2016. Review of studies on tree species classication from
remotely sensed data. Remote Sens. Environ. 186, 64–87.
Ferreira, M.P., de Almeida, D.R.A., de Almeida Papa, D., Minervino, J.B.S., Veras, H.F.P.,
Formighieri, A., Santos, C.A.N., Ferreira, M.A.D., Figueiredo, E.O., Ferreira, E.J.L.,
2020. Individual tree detection and species classication of amazonian palms using
uav images and deep learning. For. Ecol. Manag. 475, 118397.
Flood, N., Watson, F., Collett, L., 2019. Using a u-net convolutional neural network to
mapwoody vegetation extent from high resolution satellite imagery across
queensland, Australia. Int. J. Appl. Earth Obs. Geoinf. 82, 101897.
Floreano, D., Wood, R.J., 2015. Science, technology and the future of small autonomous
drones. Nature 521 (7553), 460–466.
Fricker, G.A., Ventura, J.D., Wolf, J.A., North, M.P., Davis, F.W., Franklin, J., 2019.
A convolutional neural network classier identies tree species in mixed-conifer
forest from hyperspectral imagery. Rem. Sens. 11 (19), 2326.
Fromm, M., Schubert, M., Castilla, G., Linke, J., McDermid, G., 2019. Automated
detection of conifer seedlings in drone imagery using convolutional neural networks.
Rem. Sens. 11 (21), 2585.
Hoeser, T., Kuenzer, C., 2020. Object detection and image segmentation with deep
learning on earth observation data: a review-part i: evolution and recent trends.
Rem. Sens. 12 (10), 1667.
Hoeser, T., Bachofer, F., Kuenzer, C., 2020. Object detection and image segmentation
with deep learning on earth observation data: a review—part ii: Applications. Rem.
Sens. 12 (18), 3053.
Huang, C., Li, Y., Loy, C.C., Tang, X., 2016. Learning deep representation for imbalanced
classication. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 5375–5384.
Kattenborn, T., Eichel, J., Fassnacht, F.E., 2019. Convolutional neural networks enable
efcient, accurate and ne-grained segmentation of plant species and communities
from highresolution uav imagery. Sci. Rep. 9 (1), 1–9.
Kattenborn, T., Eichel, J., Wiser, S., Burrows, L., Fassnacht, F.E., Schmidtlein, S., 2020.
Convolutional Neural Networks accurately predict cover fractions of plant species
and communities in Unmanned Aerial Vehicle imagery. Remote Sens. Ecol. Conser. 6
(4), 472–486.
Kattenborn, T., Leitloff, J., Schiefer, F., Hinz, S., 2021. Review on convolutional neural
networks (cnn) in vegetation remote sensing. ISPRS J. Photogrammetry Remote
Sens. 173, 24–49.
Leutner, B., Horning, N., Leutner, M.B., 2017. Package ‘rstoolbox’. R Foundation for
Statistical Computing. Version 0.1.
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der
Laak, J.A., Van Ginneken, B., S´
anchez, C.I., 2017. A survey on deep learning in
medical image analysis. Med. Image Anal. 42, 60–88.
Lopatin, J., Fassnacht, F.E., Kattenborn, T., Schmidtlein, S., 2017. Mapping plant species
in mixed grassland communities using close range imaging spectroscopy. Remote
Sens. Environ. 201, 12–23.
Maes, W.H., Steppe, K., 2019. Perspectives for remote sensing with unmanned aerial
vehicles in precision agriculture. Trends Plant Sci. 24 (2), 152–164.
Mahdianpari, M., Salehi, B., Rezaee, M., Mohammadimanesh, F., Zhang, Y., 2018. Very
deep convolutional neural networks for complex land cover mapping using
multispectral remote sensing imagery. Rem. Sens. 10 (7), 1119.
McDonough MacKenzie, C., Gallinat, A.S., Zipf, L., 2020. Low-cost observations and
experiments return a high value in plant phenology research. Appl. plant sci. 8 (4),
e11338.
Mills, A.J., Vyver, M.V.d., Gordon, I.J., Patwardhan, A., Marais, C., Blignaut, J.,
Sigwela, A., Kgope, B., 2015. Prescribing innovation within a large-scale restoration
programme in degraded subtropical thicket in South Africa. Forests 6 (11),
4328–4348.
Nagendra, H., 2001. Using remote sensing to assess biodiversity. Int. J. Rem. Sens. 22
(12), 2377–2400.
Nezami, S., Khoramshahi, E., Nevalainen, O., P¨
ol¨
onen, I., Honkavaara, E., 2020. Tree
species classication of drone hyperspectral and rgb imagery with deep learning
convolutional neural networks. Rem. Sens. 12 (7), 1070.
Nogueira, K., Penatti, O.A., Dos Santos, J.A., 2017. Towards better exploiting
convolutional neural networks for remote sensing scene classication. Pattern
Recogn. 61, 539–556.
Qian, W., Huang, Y., Liu, Q., Fan, W., Sun, Z., Dong, H., Wan, F., Qiao, X., 2020. Uav and
a deep convolutional neural network for monitoring invasive alien plants in the wild.
Comput. Electron. Agric. 174, 105519.
Ronneberger, O., Fischer, P., Brox, T., U-net, 2015. Convolutional networks for
biomedical image segmentation. In: International Conference on Medical Image
Computing and Computerassisted Intervention. Springer, pp. 234–241.
Rzanny, M., M¨
ader, P., Deggelmann, A., Chen, M., W¨
aldchen, J., 2019. Flowers, leaves or
both? How to obtain suitable images for automated plant identication. Plant
Methods 15 (1), 1–11.
Sabatini, F.M., Lenoir, J., Hattab, T., Arnst, E.A., Chytrý, M., Dengler, J., De Ruffray, P.,
Hennekens, S.M., Jandt, U., Jansen, F., et al., 2021. splotopen–an environmentally
balanced, open-access, global dataset of vegetation plots. Global Ecol. Biogeogr. 30
(9), 1740–1764.
Schiefer, F., Kattenborn, T., Frick, A., Frey, J., Schall, P., Koch, B., Schmidtlein, S., 2020.
Mapping forest tree species in high resolution uav-based rgb-imagery by means of
convolutional neural networks. ISPRS J. Photogrammetry Remote Sens. 170,
205–215.
Schiefer, F., Schmidtlein, S., Kattenborn, T., 2021. The retrieval of plant functional traits
from canopy spectra through rtm-inversions and statistical models are both critically
affected by plant phenology. Ecol. Indicat. 121, 107062.
Schiller, C., Schmidtlein, S., Boonman, C., Moreno-Martínez, A., Kattenborn, T., 2021.
Deep learning and citizen science enable automated plant trait predictions from
photographs. Sci. Rep. 11 (1), 1–12.
Schnitzler, A., Muller, S., et al., 1998. Ecology and biogeography of highly invasive
plants in europe: giant knotweeds from Japan (fallopia japonica and f.
sachalinensis). Revue d’Ecologie (la Terre et la Vie) 53 (1), 3–38.
Shaw, R., Tanner, R., Djeddour, D., Cortat, G., 2011. Classical biological control of
fallopia japonica in the United Kingdom–lessons for europe. Weed Res. 51 (6),
552–558.
Shin, H.-C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D.,
Summers, R.M., 2016. Deep convolutional neural networks for computer-aided
detection: cnn architectures, dataset characteristics and transfer learning. IEEE
Trans. Med. Imag. 35 (5), 1285–1298.
Tuia, D., Kellenberger, B., Beery, S., Costelloe, B.R., Zuf, S., Risse, B., Mathis, A.,
Mathis, M.W., van Langevelde, F., Burghardt, T., et al., 2021. Seeing biodiversity:
perspectives in machine learning for wildlife conservation. arXiv preprint arXiv
2110, 12951.
van der Vyver, M.L., Cowling, R.M., Mills, A.J., Difford, M., 2013. Spontaneous return of
biodiversity in restored subtropical thicket: Portulacaria afra as an ecosystem
engineer. Restor. Ecol. 21 (6), 736–744.
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H.,
Perona, P., Belongie, S., 2018. The inaturalist species classication and detection
dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 8769–8778.
Van Horn, G., Cole, E., Beery, S., Wilber, K., Belongie, S., Mac Aodha, O., 2021.
Benchmarking representation learning for natural world image collections. In:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 12884–12893.
Vanha-Majamaa, I., Salemaa, M., Tuominen, S., Mikkola, K., 2000. Digitized photographs
in vegetation analysis-a comparison of cover estimates. Appl. Veg. Sci. 3 (1), 89–94.
Wagner, F.H., 2021. The owering of atlantic forest pleroma trees. Sci. Rep. 11 (1), 1–20.
Wagner, F.H., Sanchez, A., Tarabalka, Y., Lotte, R.G., Ferreira, M.P., Aidar, M.P.,
Gloor, E., Phillips, O.L., Aragao, L.E., 2019. Using the u-net convolutional network to
map forest types and disturbance in the atlantic rainforest with very high resolution
images. Remote Sens. Ecol. Conser. 5 (4), 360–375.
Weinstein, B.G., Marconi, S., Bohlman, S.A., Zare, A., White, E.P., 2020. Cross-site
learning in deep learning rgb tree crown detection. Ecol. Inf. 56, 101061.
Wittmann, J., Girman, D., Crocker, D., 2019. Using inaturalist in a coverboard protocol
to measure data quality: suggestions for project design. Citiz. Sci. Theory Pract. 4 (1).
S. Soltani et al.