PreprintPDF Available

Deep learning and computer vision will transform entomology

Authors:

Abstract

Most animal species on Earth are insects, and recent reports suggest that their abundance is in drastic decline. Although these reports come from a wide range of insect taxa and regions, the evidence to assess the extent of the phenomenon is still sparse. Insect populations are challenging to study and most monitoring methods are labour intensive and inefficient. Advances in computer vision and deep learning provide potential new solutions to this global challenge. Cameras and other sensors that can effectively, continuously, and non-invasively perform entomological observations throughout diurnal and seasonal cycles. The physical appearance of specimens can also be captured by automated imaging in the lab. When trained on these data, deep learning models can provide estimates of insect abundance, biomass, and diversity. Further, deep learning models can quantify variation in phenotypic traits, behaviour, and interactions. Here, we connect recent developments in deep learning and computer vision to the urgent demand for more cost-efficient monitoring of insects and other invertebrates. We present examples of sensor-based monitoring of insects. We show how deep learning tools can be applied to the big data outputs to derive ecological information and discuss the challenges that lie ahead for the implementation of such solutions in entomology. We identify four focal areas, which will facilitate this transformation: 1) Validation of image-based taxonomic identification, 2) generation of sufficient training data, 3) development of public, curated reference databases, and 4) solutions to integrate deep learning and molecular tools. Significance statement Insect populations are challenging to study, but computer vision and deep learning provide opportunities for continuous and non-invasive monitoring of biodiversity around the clock and over entire seasons. These tools can also facilitate the processing of samples in a laboratory setting. Automated imaging in particular can provide an effective way of identifying and counting specimens to measure abundance. We present examples of sensors and devices of relevance to entomology and show how deep learning tools can convert the big data streams into ecological information. We discuss the challenges that lie ahead and identify four focal areas to make deep learning and computer vision game changers for entomology.
1
Deep learning and computer vision will transform entomology 1
2
Toke T. Høye1,*, Johanna Ärje1,2, Kim Bjerge3, Oskar L. P. Hansen1,4,5,6, Alexandros Iosifidis7, 3
Florian Leese8, Hjalte M. R. Mann1, Kristian Meissner9, Claus Melvad10, Jenni Raitoharju9
4
5
1. Department of Bioscience and Arctic Research Centre, Aarhus University, Grenåvej 14, DK-6
8410 Rønde, Denmark 7
2. Unit of Computing Sciences, Tampere University, Finland 8
3. School of Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark 9
4. Natural History Museum Aarhus, Wilhelm Meyers Allé 10, DK-8000 Aarhus C 10
5. Department of Biology – Center for Biodiversity Dynamics in a Changing World 11
(BIOCHANGE), Aarhus University, Ny Munkegade 116, DK-8000 Aarhus C 12
6. Department of Biology - Ecoinformatics and Biodiversity, Aarhus University, Ny Munkegade 13
116, DK-8000 Aarhus C 14
7. Department of Engineering, Aarhus University, Denmark 15
8. Aquatic Ecosystem Research, University of Duisburg-Essen, 45141 Essen, Germany 16
9. Programme for Environmental Information, Finnish Environment Institute, Jyväskylä, Finland 17
10. School of Engineering, Aarhus University, Inge Lehmannsgade 10, 8000 Aarhus C, Denmark 18
19
*corresponding author, tth@bios.au.dk, phone: +4587158892 20
21
ORCID: TTH 0000-0001-5387-3284, JÄ 0000-0003-0710-9044, KB 0000-0001-6742-9504, OLPH 22
0000-0002-1598-5733, AI 0000-0003-4807-1345, FL 0000-0002-5465-913X, HMRM 0000-0002-23
4768-4767, KM 0000-0001-6316-8554, CM 0000-0002-5720-6523, JR 0000-0003-4631-9298 24
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
2
Classification: Biological sciences (major), Physical sciences (minor) 25
26
Key words: Automated monitoring, Ecology, Insects, Image-based identification, Machine learning 27
28
Significance statement: Insect populations are challenging to study, but computer vision and deep 29
learning provide opportunities for continuous and non-invasive monitoring of biodiversity around 30
the clock and over entire seasons. These tools can also facilitate the processing of samples in a 31
laboratory setting. Automated imaging in particular can provide an effective way of identifying and 32
counting specimens to measure abundance. We present examples of sensors and devices of 33
relevance to entomology and show how deep learning tools can convert the big data streams into 34
ecological information. We discuss the challenges that lie ahead and identify four focal areas to 35
make deep learning and computer vision game changers for entomology. 36
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
3
ABSTRACT 37
Most animal species on Earth are insects, and recent reports suggest that their abundance is in 38
drastic decline. Although these reports come from a wide range of insect taxa and regions, the 39
evidence to assess the extent of the phenomenon is still sparse. Insect populations are challenging to 40
study and most monitoring methods are labour intensive and inefficient. Advances in computer 41
vision and deep learning provide potential new solutions to this global challenge. Cameras and 42
other sensors that can effectively, continuously, and non-invasively perform entomological 43
observations throughout diurnal and seasonal cycles. The physical appearance of specimens can 44
also be captured by automated imaging in the lab. When trained on these data, deep learning models 45
can provide estimates of insect abundance, biomass, and diversity. Further, deep learning models 46
can quantify variation in phenotypic traits, behaviour, and interactions. Here, we connect recent 47
developments in deep learning and computer vision to the urgent demand for more cost-efficient 48
monitoring of insects and other invertebrates. We present examples of sensor-based monitoring of 49
insects. We show how deep learning tools can be applied to the big data outputs to derive ecological 50
information and discuss the challenges that lie ahead for the implementation of such solutions in 51
entomology. We identify four focal areas, which will facilitate this transformation: 1) Validation of 52
image-based taxonomic identification, 2) generation of sufficient training data, 3) development of 53
public, curated reference databases, and 4) solutions to integrate deep learning and molecular tools.
54
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
4
INTRODUCTION 55
We are experiencing a mass extinction of species (1), but data on changes in species diversity and 56
abundance have substantial taxonomic, spatial, and temporal biases and gaps (2, 3). The lack of data 57
holds especially true for insects despite the fact that they represent the vast majority of animal 58
species. A major reason for these shortfalls for insects and other invertebrates is that available 59
methods to study and monitor species and their population trends are antiquated and inefficient (4). 60
Nevertheless, some recent studies have demonstrated alarming rates of insect diversity and 61
abundance loss (5-7). To further explore the extent and causes of these changes, we need efficient, 62
rigorous, and reliable methods to study and monitor insects (4, 8). 63
Data to derive insect population trends are already generated as part of ongoing 64
biomonitoring programs. However, legislative terrestrial biomonitoring, e.g. in the context of the 65
EU Habitat Directive, focuses on a very small subset of individual insect species such as rare 66
butterflies and beetles because the majority of insect taxa are too difficult or too costly to monitor 67
(9). In current legislative aquatic monitoring, benthic invertebrates are commonly used in 68
assessments of ecological status (e.g. the US Clean Water Act, the EU Water Framework Directive, 69
and the EU Marine Strategy Framework Directive). Still, the spatiotemporal and taxonomic extent 70
and resolution in ongoing biomonitoring programs is coarse and does not provide information on 71
the status of the vast majority of insect populations. 72
Molecular techniques such as DNA barcoding and metabarcoding will likely become 73
valuable tools for future insect monitoring based on field collected samples (10, 11), but at the 74
moment high-throughput methods cannot provide reliable abundance estimates (12, 13) leaving a 75
critical need for other methodological approaches. The state-of-the-art in deep learning and 76
computer vision methods and image processing has matured to the point where it can aid or even 77
replace manual observation in situ (14) as well as in routine laboratory sample processing tasks 78
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
5
(15). Image-based observational methods for monitoring of vertebrates using camera traps have 79
undergone rapid development in the past decade (14, 16-18). Similar approaches using cameras and 80
other sensors for investigating diversity and abundance of insects are underway (19, 20). However, 81
despite huge attention in other domains, deep learning is only very slowly beginning to be applied 82
in invertebrate monitoring and biodiversity research (21-25). 83
Deep learning models learn features of a dataset by iteratively training on example 84
data without the need for manual feature extraction (26). In this way, deep learning is qualitatively 85
different from traditional statistical approaches to prediction (27). Deep learning models specifically 86
designed for dealing with images, so called convolutional neural networks (CNNs) can extract the 87
features of various aspects of a set of images or the objects within them, and learn to differentiate 88
among them. There is great potential in automatic detection and classification of insects in video or 89
time-lapse images with trained CNNs (20). As the methods become more refined, they will bring 90
exciting new opportunities for understanding insect ecology and for monitoring (19, 28-31). 91
Here, we argue that deep learning and computer vision can be used to develop novel, 92
high throughput systems for detection, enumeration, classification, and discovery of species as well 93
as for deriving functional traits such as biomass for biomonitoring purposes. These approaches can 94
help solve long standing challenges in ecology and biodiversity research and the pressing issues in 95
insect population monitoring (32, 33). This article has three goals. First, we present sensor-based 96
solutions for observation of invertebrates in situ and for specimen-based research in the laboratory, 97
which due to the volume of data generated, use or could benefit from deep learning models to 98
process data. Second, we show how deep learning models can be applied to obtained data streams to 99
derive ecologically relevant information. Last, we outline and discuss four main challenges that lie 100
ahead in the implementation of such solutions for invertebrate monitoring, ecology, and biodiversity 101
research. 102
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
6
103
SENSOR-BASED INSECT MONITORING 104
Sensors are widely used in ecology for gathering peripheral data such as temperature, precipitation, 105
light intensity etc., but have not yet been used much for gathering data on the insects. However, 106
solutions for sensor-based monitoring of insects and other invertebrates in their natural environment 107
are emerging (34). The innovation and development is primarily driven by agricultural research to 108
predict occurrence and abundance of beneficial and pest insect species of economic importance (35-109
37), to provide more efficient screening of natural products for invasive insect species (38), or to 110
monitor disease vectors such as mosquitos (39, 40). The most commonly used sensors are cameras, 111
radar, and microphones. Such sensor-based monitoring is likely to generate big data, which require 112
efficient solutions for extracting relevant biological information. Deep learning could be a critical 113
tool in this respect. Below, we give examples of image-based approaches to insect monitoring, 114
which we argue has the greatest potential for integration with deep learning. We also describe 115
approaches using other types of sensors, where the integration with deep learning is less well 116
developed, but still could be relevant for detecting and classifying entomological information. We 117
further describe the ongoing efforts in the digitization of natural history collections, which could 118
generate valuable reference data for training and validating deep learning models. 119
120
Image-based solutions for in situ monitoring 121
Some case studies have already used cameras and deep learning methods for detecting single 122
species, such as the pest of the fruits of olive trees Bactrocera oleae (41) or for more generic pest 123
detection (42). The pest detection is based on images of insects that have been trapped with either a 124
McPhail-type trap or a trap with pheromone lure and adhesive liner. The images are collected by a 125
microcomputer and transmitted to a remote server where they are analysed. Other solutions have 126
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
7
embedded a digital camera and a microprocessor that can count trapped individuals in real-time 127
using object-detection based on an optimized deep learning model (37). In both these cases, deep 128
learning networks are trained to recognize and count the number of single pest species. However, 129
there are very few examples of invertebrate biodiversity-related field studies applying deep learning 130
models (23). Early attempts used feature vectors extracted from single perspective images and 131
yielded modest accuracy for 35 classes of moths (43) or used mostly coarse taxonomic resolution 132
(44). We have recently demonstrated that our custom-built time-lapse cameras can record image 133
data from which a deep learning model could accurately estimate local spatial, diurnal, and seasonal 134
dynamics of honey bees and other flower visiting insects (45; Figure 1). Time-lapse cameras are 135
less likely to create observer bias than direct observation and data collection can extend across full 136
diurnal and even seasonal time scales. Cameras can be baited just as traditional light and pheromone 137
traps or placed over ephemeral natural resources such as flowers, fruits, dung, fungi or carrion. 138
Bjerge, et al. (46) propose to use an automated light trap to monitor the abundance of moths and 139
other insects attracted to light. The solution is powered by a solar panel, which allows the system to 140
be installed in remote locations (Figure 2). Ultimately, true ‘Internet of Things’ enabled hardware 141
will make it possible to implement classification algorithms directly on the camera units to provide 142
fully autonomous systems in the field to monitor insects and report detection and classification data 143
back to the user or to online portals in real time (34). 144
145
Radar, acoustic, and other solutions for in situ monitoring 146
The use of radar technology in entomology has allowed for the study of insects at scales not 147
possible with traditional methods, specifically related to both migratory and non-migratory insects 148
flying at high altitudes (47). Utilizing data from established weather radar networks can provide 149
information at the level of continents (48), while specialized radar technology such as vertical-150
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
8
looking radars (VLRs) can provide finer grained data albeit at a local scale (49). The VLRs can give 151
estimates of biomass and body shape of the detected object, and direction of flight, speed and body 152
orientation can be extracted from the return radar signal (50). However, VLR data provide little 153
information on community structure and conclusive species identification requires aerial trapping 154
(51, 52). Harmonic scanning radars can detect insects flying at low altitudes at a range of several 155
hundred meters, but insects need to be tagged with a radar transponder and must be within line-of-156
sight (53, 54). Collectively, the use of radar technology in entomology can provide valuable 157
information in insect monitoring, for example on the magnitude of biomass flux stemming from 158
insect migrations (55), but requires validation with conventional monitoring methods (e.g. 56). 159
Bioacoustics is a well-established scientific discipline and acoustic signals have been 160
extensively and widely used in the field of ecology, for example for detecting presence and studying 161
behavior of marine mammals (57) and for bird species identification (58). Jeliazkov, et al. (59) used 162
audio recordings to study population trends of Orthoptera at large spatial and temporal scales, 163
demonstrating that bioacoustic techniques have merit in entomological monitoring. Machine 164
learning methods have proven a particularly valuable tool for deciphering noisy audio recordings 165
and detecting the signals of animals. Kiskin, et al. (60) demonstrated the use of a CNN to detect the 166
presence of mosquitoes by identifying the acoustic signal of their wingbeats. Other studies have 167
shown that even species classification can be done using machine learning on audio data, for 168
example for birds (58), bats (61), grasshoppers (62), and bees (63). Although, it has been argued 169
that the use of pseudo-acoustic optical sensors rather than actual acoustic sensors is a more 170
promising technology because of the much improved signal-to-noise ratio in these systems, which 171
may be a particularly important point for bioacoustics in entomology (64). 172
Other systems rely on sensor technology to automate the recording of insect activity 173
or even body mass, but without actual consideration of the subsequent processing of the data with 174
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
9
deep learning methods (65, 66). In (65) they use a sensor-ring of photodiodes and infrared LEDs to 175
detect large and small sized arthropods, including pollinators and pests and achieve a 95% detection 176
accuracy for live microarthropods of three different species in the size range of 0.5 – 1.1 mm. The 177
Edapholog (66) is a low-power monitoring system for real-time detection of soil microarthropods 178
where a pitfall trap is presented. Probe and sensing is based on detection of change in infrared light 179
intensity similar to (65) and it counts the organisms falling into the trap and estimates their body 180
size. The probe is connected via radio signals to a logging devices that transmits the data to a 181
central server for real-time monitoring. Similarly, others have augmented a traditional low-cost 182
trapping methods by implementing optoelectronic sensors and wireless communication to allow for 183
real-time monitoring and reporting (35). Since, such sensors do not produce images that are 184
intuitive to validate, it could be challenging to generate sufficient, validated training data for 185
implementing deep learning models, although such models could still prove useful. 186
187
Digitizing specimens and natural history collections 188
There are strong efforts to digitize natural history collections for multiple reasons including the 189
benefits of deep learning applications (67). The need for and benefits of digitizing natural science 190
collections have motivated the foundation of the Distributed System of Scientific Collections 191
Research Infrastructure (DISSCo RI, www.dissco.eu). DISSCo RI strives for the digital unification 192
of all European natural science assets under common curation and access policies and practices. 193
Most existing databases include single view digitisations of pinned specimens (68), while datasets 194
of insect specimens recorded using multiple sensors, 3D models, and databases on living insect 195
specimens are only just emerging (69, 70). The latter could be particularly relevant for deep 196
learning models. There is also a valuable archive of entomological data in herbarium specimens in 197
the form of signs of herbivory (71). The standard digitization of herbarium collections has proven 198
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
10
suitable for extracting herbivory data using machine learning techniques (72). Techniques to 199
automate digitization techniques will accelerate the development of such valuable databases and 200
enables tools for identification of non-pinned specimens and live insects in situ (67). The 201
BIODISCOVER machine (73) is a proposal towards the automatization of creating databases of 202
liquid preserved specimens such as most field collected insects. The process consists of four 203
automatized steps: 1) bin picking of individual insects directly from bulk samples, 2) recording the 204
specimen from multiple angles using high speed imaging, 3) saving the captured data in an 205
optimized way for deep learning algorithm training and further study, and 4) sorting specimens 206
according to size, taxonomic identity or rarity for potential further molecular processing (Figure 3). 207
Digitization efforts should carefully consider how image data of specimens can be leveraged in 208
efforts to develop deep learning models for in situ monitoring. 209
210
POTENTIAL DEEP LEARNING APPLICATIONS IN ENTOMOLOGY 211
The big data collected by sensor-based insect monitoring as described above requires efficient 212
solutions for transforming the data into biologically relevant information. Preliminary results 213
suggest that deep learning offers a valuable tool in this respect and could further inspire the 214
collection of new types of data (20, 45). Deep learning software, e.g. for ecological applications, is 215
mostly constructed using open source Python libraries and frameworks such as TensorFlow, Keras, 216
PyTorch, and Scikit-learn (24) and prototype implementations are typically publicly available e.g. 217
on www.github.com. This, in turn, makes the latest advances in other fields related to object 218
detection and fine-grained classification available also for entomological research. As such, the 219
existing deep learning toolbox is already available, but will need adaptation to entomology from the 220
domains for which the tools were developed. In the following, we provide a brief description of the 221
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
11
transformative potential of deep learning related to entomological data stored in images structured 222
around four main applications. 223
224
Detecting and tracking individuals in situ 225
Image-based monitoring of insect abundance and diversity could rapidly become globally 226
widespread as countries make efforts to better understand the severity of the global insect decline 227
and mitigation measures. Similarly, tracking of individual insects in situ even for short periods of 228
time holds exciting research potential. For example, by estimating movement speed of individual 229
insects in their natural environments and relating it to observed microclimatic variation, more 230
realistic thermal performance curves can be established and contrasted to traditional lab-derived 231
thermal performance. However, tracking insect in their natural environment is currently a highly 232
challenging task, due to e.g. the cluttered scenes and varying lighting conditions. In computer 233
vision, such tasks are termed ‘detection-based online multiple object tracking’, and work under a set 234
of assumptions (74). These assumptions include a precise initial detection (initialization) of the 235
objects to be tracked in a scene, a good ability to visually discriminate between the multiple tracked 236
objects, and smooth motion, velocity, and acceleration patterns of the tracked objects (75). The 237
small visual differences among individual insects and frequent hiding behaviour violate the above 238
assumptions. Moreover, current state-of-the-art deep learning models typically use millions of 239
learned parameters and can only run in near real-time with low-resolution video, which constrains 240
the visual discrimination of the targeted objects in the scene. Possible solutions to these challenges 241
include the use of non-linear motion models (76) and the development of compact (77) or 242
compressed (78) deep learning models. 243
If we manage to solve the task of individual tracking of insects it could open the doors 244
for a new individual-based ecology with profound impacts in such research fields as population, 245
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
12
behavioural, and thermal ecology as well as conservation biology. Moreover, considering the recent 246
development in low-cost powerful graphical processing units and dedicated artificial intelligence 247
processer suitable for autonomous and embedded systems (e.g. NVIDIA Jetson Nano, Google Coral 248
Edge TPU, and the Intel AI USB stick), it may soon become feasible to detect, track, and decode 249
behaviour of insects in real-time and report information back to the user. 250
251
Detecting species interactions 252
Species interactions are critical for the functioning of ecosystems, yet as they are ephemeral and 253
fast, the consequences of a disruption for ecological function is hard to quantify. High temporal 254
resolution image-based monitoring of consumers and resources can allow for a unique 255
quantification of species interactions (79). The use of cameras allows for continuous observations of 256
species and their interactions across entire growing seasons such as insects visiting flowers, 257
defoliation by herbivores, and predation events. There is an urgent need to develop methods to 258
observe and quantify species interactions efficiently and at ecologically relevant spatial and 259
temporal scales (80, 81). To detect such interactions, image recording should be collected at the 260
scales where individuals interact, i.e., by observing interacting individuals at intervals of seconds to 261
minutes, yet they should ideally extend over seasonal and/or multi-annual periods, which at the 262
moment is difficult to fulfil. Our preliminary results have demonstrated an exciting potential to 263
record plant-insect interactions using time-lapse cameras and deep learning (28 and Figure 1). 264
265
Taxonomic identification 266
Taxonomic identification can be approached as a deep learning classification problem. Deep 267
learning-based classification accuracies for image-based insect identification of specimens are 268
approaching the accuracy of human experts (82-84). Applications of gradient-weighted class 269
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
13
activation mapping can even visualize morphologically important features for CNN classification 270
(84). Classification accuracy is generally much lower when the insects are recorded live in their 271
natural environments (85, 86), but when class confidence is low at the species-level, it may still be 272
possible to confidently classify insects to a coarser taxonomic resolution (87). In recent years, 273
impressive results have been obtained by CNNs (88). They can classify huge image datasets, such 274
as the 1000-class ImageNet dataset at high accuracy and speed (89). With images of >10,000 275
species of plants, classification performance of CNNs is currently much lower than for botanical 276
experts (25), but promising results in distributed training of deep neural networks (90) and federated 277
learning (91, 92) suggest that improvements can be expected. 278
In most ecological communities, it is common for species to be rare. This often results 279
in highly imbalanced datasets, and the number of specimens representing the rarest species could be 280
insufficient for training neural networks (86, 87). As such, advancing the development of 281
algorithms and approaches for improved identification of rare classes is a key challenge for deep 282
learning-based taxonomic identification. Solutions to this challenge could be inspired by class 283
resampling and cost-sensitive training (93) or by multiset feature learning (94, 95). Class 284
resampling aims at balancing the classes by under-sampling the larger classes and/or over-sampling 285
the smaller classes, while cost-sensitive training assigns a higher loss for errors on the smaller 286
classes. In multiset feature learning, the larger classes are split into smaller subsets, which are 287
combined with the smaller classes to form separate training sets. These methods are all used to learn 288
features that can more robustly distinguish the smaller classes. Species identification performance 289
can vary widely, ranging from species which are correctly identified in most cases to species that 290
are generally difficult to identify (96). Typically, the amount of training data is a key element for 291
successful identification, although recent analyses of images of the approximately 65,000 292
specimens in the carabid beetle collection at the Natural History Museum London suggest that 293
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
14
imbalances in identification performance are not necessarily related to how well-represented a 294
species is in the training data (87). Further work is needed on large datasets to fully understand 295
these challenges. 296
A related challenge is formed by those species that are completely absent from the 297
reference database on which the deep learning models are trained. Detecting such species requires 298
techniques developed for multiple-class novelty/anomaly detection or open set/world recognition 299
(97, 98). A recent survey introduces various open set recognition methods with the two main 300
approaches being discriminative and generative (99). Discriminative models are based on traditional 301
machine learning techniques or deep neural networks with some additional mechanism to detect 302
outliers, while the main idea of generative models is to generate either positive or negative samples 303
for training. However, the current methods are typically applied on relatively small datasets and do 304
not scale well with the number of classes (99). Insect datasets typically have a high number of 305
classes and a very fine-grained distribution, where the phenotypic differences between species may 306
be minute while intra-species differences may be large. Such datasets are especially challenging for 307
open set recognition methods. While it will be extremely difficult to overcome this challenge for all 308
species using only phenotype based identification, combining image-based deep learning and DNA 309
barcoding techniques may help to solve the problem. 310
311
Estimating biomass from bulk samples 312
Deep learning models can potentially predict biomass of bulk insect samples in a lab setting. 313
Legislative aquatic monitoring efforts in the United States and Europe require information about the 314
abundance or biomass of individual taxa from bulk invertebrate samples. Using the 315
BIODISCOVER machine, Ärje, et al. (73) were able to estimate biomass variation of individual 316
specimens of Diptera species without destroying specimens. This was achieved from geometric 317
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
15
features of the specimen such as the mean area from multiple images recorded by the 318
BIODISCOVER machine and statistically relating such values to subsequently obtained dry mass 319
from the same specimens. To validate such approaches, it is necessary to have accurate information 320
about the dry mass of a large selection of taxa. In the future, deep learning models may provide 321
even more accurate estimates of biomass. Obtaining specimen-specific biomass information non-322
destructively from bulk samples is a high priority in routine insect monitoring, since it will enable 323
more extensive insights into insect population and community dynamics and provide better 324
information for environmental management. 325
326
FUTURE DIRECTIONS 327
To unlock the full potential of deep learning methods for insect ecology and monitoring, four main 328
challenges need to be addressed with highest priority. We describe each of these below. 329
330
Validating image-based taxonomic identification 331
Validation of the detection and identification of species recorded with cameras in the field pose a 332
critical challenge for implementing deep learning tools in entomology. Often it will not be possible 333
to conclusively identify insects from images and validation of image-based species classification 334
should be done using alternative, complimentary techniques. We suggest four approaches to this 335
validation: 1) Obtaining local knowledge about the identity and relative abundance of candidate 336
species, 2) catching and manually identifying insects in the vicinity of a camera trap, 3) identifying 337
insects by environmental DNA analysis of insect DNA traces left e.g. on flowers (100), or 4) by 338
directly observing and catching insects visible to the camera. The first three approaches are indirect 339
and each come with their separate problems such as the difference in trapping efficiency of a time-340
lapse camera trap and e.g. a pitfall trap placed to capture the same insects. However, the subsequent 341
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
16
identification of specimens from pitfall trapping can serve as validation of image-based results and 342
can further help in production of training data for optimizing deep learning models (e.g. by placing 343
specimens back under the camera). DNA techniques be able to validate image-based identification 344
since DNA can give accurate information on species identity (11, 100, 101). 345
For specific purposes, validation of insects can be done through interfaces with online 346
portals and by involving citizen science. With integrated deep learning algorithms, online portals 347
provide instant candidate species when users upload pictures of observed insect species. The most 348
prominent examples of such portals of relevance to insects are the smartphone apps connected to 349
sites such as www.iNaturalist.org and www.observation.org. Another way of using deep learning 350
models to generate data on insect occurrence in their natural environment is by involving the public 351
in the annotation and quality control of images of insects uploaded to citizen science web portals 352
such as www.zooniverse.org (102). 353
354
Generating training data 355
One of the main challenges with deep learning is the need for large amounts of training data, which 356
is slow, difficult, and expensive to collect and label. Deep learning models typically require 357
hundreds of training instances of a given species to learn to detect species occurrences against the 358
background (86). In a laboratory setting, the collection of data can be eased by automated imaging 359
devices, such BIODISCOVER described above, which allow imaging large amounts of insects 360
under fixed settings. The imaging of species in situ should be done in a wide range of conditions 361
(e.g., background, time of day, and season) to avoid that the model learns a false connection 362
between the species and the background, with resulting lower ability of the model to detect the 363
species against another background. Approaches to alleviate the challenge of moving from one 364
environment to another include multi-task learning (103), style transfer (104), image generation 365
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
17
(105), or domain adaptation (106). Multi-task learning aims to concurrently learn multiple different 366
tasks (e.g., segmentation, classification, detection) by sharing information leading to better data 367
representations and ultimately better results. Style transfer methods try to impose properties 368
appearing in one set of data to new data. Image generation can be used to created synthetic training 369
images with, for example, varying backgrounds. Domain adaptation aims at tuning the parameters 370
of a deep learning model trained on data following one distribution (source domain) to adapt so that 371
they can provide high performance on new data following another distribution (target domain). 372
The motion detection sensors in wildlife cameras are typically not triggered by insects 373
and species typically only occur in a small fraction of time-lapse images. A key challenge is 374
therefore to detect insects and filter out blank images from images with species of interest (102, 375
107). When it is difficult to obtain sufficient samples of rare insects, Zhong, et al. (108) proposed to 376
use deep learning only to detect all species of flying insects as a single class. Subsequently, the fine-377
grained species classification can be based on manual feature extraction and support vector 378
machines, which is a machine learning technique that requires less training data and solves the 379
problem of insufficient training data. 380
The issue of scarce training data can also be alleviated with new data synthesis. Data 381
synthesis could be used specifically to augment the training set by creating artificial images of 382
segmented individual insects that are placed randomly in scenes with different backgrounds (109). 383
A promising alternative is to use deep learning models for generating artificial images belonging to 384
the class of interest. The most widely approach to date is based on generative adversarial networks 385
(110) and has shown astonishing performance results in computer vision problems in general, as 386
well as in ecological problems (111). 387
388
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
18
Building reference databases 389
Publicly available reference databases are critical for adapting deep learning tools to entomological 390
research. Initiatives like DISSCO RI and IDigBio (https://www.idigbio.org/) are important for 391
enabling the use of museum collections. However, to enable deep learning-based identification, 392
individual open datasets from entomological research and monitoring are also needed (e.g. 85, 96, 393
112). The collation of such research datasets will require dedicated projects as well as large 394
coordinated efforts that drive the open-access and reuse of research data such as the European Open 395
Science Cloud and the Research Data Alliance. Building a large insect reference dataset is laborious 396
and, therefore, it is important to maximize the benefits. To do so, non-collection datasets should 397
also use common approaches and hardware and abide to best practices in metadata and data 398
management (113-115). Further, dataset collectors and deep learning model developers should work 399
closely together and make data accessible. All the possible metadata, such as camera settings and 400
hardware, sampling location, date, and time of day, should be saved for future analysis. Similarly, 401
characteristics of the specimen, such as species identity, biomass, sex, age class, and possibly 402
derived information like dry weight should be recorded if such information exist. In particular, 403
correct labelling of species in images is critical. Using multiple experts and molecular information 404
about species identity to verify the labeling or performing subsequent validity checks through DNA 405
barcoding will improve the data quality and the performance of the deep learning models. This can 406
be done, for instance, by manually verifying the quality and labeling of images that are repeatedly 407
misclassified by the machine learning methods. Standardized imaging devices such as the 408
BIODISCOVER machine could also play a key role in building reference databases from 409
monitoring programs (73). Training classifiers with species that are currently not encountered in a 410
certain region but can possibly spread there later will naturally help to detect such changes when 411
they occur. Integration of such reference databases with field monitoring methods forms an 412
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
19
important future challenge. As a starting point, we provide a list of open access entomological 413
image databases (SI Appendix). 414
415
Integration of deep learning and DNA-based tools 416
For processing samples in the lab, molecular methods have gained increasing attention over the past 417
decade, but there are still critical challenges which remain unresolved: specimens are typically 418
destroyed, abundance cannot be accurately estimated, and key specimens cannot be identified in 419
bulk samples. Nevertheless, DNA barcoding is now an established, powerful method to reliably 420
assess biodiversity also in entomology (11). For insects, this works by sequencing a short fragment 421
of the mitochondrial cytochrome-c-oxidase I subunit gene (COI) and comparing the DNA sequence 422
to an available reference database (116). Even undescribed and morphologically cryptic species can 423
be distinguished with this approach (117), which is unlikely to be possible with deep learning. This 424
is of great importance as morphologically similar species can have distinct ecological preferences 425
(118) and thus distinguishing them unambiguously is important for monitoring, ecosystem 426
assessment and conservation biology. However, mass-sequencing based molecular methods cannot 427
provide precise abundance or biomass estimates and assign sequences to individual specimens (12). 428
Therefore, an unparalleled strength lies in combining both image-recognition and DNA 429
metabarcoding approaches: i) When building reference collections for training models for insect 430
classification, species identity can be molecularly verified and potential cryptic species can be 431
separated by the DNA barcode. ii) After image-based species identification of a whole bulk sample, 432
all specimens can be processed via DNA metabarcoding to assess taxonomic resolution at the 433
highest level. A further obvious advantage of linking computer vision and deep learning to DNA is 434
the fact that even in the absence of formal species descriptions, DNA tools can generate distinctly 435
referenced taxonomic assignments via so-called “Barcode-Index-Numbers” (BINs) (119). These 436
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
20
BINs provide referenced biodiversity units using the taxonomic backbone of the Barcode of Life 437
Data Systems (https://boldsystems.org) and represent a much greater diversity of even yet 438
undescribed species. For instance, it is typically clear that a new species belongs to the genus 439
Astraptes in the butterfly family Hesperiidae, but also that it represents a genetically distinct, new 440
entity (120). These units can also be directly used as part of ecosystem status assessment despite not 441
yet having Linnean names. BINs can be used for model training. Recent studies convincingly show 442
that with this more holistic approach, which includes cryptic and undescribed species, the 443
predictions of environmental status as required by several legislative monitoring programs actually 444
improve substantially (e.g. 121). For cases of cryptic species with great relevance e.g. for 445
conservation biology it is also possible to individually process specimens of a cryptic species 446
complex after automated image-based assignment to further validate identity and frequency of 447
these. Combining deep learning with DNA-based approaches could deliver detailed trait 448
information, biomass, and abundance with the best possible taxonomic resolution. 449
450
CONCLUSION 451
Deep learning is currently influencing a wide range of scientific disciplines (88), but has only just 452
begun to benefit entomology. While there is a vast potential for deep learning to transform insect 453
ecology and monitoring, applying deep learning to entomological research questions brings new 454
technical challenges. The complexity of deep learning models and the challenges of entomological 455
data require substantial investment in interdisciplinary efforts to unleash the potential of deep 456
learning in entomology. However, these challenges also represent ample potential for cross-457
fertilization among the biological and computer sciences. The benefit to entomology is not only 458
more data, but also novel kinds of data. As the deep learning tools become widely available and 459
intuitive to use, they can transform field entomology by providing information that is currently 460
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
21
intractable to record by human observations (18, 33, 122). Consequently, there is a bright future for 461
entomology, with new research niches opening up and access to unforeseen scales and resolution of 462
data, vital for biodiversity assessments. 463
The shift towards automated methods may raise concerns about the future for 464
taxonomists, much like the debate concerned with developments in molecular species identification 465
(123, 124). We emphasize that the expertise of taxonomists is at the heart of and critical to these 466
developments. Initially, automated techniques will be used in the most routine-like tasks, which in 467
turn will allow the taxonomic experts to dedicate their focus on the specimens requiring more in 468
depth studies as well as the plethora of new species that need to be described and studied. To enable 469
this, we need to consider approaches that can pinpoint samples for human expert inspection in a 470
meaningful way, e.g., based on neural network classification confidences (82) or additional rare 471
species detectors (125). As deep learning becomes more closely integrated in entomological 472
research, the vision of real-time detection, tracking, and decoding of behaviour of insects could be 473
realized for a transformation of insect ecology and monitoring. In turn, efficient tracking of insect 474
biodiversity trends will aid the design of effective measures to counteract or revert biodiversity loss. 475
476
ACKNOWLEDGEMENTS 477
David Wagner is gratefully thanked for convening the session “Insect declines in the 478
Anthropocene” at the Entomological Society of America annual meeting 2019 in St. Louis, USA, 479
which brought the group of contributors to the special feature together. TTH acknowledges funding 480
from the Villum Foundation (grant 17523) and the Independent Research Fund Denmark (grant 481
8021-00423B), Kristian Meissner acknowledges funding from the Nordic Council of Ministers 482
(project 18103, SCANDNAnet). Jenni Raitoharju acknowledges funding from Academy of Finland 483
(project 324475). 484
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
22
REFERENCES 485
1. G. Ceballos, P. R. Ehrlich, R. Dirzo, Biological annihilation via the ongoing sixth mass 486 extinction signaled by vertebrate population losses and declines. Proc. Natl. Acad. Sci. USA 487 114, E6089-E6096 (2017). 488 2. M. Dornelas et al., BioTIME: A database of biodiversity time series for the Anthropocene. 489 Global Ecol. Biogeogr. 27, 760-786 (2018). 490 3. S. A. Blowes et al., The geography of biodiversity change in marine and terrestrial 491 assemblages. Science 366, 339-345 (2019). 492 4. G. A. Montgomery et al., Is the insect apocalypse upon us? How to find out. Biol. Conserv. 493 241, 108327 (2020). 494 5. C. A. Hallmann et al., More than 75 percent decline over 27 years in total flying insect 495 biomass in protected areas. Plos One 12, e0185809 (2017). 496 6. S. Seibold et al., Arthropod decline in grasslands and forests is associated with landscape-497 level drivers. Nature 574, 671-674 (2019). 498 7. R. van Klink et al., Meta-analysis reveals declines in terrestrial but increases in freshwater 499 insect abundances. Science 368, 417-420 (2020). 500 8. D. L. Wagner, Insect declines in the Anthropocene. Annu. Rev. Entomol. 65, 457-480 501 (2020). 502 9. S. Pawar, Taxonomic chauvinism and the methodologically challenged. Bioscience 53, 861-503 864 (2003). 504 10. T. W. A. Braukmann et al., Metabarcoding a diverse arthropod mock community. Mol Ecol 505 Resour 19, 711-727 (2019). 506 11. V. Elbrecht et al., Validation of COI metabarcoding primers for terrestrial arthropods. Peerj 507 7 (2019). 508 12. V. Elbrecht, F. Leese, Can DNA-based ecosystem assessments quantify species abundance? 509 testing primer bias and biomass-sequence relationships with an innovative metabarcoding 510 protocol. Plos One 10 (2015). 511 13. H. Krehenwinkel et al., Estimating and mitigating amplification bias in qualitative and 512 quantitative arthropod metabarcoding. Scientific Reports 7 (2017). 513 14. H. Yousif, J. Yuan, R. Kays, Z. He, Animal Scanner: Software for classifying humans, 514 animals, and empty frames in camera trap images. Ecol Evol 9, 1578-1589 (2019). 515 15. J. Ärje et al., Human experts vs. machines in taxa recognition. Signal Processing: Image 516 Communication 87, 115917 (2020). 517 16. M. S. Norouzzadeh et al., Automatically identifying, counting, and describing wild animals 518 in camera-trap images with deep learning. Proc. Natl. Acad. Sci. USA 115, E5716-E5725 519 (2018). 520 17. N. MacLeod, M. Benfield, P. Culverhouse, Time to automate identification. Nature 467, 521 154-155 (2010). 522 18. R. Steenweg et al., Scaling-up camera traps: monitoring the planet's biodiversity with 523 networks of remote sensors. Front. Ecol. Environ. 15, 26-34 (2017). 524 19. R. Steen, Diel activity, frequency and visit duration of pollinators in focal plants: in situ 525 automatic camera monitoring and data processing. Methods Ecol Evol 8, 203-213 (2017). 526 20. L. Pegoraro, O. Hidalgo, I. J. Leitch, J. Pellicer, S. E. Barlow, Automated video monitoring 527 of insect pollinators in the field. Emerging Topics in Life Sciences 10.1042/etls20190074 528 (2020). 529 21. J. Wäldchen, P. Mäder, Machine learning for image based species identification. Methods 530 Ecol Evol 9, 2216-2225 (2018). 531
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
23
22. N. Piechaud, C. Hunt, P. F. Culverhouse, N. L. Foster, K. L. Howell, Automated 532 identification of benthic epifauna with computer vision. Mar. Ecol. Prog. Ser. 615, 15-30 533 (2019). 534 23. B. G. Weinstein, A computer vision for animal ecology. J. Anim. Ecol. 87, 533-545 (2018). 535 24. S. Christin, É. Hervet, N. Lecomte, Applications for deep learning in ecology. Methods Ecol 536 Evol 10, 1632-1644 (2019). 537 25. A. Joly et al. (2019) Overview of LifeCLEF 2019: identification of amazonian plants, South 538 & North American birds, and niche prediction. (Springer International Publishing, Cham), 539 pp 387-401. 540 26. D. Xia, P. Chen, B. Wang, J. Zhang, C. Xie, Insect detection and classification based on an 541 improved convolutional neural network. Sensors 18, 4169 (2018). 542 27. D. Bzdok, N. Altman, M. Krzywinski, Statistics versus machine learning. Nat. Methods 15, 543 233-234 (2018). 544 28. D. T. Tran, T. T. Høye, M. Gabbouj, A. Iosifidis, IEEE, "Automatic flower and visitor 545 detection system" in 2018 26th European Signal Processing Conference (Eusipco). (2018), 546 10.23919/EUSIPCO.2018.8553494, pp. 405-409. 547 29. R. A. Collett, D. O. Fisher, Time-lapse camera trapping as an alternative to pitfall trapping 548 for estimating activity of leaf litter arthropods. Ecol Evol 7, 7527-7533 (2017). 549 30. I. Ruczy ski, Z. Hałat, M. Zegarek, T. Borowik, D. K. N. Dechmann, Camera transects as a 550 method to monitor high temporal and spatial ephemerality of flying nocturnal insects. 551 Methods Ecol Evol 11, 294-302 (2020). 552 31. S. E. Barlow, M. A. O’Neill, Technological advances in field studies of pollinator ecology 553 and the future of e-ecology. Current Opinion in Insect Science 38, 15-25 (2020). 554 32. P. Cardoso, T. L. Erwin, P. A. V. Borges, T. R. New, The seven impediments in invertebrate 555 conservation and how to overcome them. Biol. Conserv. 144, 2647-2655 (2011). 556 33. J. Hortal et al., Seven shortfalls that beset large-scale knowledge of biodiversity. Annual 557 Review of Ecology, Evolution, and Systematics 46, 523-549 (2015). 558 34. I. Potamitis, P. Eliopoulos, I. Rigakis, Automated remote insect surveillance at a global 559 scale and the internet of things. Robotics 6 (2017). 560 35. I. Potamitis, I. Rigakis, N. Vidakis, M. Petousis, M. Weber, Affordable bimodal optical 561 sensors to spread the use of automated insect monitoring. Journal of Sensors, Article ID: 562 3949415 (2018). 563 36. D. J. A. Rustia, J.-J. Chao, J.-Y. Chung, T.-T. Lin (2019) An online unsupervised deep 564 learning approach for an automated pest insect monitoring system. in 2019 ASABE Annual 565 International Meeting (ASABE, St. Joseph, MI), p 1. 566 37. Y. Sun et al., Automatic in-trap pest detection using deep learning for pheromone-based 567 Dendroctonus valens monitoring. Biosys. Eng. 176, 140-150 (2018). 568 38. T. M. Poland, D. Rassati, Improved biosecurity surveillance of non-native forest insects: a 569 review of current methods. J. Pest Sci. 92, 37-49 (2019). 570 39. D. A. A. Santos, L. E. Teixeira, A. M. Alberti, V. Furtado, J. J. P. C. Rodrigues (2018) 571 Sensitivity and noise evaluation of an optoelectronic sensor for mosquitoes monitoring. in 572 2018 3rd International Conference on Smart and Sustainable Technologies (SpliTech), pp 573 1-5. 574 40. J. Park, D. I. Kim, B. Choi, W. Kang, H. W. Kwon, Classification and morphological 575 analysis of vector mosquitoes using deep convolutional neural networks. Scientific Reports 576 10, 1012 (2020). 577 41. R. Kalamatianos, I. Karydis, D. Doukakis, M. Avlonitis, DIRT: The dacus image 578 recognition toolkit. Journal of Imaging 4, 129 (2018). 579
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
24
42. W. Ding, G. Taylor, Automatic moth detection from trap images for pest management. 580 Comput. Electron. Agric. 123, 17-28 (2016). 581 43. M. Mayo, A. T. Watson, Automatic species identification of live moths. Knowledge-Based 582 Systems 20, 195-202 (2007). 583 44. J. Wang, C. Lin, L. Ji, A. Liang, A new automatic identification system of insect images at 584 the order level. Knowledge-Based Systems 33, 102-110 (2012). 585 45. T. T. Høye, H. M. R. Mann, K. Bjerge, Camera-based monitoring of insects on green roofs 586 [in Danish], Aarhus University, DCE – National Centre for Environment and Energy, pp. 587 18, Scientific report nr. 371 (2020). 588 46. K. Bjerge, M. V. Sepstrup, J. B. Nielsen, F. Helsing, T. T. Høye, A light trap and computer 589 vision system to detect and classify live moths (Lepidoptera) using tracking and deep 590 learning. bioRxiv 10.1101/2020.03.18.996447, 2020.2003.2018.996447 (2020). 591 47. J. W. Chapman, V. A. Drake, D. R. Reynolds, Recent insights from radar studies of insect 592 flight. Annu. Rev. Entomol. 56, 337-356 (2011). 593 48. O. Hueppop et al., Perspectives and challenges for the use of radar in biological 594 conservation. Ecography 42, 912-930 (2019). 595 49. K. R. Wotton et al., Mass seasonal migrations of hoverflies provide extensive pollination 596 and crop protection services. Curr. Biol. 29, 2167-+ (2019). 597 50. J. W. Chapman, A. D. Smith, I. P. Woiwod, D. R. Reynolds, J. R. Riley, Development of 598 vertical-looking radar technology for monitoring insect migration. Comput. Electron. Agric. 599 35, 95-110 (2002). 600 51. J. W. Chapman, D. R. Reynolds, A. D. Smith, Migratory and foraging movements in 601 beneficial insects: a review of radar monitoring and tracking methods. Int. J. Pest Manage. 602 50, 225-232 (2004). 603 52. J. W. Chapman et al., High-altitude migration of the diamondback moth Plutella xylostella 604 to the UK: a study using radar, aerial netting, and ground trapping. Ecol. Entomol. 27, 641-605 650 (2002). 606 53. W. D. Kissling, D. E. Pattemore, M. Hagen, Challenges and prospects in the telemetry of 607 insects. Biological Reviews 89, 511-530 (2014). 608 54. R. Maggiora, M. Saccani, D. Milanesio, M. Porporato, An innovative harmonic radar to 609 track flying insects: the case of Vespa velutina. Scientific Reports 9, 11964 (2019). 610 55. G. Hu et al., Mass seasonal bioflows of high-flying insect migrants. Science 354, 1584-1587 611 (2016). 612 56. P. M. Stepanian et al., Declines in an abundant aquatic insect, the burrowing mayfly, across 613 major North American waterways. Proc. Natl. Acad. Sci. USA 117, 2987-2992 (2020). 614 57. A. K. Stimpert, W. W. L. Au, S. E. Parks, T. Hurst, D. N. Wiley, Common humpback whale 615 (Megaptera novaeangliae) sound types for passive acoustic monitoring. J. Acoust. Soc. Am. 616 129, 476-482 (2011). 617 58. J. Salamon, J. P. Bellol, A. Farnsworth, S. Kelling, Fusing shallow and deep learning for 618 bioacoustic bird species classification. 2017 Ieee International Conference on Acoustics, 619 Speech and Signal Processing (Icassp), 141-145 (2017). 620 59. A. Jeliazkov et al., Large-scale semi-automated acoustic monitoring allows to detect 621 temporal decline of bush-crickets. Global Ecology and Conservation 6, 208-218 (2016). 622 60. I. Kiskin et al., Bioacoustic detection with wavelet-conditioned convolutional neural 623 networks. Neural Computing & Applications 32, 915-927 (2020). 624 61. O. Mac Aodha et al., Bat detective - deep learning tools for bat acoustic signal detection. 625 PLoS Comp. Biol. 14 (2018). 626
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
25
62. E. D. Chesmore, E. Ohya, Automated identification of field-recorded songs of four British 627 grasshoppers using bioacoustic signal recognition. Bull. Entomol. Res. 94, 319-330 (2004). 628 63. S. Kawakita, K. Ichikawa, Automated classification of bees and hornet using acoustic 629 analysis of their flight sounds. Apidologie 50, 71-79 (2019). 630 64. Y. P. Chen, A. Why, G. Batista, A. Mafra-Neto, E. Keogh, Flying insect classification with 631 inexpensive sensors. J. Insect Behav. 27, 657-677 (2014). 632 65. E. Balla et al., An opto-electronic sensor-ring to detect arthropods of significantly different 633 body sizes. Sensors 20, 982 (2020). 634 66. M. Dombos et al., EDAPHOLOG monitoring system: automatic, real-time detection of soil 635 microarthropods. Methods Ecol Evol 8, 313-321 (2017). 636 67. B. P. Hedrick et al., Digitization and the future of natural history collections. Bioscience 70, 637 243-251 (2020). 638 68. V. Blagoderov, I. J. Kitching, L. Livermore, T. J. Simonsen, V. S. Smith, No specimen left 639 behind: industrial scale digitization of natural history collections. Zookeys 640 10.3897/zookeys.209.3178, 133-146 (2012). 641 69. B. Ströbel, S. Schmelzle, N. Blüthgen, M. Heethoff, An automated device for the 642 digitization and 3D modelling of insects, combining extended-depth-of-field and all-side 643 multi-view imaging. ZooKeys 759 (2018). 644 70. A. E. Z. Short, T. Dikow, C. S. Moreau, Entomological collections in the age of big data. 645 Annu. Rev. Entomol. 63, 513-530 (2018). 646 71. E. K. Meineke, T. J. Davies, Museum specimens provide novel insights into changing plant–647 herbivore interactions. Philosophical Transactions of the Royal Society B: Biological 648 Sciences 374, 20170393 (2019). 649 72. E. K. Meineke, C. Tomasi, S. Yuan, K. M. Pryer, Applying machine learning to investigate 650 long-term insect–plant interactions preserved on digitized herbarium specimens. 651 Applications in Plant Sciences 8, e11369 (2020). 652 73. J. Ärje et al., Automatic image-based identication and biomass estimation of invertebrates. 653 Methods Ecol Evol 10.1111/2041-210X.13428 (2020). 654 74. Q. Wang, L. Zhang, L. Bertinetto, W. Hu, P. H. S. Torr (2019) Fast online object tracking 655 and segmentation: a unifying approach. in IEEE Conference on Computer Vision and 656 Pattern Recognition. 657 75. W. Luo, X. Zhao, T.-K. Kim, Multiple object tracking: a review. ArXiv abs/1409.7618 658 (2014). 659 76. B. Yang, R. Nevatia, Multi-target tracking by online learning of non-linear motion patterns 660 and robust appearance models. 2012 Ieee Conference on Computer Vision and Pattern 661 Recognition (Cvpr), 1918-1925 (2012). 662 77. D. T. Tran, S. Kiranyaz, M. Gabbouj, A. Iosifidis, Heterogeneous multilayer generalized 663 operational perceptron. IEEE Transactions on Neural Networks and Learning Systems 664 10.1109/TNNLS.2019.2914082, 1-15 (2019). 665 78. D. T. Tran, A. Iosifidis, M. Gabbouj, Improving efficiency in convolutional neural networks 666 with multilinear filters. Neural Networks 105, 328-339 (2018). 667 79. S. Hamel et al., Towards good practice guidance in using camera-traps in ecology: influence 668 of sampling design on validity of ecological inferences. Methods Ecol Evol 4, 105-113 669 (2013). 670 80. L. Estes et al., The spatial and temporal domains of modern ecology. Nature Ecology & 671 Evolution 2, 819-826 (2018). 672 81. A. Valiente-Banuet et al., Beyond species loss: the extinction of ecological interactions in a 673 changing world. Funct. Ecol. 29, 299-307 (2015). 674
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
26
82. J. Raitoharju, K. Meissner (2019) On confidences and their use in (semi-)automatic multi-675 image taxa identification. in 2019 IEEE Symposium Series on Computational Intelligence 676 (SSCI), pp 1338-1343. 677 83. M. Valan, K. Makonyi, A. Maki, D. Vondrá ek, F. Ronquist, Automated taxonomic 678 identification of insects with expert-level accuracy using effective feature transfer from 679 convolutional networks. Syst. Biol. 68, 876-895 (2019). 680 84. D. Miloševi et al., Application of deep learning in aquatic bioassessment: Towards 681 automated identification of non-biting midges. Sci. Total Environ. 711, 135160 (2020). 682 85. X. Wu, C. Zhan, Y. Lai, M. Cheng, J. Yang (2019) IP102: a large-scale benchmark dataset 683 for insect pest recognition. in 2019 IEEE/CVF Conference on Computer Vision and Pattern 684 Recognition (CVPR), pp 8779-8788. 685 86. G. V. Horn et al., The iNaturalist challenge 2017 dataset. ArXiv abs/1707.06642 (2017). 686 87. O. L. P. Hansen et al., Species-level image classification with convolutional neural network 687 enables insect identification from habitus images. Ecol Evol 10, 737-747 (2020). 688 88. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436 (2015). 689 89. K. He, X. Zhang, S. Ren, J. Sun (2015) Delving deep into rectifiers: surpassing human-level 690 performance on ImageNet classification. in 2015 IEEE International Conference on 691 Computer Vision (ICCV), pp 1026-1034. 692 90. J. Dean et al. (2012) Large scale distributed deep networks. in Proceedings of the 25th 693 International Conference on Neural Information Processing Systems - Volume 1 (Curran 694 Associates Inc., Lake Tahoe, Nevada), pp 1223–1231. 695 91. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y. Arcas (2016) 696 Communication-efficient learning of deep networks from decentralized data. in Proceedings 697 of the 20th International Conference on Artificial Intelligence and Statistics. 698 92. K. Bonawitz et al., Towards federated learning at scale: system design. ArXiv 699 abs/1902.01046 (2019). 700 93. C. Huang, Y. N. Li, C. C. Loy, X. O. Tang, Learning deep representation for imbalanced 701 classification. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr) 702 10.1109/Cvpr.2016.580, 5375-5384 (2016). 703 94. F. Wu, X. Y. Jing, S. G. Shan, W. M. Zuo, J. Y. Yang, Multiset feature learning for highly 704 imbalanced data classification. Thirty-First Aaai Conference on Artificial Intelligence, 705 1583-1589 (2017). 706 95. X. Jing et al., Multiset feature learning for highly imbalanced data classification. IEEE 707 Transactions on Pattern Analysis and Machine Intelligence 10.1109/TPAMI.2019.2929166, 708 1-1 (2019).
709 96. J. Raitoharju et al., Benchmark database for fine-grained image classification of benthic 710 macroinvertebrates. Image Vision Comput. 78, 73-83 (2018). 711 97. M. Turkoz, S. Kim, Y. Son, M. K. Jeong, E. A. Elsayed, Generalized support vector data 712 description for anomaly detection. Pattern Recognition 100, 107119 (2020). 713 98. P. Perera, V. M. Patel (2019) Deep transfer learning for multiple class novelty detection. in 714 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 715 11536-11544. 716 99. C. Geng, S. Huang, S. Chen, Recent advances in open set recognition: a survey. IEEE 717 Transactions on Pattern Analysis and Machine Intelligence 10.1109/TPAMI.2020.2981604, 718 1-1 (2020). 719 100. P. F. Thomsen, E. E. Sigsgaard, Environmental DNA metabarcoding of wild flowers reveals 720 diverse communities of terrestrial arthropods. Ecol Evol 9, 1665-1679 (2019). 721
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
27
101. M. F. Geiger et al., Testing the global malaise trap program – how well does the current 722 barcode reference library identify flying insects in Germany? Biodiversity Data Journal 4 723 (2016). 724 102. M. Willi et al., Identifying animal species in camera trap images using deep learning and 725 citizen science. Methods Ecol Evol 10, 80-91 (2019). 726 103. A. Kendall, Y. Gal, R. Cipolla, Multi-task learning using uncertainty to weigh losses for 727 scene geometry and semantics. 2018 Ieee/Cvf Conference on Computer Vision and Pattern 728 Recognition (Cvpr) 10.1109/Cvpr.2018.00781, 7482-7491 (2018). 729 104. L. A. Gatys, A. S. Ecker, M. Bethge, Image style transfer using convolutional neural 730 networks. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cvpr) 731 10.1109/Cvpr.2016.265, 2414-2423 (2016). 732 105. J. M. Bao, D. Chen, F. Wen, H. Q. Li, G. Hua, CVAE-GAN: fine-grained image generation 733 through asymmetric training. 2017 Ieee International Conference on Computer Vision (Iccv) 734 10.1109/Iccv.2017.299, 2764-2773 (2017). 735 106. E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative domain adaptation. 736 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017) 737 10.1109/Cvpr.2017.316, 2962-2971 (2017). 738 107. P. Glover-Kapfer, C. A. Soto-Navarro, O. R. Wearn, Camera-trapping version 3.0: current 739 constraints and future priorities for development. Remote Sensing in Ecology and 740 Conservation 5, 209-223 (2019). 741 108. Y. Zhong, J. Gao, Q. Lei, Y. Zhou, A vision-based counting and recognition system for 742 flying insects in intelligent agriculture. Sensors 18, 1489 (2018). 743 109. H. Inoue, Data augmentation by pairing samples for images classification. ArXiv 744 abs/1801.02929 (2018). 745 110. I. J. Goodfellow et al., Generative adversarial networks. ArXiv abs/1406.2661 (2014). 746 111. C.-Y. Lu, D. J. Arcega Rustia, T.-T. Lin, Generative adversarial network based image 747 augmentation for insect pest classification enhancement. IFAC-PapersOnLine 52, 1-5 748 (2019). 749 112. M. Martineau et al., A survey on image-based insect classification. Pattern Recognition 65, 750 273-284 (2017). 751 113. T. Forrester et al., An open standard for camera trap data. Biodiversity Data Journal 4 752 (2016). 753 114. L. Scotson et al., Best practices and software for the management and sharing of camera trap 754 data for small and large scales studies. Remote Sensing in Ecology and Conservation 3, 158-755 172 (2017). 756 115. A. Nieva de la Hidalga, M. van Walsun, P. Rosin, X. Sun, A. Wijers, Quality management 757 methodologies for digitisation operations, pp. 89, 10.5281/zenodo.3469521 (2019). 758 116. S. Ratnasingham, P. D. N. Hebert, BOLD: the barcode of life data system 759 (www.barcodinglife.org). Mol. Ecol. Notes 7, 355-364 (2007). 760 117. M. Hajibabaei, D. H. Janzen, J. M. Burns, W. Hallwachs, P. D. N. Hebert, DNA barcodes 761 distinguish species of tropical Lepidoptera. Proc. Natl. Acad. Sci. USA 103, 968-971 (2006). 762 118. J. N. Macher et al., Multiple-stressor effects on stream invertebrates: DNA barcoding 763 reveals contrasting responses of cryptic mayfly species. Ecol. Indicators 61, 159-169 764 (2016). 765 119. S. Ratnasingham, P. D. N. Hebert, A DNA-based registry for all animal species: the barcode 766 index number (BIN) system. PLOS ONE 8, e66213 (2013). 767
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
28
120. P. D. N. Hebert, E. H. Penton, J. M. Burns, D. H. Janzen, W. Hallwachs, Ten species in one: 768 DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes 769 fulgerator. Proc. Natl. Acad. Sci. USA 101, 14812-14817 (2004). 770 121. T. Cordier et al., Supervised machine learning outperforms taxonomy-based environmental 771 DNA metabarcoding applied to biomonitoring. Mol Ecol Resour 18, 1381-1391 (2018). 772 122. A. C. Burton et al., REVIEW: Wildlife camera trapping: a review and recommendations for 773 linking surveys to ecological processes. J. Appl. Ecol. 52, 675-685 (2015). 774 123. M. G. Kelly, S. C. Schneider, L. King, Customs, habits, and traditions: the role of 775 nonscientific factors in the development of ecological assessment methods. Wiley 776 Interdisciplinary Reviews-Water 2, 159-165 (2015). 777 124. F. Leese et al., Why we need sustainable networks bridging countries, disciplines, cultures 778 and generations for aquatic biomonitoring 2.0: a perspective derived from the DNAqua-Net 779 COST action. Next Generation Biomonitoring, Pt 1 58, 63-99 (2018). 780 125. F. Sohrab, J. Raitoharju, Boosting rare benthic macroinvertebrates taxa identification with 781 one-class classification. ArXiv abs/2002.10420 (2020). 782 783
784
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
29
FIGURE LEGENDS 785
Figure 1 786
We developed and tested a camera trap for monitoring flower visiting insects, which records images 787
at fixed intervals (45). (A) The setup consist of two web cameras connected to a control unit 788
containing a Raspberry Pi computer and a hard drive. In our test, ten camera traps were mounted on 789
custom built steel rod mounts c. 30cm above a green roof mix of plants in the genus Sedum. Images 790
were recorded every 30 sec during the entire flowering season. After training a convolutional neural 791
network (Yolo3), we detected >100,000 instances of pollinators over the course of an entire 792
growing season. (B) An example image from one of the cameras showing a scene consisting of 793
different flowering species. The locations of the insect detections varied greatly among three 794
common flower visiting species (C) the European honey bee (Apis mellifera), (D) the red-tailed 795
bumblebee (Bombus lapidarius), and (E) the marmalade hoverfly (Episyrphus balteatus). Across 796
the ten camera traps, the deep learning model detected detailed variation in (F) seasonal and (G) 797
diurnal variation in the occurrence frequency among the same three species. Figure adapted with 798
permission from (45). 799
800
Figure 2 801
(A) To automatically monitor nocturnal moth species, we designed a light trap with an on-board 802
computer vision system (46). The light trap is equipped with three different light sources. A 803
fluorescent tube to attract moths, a light table covered by a white sheet to provide a diffuse 804
background illumination of the resting insects, and a light ring to illuminate the specimens. The 805
system is able to attract moths and automatically capture images based on motion detection. The 806
trap is designed using standard components such as a high-resolution USB web camera and a 807
Raspberry Pi computer. (B) We have proposed a computer vision algorithm that, during offline 808
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
30
processing of the captured images, performs tracking and counting of individual moths. A 809
customized convolutional neural network was trained to detect and classify eight different moth 810
species. The algorithm can run on the on-board computer to allow the system to automatically 811
process and submit species data via a modem to a server. The system works off grid due to a battery 812
and solar panel. 813
814
Figure 3 815
The BIODISCOVER machine, can automate the process of invertebrate sample sorting, species 816
identification, and biomass estimation (73). (A) The imaging system consists of an ethanol-filled 817
spectroscopic cuvette, a powerful and adjustable light source and two cameras capable of recording 818
images at 50 frames per second (B) The setup is mounted in a light proof aluminium box and fitted 819
with a pump for refilling the spectroscopic cuvette. (C) Each specimen is imaged from two angles 820
by the cameras as it is dropped into the ethanol-filled cuvette and geometric features related to size 821
and biomass are computed automatically. (D) The system has a built in flushing mechanism for 822
controlling which specimens should be kept together for subsequent storage or analysis. The results 823
for an initial dataset of images of 598 specimens across 12 species of known identity was very 824
promising with a classification accuracy of 98.0% (73). The system is generic and can easily be 825
used for other groups of invertebrates as well. As such, the BIODISCOVER machine pave the way 826
for cheap, fast, and accurate data on spatial and temporal variation in invertebrate abundance, 827
diversity and biomass. Figure adapted with permission from (73). 828
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
31
TABLE 1 829
Glossary 830
Bin picking: an industrial term for robots that pick up one of many objects randomly placed in a 831
container. 832
Convolutional Neural Network (CNN): a deep learning algorithm in the family of neural 833
networks with serval different layers commonly applied for image recognition and 834
classification. A CNN can be trained to recognize various objects and patterns in an image. 835
There are four main different operations in a CNN: convolution, activation functions, sub 836
sampling, and fully connected layer. During training the learnable parameters of each 837
convolutional and fully connected layer are adjusted so the CNN is able to recognize different 838
patterns of the training data and used for final image classification. 839
Data augmentation: a technique that can be used to artificially expand the size of a training 840
dataset by creating modified images with objects of interest for classification. 841
Machine learning: a subset of artificial intelligence associated with creating algorithms that can 842
change themselves without human intervention to get the desired result – by feeding themselves 843
through structured data. 844
Deep learning: a subset of machine learning where algorithms are created and function 845
similarly to machine learning, but where there are many levels of these algorithms, each 846
providing a different interpretation of the data it conveys. 847
DNA barcoding: Identification of a species using a short, standardised gene fragment. 848
Initialization: description of an object to be tracked. 849
Training data: classified images (e.g. images of known species identified by experts) that are 850
recorded to train a deep learning model. 851
Precision: the number of true positives divided by the sum of true positives and false positives 852
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
32
Recall: also called the true positive rate, is the number of true positives divided by the sum of 853
true positives and false negatives. 854
Classification accuracy: the sum of true positives and true negatives divided by the total 855
number of specimens. 856
857
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted July 4, 2020. . https://doi.org/10.1101/2020.07.03.187252doi: bioRxiv preprint
... Species identification through image analysis has been explored for efficient taxonomical and environmental applications for several years (Norouzzadeh et al., 2018;Terry et al., 2019;Høye et al., 2020). These computer-aided applications have tried to address a wide range of problems from food safety to identification of insect pests (Daly et al., 1982;Weeks et al., 1997;O'Neill et al., 2000;Larios et al., 2008;Yalcin, 2015). ...
... These observations seemed reasonable and have room for improvement without needing significant change in the model architecture. They are also aligned with a known general limitation of CNN models, which require training sets with both high-quality and largequantity of images to yield better prediction accuracies (Valan et al., 2019;Høye et al., 2020). While the cropped subimages were a way of imitating the actual beetle fragments and artificially increasing the size of the dataset, the limited number of elytral images remained one of the challenges in this study. ...
Article
Full-text available
Food samples are routinely screened for food-contaminating beetles (i.e., pantry beetles) due to their adverse impact on the economy, environment, public health and safety. If found, their remains are subsequently analyzed to identify the species responsible for the contamination; each species poses different levels of risk, requiring different regulatory and management steps. At present, this identification is done through manual microscopic examination since each species of beetle has a unique pattern on its elytra (hardened forewing). Our study sought to automate the pattern recognition process through machine learning. Such automation will enable more efficient identification of pantry beetle species and could potentially be scaled up and implemented across various analysis centers in a consistent manner. In our earlier studies, we demonstrated that automated species identification of pantry beetles is feasible through elytral pattern recognition. Due to poor image quality, however, we failed to achieve prediction accuracies of more than 80%. Subsequently, we modified the traditional imaging technique, allowing us to acquire high-quality elytral images. In this study, we explored whether high-quality elytral images can truly achieve near-perfect prediction accuracies for 27 different species of pantry beetles. To test this hypothesis, we developed a convolutional neural network (CNN) model and compared performance between two different image sets for various pantry beetles. Our study indicates improved image quality indeed leads to better prediction accuracy; however, it was not the only requirement for achieving good accuracy. Also required are many high-quality images, especially for species with a high number of variations in their elytral patterns. The current study provided a direction toward achieving our ultimate goal of automated species identification through elytral pattern recognition.
... Unlimited multi-faceted data could be assembled and assessed in a truly integrative taxonomic framework. This will support incorporation of artificial intelligence methods to guide species delimitation analyses, as people are actively attempting for identification 8,9 . These algorithms could be trained with examples of well-resolved groups and then optimized weighting schemes could be used for other groups. ...
... Deep-learning can also identify both new instances of the same species and new species via comparison with known species 9,10 . For each known species, a morphological, ecological, distributional and genomic variation 'space' would be defined, and specimens exceeding the limits of these spaces would be flagged for examination by experts, expediting sorting immensely while also preventing erroneous records. ...
... Unlimited multi-faceted data could be assembled and assessed in a truly integrative taxonomic framework. This will support incorporation of artificial intelligence methods to guide species delimitation analyses, as people are actively attempting for identification 8,9 . These algorithms could be trained with examples of well-resolved groups and then optimized weighting schemes could be used for other groups. ...
... Deep-learning can also identify both new instances of the same species and new species via comparison with known species 9,10 . For each known species, a morphological, ecological, distributional and genomic variation 'space' would be defined, and specimens exceeding the limits of these spaces would be flagged for examination by experts, expediting sorting immensely while also preventing erroneous records. ...
... As ecological knowledge increases in the future, and with the help of new methods such as computer vision for species and trait identification (Høye et al., 2020), more observed taxa will be identified to the species level and more trait data will be available at the species level, reducing the need for trait aggregation. However, we showed that trait aggregation often is in agreement with expert assignments, especially through aggregation approaches that use the median. ...
Article
Full-text available
• Use of invertebrate traits rather than species composition may facilitate large-scale comparisons of community structure and responses to disturbance in freshwater ecology because the same traits potentially occur everywhere. In recent years, comprehensive invertebrate trait databases have been established at different scales (e.g., regions, continents). The wide availability of invertebrate trait data supports large-scale studies. However, a number of data-related issues complicate the use of invertebrate traits for ecological studies. It is uncertain how harmonising varying trait definitions among databases might influence subsequent identification of trait–environment relationships. Furthermore, there have been few comparisons of trait aggregation approaches with expert-assigned trait affinities. • We describe inconsistencies in the definitions of traits used to create freshwater invertebrate trait databases in Europe, North America, New Zealand, and Australia. Based on our comparisons of these databases, we established four novel trait datasets by harmonising definitions of commonly used traits. Next, we used two of these datasets to compare aggregated traits obtained by different aggregation methods with traits assigned by experts, both at the family level. The trait aggregation methods that we compared used either the mean or the median and different weightings. We further explored the effects of harmonisation and trait aggregation by re-analysing data from a case study. • We found that among databases, trait definitions often differed because varying numbers of traits were used to describe particular functions (e.g., respiration traits) and the way those functions were described also varied (e.g., for feeding mode some databases focused on the food source, whereas others focused on mouthpart morphology). The coding to describe traits (binary, fuzzy) also varied among databases. • Our comparison of different aggregation methods showed that family-level aggregated and expert-assigned traits were similar, especially when traits were aggregated based on the median of trait values of taxa within a family. The case study showed that harmonised and aggregated data identified similar trait–environment relationships to non-aggregated data. However, harmonised and aggregated data yielded only partially similar values for functional diversity metrics when compared to the case study results. • By identifying inconsistencies in trait definitions we hope to motivate the development of standardised definitions for invertebrate traits. Our results also illustrate the usefulness of harmonised datasets for ecological study and provide guidance for the circumstances under which the choice of trait aggregation method is important.
... In addition, many biologists also use image analysis to detect presence and absence of organisms (e.g., within a population, community or environment; e.g., by means of camera traps or satellite images), or to identify species (by experts or algorithms). While species monitoring and taxonomic identification constitutes an important and rapidly growing discipline on its own (Norouzzadeh et al., 2018;Wäldchen and Mäder, 2018;Høye et al., 2020), this review focuses on the extraction of phenotypic data from digital images as a key methodological approach for the study of phenomes (Houle et al., 2010). ...
Article
Full-text available
For centuries, ecologists and evolutionary biologists have used images such as drawings, paintings and photographs to record and quantify the shapes and patterns of life. With the advent of digital imaging, biologists continue to collect image data at an ever-increasing rate. This immense body of data provides insight into a wide range of biological phenomena, including phenotypic diversity, population dynamics, mechanisms of divergence and adaptation, and evolutionary change. However, the rate of image acquisition frequently outpaces our capacity to manually extract meaningful information from images. Moreover, manual image analysis is low-throughput, difficult to reproduce, and typically measures only a few traits at a time. This has proven to be an impediment to the growing field of phenomics – the study of many phenotypic dimensions together. Computer vision (CV), the automated extraction and processing of information from digital images, provides the opportunity to alleviate this longstanding analytical bottleneck. In this review, we illustrate the capabilities of CV as an efficient and comprehensive method to collect phenomic data in ecological and evolutionary research. First, we briefly review phenomics, arguing that ecologists and evolutionary biologists can effectively capture phenomic-level data by taking pictures and analyzing them using CV. Next we describe the primary types of image-based data, review CV approaches for extracting them (including techniques that entail machine learning and others that do not), and identify the most common hurdles and pitfalls. Finally, we highlight recent successful implementations and promising future applications of CV in the study of phenotypes. In anticipation that CV will become a basic component of the biologist’s toolkit, our review is intended as an entry point for ecologists and evolutionary biologists that are interested in extracting phenotypic information from digital images.
... In addition, many biologists also use image analysis to detect presence and absence of organisms (e.g., within a population, community or environment; e.g., by means of camera traps or satellite images), or to identify species (by experts or algorithms). While species monitoring and taxonomic identification constitutes an important and rapidly growing discipline on its own (Norouzzadeh et al., 2018;Wäldchen and Mäder, 2018;Høye et al., 2020), this review focuses on the extraction of phenotypic data from digital images as a key methodological approach for the study of phenomes (Houle et al., 2010). ...
Preprint
Full-text available
For centuries, ecologists and evolutionary biologists have used images such as drawings, paintings, and photographs to record and quantify the shapes and patterns of life. With the advent of digital imaging, biologists continue to collect image data at an ever-increasing rate. This immense body of data provides insight into a wide range of biological phenomena, including phenotypic trait diversity, population dynamics, mechanisms of divergence and adaptation and evolutionary change. However, the rate of image acquisition frequently outpaces our capacity to manually extract meaningful information from the images. Moreover, manual image analysis is low-throughput, difficult to reproduce, and typically measures only a few traits at a time. This has proven to be an impediment to the growing field of phenomics - the study of many phenotypic dimensions together. Computer vision (CV), the automated extraction and processing of information from digital images, is a way to alleviate this longstanding analytical bottleneck. In this review, we illustrate the capabilities of CV for fast, comprehensive, and reproducible image analysis in ecology and evolution. First, we briefly review phenomics, arguing that ecologists and evolutionary biologists can most effectively capture phenomic-level data by using CV. Next, we describe the primary types of image-based data, and review CV approaches for extracting them (including techniques that entail machine learning and others that do not). We identify common hurdles and pitfalls, and then highlight recent successful implementations of CV in the study of ecology and evolution. Finally, we outline promising future applications for CV in biology. We anticipate that CV will become a basic component of the biologist’s toolkit, further enhancing data quality and quantity, and sparking changes in how empirical ecological and evolutionary research will be conducted.
... We observed that the development of advanced Artificial Intelligence system has provided excellent solutions in various scientific domain tasks namely image classification (Jaswal et al., 2014), object detection (Xie, et al., 2018), scene classification (Damodaran et al., 2019); Natural language processing (Kumar et al., 2019), Anomaly Detection (Sujadevi et al., 2019), time series forecasting (Selvin et al., 2017), biomedical segmentation (Vyshnav et al., 2020), and many others. Research in the fields of biological science especially entomology (Høye, et al., 2020) is no exception, as Artificial Intelligence (AI) techniques like Deep Convolutional Neural Networks(DCNN) has achieved expert level performance in pest detection (Nam and Hung, 2018), insect identification (Hansen et al., 2020) and continous automatic insect monitoring (Cardim Ferreira Lima et al., 2020). The AI systems also provide enourmous advantage of automated suitable feature extraction (size, wing-venation, color, shape, patterns) for learning and continouos improvement in performance whenever a new data is fed inside the system. ...
Article
The conventional butterfly identification method is based on their different morphological characters namely wing-venation, color, shape, patterns and through the dissection studies and molecular techniques which are tedious, expensive and highly time-consuming. To overcome the above aforesaid challenges, a new butterfly identification system using butterfly images has been designed to instantly identify the butterfly with high accuracy. In this study, we construct a new butterfly dataset with 34024 butterfly images belonging to 315 species from India. We propose and prove the effectiveness of new data augmentation techniques on our dataset. To identify butterflies using photographic images, we built eleven new Deep Convolutional Neural Network (DCNN) butterfly classifier models using eleven pre-trained architectures namely ResNet-18, ResNet-34, ResNet-50, ResNet-121, ResNet-152, Alex-Net, DenseNet-121, DenseNet-161, VGG-16, VGG-19 and SqueezeNet-v1.1. The different model's classification results were compared and the proposed technique achieved a maximum top-1 accuracy(94.44%), top-3 accuracy(98.46%) and top-5 accuracy(99.09%) using ResNet-152 model, followed by DenseNet-161 model achieved the top-1 accuracy(94.31%), top-3 accuracy (98.07%) and top-5 accuracy (98.66%). The results suggest that models can be assertively used to identify butterflies in India.
... Developing such tools require close collaboration with natural history museums to ensure that DNA libraries are accurate and up-to-date and museum collections are digitized and publically available [56]. Similarly, image-based observational methods combined with artificial intelligence holds the promise to improve temporal and spatial resolution of observations of species, biotic interactions, and relationships to the environment [57][58][59][60]. Interdisciplinary efforts across molecular biology, ecology, taxonomy, and engineering will be important to maximize benefits of such exciting new opportunities. ...
Article
The harsh climate, limited human infrastructures, and basic autecological knowledge gaps represent substantial challenges for studying arthropods in the Arctic. At the same time, rapid climate change, low species diversity, and strong collaborative networks provide unique and underexploited Arctic opportunities for understanding species responses to environmental change and testing ecological theory. Here, I provide an overview of individual, population, and ecosystem level responses to climate change in Arctic arthropods. I focus on thermal performance, life history variation, population dynamics, community composition, diversity and biotic interactions. The species-poor Arctic represents a unique opportunity for testing novel, automated arthropod monitoring methods. The Arctic can potentially provide insights to further understand and mitigate the effects of climate change on arthropods worldwide.
Article
Full-text available
Variables como el cambio climático, las sequías, los huracanes, la deforestación y el uso indiscriminado de plaguicidas desencadenan constantemente la presencia de insectos plagas en el país, así como de potenciales plagas invasoras que encuentran las condiciones necesarias para establecerse y causar daños en los cultivos sensibles del país. La entomología agrícola se centra en el estudio de los insectos asociados a las plantas relacionadas con la agricultura, tanto los perjudiciales como los beneficiosos, dando cada vez más importancia al medio ambiente y apoyándose en los avances biotecnológicos disponibles para conservar y aumentar la biodiversidad en las zonas agrícolas. Por otra parte, hay una serie de nuevas tecnologías que pueden utilizarse en la entomología agrícola, como la visión artificial. La visión por ordenador o visión artificial es el área de investigación en torno a cómo los ordenadores ven y entienden las imágenes digitales y/o videos. Este estudio pretende dar un enfoque de la situación actual y potenciales usos de la visión por ordenador en la entomología agrícola y examinar de manera puntual cómo puede aplicarse en Panamá. Se evalúa los retos y las técnicas de visión por ordenador e inteligencia artificial aplicadas a la entomología mediante la selección de bibliografía actual publicada sobre la temática. Asimismo, este trabajo pretende identificar las lagunas y las oportunidades con vistas a convertirse en una referencia actualizada para futuros trabajos. Se analizaron varias referencias bibliográficas, de los cuales se extrajo la información contenida y se expone la aplicabilidad de las distintas técnicas en Panamá.
Article
Full-text available
Species identification is currently a strong limitation to wild pollinator studies. It requires killing specimens for laboratory analyses, which can pose ethical issues in some programs of conservation biology and citizen science. The recent development of image-based identification tools using machine learning could challenge the current paradigm of required specimen euthanasia for species identification. However, to be accurate, these new methods call for standardized images or images of precise characters that are difficult or even impossible to obtain on live specimens. To facilitate the acquisition of these pictures, we tested two in-situ CO2 anesthesia protocols using material easily available, even in citizen science programs. We measured the time of anesthesia of 196 flower visitors belonging to the Hymenoptera and Diptera orders. The most efficient protocol enabled us to anesthetize 90 % of the specimens for more than a minute with a marginal mortality (1.5 %). Anesthesia time increased with specimen size in Hymenoptera and decreased with air temperature. Diptera were less sensitive to anesthesia. Further analyses would be required to investigate the potential sublethal effects of these anesthesia. These preliminary results suggest nonetheless that CO2-based anesthesia could help the development of non-lethal methods of wild pollinator identifications.
Article
Full-text available
Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time‐consuming sorting and expert‐based identification of taxa pose strong limitations on how many insect samples can be processed. In turn, this affects the scale of efforts to map and monitor invertebrate diversity altogether. Given recent advances in computer vision, we propose to enhance the standard human expert‐based identification approach involving manual sorting and identification with an automatic image‐based technology. We describe a robot‐enabled image‐based identification machine, which can automate the process of invertebrate sample sorting, specimen identification and biomass estimation. We use the imaging device to generate a comprehensive image database of terrestrial arthropod species which is then used to test classification accuracy, that is, how well the species identity of a specimen can be predicted from images taken by the machine. We also test sensitivity of the classification accuracy to the camera settings (aperture and exposure time) to move forward with the best possible image quality. We use state‐of‐the‐art Resnet‐50 and InceptionV3 convolutional neural networks for the classification task. The results for the initial dataset are very promising as we achieved an average classification accuracy of 0.980. While classification accuracy is high for most species, it is lower for species represented by less than 50 specimens. We found significant positive relationships between mean area of specimens derived from images and their dry weight for three species of Diptera. The system is general and can easily be used for other groups of invertebrates as well. As such, our results pave the way for generating more data on spatial and temporal variation in invertebrate abundance, diversity and biomass. En central udfordring for naturforvaltning og forskning i biodiversitet er, at forstå hvordan arter reagerer på miljøvariation. I lyset af globale nedgange i insektbestande er der brug for en større moniteringsindsats, men manuel sortering og ekspert‐baseret insektbestemmelse sætter store begrænsninger for hvor mange prøver, der kan behandles. Dermed sænkes det samlede ambitionsniveau, når det gælder kortlægning og monitering af invertebrater, da mange prøver slet ikke bliver indsamlet. De store landvindinger indenfor billedgenkendelse og kunstig intelligens, der er gjort i de senere år, motiverer os til at foreslå at insektbestemmelse kan gøres langt mere effektiv ved hjælp af automatiseret billedbaseret teknologi. Vi beskriver en robot‐baseret maskine til billedgenkendelse, der kan automatisere processen med at sortere og identificere invertebrater og tilmed estimere individernes biomasse. Vi anvender apparatet til at generere en omfattende billeddatabase af terrestriske leddyrsarter og tester klassifikationsnøjagtigheden, altså med hvilken sikkerhed et individ kan artsbestemmes ud fra billeder taget med maskinen. Vi tester også systemets følsomhed for kameraindstillinger (blændeåbning og lukkertid) for at opnå den optimale billedkvalitet. Vi bruger state‐of‐the‐art Resnet‐50 og InceptionV3 convolutional neural networks (CNNs) til klassifikationsopgaven. Resultaterne er meget lovende, da vi opnåede en gennemsnitlig klassifikationsnøjagtighed på 0,980. Selvom klassifikationsnøjagtigheden var høj for de fleste arter, var den lavere for arter repræsenteret med mindre end 50 individer. Vi fandt også signifikante positive relationer mellem det gennemsnitlige areal af et individ beregnet ud fra billederne og individets tørvægt for tre arter af Diptera. Systemet er generelt og kan nemt tilpasses andre grupper af invertebrater. Dermed baner vores resultater vejen for at generere mere data på tidslig og rumlig variation i invertebraters abundans, diversitet og biomasse.
Article
Full-text available
Premise: Despite the economic significance of insect damage to plants (i.e., herbivory), long-term data documenting changes in herbivory are limited. Millions of pressed plant specimens are now available online and can be used to collect big data on plant-insect interactions during the Anthropocene. Methods: We initiated development of machine learning methods to automate extraction of herbivory data from herbarium specimens by training an insect damage detector and a damage type classifier on two distantly related plant species (Quercus bicolor and Onoclea sensibilis). We experimented with (1) classifying six types of herbivory and two control categories of undamaged leaf, and (2) detecting two of the damage categories for which several hundred annotations were available. Results: Damage detection results were mixed, with a mean average precision of 45% in the simultaneous detection and classification of two types of damage. However, damage classification on hand-drawn boxes identified the correct type of herbivory 81.5% of the time in eight categories. The damage classifier was accurate for categories with 100 or more test samples. Discussion: These tools are a promising first step for the automation of herbivory data collection. We describe ongoing efforts to increase the accuracy of these models, allowing researchers to extract similar data and apply them to biological hypotheses.
Preprint
Full-text available
Insect monitoring methods are typically very time consuming and involves substantial investment in species identification following manual trapping in the field. Insect traps are often only serviced weekly resulting in low temporal resolution of the monitoring data, which hampers the ecological interpretation. This paper presents a portable computer vision system capable of attracting and detecting live insects. More specifically, the paper proposes detection and classification of species by recording images of live individuals attracted to a light trap. An Automated Moth Trap (AMT) with multiple light sources and a camera was designed to attract and monitor live insects during twilight and night hours. A computer vision algorithm referred to as Moth Classification and Counting (MCC), based on deep learning analysis of the captured images, tracked and counted the number of insects and identified moth species. Observations over 48 nights resulted in the capture of more than 250,000 images with an average of 5,675 images per night. A customized convolutional neural network was trained on 2,000 labelled images of live moths represented by eight different species, achieving a high validation F 1 -score of 0.93. The algorithm measured an average classification and tracking F 1 -score of 0.71 and a tracking detection rate of 0.79. Overall, the proposed computer vision system and algorithm showed promising results as a low-cost solution for non-destructive and automatic monitoring of moths.
Article
Full-text available
In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers to not only accurately classify the seen classes, but also effectively deal with unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also review the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.
Article
Full-text available
Natural history collections (NHCs) are the foundation of historical baselines for assessing anthropogenic impacts on biodiversity. Along these lines, the online mobilization of specimens via digitization—the conversion of specimen data into accessible digital content—has greatly expanded the use of NHC collections across a diversity of disciplines. We broaden the current vision of digitization (Digitization 1.0)—whereby specimens are digitized within NHCs—to include new approaches that rely on digitized products rather than the physical specimen (Digitization 2.0). Digitization 2.0 builds on the data, workflows, and infrastructure produced by Digitization 1.0 to create digital-only workflows that facilitate digitization, curation, and data links, thus returning value to physical specimens by creating new layers of annotation, empowering a global community, and developing automated approaches to advance biodiversity discovery and conservation. These efforts will transform large-scale biodiversity assessments to address fundamental questions including those pertaining to critical issues of global change.
Article
Full-text available
Arthropods, including pollinators and pests, have high positive and negative impacts on human well-being and the economy, and there is an increasing need to monitor their activity and population growth. The monitoring of arthropod species is a time-consuming and financially demanding process. Automatic detection can be a solution to this problem. Here, we describe the setup and operation mechanism of an infrared opto-electronic sensor-ring, which can be used for both small and large arthropods. The sensor-ring consists of 16 infrared (IR) photodiodes along a semicircle in front of an infrared LED. Using 3D printing, we constructed two types of sensor-ring: one with a wider sensing field for detection of large arthropods (flying, crawling, surface-living) in the size range of 2-35 mm; and another one with a narrower sensing field for soil microarthropods in the size range of 0.1-2 mm. We examined the detection accuracy and reliability of the two types of sensor-ring in the laboratory by using particles, and dead and living arthropods at two different sensitivity levels. For the wider sensor-ring, the 95% detectability level was reached with grain particles of 0.9 mm size. This result allowed us to detect all of the macroarthropods that were applied in the tests and that might be encountered in pest management. In the case of living microarthropods with different colors and shapes, when we used the narrower sensor-ring, we achieved the 95% detectability level at 1.1 mm, 0.9 mm, and 0.5 mm in the cases of F. candida, H. nitidus, and H. aculeifer, respectively. The unique potential of arthropod-detecting sensors lies in their real-time measurement system; the data are automatically forwarded to the server, and the end-user receives pest abundance data daily or even immediately. This technological innovation will allow us to make pest management more effective.
Article
Full-text available
Significance The annual appearance of massive mayfly swarms is a source of public fascination and spectacular natural phenomenon that plays a key role in regional food webs. Alarming reports of insect declines motivate efforts to uncover long-term and large-scale invertebrate population trends. Monitoring aquatic insect abundance across ecosystems continues to be logistically infeasible, leaving the vulnerability of these communities to intensifying anthropogenic impacts unknown. We apply radar remote sensing to quantify aquatic insect abundance at scales that have been previously impossible, revealing persistent declines in biomass flux from aquatic to terrestrial habitats. As ecological indicators, these losses may signal deterioration in water quality and, if current population trends continue, could cascade to widespread disappearance from some of North America’s largest waterways.
Article
Local drivers of decline matter Recent studies have reported alarming declines in insect populations, but questions persist about the breadth and pattern of such declines. van Klink et al. compiled data from 166 long-term surveys across 1676 globally distributed sites and confirmed declines in terrestrial insects, albeit at lower rates than some other studies have reported (see the Perspective by Dornelas and Daskalova). However, they found that freshwater insect populations have increased overall, perhaps owing to clean water efforts and climate change. Patterns of variation suggest that local-scale drivers are likely responsible for many changes in population trends, providing hope for directed conservation actions. Science , this issue p. 417 ; see also p. 368