Article

Fish species identification using a convolutional neural network trained on synthetic data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Acoustic-trawl surveys are an important tool for marine stock management and environmental monitoring of marine life. Correctly assigning the acoustic signal to species or species groups is a challenge, and recently trawl camera systems have been developed to support interpretation of acoustic data. Examining images from known positions in the trawl track provides high resolution ground truth for the presence of species. Here, we develop and deploy a deep learning neural network to automate the classification of species present in images from the Deep Vision trawl camera system. To remedy the scarcity of training data, we developed a novel training regime based on realistic simulation of Deep Vision images. We achieved a classification accuracy of 94% for blue whiting, Atlantic herring, and Atlantic mackerel, showing that automatic species classification is a viable and efficient approach, and further that using synthetic data can effectively mitigate the all too common lack of training data. © International Council for the Exploration of the Sea 2018. All rights reserved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... 256x256 [2,7,10,14], 227x227 [9], 224x224 [21,35,36], 222x222 [38], 200x200 [27], 128x128 [10,29,32], 100x100 [19,30], 64x101 [28], 64x64 [32], 48x48 [31], 47x47 [1], 32x32 [4,6,8] Dataset used for testing F4K [1,9,18,19,24,26,27,31,35], Life-CLEF15 [4,8,9], SIPPER [38], CADDY-UG [5,25], MLC [21], WHOI-Plankton [23,37,39], ZooScan [14,23], LifeCLEF14 [8], underwater images from ILSVRC [22], temperate fish species [27], ILES [30], CZECH [30], ZOOVIS [15], Fish-gres [35], ANT-XXVIII/2 and PS103: [16], ISIIS [17], PlanktonSet 1.0 [32,36], Zooglider [33], ZooLake [29], FlowCam imaging system [28] Dataset collected from sea [6, 9, 11, 12, 14-18, 28, 31-33, 35-37], lake [3,29,30], test tube filled with alcohol [7], internet [20,34] Number of classes 1611 [13], 121 [2,32,36], 108 [17], 104 [28], 103 [37], 77 [38], 52 [38], 51 [16], 45 [22], 35 [29], 29 [7], 27 [33], 23 [1,9,24,26,31,35], 21 [11,19], 17 [5], 16 [25], 15 [4,8,9], 13 [14], 10 [8,18], 9 [21], 8 [35], 7 [15,38], 4 [27,30,34], 3 [12,20], 2 [3,6] Objects classified fish species [1,4,8,9,11,12,18,19,24,26,27,31,34,35], phyto/zoo-plankton species [2, 10, 13-15, 17, 23, 28-30, 32, 33, 36-38], Diatoms [16], Posidonia Oceanica meadows [6], coral reefs [21], macroinvertebrate benthos [7], sea cucumbers [3], underwater objects [20,22], diver gestures [5,25], Metrics used (Accuracy = Classification accuracy ) Accuracy [1-3, 5, 7, 9, 10, 12-14, 18-21, 23, 24, 26-31, 33, 35-40] Precision/recall/F1 precision/recall [3,4,8,9,12,13,13,15,17,23,26,29,31], "average class precision" (ACP) [17,21], F1-score [3,12,13,16,17,23,26,[28][29][30][31] Others CCR [11,25], PSNR [18], detection hit ratio (TP/P) [6], recognition rate [34], AUC [23], average count [8], predicted class score [22], kappa [26,28], inference time [15,35], softmax loss [32] Training time [6,14,25,34] [4,7,11,18,22,24,32], MATLAB [1,8,8,9,15,20,23,25,36], LibSVM [4,7,8], LibLinear [1], Tensorflow [35,36], Keras [26,31,35], Theano [33] [25,30], 13 [28], 25 [9], 30 [23,29], 40 [33], 50 [16,27], 60 [35], 70 [10], 136 [38], 148 [38], 150 [17,38], 200 [3], 300 [26] Base learning rate 1e-5 [29,35], 0.0001 [12,20,30,33,40], 0.001 [2,12,18,20,23,[25][26][27][28], 0.01 [1,10,12], 0.05 [3], 0.1 [3] Momentum 0.1 [20], 0.2 [20], 0.9 [1, 2, 10, 18] Mini-batch size 8 [12,27], 10 [20], 16-128 [23], 20 [20,35], 32 [16,26,30], 40 [18], 64 [1,12], 100 [3], 256 [10], 500 [25,28] ...
... 13x13 [1,14,23,24], 11x11 [9,18,20,24,25,34], 7x7 [5,11,14,22,23,25,26,34], 4x4 [8] Skip connections [2,3,5,23,25,26,35] Normalization schemes BN [12,25,35,36], local response normalization [14], histogram normalization [17] Pooling (except max/average) spatial pyramid pooling [1,21], global average pooling [13,27,40], fractional max-pooling [17], cross convolutional layer pooling [26], feature pooling [1], cyclic pooling [33], MPN-COV pooling [36] In deep+shallow hybrid networks, using shallow networks for classification [1, 7-9, 13, 15, 17, 24, 31], preprocessing and classification [4], feature extraction [28,29], feature extraction and classification [21,29] Optimizer SGD [1-3, 18, 22, 38], "SGD with momentum" [20], Adam [3,5,11,19,20,27,28,33], AdaMax [26], Adagrad [12], Adadelta [12], RMSProp [20,40] ...
... 13x13 [1,14,23,24], 11x11 [9,18,20,24,25,34], 7x7 [5,11,14,22,23,25,26,34], 4x4 [8] Skip connections [2,3,5,23,25,26,35] Normalization schemes BN [12,25,35,36], local response normalization [14], histogram normalization [17] Pooling (except max/average) spatial pyramid pooling [1,21], global average pooling [13,27,40], fractional max-pooling [17], cross convolutional layer pooling [26], feature pooling [1], cyclic pooling [33], MPN-COV pooling [36] In deep+shallow hybrid networks, using shallow networks for classification [1, 7-9, 13, 15, 17, 24, 31], preprocessing and classification [4], feature extraction [28,29], feature extraction and classification [21,29] Optimizer SGD [1-3, 18, 22, 38], "SGD with momentum" [20], Adam [3,5,11,19,20,27,28,33], AdaMax [26], Adagrad [12], Adadelta [12], RMSProp [20,40] ...
Article
Full-text available
In recent years, there has been an enormous interest in using deep learning to classify underwater images to identify various objects like fishes, plankton, coral reefs, seagrass, submarines, and gestures of sea-divers. This classification is essential for measuring the water bodies' health and quality and protecting the endangered species. Further, it has applications in oceanography, marine economy and defense, environment protection, and underwater exploration and human-robot col-laborative tasks. This paper presents a survey of deep learning techniques for performing the underwater image classification. We underscore the similarities and differences of several methods. We believe that underwater image classification is one of the killer application that would test the ultimate success of deep learning techniques. Towards realizing that goal, this survey seeks to inform researchers about state-of-the-art on deep learning on underwater images and also motivate them to push its frontiers forward. Index Terms-Deep neural networks, artificial intelligence, autonomous underwater vehicle, transfer learning.
... Acquiring images continuously throughout every trawl haul results in large numbers of images, which places additional burdens on the operators, especially if the data must be manually inspected and processed. Computer vision algorithms (White et al., 2006) and, more recently, general deep learning algorithms (Huang et al., 2014;Salman et al., 2016;Siddiqui et al., 2017;Allken et al., 2018) have been successfully used to alleviate such 'analysis bottlenecks' (Malde et al., 2019). Supervised training of deep neural networks typically requires large amounts of labelled data, and whilst automatic fish species classification, detection and counting are within reach of current technology (Huang et al., 2014;Salman et al., 2016;Siddiqui et al., 2017), sufficient datasets of images with appropriate annotations are required. ...
... The objective of this data paper is to provide the necessary tools to develop effective deep learning systems for automatic analysis of trawl camera data and similar images. We provide a software utility to generate an arbitrary number of simulated images by random arrangement of individual fish against a background (Allken et al., 2018). In addition to the software, we provide images of individual fish and backgrounds, and a set of complete annotated source images to serve as a test dataset against which to evaluate classification algorithms. ...
... To ensure that the dataset is representative and captures variability across surveys and stations, it is desirable that images are taken as uniformly from the stations as possible. We have previously shown (Allken et al., 2018) that including synthetic data in the training set increase accuracy when training data are limited. We attained high classification accuracy (up to 94%) by training on synthetic images generated using only 70 fish cutouts per fish species. ...
Article
Full-text available
Developing high‐performing machine learning algorithms requires large amounts of annotated data. Manual annotation of data is labour‐intensive, and the cost and effort needed are an important obstacle to the development and deployment of automated analysis. In a previous work, we have shown that deep learning classifiers can successfully be trained on synthetic images and annotations. Here, we provide a curated set of fish image data and backgrounds, the necessary software tools to generate synthetic images and annotations, and annotated real datasets to test classifier performance. The dataset is constructed from images collected using the Deep Vision system during two surveys from 2017 and 2018 that targeted economically important pelagic species in the Northeast Atlantic Ocean. We annotated a total of 1,879 images, randomly selected across trawl stations from both surveys, comprising 482 images of blue whiting, 456 images of Atlantic herring, 341 images of Atlantic mackerel, 335 images of mesopelagic fishes and 265 images containing a mixture of the four categories. Developing high‐performing machine learning algorithms requires large amounts of annotated data for training. Manual annotation of data is labour‐intensive, and the cost/ effort needed is an important obstacle to the development and deployment of automated analysis. Here, we provide a curated set of 1,879 images with 4,328 individual annotations of four species collected inside a sampling trawl during surveys of economically important pelagic species in the Northeast Atlantic Ocean. Code for generating synthetic images for training is also provided.
... Recently, automated processing of the data obtained by video cameras has become more common in various industries, and fisheries are not an exception. Several studies describe automated fish detection and classification commonly performed with the aid of deep learning models application [11][12][13][14][15]. These studies demonstrate that the deep learning models for objects detection and classification are efficient tools for processing the on-board as well as underwater collected recordings of the catch. ...
... The desired scenario in training a useful deep learning model is to achieve the simultaneous decrease in both training and validation (test) losses [36]. Data augmentation is an effective technique not only to prevent overfitting via introducing additional variance in the dataset but also to inflate the data with synthetic examples, which is helpful in cases where raw data is limited [11,36,40]. ...
... The geometric augmentations are easy to apply and help to tackle the problem of positional biases associated with target objects occurring in the same area of the training images [36]. Numerous studies report the positive effect of applying these augmentations during training on the resulting performance, typically object classification [11,28,40,41]. Following the recommendations in the original study, we apply a set of geometric augmentations with the CP augmentation in our case. ...
Article
Full-text available
Bycatch in demersal trawl fisheries challenges their sustainability despite the implementation of the various gear technical regulations. A step towards extended control over the catch process can be established through a real-time catch monitoring tool that will allow fishers to react to unwanted catch compositions. In this study, for the first time in the commercial demersal trawl fishery sector, we introduce an automated catch description that leverages state-of-the-art region based convolutional neural network (Mask R-CNN) architecture and builds upon an in-trawl novel image acquisition system. The system is optimized for applications in Nephrops fishery and enables the classification and count of catch items during fishing operation. The detector robustness was improved with augmentation techniques applied during training on a custom high-resolution dataset obtained during extensive demersal trawling. The resulting algorithms were tested on video footage representing both the normal towing process and haul-back conditions. The algorithm obtained an F-score of 0.79. The resulting automated catch description was compared with the manual catch count showing low absolute error during towing. Current practices in demersal trawl fisheries are carried out without any indications of catch composition nor whether the catch enters the fishing gear. Hence, the proposed solution provides a substantial technical contribution to making this type of fishery more targeted, paving the way to further optimization of fishing activities aiming at increasing target catch while reducing unwanted bycatch.
... The FC layer works as a novel function for the model to learn how efficiently every learner contributes. [3,5,43], texture [5,25,33,40,43], shape [25,40], geometric [3,8,15], environmental [8], hydrographic [15], geotemporal [15] ML models used/compared SVM [5, 13-15, 18, 21, 30, 33, 43, 45-48], k-NN [14,45,47], MLP or ANN [3,8,15,33,43], PCA [45,47], k-means [5], SRC [47], random forest [5,7,14,15], multinomial logistic regression [26], XRT [15], GBC [15] Features compared DenseSIFT [45,48], ImageJ [46], HOG [10,46], Gabor [5,33,48], SIFT [10,46], BOVW [5,10] Kernel size except 5x5, 3x3, 2x2 and 1x1 13x13 [18,21,39,45], 11x11 [21,22,27,29,30,34], 7x7 [10,11,18,22,27,31,37,39], 4x4 [47] Skip connections [4,10,11,17,18,22,42] Normalization schemes BN [2,4,6,22], local response normalization [39], histogram normalization [26] Pooling (except max/average) spatial pyramid pooling [43,45], global average pooling [5,12,20], fractional max-pooling [26], cross convolutional layer pooling [11], feature pooling [45], cyclic pooling [15], MPN-COV pooling [6] In deep+shallow hybrid networks, using shallow networks for classification [5,13,14,21,26,30,[45][46][47], preprocessing and classification [48], feature extraction [3,8], feature extraction and classification [3,43] Optimizer SGD [17,23,34,37,42,45], "SGD with momentum" [29], Adam [8,10,15,17,20,29,31,36], AdaMax [11], Adagrad [2], Adadelta [2], RMSProp [12,29] Loss function Cross-entropy [2,12,17,20,36], hinge [45] Avoiding overfitting limiting max number of images per class [23], dropout [2,8,12,14,15,31,37] Handling classimbalance class reweighting [3,8], data standardization [8], data augmentation (Table V) Other design features and optimizations leaky-ReLU [17], parametric ReLU [39], no data augmentation [17,18,23,25], fusing low-level and high-level features [4,47], depthwise separable CONV [4], asymmetric CONV [4], multi-scale images [44], SE block [20], SFFS [18], DECONV layers [21], SPP layer [45], Training on geometric data is done separately using an MLP configuration before feeding to the collaborative network. The MLP configuration is tuned using feature engineering of input data and adding Dropout layers. ...
... The FC layer works as a novel function for the model to learn how efficiently every learner contributes. [3,5,43], texture [5,25,33,40,43], shape [25,40], geometric [3,8,15], environmental [8], hydrographic [15], geotemporal [15] ML models used/compared SVM [5, 13-15, 18, 21, 30, 33, 43, 45-48], k-NN [14,45,47], MLP or ANN [3,8,15,33,43], PCA [45,47], k-means [5], SRC [47], random forest [5,7,14,15], multinomial logistic regression [26], XRT [15], GBC [15] Features compared DenseSIFT [45,48], ImageJ [46], HOG [10,46], Gabor [5,33,48], SIFT [10,46], BOVW [5,10] Kernel size except 5x5, 3x3, 2x2 and 1x1 13x13 [18,21,39,45], 11x11 [21,22,27,29,30,34], 7x7 [10,11,18,22,27,31,37,39], 4x4 [47] Skip connections [4,10,11,17,18,22,42] Normalization schemes BN [2,4,6,22], local response normalization [39], histogram normalization [26] Pooling (except max/average) spatial pyramid pooling [43,45], global average pooling [5,12,20], fractional max-pooling [26], cross convolutional layer pooling [11], feature pooling [45], cyclic pooling [15], MPN-COV pooling [6] In deep+shallow hybrid networks, using shallow networks for classification [5,13,14,21,26,30,[45][46][47], preprocessing and classification [48], feature extraction [3,8], feature extraction and classification [3,43] Optimizer SGD [17,23,34,37,42,45], "SGD with momentum" [29], Adam [8,10,15,17,20,29,31,36], AdaMax [11], Adagrad [2], Adadelta [2], RMSProp [12,29] Loss function Cross-entropy [2,12,17,20,36], hinge [45] Avoiding overfitting limiting max number of images per class [23], dropout [2,8,12,14,15,31,37] Handling classimbalance class reweighting [3,8], data standardization [8], data augmentation (Table V) Other design features and optimizations leaky-ReLU [17], parametric ReLU [39], no data augmentation [17,18,23,25], fusing low-level and high-level features [4,47], depthwise separable CONV [4], asymmetric CONV [4], multi-scale images [44], SE block [20], SFFS [18], DECONV layers [21], SPP layer [45], Training on geometric data is done separately using an MLP configuration before feeding to the collaborative network. The MLP configuration is tuned using feature engineering of input data and adding Dropout layers. ...
... The FC layer works as a novel function for the model to learn how efficiently every learner contributes. [3,5,43], texture [5,25,33,40,43], shape [25,40], geometric [3,8,15], environmental [8], hydrographic [15], geotemporal [15] ML models used/compared SVM [5, 13-15, 18, 21, 30, 33, 43, 45-48], k-NN [14,45,47], MLP or ANN [3,8,15,33,43], PCA [45,47], k-means [5], SRC [47], random forest [5,7,14,15], multinomial logistic regression [26], XRT [15], GBC [15] Features compared DenseSIFT [45,48], ImageJ [46], HOG [10,46], Gabor [5,33,48], SIFT [10,46], BOVW [5,10] Kernel size except 5x5, 3x3, 2x2 and 1x1 13x13 [18,21,39,45], 11x11 [21,22,27,29,30,34], 7x7 [10,11,18,22,27,31,37,39], 4x4 [47] Skip connections [4,10,11,17,18,22,42] Normalization schemes BN [2,4,6,22], local response normalization [39], histogram normalization [26] Pooling (except max/average) spatial pyramid pooling [43,45], global average pooling [5,12,20], fractional max-pooling [26], cross convolutional layer pooling [11], feature pooling [45], cyclic pooling [15], MPN-COV pooling [6] In deep+shallow hybrid networks, using shallow networks for classification [5,13,14,21,26,30,[45][46][47], preprocessing and classification [48], feature extraction [3,8], feature extraction and classification [3,43] Optimizer SGD [17,23,34,37,42,45], "SGD with momentum" [29], Adam [8,10,15,17,20,29,31,36], AdaMax [11], Adagrad [2], Adadelta [2], RMSProp [12,29] Loss function Cross-entropy [2,12,17,20,36], hinge [45] Avoiding overfitting limiting max number of images per class [23], dropout [2,8,12,14,15,31,37] Handling classimbalance class reweighting [3,8], data standardization [8], data augmentation (Table V) Other design features and optimizations leaky-ReLU [17], parametric ReLU [39], no data augmentation [17,18,23,25], fusing low-level and high-level features [4,47], depthwise separable CONV [4], asymmetric CONV [4], multi-scale images [44], SE block [20], SFFS [18], DECONV layers [21], SPP layer [45], Training on geometric data is done separately using an MLP configuration before feeding to the collaborative network. The MLP configuration is tuned using feature engineering of input data and adding Dropout layers. ...
Preprint
Full-text available
In recent years, there has been an enormous interest in using deep learning to classify underwater images to identify various objects like fishes, plankton, coral reefs, seagrass, submarines, and gestures of sea-divers. This classification is essential for measuring the water bodies' health and quality and protecting the endangered species. Further, it has applications in oceanography, marine economy and defense, environment protection, and underwater exploration and human-robot collaborative tasks. This paper presents a survey of deep learning techniques for performing the underwater image classification. We underscore the similarities and differences of several methods. We believe that underwater image classification is one of the killer application that would test the ultimate success of deep learning techniques. Towards realizing that goal, this survey seeks to inform researchers about state-of-the-art on deep learning on underwater images and also motivate them to push its frontiers forward.
... Most of the proposed work is based on CNN, some of them used a modified version of the VGG-16 model [5]- [7] while other researchers used AlexNet based models [8]. In addition, newly proposed methods were examined and compared to the known algorithms [9]- [12] to demonstrate better results. Some authors developed fish identification and classification methods for the underwater environment [5] where videos are captured for data collection and monitoring, these videos suffer challenges as underwater images have various noise issues in the background [11]. ...
... Then the image segmentation is taking place in the pre-processing to partitioning the image into three parts (head, body, and tail). Subsequently, the images are normalized to convert the images to a set of numbers between zero and one [9]. The output of the previous operations is stored in a comma-separated values (CSV) file once the (CSV) file is produced then the dataset can be feed to the neural network. ...
Conference Paper
Full-text available
Consumers of the fish market around the world face problems in the identification of fish species and people need to obtain expert assistance to do so. This situation is typically the same in Gaza Strip, the local fish market lacks such an application that exposes people to fraud by some sellers. On the other hand, many poisonous and exposed fishes were caught and sold in the fish market in Gaza Strip. Thus, in this work, an innovative mobile application is proposed for identifying fish species that are commonly available in the Mediterranean Sea and therefore in the local fish market. A considerable number of fish images as a dataset were obtained from El-Hesba and auction markets located next to Gaza Fishing Harbor of Gaza as well as aquaculture farms. The AlexNet was chosen for the proposed model. Moreover, RELU SOFTMAX was chosen for the main network the model exhibit accuracy of 80 %, 0.788 precision and recall of 0.631.
... Because of the difficulties to collect image data on species such as the blue whiting, Atlantic herring and Atlantic mackerel, Allken et al. (2018) have augmented their dataset with synthetic data of fish images [31]. These were generated by randomly selecting a cropped image of a fish and placing them on empty background, i.e., images in which there are no fish or other objects. ...
... Because of the difficulties to collect image data on species such as the blue whiting, Atlantic herring and Atlantic mackerel, Allken et al. (2018) have augmented their dataset with synthetic data of fish images [31]. These were generated by randomly selecting a cropped image of a fish and placing them on empty background, i.e., images in which there are no fish or other objects. ...
Article
Full-text available
People from all around the world face problems in the identification of fish species and users need to have access to scientific expertise to do so and, the situation is not different for Mauritians. Thus, in this project, an innovative smartphone application has been developed for the identification of fish species that are commonly found in the lagoons and coastal areas, including estuaries and the outer reef zones of Mauritius. Our dataset consists of 1520 images with 40 images for each of the 38 fish species that was studied. Eighty-percent of the data was used for training, ten percent was used for validation and the remaining ten percent was used for testing. All the images were first converted to the grayscale format before the application of a Gaussian blur to remove noise. A thresholding operation was then performed on the images in order to subtract the fish from the background. This enabled us to draw a contour around the fish from which several features were extracted. A number of classifiers such as kNN, Support Vector Machines, neural networks, decision trees and random forest were used to find the best performing one. In our case, we found that the kNN algorithm achieved the highest accuracy of 96%. Another model for the recognition was created using the TensorFlow framework which produced an accuracy of 98%. Thus, the results demonstrate the effectiveness of the software in fish identification and in the future, we intend to increase the number of fish species in our dataset and to tackle challenging issues such as partial occlusions and pose variations through the use of more powerful deep learning architectures.
... The methodology of this study is shown in Figure 30 with decision tree and biochemistry analysis (AST, TP, TRIG, CHOL, and GLU collected from blood) [100]. Allken et al. [101] used Deep Vision camera to take images from the marine stock of marine stock. These images are materials for deploying a deep learning neural network to automate the classification of species. ...
... The results show that a classification accuracy of 94% was achieved for blue whiting, Atlantic herring, and Atlantic mackerel, showing that automatic species classification is a viable and efficient approach, and further that using synthetic data can effectively mitigate the all too common lack of training data. Figure 31 presents the procedure to identify the fish [101]. The sample fish images extract from actual ones and attach on the empty background at any position with random size and directions. ...
Article
Full-text available
Smart aquaculture is nowadays one of the sustainable development trends for the aquaculture industry in intelligence and automation. Modern intelligent technologies have brought huge benefits to many fields including aquaculture to reduce labor, enhance aquaculture production, and be friendly to the environment. Machine learning is a subdivision of artificial intelligence (AI) by using trained algorithm models to recognize and learn traits from the data it watches. To date, there are several studies about applications of machine learning for smart aquaculture including measuring size, weight, grading, disease detection, and species classification. This review provides and overview of the development of smart aquaculture and intelligent technology. We summarized and collected 100 articles about machine learning in smart aquaculture from nearly 10 years about the methodology, results as well as the recent technology that should be used for development of smart aquaculture. We hope that this review will give readers interested in this field useful information.
... Figure 1 shows some relevant examples. Second, real-world datasets for classification are limited in their scale in comparison to benchmark datasets, with limited representative power in terms of number of species (Allken et al., 2019;Costa et al., 2013;Ding et al., 2017;Larsen et al., 2009;Lee et al., 2008;Ogunlana et al., 2015;Rathi et al., 2018;Rauf et al., 2019), or number of images per species (Lee et al., 2003;Rodrigues et al., 2010). This is especially true for rare species (Villon et al., 2021). ...
... It also saves model development time and boosts classification performance, especially when the available task-specific training sets are small (Yosinski et al., 2014). This technique has already been successfully applied in other prior works on fish classification (Allken et al., 2019;Siddiqui et al., 2018) and fish detection (Salman et al., 2019). ...
Article
Full-text available
1. Species classification is an important task that is the foundation of industrial, commercial, ecological, and scientific applications involving the study of species distributions, dynamics, and evolution. 2. While conventional approaches for this task use off‐the‐shelf machine learning (ML) methods such as existing Convolutional Neural Network (ConvNet) architectures, there is an opportunity to inform the ConvNet architecture using our knowledge of biological hierarchies among taxonomic classes. 3. In this work, we propose a new approach for species classification termed Hierarchy‐Guided Neural Network (HGNN), which infuses hierarchical taxonomic information into the neural network’s training to guide the structure and relationships among the extracted features. We perform extensive experiments on an illustrative use‐case of classifying fish species to demonstrate that HGNN outperforms conventional ConvNet models in terms of classification accuracy, especially under scarce training data conditions. 4. We also observe that HGNN shows better resilience to adversarial occlusions, when some of the most informative patch regions of the image are intentionally blocked and their effect on classification accuracy is studied.
... Some projects are still in the process of applying a recognition system in an uncontrolled and complicated environment such as the sea floor, in addition to having a camera in motion (Chuang et al. 2016;Siddiqui et al. 2018;Allken et al. 2019). For example, Chuang et al. (2016) use a natural environment with the help of a moving robot, so having control of the background as the previous examples is impossible. ...
... Siddiqui et al. (2018) work with images of fish in their natural environment, they have cropped training images; in this work they are not looking for a region of interest, they are still in a preimplementation system for identification with multiple detections of fish and pointing them out in their coordinates in a more complex image. Allken et al. (2019) developed an acoustic sonar fish identification system based on a simulated environment. The distance between the sea bottom and the fish allows sonar to obtain contrasting images between both elements, facilitating their separation. ...
Article
Full-text available
Knowledge and monitoring of invasive species are fundamental measures to determine the short- and long-term effect on invaded ecosystems, in addition to developing strategies to control the problem or its specific solution. In this context, the lionfish is an invasive species that worries managers and scientists of fisheries and marine conservation, this is due to the affected area that spread starting from the east coast of the United States to the coasts of Brazil and it is recently extending to include the Mediterranean Sea. The diet of the invasive fish is small species of fish, crustaceans and invertebrates; the consequent damage is the decrease of food for species at the next level of the food chain and the lack of species to keep coral reefs healthy. In this paper, we propose a lionfish detection system that will be installed in an autonomous underwater vehicle, as part of a monitoring strategy that will allow real-time determination of the number of Lionfish, their location and without human intervention. We compared two detection systems, namely YOLOv4 and SSD-Mobilenet-v2, by training with cross-validation and evaluation with the test set we obtained the best model with 63.66% recall, 89.79% precision, and 79.15% mAP with images in the natural environment, implemented on NVIDIA's Jetson Nano embedded system.
... By inputting the swimming trajectories of different fish under different water quality into the model, the user can easily judge the water environment. Based on this, the water environment could be continuously monitored and be effectively managed, which will be helpful in achieving the sustainable exploitation of marine natural resources [5,6]. ...
... Although details can be removed by the erosion operation, the necessary classification features are also eliminated. Therefore, it is necessary to perform the dilation operation, for which its logical procedure is shown in Formula (6). ...
Article
Full-text available
With the development of computer science technology, theory and method of image segmentation are widely used in fish discrimination, which plays an important role in improving the efficiency of fisheries sorting and biodiversity studying. However, the existing methods of fish images segmentation are less accurate and inefficient, which is worthy of in-depth exploration. Therefore, this paper proposes an atrous pyramid GAN segmentation network aimed at increasing accuracy and efficiency. This paper introduces an atrous pyramid structure, and the GAN module is added before the CNN backbone in order to augment the dataset. The Atrous pyramid structure first fuses the input and output of the dilated convolutional layer with a small sampling rate and then feeds the fused features into the subsequent dilated convolutional layer with a large sampling rate to obtain dense multiscale contextual information. Thus, by capturing richer contextual information, this structure improves the accuracy of segmentation results. In addition to the aforementioned innovation, various data enhancement methods, such as MixUp, Mosaic, CutMix, and CutOut, are used in this paper to enhance the model’s robustness. This paper also improves the loss function and uses the label smoothing method to prevent model overfitting. The improvement is also tested by extensive ablation experiments. As a result, our model’s F1-score, GA, and MIoU were tested on the validation dataset, reaching 0.961, 0.981, and 0.973, respectively. This experimental result demonstrates that the proposed model outperforms all the other contrast models. Moreover, in order to accelerate the deployment of the encapsulated model on hardware, this paper optimizes the execution time of the matrix multiplication method on Hbird E203 based on Strassen’s algorithm to ensure the efficient operation of the model on this hardware platform.
... However, state-of-the-art machine learning methods, such as deep learning, typically require a large number of labelled examples to train on. Therefore, one of the principal bottlenecks in using machine learning to facilitate analysis of marine data, is the current lack of annotated data [1,2,7]. ...
... As scarce data can impede machine learning techniques, several approaches, including data generation and augmentation, are proposed to improve the performance. For example, data generation has previously been used in a study by Allken et al. [7], where synthetic samples were leveraged to expand the training dataset for a deep learning algorithm identifying fish species from video data. ...
Article
Full-text available
Driven by the unprecedented availability of data, machine learning has become a pervasive and transformative technology across industry and science. Its importance to marine science has been codified as one goal of the UN Ocean Decade. While increasing amounts of, for example, acoustic marine data are collected for research and monitoring purposes, and machine learning methods can achieve automatic processing and analysis of acoustic data, they require large training datasets annotated or labelled by experts. Consequently, addressing the relative scarcity of labelled data is, besides increasing data analysis and processing capacities, one of the main thrust areas. One approach to address label scarcity is the expert-in-the-loop approach which allows analysis of limited and unbalanced data efficiently. Its advantages are demonstrated with our novel deep learning-based expert-in-the-loop framework for automatic detection of turbulent wake signatures in echo sounder data. Using machine learning algorithms, such as the one presented in this study, greatly increases the capacity to analyse large amounts of acoustic data. It would be a first step in realising the full potential of the increasing amount of acoustic data in marine sciences.
... From that 144 images, 80 was separated for training consists of 40 samples with pesticide The Random Forest classifier got the best performance of 96.87% for fish pupil and 93.75% for the fisheye. [21] also applied CNN for the identification of fish specimens but training with synthetic data from Blue Whiting, Atlantic Herring, and Atlantic Mackerel. ...
... Rathi et al. (2017) [20] got good results using Convolutional Neural Networks-CNN and Image Processing for fish image classification in an underwater environment achieving an accuracy of 96.29% tested in the Fish4Knowledge dataset containing 27,142 images.Allken et al. (2018) ...
Conference Paper
Full-text available
Color recognition is an important step for computer vision to be able to recognize objects in the most different environmental conditions. Classifying objects by color using computer vision is a good alternative for different color conditions such as the aquarium. In which it is possible to use resources of a smartphone with real-time image classification applications. This paper presents some experimental results regarding the use of five different feature extraction techniques to the problem of fish species identification. The feature extractors tested are the Bag of Visual Words (BoVW), the Bag of Colors (BoC), the Bag of Features and Colors (BoFC), the Bag of Colored Words (BoCW), and the histograms HSV and RGB color spaces. The experiments were performed using a dataset, which is also a contribution of this work, containing 1120 images from fishes of 28 different species. The feature extractors were tested under three different supervised learning setups based on Decision Trees, K-Nearest Neighbors, and Support Vector Machine. From the attribute extraction techniques described, the best performance was BoC using the Support Vector Machines as a classifier with an FMeasure of 0.90 and AUC of 0.983348 with a dictionary size of 2048.
... Most of the instrumentation proposed for each module has already been developed and implemented in some existing cabled observatories, though significant improvements in capabilities and reduction in cost are needed (see review in Aguzzi et al. 2019). In addition, much of the software needed to realise large-scale observatory networks that are useful to fisheries scientists, resource managers, and ecologists are still in the early stages of development (Allken et al. 2018, Juanes 2018, Marini et al. 2018a. Therefore, the development of data delivery systems that are accessible to a wide range of stakeholders from different disciplines and backgrounds is of vital importance for the effective use of ocean observatories for fisheries and ecological applications (Pearlman et al. 2019). ...
... Automated detection and classification methodologies based on the various observation technologies are rapidly advancing (e.g. Allken et al. 2018, Juanes 2018, Marini et al. 2018b). However, we suggest that the concept of an ecosystem observatory user data interface would greatly enhance the application, testing, and quality control of detection algorithms by providing a simple computer interface for user-aided system learning (see 'Cyber developments in support of monitoring networks' section subsequently; Figure 8). ...
Chapter
Full-text available
Four operational factors, together with high development cost, currently limit the use of ocean observatories in ecological and fisheries applications: 1) limited spatial coverage, 2) limited integration of multiple types of technologies, 3) limitations in the experimental design for in situ studies, and 4) potential unpredicted bias in monitoring outcomes due to the infrastructure's presence and functioning footprint. To address these limitations, we propose a novel concept of a standardised 'ecosystem observatory module' structure composed of a central node and three tethered satellite pods together with permanent mobile platforms. The module would be designed with a rigid spatial configuration to optimise overlap among multiple observation technologies, each providing 360° coverage of a cylindrical or hemi-spherical volume around the module, including permanent stereo video cameras, acoustic imaging sonar cameras, horizontal multibeam echosounders, and a passive acoustic array. The incorporation of multiple integrated observation technologies would enable unprecedented quantification of macrofaunal composition, abundance, and density surrounding the module, as well as the ability to track the movements of individual fishes and macroinvertebrates. Such a standardised modular design would allow for the hierarchical spatial connection of observatory modules into local module clusters and larger geographic module networks providing synoptic data within and across linked ecosystems suitable for fisheries and ecosystem-level monitoring on multiple scales.
... Machine learning systems are typically trained to emulate human curation, and thus a natural application is to use such systems to replace labor intensive steps in existing analysis pipelines. Reliance of manual curation is currently limiting effective data use, and automatic systems can reduce cost or increase throughput, for instance identifying fish species from images (Allken et al., 2019;Siddiqui et al., 2018;Villon et al., 2018), fish trajectory estimation (Beyan et al., 2018), or automatic age reading of otoliths (Moen et al., 2018). The latter is perhaps of particular interest, as it demonstrates that a deep learning can obtain an accuracy comparable to human curators. ...
... PyTorch (Paszke et al., 2017) is another popular framework combining ease of use with expressive power. These frameworks are general and can be adapted to challenges in the marine domain with relative ease (Allken et al., 2019;Moen et al., 2018;Siddiqui et al., 2018;Villon et al., 2018). The vast number of online tutorials and documentation is a major advantage, and pre-trained models are available from public repositories (often referred to as model zoos). ...
Article
Oceans constitute over 70% of the earth's surface, and the marine environment and ecosystems are central to many global challenges. Not only are the oceans an important source of food and other resources, but they also play a important roles in the earth's climate and provide crucial ecosystem services. To monitor the environment and ensure sustainable exploitation of marine resources, extensive data collection and analysis efforts form the backbone of management programmes on global, regional, or national levels. Technological advances in sensor technology, autonomous platforms, and information and communications technology now allow marine scientists to collect data in larger volumes than ever before. But our capacity for data analysis has not progressed comparably, and the growing discrepancy is becoming a major bottleneck for effective use of the available data, as well as an obstacle to scaling up data collection further. Recent years have seen rapid advances in the fields of artificial intelligence and machine learning, and in particular, so-called deep learning systems are now able to solve complex tasks that previously required human expertise. This technology is directly applicable to many important data analysis problems and it will provide tools that are needed to solve many complex challenges in marine science and resource management. Here we give a brief review of recent developments in deep learning, and highlight the many opportunities and challenges for effective adoption of this technology across the marine sciences.
... Inception-V3 was used in many classification problems like flower [4] and fish species identification [5]. More specifically, Inception-V3 was used to classify animal images from the Caltech Camera Traps (CCT) dataset into 16 different categories with an accuracy of 79.17% [6]. ...
... Klasifikasi spesies ikan [7] komputer visi [6] Identifikasi spesies ikan [8] Gambar 1. Aplikasi ML dalam identifikasi citra ikan. ...
Article
Full-text available
Wetlands are habitats commonly used for fish cultivation. South Kalimantan is one of the provinces that has a wetland area, which is 11,707,400ha, there are 67 rivers and an estimated 200 species of fish. This shows the abundant wealth of fish treasures and economic value. The study of fish identification is an important subject for the preservation of wetland fish. In the field of artificial intelligence, identification can be done using Machine Learning (ML). There are many libraries, a collection of functions that can be used in ML development, one of which is Tensorflow. In this paper, we survey a variety of literature on the use of Tensorflow, as well as datasets, algorithms, and methods that can be used in developing wetland area fish image identification applications. The results of the literature survey show that Tensorflow can be used for the development of fish character identification applications. There are many datasets that can be used such as MNIST, Oxford-I7, Oxford-102, LHI-Animal-Faces, Taiwan marine fish, KTH-Animal, NASNet, ResNet, and MobileNet. Classification methods that can be used to classify fish images include CNN, R-CNN, DCNN, Fast R-CNN, kNN, PNN, Faster R-CNN, SVM, LR, RF, PCA and KFA. Tensorflow provides many models that can be used for image classification, including Inception-v3 and MobileNets, and supports models such as CNN, RNN, RBM, and DBN. To speed up the classification process, image dimensions can be reduced using the MDS, LLE, Isomap, and SE algorithms.
... Whilst synthetic image generation increases the size and diversity of species or species groups in the minority class, it must also ensure that the data remains representative of them and does not introduce significant noise. , Allken et al. (2019) and Zheng et al. (2018) augmented training datasets for fish detection and recognition by applying different transformations (i.e., flipping, shifting, blurring, rotating and scaling) to annotated fish from training images. ...
Article
Electronic monitoring (EM) is increasingly used to monitor catch and bycatch in wild capture fisheries. EM video data is still manually reviewed and adds to on-going management costs. Computer vision, machine learning, and artificial intelligence-based systems are seen to be the next step in automating EM data workflows. Here we show some of the obstacles we have confronted, and approaches taken as we develop a system to automatically identify and count target and bycatch species using cameras deployed to an industry vessel. A Convolutional Neural Network was trained to detect and classify target and bycatch species groups, and a visual tracking system was developed to produce counts. The multiclass detector achieved a mean Average Precision of 53.42%. Based on the detection results, the visual tracking system provided automatic fish counts for the test video data. Automatic counts were within two standard deviations of the manual counts for the target species, and most times for the bycatch species. Unlike other recent attempts, weather and lighting conditions were largely controlled by mounting cameras under cover.
... Although there are still challenges ahead, machine learning (ML) is entering marine science on a broad scale (Malde et al. 2020). Automated fish identification from images (Allken et al. 2019), age-reading from otoliths (Moen et al. 2018 ABSTRACT: Marine organisms are subject to environmental variability on various temporal and spatial scales, which affect processes related to growth and mortality of different life stages. Marine scientists are often faced with the challenge of identifying environmental variables that best explain these processes, which, given the complexity of the interactions, can be like searching for a needle in the proverbial haystack. ...
Article
Full-text available
Marine organisms are subject to environmental variability on various temporal and spatial scales, which affect processes related to growth and mortality of different life stages. Marine scientists are often faced with the challenge of identifying environmental variables that best explain these processes, which, given the complexity of the interactions, can be like searching for a needle in the proverbial haystack. Even after initial hypothesis-based variable selection, a large number of potential candidate variables can remain if different lagged and seasonal influences are considered. To tackle this problem, we propose a machine learning framework that incorporates important steps in model building, ranging from environmental signal extraction to automated variable selection and model validation. Its modular structure allows for the inclusion of both parametric and machine learning models, like random forest. Unsupervised feature extractions via empirical orthogonal functions (EOFs) or self-organising maps (SOMs) are demonstrated as a way to summarize spatiotemporal fields for inclusion in predictive models. The proposed framework offers a robust way to reduce model complexity through a multi-objective genetic algorithm (NSGA-II) combined with rigorous cross-validation. We applied the framework to recruitment of the North Sea cod stock and investigated the effects of sea surface temperature (SST), salinity and currents on the stock via a modified version of random forest. The best model (5-fold CV r ² = 0.69) incorporated spawning stock biomass and EOF-derived time series of SST and salinity anomalies acting through different seasons, likely relating to differing environmental effects on specific life-history stages during the recruitment year.
... Further, in terms of small samples and limited data sets, researchers have also developed several methods to classify fish premised on transfer learning techniques. Jin and Liang (2017) and Allken et al. (2019) pre-trained on the ImageNet classification data sets to acquire model parameters, and subsequently optimized the CNN model through the actual data set to classify fish. Conversely, Mathur et al. (2020) utilized cross convolution layer pooling to improve the classification accuracy of transfer learning in the pre-trained CNN model, with the accuracy rate reaching 98.03%. ...
Article
Among the background of developments in automation and intelligence, machine learning technology has been extensively applied in aquaculture in recent years, providing a new opportunity for the realization of digital fishery farming. In the present paper, the machine learning algorithms and techniques adopted in intelligent fish aquaculture in the past five years are expounded, and the application of machine learning in aquaculture is explored in detail, including the information evaluation of fish biomass, the identification and classification of fish, behavioral analysis and prediction of water quality parameters. Further, the application of machine learning algorithms in aquaculture is outlined, and the results are analyzed. Finally, several current problems in aquaculture are highlighted, and the development trend is considered.
... A notable subtype of supervised machine learning that has risen to prominence in recent years is that of artificial neural networks, so named due to their resemblance to human brain neurons (Hopfield, 1982). Neural networks are excellent at learning complex patterns within labeled datasets and can be trained on a broad range of data types beyond numerical matrices, for example, sound files (e.g., Sprengel et al., 2016), still images (Allken et al., 2019), or video (Fan et al., 2016). These data, however, must be labeled with some form of output-it is this label that the neural network is ultimately being trained to predict. ...
Article
Aim The study of biogeographic barriers is instrumental in understanding the evolution and distribution of taxa. With the increasing availability of empirical datasets, emergent patterns can be inferred from communities by synthesizing how barriers filter and structure populations across species. We assemble phylogeographic data across a barrier and perform spatially explicit simulations, quantifying spatiotemporal patterns of divergence, the influence of traits on these patterns, and the statistical power needed to differentiate diversification modes. Taxon Vertebrates, Invertebrates, Plants Location North America Methods We incorporate published datasets, from papers that match relevant keywords, to examine taxa around the Cochise Filter Barrier, separating the Sonoran and Chihuahuan Deserts of North America, to synthesize phylogeographic structuring across the communities with respect to organismal functional traits. We then use simulation and machine learning to assess the power of phylogeographic model selection. Results Taxa distributed across the Cochise Filter Barrier show heterogeneous responses to the barrier in levels of gene flow, phylogeographic structure, divergence timing, barrier width, and divergence mechanism. These responses correlate with locomotor and thermoregulatory traits. Many taxa show a Pleistocene population genetic break, often with introgression after divergence. Allopatric isolation and isolation by environment are the primary mechanisms structuring genetic divergence within taxa. Simulations reveal that in spatially explicit isolation with migration models across the barrier, age of divergence, presence of gene flow, and presence of isolation by distance can confound the interpretation of evolutionary history and model selection by producing easily confusable results. We re‐analyze five empirical genetic datasets to illustrate the utility of these simulations despite these constraints. Main Conclusions By synthesizing phylogeographic data for the Cochise Filter Barrier, we show that barriers interact with species traits to differentiate taxa in communities over millions of years. Identifying diversification modes across the barrier for these taxa remains challenging because commonly invoked demographic models may not be identifiable across a range of likely parameter space.
... CNNs [25,26] use deep architectures which are very useful for image processing. In a biological context CNNs have been used to classify plants [27,28] and fish [29][30][31][32][33][34]. ...
Article
Full-text available
Visual characteristics are among the most important features for characterizing the phenotype of biological organisms. Color and geometric properties define population phenotype and allow assessing diversity and adaptation to environmental conditions. To analyze geometric properties classical morphometrics relies on biologically relevant landmarks which are manually assigned to digital images. Assigning landmarks is tedious and error prone. Predefined landmarks may in addition miss out on information which is not obvious to the human eye. The machine learning (ML) community has recently proposed new data analysis methods which by uncovering subtle features in images obtain excellent predictive accuracy. Scientific credibility demands however that results are interpretable and hence to mitigate the black-box nature of ML methods. To overcome the black-box nature of ML we apply complementary methods and investigate internal representations with saliency maps to reliably identify location specific characteristics in images of Nile tilapia populations. Analyzing fish images which were sampled from six Ethiopian lakes reveals that deep learning improves on a conventional morphometric analysis in predictive performance. A critical assessment of established saliency maps with a novel significance test reveals however that the improvement is aided by artifacts which have no biological interpretation. More interpretable results are obtained by a Bayesian approach which allows us to identify genuine Nile tilapia body features which differ in dependence of the animals habitat. We find that automatically inferred Nile tilapia body features corroborate and expand the results of a landmark based analysis that the anterior dorsum, the fish belly, the posterior dorsal region and the caudal fin show signs of adaptation to the fish habitat. We may thus conclude that Nile tilapia show habitat specific morphotypes and that a ML analysis allows inferring novel biological knowledge in a reproducible manner.
... Furthermore, some generated data were used to augment synthetic approaches helping in modeling cross-scale phenomena (Makowski et al. 2019). In almost all situations, the emphasis was placed not on synthetic data, which was a methodological step, but on answering a broader research question (Allken et al. 2019). ...
Article
Many experiments are not feasible to be conducted as factorials. Simulations using synthetically generated data are viable alternatives to such factorial experiments. The main objective of the present research is to develop a methodology and a platform to synthetically generate spatially explicit forest ecosystems represented by points with a predefined spatial pattern. Using algorithms with polynomial complexity and parameters that control the number of clusters, the degree of clusterization, and the proportion of nonrandom trees, we show that spatially explicit forest ecosystems can be generated time efficiently, which enable large factorial simulations. The proposed method was tested on 1200 synthetically generated forest stands, each of 25 ha, using 10 spatial indices: Clark-Evans aggregation index, Ripley’s K, Besag’s L, Morisita’s dispersion index, Greig-Smith index, the size dominance index of Hui, index of nonrandomness of Pielou, directional index and mean directional index of Corral-Rivas, and size differentiation index of Von Gadow. The size of individual trees was randomly generated aiming at variograms like real forests. We obtained forest stands with the expected spatial arrangement and distribution of sizes in less than one hour. To ensure replicability of the study we have provided a free fully functional software that executes the stated tasks.
... Accurate identification of species is the basis of taxonomic research. Handegard et al. [228] used a deep learning model to classify the species present in the image automatically. ...
Article
Full-text available
Object detection is a fundamental but challenging issue in the field of generic image analysis; it plays an important role in a wide range of applications and has been receiving special attention in recent years. Although there are enomerous methods exist, an in-depth review of the literature concerning generic detection remains. This paper provides a comprehensive survey of recent advances in visual object detection with deep learning. Covering about 300 publications that we survey 1) region proposal-based object detection methods such as R-CNN, SPPnet, Fast R-CNN, Faster R-CNN, Mask RCN, RFCN, FPN, 2) classification/regression base object detection methods such as YOLO(v2 to v5), SSD, DSSD, RetinaNet, RefineDet, CornerNet, EfficientDet, M2Det 3) Some latest detectors such as, relation network for object detection, DCN v2, NAS FPN. Moreover, five publicly available benchmark datasets and their standard evaluation metrics are also discussed. We mainly focus on the application of deep learning architectures to five major applications, namely Object Detection in Surveillance, Military, Transportation, Medical, and Daily Life. In the survey, we cover a variety of factors affecting the detection performance in detail, such as i) a wide range of object categories and intra-class variations, ii) limited storage capacity and computational power. Finally, we finish the survey by identifying fifteen current trends and promising direction for future research.
... Indeed, ecologists collect untold exobytes of image and video data, including other marine organisms such as fish, benthic organisms (Beyan and Browman 2020), mammals (O'Connell et al. 2010;Karnowski et al. 2016), and freshwater benthic organisms (Miloševi c et al. 2020). Research teams that produce such data have already begun to leverage ML techniques to analyze their data, largely relying on object detection and taxonomic classification approaches (Allken et al. 2019;Kloster et al. 2020;Mahmood et al. 2020). There remains much to be learned by studying these data streams with an eye toward trait-based analyses. ...
Article
Full-text available
Plankton imaging systems supported by automated classification and analysis have improved ecologists' ability to observe aquatic ecosystems. Today, we are on the cusp of reliably tracking plankton populations with a suite of lab‐based and in situ tools, collecting imaging data at unprecedentedly fine spatial and temporal scales. But these data have potential well beyond examining the abundances of different taxa; the individual images themselves contain a wealth of information on functional traits. Here, we outline traits that could be measured from image data, suggest machine learning and computer vision approaches to extract functional trait information from the images, and discuss promising avenues for novel studies. The approaches we discuss are data agnostic and are broadly applicable to imagery of other aquatic or terrestrial organisms.
... Because of the limited training data for fish classification, strategies that can reduce the training data size requirement were heavily utilized to develop full deep learning models for fish classification. The strategies were transfer learning [31] and data augmentation [32]. Not only they were studied for modeling research, but deep-learning-based fish classification models were also physically implemented for underwater-drone [33] and aquarium [34]. ...
... Many taxonomic studies to date focused on the application of machine learning in the identification of plant species (Tan et al., 2020;Murat et al., 2017), land animals (Nguyen et al., 2017;Norouzzadeh et al., 2018), and insects (Thenmozhi, Dakshayani & Srinivasulu, 2020) while a small number of marine-related studies had been conducted with a focus on fish identification. Examples of these studies included the use of machine learning and deep learning methods for tracking and estimation of fish abundance (Marini et al., 2018), identification of fish species using whole-body images (Allken et al., 2018) and using otolith contours in fish species identification (Salimi, SK & Chong, 2016). ...
Article
Full-text available
Background Despite the high commercial fisheries value and ecological importance as prey item for higher marine predators, very limited taxonomic work has been done on cephalopods in Malaysia. Due to the soft-bodied nature of cephalopods, the identification of cephalopod species based on the beak hard parts can be more reliable and useful than conventional body morphology. Since the traditional method for species classification was time-consuming, this study aimed to develop an automated identification model that can identify cephalopod species based on beak images. Methods A total of 174 samples of seven cephalopod species were collected from the west coast of Peninsular Malaysia. Both upper and lower beaks were extracted from the samples and the left lateral views of upper and lower beak images were acquired. Three types of traditional morphometric features were extracted namely grey histogram of oriented gradient (HOG), colour HOG, and morphological shape descriptor (MSD). In addition, deep features were extracted by using three pre-trained convolutional neural networks (CNN) models which are VGG19, InceptionV3, and Resnet50. Eight machine learning approaches were used in the classification step and compared for model performance. Results The results showed that the Artificial Neural Network (ANN) model achieved the best testing accuracy of 91.14%, using the deep features extracted from the VGG19 model from lower beak images. The results indicated that the deep features were more accurate than the traditional features in highlighting morphometric differences from the beak images of cephalopod species. In addition, the use of lower beaks of cephalopod species provided better results compared to the upper beaks, suggesting that the lower beaks possess more significant morphological differences between the studied cephalopod species. Future works should include more cephalopod species and sample size to enhance the identification accuracy and comprehensiveness of the developed model.
... Simulated data can be used to get a large training dataset with varying features without need for field measurements (Ji et al., 2019). Simulated data has also been widely utilized to train convolutional neural networks in various applications, such as electron detection (van Schayck et al., 2020), ultrasound image enhancement (Perdios et al., 2018), and identification of fish species (Allken et al., 2019). A network trained with simulated data can potentially be directly applicable to real data (Nair et al., 2018), though real data may differ from the simulated data, and the network may need to be adjusted to address this difference (van Oort et al., 2019). ...
Article
Full-text available
The aim of our research was to examine whether simulated forest data can be utilized for training supervised classifiers. We included two classifiers namely the random forest classifier and the novel convolutional neural network classifier that utilizes feature images. We simulated tree parameters and created a feature vector for each tree. The original feature vector was utilised with random forest classifier. However, these feature vectors were also converted into feature images suitable for input into a YOLO (You Only Look Once) convolutional neural network classifier. The selected features were red colour, green colour, near-infrared colour, tree height divided by canopy diameter, and NDVI. The random forest classifier and convolutional neural network classifier performed similarly both with simulated data and field-measured reference data. As a result, both methods were able to identify correctly 97.5 % of the field-measured reference trees. Simulated data allows much larger training data than what could be feasible from field measurements.
... These results validate the feasibility of applying deep learning models to identify highly variable signals over a wide range of spatial and temporal scales. In order to solve the problem of classifying species in the images detected in the camera system, Allken et al. (Allken et al., 2019) introduced a deep learning network and developed a new training method. In experiments to classify blue whiting, Atlantic herring, and Atlantic mackerel, the results showed a classification accuracy of 94%. ...
... We can extract cropped images of fish and paste them onto randomly selected backgrounds while incorporating transformations. This approach will effectively generate thousands of new images from a handful of genuine images (Allken et al., 2018). As new shark images are ingested and validated, the Shark Detector will immediately use them, automatically funneling those images into the appropriate training datasets. ...
Article
Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation. We created a database of 53,345 shark images covering 219 species of sharks, and packaged object-detection and image classification models into a Shark Detector bundle. The Shark Detector recognizes and classifies sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: collecting occurrence records from photographs taken by the public or citizen scientists, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity. The Shark Detector can classify 47 species pertaining to 26 genera. It sorted heterogeneous datasets of images sourced from Instagram with 91% accuracy and classified species with 70% accuracy. It located sharks in baited remote footage and YouTube videos with 89% accuracy, and classified located subjects to the species level with 69% accuracy. All data-generation methods were processed without manual interaction. As media-based remote monitoring appears to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.
... Eickholt et al. (2020) implemented a CNN that can classify living fishes as they pass through a tunnel under barriers, in order to detect invasive species. Allken et al. (2019) implemented a CNN that classified images with multiple fish in a controlled environment. ...
Article
Full-text available
This paper presents and evaluates a method for detecting and counting demersal fish species in complex, cluttered, and occluded environments that can be installed on the conveyor belts of fishing vessels. Fishes on the conveyor belt were recorded using a colour camera and were detected using a deep neural network. To improve the detection, synthetic data were generated for rare fish species. The fishes were tracked over the consecutive images using a multi-object tracking algorithm, and based on multiple observations, the fish species was determined. The effect of the synthetic data, the amount of occlusion, and the observed dorsal or ventral fish side were investigated and a comparison with human electronic monitoring (EM) review was made. Using the presented method, a weighted counting error of 20% was achieved, compared to a counting error of 7% for human EM review on the same recordings.
... FC is defined as the process of distinguishing and perceiving fish species and families depending on their attributes by using image processing. It determines and classifies the objective fish into species depending on the similarity with the representative specimen image [5]. The recognition of fish species is widely considered a challenging research area due to difficulties such as distortion, noise, and segmentation error incorporated in the images [6]. ...
Article
Full-text available
In computer vision, image classification is one of the potential image processing tasks. Nowadays, fish classification is a wide considered issue within the areas of machine learning and image segmentation. Moreover, it has been extended to a variety of domains, such as marketing strategies. This paper presents an effective fish classification method based on convolutional neural networks (CNNs). The experiments were conducted on the new dataset of Bangladesh's indigenous fish species with three kinds of splitting: 80-20%, 75-25%, and 70-30%. We provide a comprehensive comparison of several popular optimizers of CNN. In total, we perform a comparative analysis of 5 different state-of-the-art gradient descent-based optimizers, namely adaptive delta (AdaDelta), stochastic gradient descent (SGD), adaptive momentum (Adam), adaptive max pooling (Adamax), Root mean square propagation (Rmsprop), for CNN. Overall , the obtained experimental results show that Rmsprop, Adam, Adamax performed well compared to the other optimization techniques used, while AdaDelta and SGD performed the worst. Furthermore, the experimental results demonstrated that Adam optimizer attained the best results in performance measures for 70-30% and 80-20% splitting experiments, while the Rmsprop optimizer attained the best results in terms of performance measures of 70-25% splitting experiments. Finally, the proposed model is then compared with state-of-the-art deep CNNs models. Therefore, the proposed model attained the best accuracy of 98.46% in enhancing the CNN ability in classification, among others. This is an open access article under the CC BY-SA license.
... The deployment of a convolutional neural network (CNN) trained on synthetic data was found to be useful for automatically detecting and classifying the aquatic species (blue whiting, Atlantic herring and Atlantic mackerel) captured in the images from the Deep Vision trawl camera system with high accuracy. These advanced methods can overcome the challenges in the interpretation of acoustic data from acoustic-trawl surveys [86] . Further, individual fish size measurement by automatic segmentation of underwater stereo images of fishes (blue whiting, saithe, redfish, Atlantic mackerel, Atlantic herring, velvet belly lanternshark and Norway pout) acquired by the Deep Vision imaging system using a Mask region CNN architecture is a promising application to monitor and reduce the catch of undersized fish in commercial trawling, as it overcomes the technical limitations of echosounders [89] . ...
Article
Full-text available
Aquaculture and fisheries sectors are finding ingenious ways to grow and meet the soaring human demand for nutrient-rich fish and seafood by efficiently utilizing the vast water resources and biodiversity of aquatic life on earth. This includes the progressive integration of information technology, data science and artificial intelligence with fishing and fish farming methods to enable intensification of aquaculture production, sustainable exploitation of natural fishery resources and mechanization-automation of allied activities. Exclusive data mining and machine learning systems are being developed to process complex datasets and perform intelligent tasks like analysing cause-effect associations, forecasting problems and providing smart-precision solutions for farming and catching fish. Considering the intensifying research and growing interest of stakeholders, in this review, we have consolidated basic information on the various practical applications of data mining and machine learning in aquaculture and fisheries domains from representative selection of scientific literature. This includes an overview of research and applications in 1) aquaculture activities such as monitoring and control of the production environment, optimization of feed use, fish biomass monitoring and disease prevention; 2) fisheries management aspects such as resource assessment, fishing, catch monitoring and regulation; 3) environment monitoring related to hydrology, primary production and aquatic pollution; 4) automation of fish processing and quality assurance systems; and 5) fish market intelligence, price forecasting and socioeconomics. While aquaculture has been relatively faster in integrating data mining and machine learning tools with advanced farming systems, capture fisheries is finding reliable methods to sort the complexities in data collection and processing. Finally, we have pointed out some of the challenges and future perspectives related to large-scale adoption.
... In recent years, deep learning, achieve high quality in pattern recognition and classification [8]. The most celebrate deep learning method for object recognition and classification is CNN (Convolutional neural network) [9]. Furthermore, CNN is considered one of the competent classifiers, especially in terms of its accuracy. ...
... This proposed CNN contained skip connections, much like the ResNet architecture [8], which obtained the highest training accuracy when compared to nonresidual and Support Vector Machine (SVM) models. Allken et al. [9] used a fish species classifier based on InceptionV3 using a synthetic 10,000-image dataset to obtain an accuracy of 94%. Hu and You [10] used ResNet18 on the 19,465-image Animal-10 dataset to obtain an accuracy of 92%. ...
Article
Full-text available
Camera traps deployed in remote locations provide an effective method for ecologists to monitor and study wildlife in a non-invasive way. However, current camera traps suffer from two problems. First, the images are manually classified and counted, which is expensive. Second, due to manual coding, the results are often stale by the time they get to the ecologists. Using the Internet of Things (IoT) combined with deep learning represents a good solution for both these problems, as the images can be classified automatically, and the results immediately made available to ecologists. This paper proposes an IoT architecture that uses deep learning on edge devices to convey animal classification results to a mobile app using the LoRaWAN low-power, wide-area network. The primary goal of the proposed approach is to reduce the cost of the wildlife monitoring process for ecologists, and to provide real-time animal sightings data from the camera traps in the field. Camera trap image data consisting of 66,400 images were used to train the InceptionV3, MobileNetV2, ResNet18, EfficientNetB1, DenseNet121, and Xception neural network models. While performance of the trained models was statistically different (Kruskal–Wallis: Accuracy H(5) = 22.34, p < 0.05; F1-score H(5) = 13.82, p = 0.0168), there was only a 3% difference in the F1-score between the worst (MobileNet V2) and the best model (Xception). Moreover, the models made similar errors (Adjusted Rand Index (ARI) > 0.88 and Adjusted Mutual Information (AMU) > 0.82). Subsequently, the best model, Xception (Accuracy = 96.1%; F1-score = 0.87; F1-Score = 0.97 with oversampling), was optimized and deployed on the Raspberry Pi, Google Coral, and Nvidia Jetson edge devices using both TenorFlow Lite and TensorRT frameworks. Optimizing the models to run on edge devices reduced the average macro F1-Score to 0.7, and adversely affected the minority classes, reducing their F1-score to as low as 0.18. Upon stress testing, by processing 1000 images consecutively, Jetson Nano, running a TensorRT model, outperformed others with a latency of 0.276 s/image (s.d. = 0.002) while consuming an average current of 1665.21 mA. Raspberry Pi consumed the least average current (838.99 mA) with a ten times worse latency of 2.83 s/image (s.d. = 0.036). Nano was the only reasonable option as an edge device because it could capture most animals whose maximum speeds were below 80 km/h, including goats, lions, ostriches, etc. While the proposed architecture is viable, unbalanced data remain a challenge and the results can potentially be improved by using object detection to reduce imbalances and by exploring semi-supervised learning.
... Sonar imagery is limiting when a fish lacks distinguishing morphology, including size, making species identification unreliable (Mueller et al. 2010), or when multiple fish inhabit the same sonar beam (Holmes et al. 2006), which affects detectability. Current research using machine learning and artificial intelligence to identify individual fish holds great promise for monitoring FAS (Allken et al. 2019). The active scanning provides live views which enable detailed investigation on how fish use FAS. ...
Technical Report
Full-text available
This study evaluated the relative effectiveness of three broad groups of fish attractors at attracting Australian bass and golden perch in impoundments to improve recreational angling. Comparisons were made between sunken structures constructed from timber and synthetic materials and a novel suspended fish attractor design. Multiple lines of evidence were used to evaluate the response. All types of fish attractors were used by the target species and the prey that supports them. Angler catch, visitation rates and satisfaction with the fishery all improved following installation of the fish attractors.
... Sbragaglia et al. (2018) visade också att fiskar kan skilja på snorklare med och utan harpun. Förutom ovan nämnda tekniker förekommer nya tekniker för trål-och båt-monterade kameraundersökningar i litteraturen (Allken et al. 2019, DeCelles et al. 2017. Sedan början av 2000-talet har man också arbetat framgångsrikt med tekniker och kameramontage även för inventering av djuphaven, alltså platser där ljusmängden är högst bristfällig (Sarradin et al. 2007). ...
Article
Full-text available
The Swedish Government has decided that authorities who use experimental animals shall establish strategies for their work with issues concerning 3R, ie. Replace, Reduce and Refine. The following synthesis aims to evaluate the possibilities of adapting SLU Aqua's current monitoring methodologies of aquatic resources and environments to reduce the number of dead and/or suffering fish and shellfish according to 3R. Sampling of fish and shellfish is one of the cornerstones of the follow-up to the different EU directives and regulations. Today's fisheries-independent sampling of fish and shellfish includes several different methods, most of which are invasive or fatal to the organisms (Table 3.2.1). Sampling methods may affect the well-being of the fish, both during the catch process and during the following analyzes. However, in recent years, less invasive and non-invasive techniques are being developed that may measure variables that previously required the fish to be killed or injured. These methods could be divided into four different main groups: acoustic; optical; established methods (pots and traps/fyke nets/electric fishing); and new methods (eDNA /Citizen science/ iEcology). Acoustic methods are used primarily for density and biomass estimation of pelagic fish, complemented with trawling to collect species and size distribution data. With new advanced sonar and analysis techniques, the possibility to assess species directly from acoustic data increases. Acoustics is also used on surface drones and to map the passage of fish in running water. Optical methods are used to measure three-dimensional objects based on two-dimensional photographic or digital images making it possible to monitor fish in environments where test fishing is not allowed. Cameras can be mounted on different types of submarines, as well as inside a trawl, to identify and measure fish continuously during a trawl haul. Established methods such as electric fishing are used for monitoring fish and crayfish in running water without causing mortality. Fishing takes place by wading or by boats equipped for electric fishing. Fyke nets 4 and other passive, non-lethal, sampling gears are used for monitoring demersal fish and in catch-recapture experiments. Similarly, traps and pots are used for live capture of shellfish. New methods such as eDNA (environmental DNA) is gaining increasing interest. By extracting and sequencing DNA from samples of sediment, soil, or water, the presence of different species can be detected. The technology has developed considerably and is nowadays a commonly employed method for obtaining qualitative samples of an area's species composition. Furthermore, by identifying the number of unique haplotypes, quantitative samples may be obtained. Citizen science uses information based on public voluntary reporting. An ongoing example is the hundreds of observations from the live broadcast from the common guillemot’s nesting shelves that are used to analyze how climate change affects the birds. There is also a growing interest in collecting voluntary data on catches and sizes of fish in sport fishing. iEcology is a relatively new concept, meaning that ecological questions could be answered by processing and analyzing a large amount of data that are published on the internet. By combining iEcology with citizen science, DTU Aqua in Denmark has used a mobile application to study both fish densities and overfishing. A change in monitoring methodology in order to reduce the number of fish that die or are injured is limited by the requirements or regulations by which the respective monitoring program is governed. International sampling programs are often coordinated between a number of participating countries. Another limitation is the risk of breaking long time-series. Time-series are necessary to understand the dynamics of fish and shellfish stocks and to produce the basis for management decisions. A change in monitoring methodology may also be limited by the requirements for evaluating the status and pressure on fish stocks in order to achieve sustainable management of commercial and recreational fisheries. Possibilities for a developed monitoring with regard to 3R: 3R - replace. If an established method is to be replaced by a new non-invasive or less invasive methodology, the introduction must take place in parallel with the current methodology in order to evaluate the novel method in relation to the study's aims and requirements. Sampling based on commercial fishing is advantageous from an animal ethics point of view, as the individuals have already been caught and killed for another purpose. In addition, it can take place throughout the year, unlike fisheries-independent sampling that is often performed on single occasions. 3R - reduce. To minimize the number of individuals being killed, monitoring can be optimized by reducing the number of individuals affected and weighing this against precision, accuracy and statistical power for included variables and indicators. This can be an option when lethal sampling and analyzing methodology generate time-series that cannot, should or may not be broken, or cannot be replaced by non-lethal methodology. A combination of acoustics and stereo video can provide significant benefits where trawling is used to verify species and size composition in acoustic data. 3R - refine. New non-invasive and established invasive methodologies can be combined to investigate the effectiveness and outcome of both methods. A purpose can also be to increase the precision of the monitoring without causing further harm to the individuals. By adding eDNA to established methods, additional information on species abundance and biodiversity can be obtained. Improved, less harmful handling of caught fish and shellfish will also reduce injuries and suffering.
... Studies have examined the photo-response of rock bream (Jang et al., 2019). Artificial intelligence algorithms, including a deep-learning neural network, have been developed for identifying fish species (Allken et al., 2019). However, to solve problems associated with automated fish health monitoring systems in aquaculture, it is still necessary to develop an algorithm that can detect abnormal fish behaviours, which may require training and validation datasets. ...
Article
Various approaches have been applied to transform aquaculture from a manual, labour-intensive industry to one dependent on automation technologies in the era of the fourth industrial revolution. Technologies associated with the monitoring of physical condition have successfully been applied in most aquafarm facilities; however, real-time biological monitoring systems that can observe fish condition and behaviour are still required. In this study, we used a video recorder placed on top of a fish tank to observe the swimming patterns of rock bream (Oplegnathus fasciatus), first one fish alone and then a group of five fish. Rock bream in the video samples were successfully identified using the you-only-look-once v3 algorithm, which is based on the Darknet-53 convolutional neural network. In addition to recordings of swimming behaviour under normal conditions, the swimming patterns of fish under abnormal conditions were recorded on adding an anaesthetic or lowering the salinity. The abnormal conditions led to changes in the velocity of movement (3.8 ± 0.6 cm/s) involving an initial rapid increase in speed (up to 16.5 ± 3.0 cm/s, upon 2-phenoxyethanol treatment) before the fish stopped moving, as well as changing from swimming upright to dying lying on their sides. Machine learning was applied to datasets consisting of normal or abnormal behaviour patterns, to evaluate the fish behaviour. The proposed algorithm showed a high accuracy (98.1%) in discriminating normal and abnormal rock bream behaviour. We conclude that artificial intelligence-based detection of abnormal behaviour can be applied to develop an automatic bio-management system for use in the aquaculture industry.
Article
The recent advancement in data science coupled with the revolution in digital and satellite technology has improved the potential for artificial intelligence (AI) applications in the forestry and wildlife sectors. India shares 7% of global forest cover and is the 8th most biodiverse region in the world. However, rapid expansion of developmental projects, agriculture, and urban areas threaten the country’s rich biodiversity. Therefore, the adoption of new technologies like AI in Indian forests and biodiversity sectors can help in effective monitoring, management, and conservation of biodiversity and forest resources. We conducted a systematic search of literature related to the application of artificial intelligence (AI) and machine learning algorithms (ML) in the forestry sector and biodiversity conservation across globe and in India (using ISI Web of Science and Google Scholar). Additionally, we also collected data on AI-based startups and non-profits in forest and wildlife sectors to understand the growth and adoption of AI technology in biodiversity conservation, forest management, and related services. Here, we first provide a global overview of AI research and application in forestry and biodiversity conservation. Next, we discuss adoption challenges of AI technologies in the Indian forestry and biodiversity sectors. Overall, we find that adoption of AI technology in Indian forestry and biodiversity sectors has been slow compared to developed, and to other developing countries. However, improving access to big data related to forest and biodiversity, cloud computing, and digital and satellite technology can help improve adoption of AI technology in India. We hope that this synthesis will motivate forest officials, scientists, and conservationists in India to explore AI technology for biodiversity conservation and forest management.
Article
This study developed a method for automatically measuring fish-body sizes with a stereo-vision system calibrated using a three-dimensional frame for accuracy and precision. The three-dimensional frame was installed in a water tank and photographed using a stereo camera to obtain the parameters for calibration. An optical character recognition technique was used to detect the feature points on the frame. All the feature points could be detected, and the correct combinations of the points were matched automatically in the stereo images. To obtain the fish-body size, the snouts and tails of goldfish in the tank were detected in the stereo video sequences using the Faster R-CNN image recognition technique, and the fish-body lengths were calculated automatically. The accuracy and precision of the automatic calibration system were equivalent to the manual calibration, whereas those of the automatic fish-body size measurements were lower than the manual measurement. The automatic processes of the calibration and the fish body-size measurement were about 96% and 90% faster than those in the manual process. The issues of the accuracy and precision of fish-body size measurements can be resolved in the future by improving the image recognition accuracy.
Article
Scientific studies on species identification in fish have considerable significance in aquatic ecosystems and quality evaluation. The morphological differences between different fish species are obvious. Machine learning methods use artificial prior knowledge to extract fish features, which is time-consuming, laborious, and subjective. Recently, deep learning-based identification of fish species has been widely used. However, fish species identification still faces many challenges due to the small scale of fish samples and the imbalance of the number of categories. For example, the model is prone to being overfitted, and the performance of the classifier is biased to the fish species of most samples. To solve the above problems, this paper proposes a fish species identification approach based on SE-ResNet152 and class-balanced focal loss. First, visualization analysis and image preprocessing of fish datasets are carried out. Second, the SE-ResNet152 model is constructed as a generalized feature extractor and is migrated to the target dataset. Finally, we apply the class-balanced focal loss function to train the SE-ResNet152 model, and realize fish species identification on three fish image views (body, head, and scale). The proposed method was tested on the Fish-Pak public dataset and achieved 98.80%, 96.67%, and 91.25% accuracy on the three fish image views, respectively. To ensure the superior performance of the proposed method, we performed an experimental comparison with other methods involving SENet154, DenseNet121, ResNet18, ResNet152, VGG16, cross-entropy, and focal loss. Comprehensive empirical analyses reveal that the proposed method achieves good performance on the three fish image views and outperforms common methods.
Article
Full-text available
In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications ranging from human-computer interaction (HCI), augmented/virtual reality (AR/VR), and even therapeutic range of motion for medical applications. While most current solutions rely on optical or depth cameras, these methods require ideal lighting and temperature conditions. mmWave radar devices have recently emerged as a promising alternative offering low-cost system-on-chip sensors whose output signals contain precise spatial information even in non-ideal imaging conditions. Additionally, deep convolutional neural networks have been employed extensively in image recognition by learning both feature extraction and classification simultaneously. However, little work has been done towards static gesture recognition using mmWave radars and CNNs due to the difficulty involved in extracting meaningful features from the radar return signal, and the results are inferior compared with dynamic gesture classification. This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing “sterile” images which aid in distinguishing distinct features among the static gestures and subsequently improve the classification accuracy. Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from 85% to 93% and 90% to 95% for range and range-angle profiles, respectively.
Article
Automatic classification of different species of fish is important for the comprehension of marine ecology, fish behaviour analysis, aquaculture management, and fish health monitoring. In recent years, many automatic classification methods have been developed, among which machine vision-based classification methods are widely used with the advantages of being fast and non-destructive. In addition, the successful application of rapidly emerging deep learning techniques in machine vision has brought new opportunities for fish classification. This paper provides an overview of machine vision models applied in the field of fish classification, followed by a detailed discussion of specific applications of various classification methods. Furthermore, the challenges and future research directions in the field of fish classification are discussed. This paper would help researchers and practitioners to understand the applicability of machine vision in fish classification and encourage them to develop advanced algorithms and models to address the complex problems that exist in fish classification practice.
Article
Full-text available
The biological investigation of a population’s shape diversity using digital images is typically reliant on geometrical morphometrics, which is an approach based on user-defined landmarks. In contrast to this traditional approach, the progress in deep learning has led to numerous applications ranging from specimen identification to object detection. Typically, these models tend to become black boxes, which limits the usage of recent deep learning models for biological applications. However, the progress in explainable artificial intelligence tries to overcome this limitation. This study compares the explanatory power of unsupervised machine learning models to traditional landmark-based approaches for population structure investigation. We apply convolutional autoencoders as well as Gaussian process latent variable models to two Nile tilapia datasets to investigate the latent structure using consensus clustering. The explanatory factors of the machine learning models were extracted and compared to generalized Procrustes analysis. Hypotheses based on the Bayes factor are formulated to test the unambiguity of population diversity unveiled by the machine learning models. The findings show that it is possible to obtain biologically meaningful results relying on unsupervised machine learning. Furthermore we show that the machine learning models unveil latent structures close to the true population clusters. We found that 80% of the true population clusters relying on the convolutional autoencoder are significantly different to the remaining clusters. Similarly, 60% of the true population clusters relying on the Gaussian process latent variable model are significantly different. We conclude that the machine learning models outperform generalized Procrustes analysis, where 16% of the population cluster was found to be significantly different. However, the applied machine learning models still have limited biological explainability. We recommend further in-depth investigations to unveil the explanatory factors in the used model.
Article
Full-text available
With the rapid emergence of the technology of deep learning (DL), it was successfully used in different fields such as the aquatic product. New opportunities in addition to challenges can be created according to this change for helping data processing in the smart fish farm. This study focuses on deep learning applications and how to support different activities in aquatic like identification of the fish, species classification, feeding decision, behavior analysis, estimation size, and prediction of water quality. Power and performance of computing with the analyzed given data are applied in the proposed DL method within fish farming. Results of the proposed method show the significance of contributions in deep learning and how automatic features are extracted. Still, there is a big challenge of using deep learning in an era of artificial intelligence. Training of the proposed method used a large number of labeled images got from the Fish4Knowledge dataset. The proposed method based on suitable feature extracted from the fish achieved good results in terms of recognition rate and accuracy.
Article
Full-text available
With the availability of low-cost and efficient digital cameras, ecologists can now survey the world’s biodiversity through image sensors, especially in the previously rather inaccessible marine realm. However, the data rapidly accumulates, and ecologists face a data processing bottleneck. While computer vision has long been used as a tool to speed up image processing, it is only since the breakthrough of deep learning (DL) algorithms that the revolution in the automatic assessment of biodiversity by video recording can be considered. However, current applications of DL models to biodiversity monitoring do not consider some universal rules of biodiversity, especially rules on the distribution of species abundance, species rarity and ecosystem openness. Yet, these rules imply three issues for deep learning applications: the imbalance of long-tail datasets biases the training of DL models; scarce data greatly lessens the performances of DL models for classes with few data. Finally, the open-world issue implies that objects that are absent from the training dataset are incorrectly classified in the application dataset. Promising solutions to these issues are discussed, including data augmentation, data generation, cross-entropy modification, few-shot learning and open set recognition. At a time when biodiversity faces the immense challenges of climate change and the Anthropocene defaunation, stronger collaboration between computer scientists and ecologists is urgently needed to unlock the automatic monitoring of biodiversity.
Article
Fish counts and species information can be obtained from images taken within trawls, which enables trawl surveys to operate without extracting fish from their habitat, yields distribution data at fine scale for better interpretation of acoustic results, and can detect fish that are not retained in the catch due to mesh selection. To automate the process of image-based fish detection and identification, we trained a deep learning algorithm (RetinaNet) on images collected from the trawl-mounted Deep Vision camera system. In this study, we focused on the detection of blue whiting, Atlantic herring, Atlantic mackerel, and mesopelagic fishes from images collected in the Norwegian sea. To address the need for large amounts of annotated data to train these models, we used a combination of real and synthetic images, and obtained a mean average precision of 0.845 on a test set of 918 images. Regression models were used to compare predicted fish counts, which were derived from RetinaNet classification of fish in the individual image frames, with catch data collected at 20 trawl stations. We have automatically detected and counted fish from individual images, related these counts to the trawl catches, and discussed how to use this in regular trawl surveys.
Article
Knowledge on the age of fish is vital for assessing the status of fish stocks and proposing management actions to ensure their sustainability. Prevalent methods of fish ageing are based on the readings of otolith images by experts, a process that is often time-consuming and costly. This suggests the need for automatic and cost-effective approaches. Herein, we investigate the feasibility of using deep learning to provide an automatic estimation of fish age from otolith images through a convolutional neural network designed for image analysis. On top of this network, we propose an enhanced-with multitask learning-network to better estimate fish age by introducing as an auxiliary training task the prediction of fish length from otolith images. The proposed approach is applied on a collection of 5027 otolith images of red mullet (Mullus barbatus), considering fish age estimation as a multi-class classification task with six age groups (Age-0, Age-1, Age-2, Age-3, Age-4, Age-5+). Results showed that the network without multitask learning predicted fish age correctly by 64.4 %, attaining high performance for younger age groups (Age-0 and Age-1, F1 score > 0.8) and moderate performance for older age groups (Age-2 to Age-5+, F1 score: 0.50− 0.54). The network with multitask learning increased correctness in age prediction reaching 69.2 % and proved efficient to leverage its predictive performance for older age groups (Age-2 to Age-5+, F1 score: 0.57− 0.64). Our findings suggest that deep learning has the potential to support the automation of fish age reading, though further research is required to build an operational tool useful in routine fish aging protocols for age reading experts.
Article
Full-text available
Machine learning, a subfield of artificial intelligence, offers various methods that can be applied in marine science. It supports data-driven learning, which can result in automated decision making of de novo data. It has significant advantages compared with manual analyses that are labour intensive and require considerable time. Machine learning approaches have great potential to improve the quality and extent of marine research by identifying latent patterns and hidden trends, particularly in large datasets that are intractable using other approaches. New sensor technology supports collection of large amounts of data from the marine environment. The rapidly developing machine learning subfield known as deep learning—which applies algorithms (artificial neural networks) inspired by the structure and function of the brain—is able to solve very complex problems by processing big datasets in a short time, sometimes achieving better performance than human experts. Given the opportunities that machine learning can provide, its integration into marine science and marine resource management is inevitable. The purpose of this themed set of articles is to provide as wide a selection as possible of case studies that demonstrate the applications, utility, and promise of machine learning in marine science. We also provide a forward-look by envisioning a marine science of the future into which machine learning has been fully incorporated.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1]. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/.
Article
Full-text available
There is a need for automatic systems that can reliably detect, track and classify fish and other marine species in underwater videos without human intervention. Conventional computer vision techniques do not perform well in underwater conditions where the background is complex and the shape and textural features of fish are subtle. Data-driven classification models like neural networks require a huge amount of labelled data, otherwise they tend to over-fit to the training data and fail on unseen test data which is not involved in training. We present a state-of-the-art computer vision method for fine-grained fish species classification based on deep learning techniques. A cross-layer pooling algorithm using a pre-trained Convolutional Neural Network as a generalized feature detector is proposed, thus avoiding the need for a large amount of training data. Classification on test data is performed by a SVM on the features computed through the proposed method, resulting in classification accuracy of 94.3% for fish species from typical underwater video imagery captured off the coast of Western Australia. This research advocates that the development of automated classification systems which can identify fish from underwater video imagery is feasible and a cost-effective alternative to manual identification by humans. © International Council for the Exploration of the Sea 2017. All rights reserved.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the fully convolutional network (FCN) architecture and its variants. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. The design of SegNet was primarily motivated by road scene understanding applications. Hence, it is efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than competing architectures and can be trained end-to-end using stochastic gradient descent. We also benchmark the performance of SegNet on Pascal VOC12 salient object segmentation and the recent SUN RGB-D indoor scene understanding challenge. We show that SegNet provides competitive performance although it is significantly smaller than other architectures. We also provide a Caffe implementation of SegNet and a webdemo at http://mi.eng.cam.ac.uk/projects/segnet/
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Conference Paper
Full-text available
Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Remarkably we report better or competitive results compared to the state-of-the-art in all the tasks on various datasets. The results are achieved using a linear SVM classifier applied to a feature representation of size 4096 extracted from a layer in the net. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual classification tasks.
Chapter
Camera-based fish abundance estimation with the aid of visual analysis techniques has drawn increasing attention. Live fish segmentation and recognition in open aquatic habitats, however, suffers from fast light attenuation, ubiquitous noise and non-lateral views of fish. In this chapter, an automatic live fish segmentation and recognition framework for trawl-based cameras is proposed. To mitigate the illumination issues, double local thresholding method is integrated with histogram backprojection to produce an accurate shape of fish segmentation. For recognition, a hierarchical partial classification is learned so that the coarse-to-fine categorization stops at any level where ambiguity exists. Attributes from important fish anatomical parts are focused to generate discriminative feature descriptors. Experiments on mid-water image sets show that the proposed framework achieves up to 93% of accuracy on live fish recognition based on automatic and robust segmentation results.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Conference Paper
Several machine learning models, including neural networks, consistently mis- classify adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed in- put results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to ad- versarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Us- ing this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.
Technical Report
TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
Convolutional networks are at the core of most stateof-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.
Article
Underwater video and digital still cameras are rapidly being adopted by marine scientists and managers as a tool for non-destructively quantifying and measuring the relative abundance, cover and size of marine fauna and flora. Imagery recorded of fish can be time consuming and costly to process and analyze manually. For this reason, there is great interest in automatic classification, counting, and measurement of fish. Uncon-strained underwater scenes are highly variable due to changes in light intensity, changes in fish orientation due to movement, a variety of background habitats which sometimes also move, and most importantly similarity in shape and patterns among fish of different species. This poses a great challenge for image/video processing techniques to accurately differentiate between classes or species of fish to perform automatic classification. We present a machine learning approach, which is suitable for solving this challenge. We demonstrate the use of a convolution neural network model in a hierarchical feature combination setup to learn species-dependent visual features of fish that are unique, yet abstract and robust against environmental and intra-and inter-species variability. This approach avoids the need for explicitly extracting features from raw images of the fish using several fragmented image processing techniques. As a result, we achieve a single and generic trained architecture with favorable performance even for sample images of fish species that have not been used in training. Using the LifeCLEF14 and LifeCLEF15 benchmark fish datasets, we have demonstrated results with a correct classification rate of more than 90%.
Article
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
Article
In this work we present an end-to-end system for text spotting -- localising and recognising text in natural scene images -- and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.
Article
In recent years, deep neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Article
An in-trawl stereo camera system (DeepVision) collected continuous, overlapping, images of organisms ranging from krill and jellyfish to large teleost fishes, including saithe (Pollachius virens) and Atlantic cod (Gadus morhua) infected with parasitic copepods. The four-dimensional position (latitude, longitude, depth, time) of individuals was recorded as they passed the camera, providing a level of within-haul spatial resolution not available with standard trawl sampling. Most species were patchily distributed, both vertically and horizontally, and occasionally individuals were observed at significant vertical and horizontal separation from conspecifics. Acoustically visible layers extending off the continental rise at 250 m depth and greater were verified as primarily blue whiting (Micromesistius poutassou), but also included a small proportion of evenly distributed golden redfish (Sebastes marinus) and greater Argentines (Argentina silus). Small, but statistically significant, differences in length by depth were observed for blue whiting within a single haul. These results demonstrate the technology can greatly increase the amount and detail of information collected with little additional sampling effort.
Article
I n s i t u measurements of fish target strength are selected for use in echo integrator surveys at 38 kHz. The results are expressed through equations in which the mean target strength TS is regressed on the mean fish length l in centimeters. For physoclists, TS=20 log l−67.4, and for clupeoids, TS=20 log l−71.9. These equations are supported by independent measurements on tethered, caged, and freely aggregating fish and by theoretical computations based on the swimbladder form. Causes of data variability are attributed to differences in species, behavior, and, possibly, swimbladder state.
Article
An experiment to verify the basic linearity of fisheries is described. Herring (Clupea harengus L. ) was the subject fish. Acoustic measurements consisted of the echo energy from aggregations of caged but otherwise free-swimming fish, and the target strength functions of similar, anesthetized specimens. Periodic photographic observation of the caged fish allowed characterization of their behavior through associated spatial and orientation distributions. The fish biology and hydrography were also measured. Computations of the echo energy from encaged aggregations agreed well with observation. This success was obtained for each of four independent echo sounders operating at frequencies from 38 to 120 kHz and at power levels from 35 w to nearly 1 kw.
Article
Trials of a computer vision machine (The CatchMeter) for identifying and measuring different species of fish are described. The fish are transported along a conveyor underneath a digital camera. Image processing algorithms: determine the orientation of the fish utilising a moment-invariant method, identify whether the fish is a flatfish or roundfish with 100% accuracy, measure the length with a standard deviation of 1.2 mm and species with up to 99.8% sorting reliability for seven species of fish. The potential application of the system onboard both research and commercial ships is described. The machine can theoretically process up to 30,000 fish/h using a single conveyor based system.
  • D Maclennan
  • E Simmonds
MacLennan, D., and Simmonds, E. 2005. Fisheries Acoustics. Fish and Aquatic Resources Series 10. Chapman & Hall, London.