Figure - available from: Sensors
This content is subject to copyright.
Evolution of classification pipelines (the most recent at the bottom). Off-the-shelf deep residual features have the potential to replace the previous classification pipelines and improve performance for benthic marine image classification tasks. (SIFT: scale invariant feature transform, HOG: histograms of gradient, LBP: local binary patterns, CNN: convolutional neural networks, ResNet: residual networks).

Evolution of classification pipelines (the most recent at the bottom). Off-the-shelf deep residual features have the potential to replace the previous classification pipelines and improve performance for benthic marine image classification tasks. (SIFT: scale invariant feature transform, HOG: histograms of gradient, LBP: local binary patterns, CNN: convolutional neural networks, ResNet: residual networks).

Source publication
Article
Full-text available
Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort...

Similar publications

Preprint
Full-text available
Macroalgae (or seaweeds) are the dominant primary producers in marine vegetated coastal habitats and largely contribute to global ocean carbon fluxes. They also represent attractive renewable production platforms for biofuels, food, feed, and bioactives, notably due to their diverse and peculiar polysaccharides and carbohydrates. Among seaweeds, br...
Preprint
Full-text available
Barren rocky seafloor landscapes, denuded of almost all life by ravenous sea urchins, liberated from their predators, stands as one of the iconic images of trophic cascades in Ecology. While this paradigm has been cited in nearly every temperate rocky reef ecosystem across the globe, there is widespread disagreement as to its generality. Given thei...
Article
Full-text available
Kelp forests are highly productive foundation species along much of the world’s coastline. As a result, kelp are crucial to the ecological, social, and economic well-being of coastal communities. Yet, due to a combination of acute and chronic stressors, kelp forests are under threat and have declined in many locations worldwide. Active restoration...
Article
Full-text available
With the increasing interest and activities regarding seaweed cultivation in Europe, it is becoming crucial to utilize the opportunities that lies within environmental resources, such as light and nutrients, to produce biomass of a high yield and quality. The chemical composition of seaweed varies between seasons and depths as an effect of resource...
Preprint
Full-text available
Background: The red sea urchin Mesocentrotus franciscanus is an ecologically important kelp forest herbivore and an economically valuable wild fishery species. To examine of how M. franciscanus responds to its environment on a molecular level, differences in gene expression patterns were observed in embryos raised under combinations of two temperat...

Citations

... This section discusses approaches for multi-species segmentation of seagrass imagery. Although there has been research on multi-class patch classification of macroalgae (Mahmood et al, 2020b), the first work on multi-species identification of seagrass with deep learning is Raine et al (2020). Since publication of this approach and contribution of the DeepSeagrass dataset (further detail in Section 3.3), there have been further works on the multi-species seagrass problem (Noman et al, 2021a;Mehrubeoglu et al, 2021;Balado et al, 2021;Ozaeta et al, 2023;Paul et al, 2023), as outlined in the following paragraphs. ...
Preprint
Full-text available
Underwater surveys provide long-term data for informing management strategies, monitoring coral reef health, and estimating blue carbon stocks. Advances in broad-scale survey methods, such as robotic underwater vehicles, have increased the range of marine surveys but generate large volumes of imagery requiring analysis. Computer vision methods such as semantic segmentation aid automated image analysis, but typically rely on fully supervised training with extensive labelled data. While ground truth label masks for tasks like street scene segmentation can be quickly and affordably generated by non-experts through crowdsourcing services like Amazon Mechanical Turk, ecology presents greater challenges. The complexity of underwater images, coupled with the specialist expertise needed to accurately identify species at the pixel level, makes this process costly, time-consuming, and heavily dependent on domain experts. In recent years, some works have performed automated analysis of underwater imagery, and a smaller number of studies have focused on weakly supervised approaches which aim to reduce the expert-provided labelled data required. This survey focuses on approaches which reduce dependency on human expert input, while reviewing the prior and related approaches to position these works in the wider field of underwater perception. Further, we offer an overview of coastal ecosystems and the challenges of underwater imagery. We provide background on weakly and self-supervised deep learning and integrate these elements into a taxonomy that centres on the intersection of underwater monitoring, computer vision, and deep learning, while motivating approaches for weakly supervised deep learning with reduced dependency on domain expert data annotations. Lastly, the survey examines available datasets and platforms, and identifies gaps, barriers, and opportunities for automating underwater surveys.
... HC naturally arises in many applications of interest: Lewis et al. (2004); Peng et al. (2018); Rousu et al. (2006) applied it to text categorization, Deng et al. (2009) ;Mendonça et al. (2021); Taoufiq et al. (2020) to image recognition and segmentation, Murtaza et al. (2020); Xie and Xing (2018) to medicine, Dimitrovski et al. (2012); Mahmood et al. (2020) biology, Nakano et al. (2018); Sandaruwan and Wannige (2021); Vens et al. (2008) to functional genomics and protein classification, to mention a few. Indeed, the introduction of the hierarchical information during the training process proved to be an effective technique to obtain classifiers manifesting better performance, able to learn from less data, and with stronger guarantees of compliance with the background knowledge itself (Giunchiglia et al., 2022). ...
Preprint
Full-text available
This work proposes a novel approach to the deep hierarchical classification task, i.e., the problem of classifying data according to multiple labels organized in a rigid parent-child structure. It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer. The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields: lexicographic multi-objective optimization, non-standard analysis, and deep learning. To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks, on the CIFAR10, CIFAR100 (where it has been originally and recently proposed before being adopted and tuned for multiple real-world applications) and Fashion-MNIST benchmarks. Evidence states that an LH-DNN can achieve comparable if not superior performance, especially in the learning of the hierarchical relations, in the face of a drastic reduction of the learning parameters, training epochs, and computational time, without the need for ad-hoc loss functions weighting values.
... O algoritmo para a implementação da rede neural foi desenvolvido utilizando a linguagem de programação Python, com o apoio da biblioteca de aprendizado de máquina PyTorch, a fim de que o modelo de treinamento fosse implementado para o problema de classificação. Adotou-se a ResNet-50, um modelo residual pré-treinado em um banco de dados de imagem e com parâmetros pré-definidos (Mahmood et al., 2020). A principal motivação para o uso da ResNet-50 na atividade de classificação reside em sua baixa complexidade e no número reduzido de operações de processamento necessárias para o aprendizado de representações de dados visuais mais profundas e precisas, no que resulta em menor custo computacional (He et al., 2016). ...
... Representação esquemática da arquitetura da ResNet-50(Mahmood et al., 2020). ...
Conference Paper
Full-text available
A termografia infravermelha consiste em uma técnica de ensaios não destrutivos, desempenhando um importante papel na manutenção industrial e na detecção de defeitos estruturais, tais como corrosão, trincas e delaminações. Entretanto, um desafio importante corresponde à quantificação desses defeitos, assim como a capacidade de interpretação das imagens termográficas. Este trabalho tem como objetivo desenvolver um algoritmo de predição, com base no modelo de rede neural convolucional ResNet-50, como abordagem de visão computacional, contribuindo para superar essas limitações e, a partir de soluções numéricas de transferência de calor desenvolvidas no software comercial COMSOL Multiphysics, fornecer um conjunto de dados de treinamento para que a rede neural fosse avaliada em um ambiente determinístico. Esses dados compreendem um total de 320 termogramas obtidos via simulações auxiliadas por um algoritmo desenvolvido para a criação de modelos de polímero reforçado com fibras de carbono com base nos parâmetros dos defeitos, incorporando furos de fundo plano em diferentes posições, profundidades e diâmetros, além da descaracterização dos dados simulados pela introdução de ruído, produzindo condições reais de ensaio termográfico. Um fluxo de calor foi empregado na superfície oposta ao defeito, com intensidade de 600 W/m2 em um intervalo de 10 s, levando em conta as condições de contorno relativas a um contexto de ensaio termográfico, no método de reflexão e em regime transiente. O contraste térmico destacou as regiões saudável e defeituosa, e permitiu aferir o comportamento de difusão térmica causada pela geometria. O conjunto de dados obtido foi segmentado em classes correspondentes a cada nível de defeito: leve, moderado e severo, onde 288 termogramas foram dedicados ao treinamento do modelo ResNet-50, correspondendo a 90% do total, enquanto 32 instâncias foram utilizadas com o objetivo de verificar o seu desempenho. A rede neural treinada obteve uma acurácia de 94,17% no conjunto de treinamento e 90,62% no conjunto de teste. Portanto, a aplicação do algoritmo de predição baseado em rede neural convolucional, aliado à geração de dados de treinamento via simulações numéricas, representa um avanço significativo na quantificação e interpretação de defeitos utilizando a abordagem de termografia infravermelha, promovendo maior precisão e confiabilidade na detecção de anomalias estruturais.
... Architecture of CNN Fig. 5. Architecture of ResNet50 Pre-trained Network[14] ...
Article
Lip reading, the process of interpreting speech by visually observing the movements of the lips, has emerged as a critical area of research with applications spanning communication aids for the hearing impaired, silent speech interfaces, and enhanced human-computer interaction. This paper reviews recent advancements in lip reading technologies, focusing on the integration of machine learning and computer vision techniques. We explore state-of-the-art methods including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer-based models that have significantly improved the accuracy and robustness of lip reading systems. The study highlights the importance of large annotated datasets, such as LipNet and LRW, which have facilitated the training of deep learning models. Additionally, we examine multimodal approaches that combine visual information with audio signals to enhance performance, especially in noisy environments. Despite substantial progress, challenges remain in addressing speaker variability, low resolution, and real-time processing. Future research directions are discussed, emphasizing the need for more diverse datasets, improved model generalization, and real-world application testing. This comprehensive review underscores the potential of advanced lip reading technologies to revolutionize communication accessibility and human-computer interaction. This paper presents the method for Vision based Lip Reading system that uses convolutional neural network (CNN) with attention-based Long Short-Term Memory (LSTM). The dataset includes video clips pronouncing words sentence. The pretrained CNN is used for extracting features from pre- processed video frames which then are processed for learning temporal characteristics by LSTM. The SoftMax layer of architecture provides the result of lip reading. In the present work experiments are performed with two pre-trained models. The system provides 80% accuracy using Tensorflow and ensemble learning. Keywords— CNN; RNN; LSTM; Tensorflow; lip reading; deep learning
... Residual neural networks (ResNet) are deep convolutional network that introduced in 2015 by Microsoft for visual recognition tasks and win the Large-Scale Visual Recognition Challenge (ILSVRC 2015) in image detection, classification, and localization which shows outstanding performance on the ImageNet dataset [26]. The basic idea of ResNet is to skip blocks of convolutional layers by applying shortcut connections. ...
Article
Full-text available
Chickenpox caused by the varicella-zoster virus is an extremely contagious viral infection common in children and quickly develops into a severe problem. Over 90% of unvaccinated people have been infected. Still, infection occurs at different ages in different parts of the world- Over 70 % of people become infected by the age of 10 years in the United States, the United Kingdom, and Japan, and by the age of 20 in India, West Indies, and South East Asia. Automatic classification of the specific disease is a challenging task to present clinicians to distinguish between different kinds of skin conditions and recommend suitable treatment. Convolutional Neural Networks have recently achieved great success in many machines learning purposes and have presented a state-of-the-art performance in various computer-assisted diagnosis applications. This study proposes a deep neural network-based method that follows an ensemble approach by combining VGG-16, VGG-19, and ResNet-50 architectures to distinguish chickenpox from other skin conditions. Experimental test results have achieved accurate classification with an assuring test accuracy up to 93%.
... However, there is currently less literature employing them at the habitat level (Jackett et al., 2023;Mahmood et al., 2020a;Mahmood et al., 2020b;Mohamed et al., 2020;Rao et al., 2017;Vega et al., 2024;Yamada et al., 2022;Yamada et al., 2023) with many focusing on the same benchmark datasets, Benthoz15 (Bewley et al., 2015) and Tasmania , in shallow waters. Shallow-water corals are a particularly common benthic target for CNN classification (Beijbom et al., 2016;Gómez-Ríos et al., 2019a;Gómez-Ríos et al., 2019b;Mahmood et al., 2016;Mahmood et al., 2019), largely thanks to the availability of open-source datasets such as MLC (Beijbom et al., 2012), EILAT and RSMAS. ...
... There are a range of classifiers that could be used for this task such as Random Forests (Breiman, 2001), K-Nearest Neighbour (KNN), Logistic Regression and Naive Bayes, however in this work we focus on Support Vector Machines (SVM) (Cortes and Vapnik, 1995;Cristianini and Shawe-Taylor, 2000). SVMs have been shown to pair well with 'off-the-shelf' CNN features in benthic image applications (Mahmood et al., 2016;Mahmood et al., 2019;Mahmood et al., 2020a;Mahmood et al., 2020b;Mohamed et al., 2020), as well as more generally (Azizpour et al., 2015;Razavian et al., 2014;Salman et al., 2016). They are a classical method, well documented and offer a good trade-off in terms of complexity, performance, computational demand and time. ...
... VGG is one of the most implemented algorithms for image classification and although several years old remains highly popular, with high performance across diverse image applications applications (Abosaq et al., 2023;Althubiti et al., 2022;Kaur and Gandhi, 2019;Krishnaswamy Rangarajan and Purushothaman, 2020;Yang et al., 2021a;Yang et al., 2021b). More importantly, it also has a record of good performance across marine classification tasks (González-Rivero et al., 2020;Kloster et al., 2020;Lumini and Nanni, 2019;Mahmood et al., 2019;Mahmood et al., 2020a;Zhang et al., 2019). Preliminary comparisons of VGG16 to other models (AlexNet (Krizhevsky et al., 2012), ResNet18 & ResNet50 (He et al., 2016) and VGG19 (Simonyan and Zisserman, 2015)) found VGG16 to produce deep features that were more accurately classified by an SVM. ...
Article
Full-text available
Automating identification of benthic habitats from imagery, with Machine Learning (ML), is necessary to contribute efficiently and effectively to marine spatial planning. A promising method is to adapt pre-trained general convolutional neural networks (CNNs) to a new classification task (transfer learning). However, this is often inaccessible to a non-specialist, requiring large investments in computational resources and time (for user comprehension and model training). In this paper, we demonstrate a simpler transfer learning framework for classifying broad deep-sea benthic habitats. Specifically, we take an ‘off-the-shelf’ CNN (VGG16) and use it to extract features (pixel patterns) from benthic images (without further training). The default outputs of VGG16 are then fed in to a Support Vector Machine (SVM), a classical and simpler method than deep networks. For comparison, we also train the remaining classification layers of VGG16 using stochastic gradient descent. The discriminative power of these approaches is demonstrated on three benthic datasets (574–8353 images) from Norwegian waters; each using a unique imaging platform. Benthic habitats are broadly classified as Soft Substrate (sands, muds), Hard Substrate (gravels, cobbles and boulders) and Reef (Desmophyllum pertusum). We found that the relatively simplicity of the SVM classifier did not compromise performance. Results were competitive with the CNN classifier and consistently high, with test accuracy ranging from 0.87 to 0.95 (average = 0.9 (+/- 0.04)) across datasets, somewhat increasing with dataset size. Impressively, these results were achieved 2.4–5× faster than CNN training and had significantly less dependency on high-specification hardware. Our suggested approach maximises conceptual and practical simplicity, representing a realistic baseline for novice users when approaching benthic habitat classification. This method has wide potential. It allows automated image grouping to aid annotation or further model selection, as well as screening of old-datasets. It is especially suited to offshore scenarios as it can provide quick, albeit crude, insights into habitat presence, allowing adaptation of sampling protocols in near real-time.
... learned feature representations extracted from deep residual networks. This method outperforms the traditional classification method (Mahmood et al., 2020) and allows for the identification of new species with greater ease and precision. ...
... The backbone operates by producing high-resolution feature maps, effectively capturing hierarchical features from the input image. Subsequently, these fixed-size feature maps are inputted into fully connected layers to produce SoftMax scores for each class and revised bounding boxes for region proposals [22]. ...
... The architecture of ResNet-50[22]. ...
Article
Full-text available
As an inherent human characteristic, ethnicity plays a fundamental and critical role in biometric identification. On the other hand, the human face is the core of man’s identity, and facts such as age and race are often extrapolated automatically from the face. The objective is to utilize computer technologies to identify and categorize ethnic groups based on facial features. Convolutional neural networks (CNN), which can automatically identify underlying patterns from data, excel at learning image features and have shown state-of-the-art performance in several visual recognition challenges, such as ethnicity detection. Although the automated classification of traits such as age, gender, and ethnicity is a well-researched topic, Iraqi ethnic groupings have not yet been addressed. This study seeks to tackle the challenge of predicting the ethnicity of Iraqi male individuals based on their facial traits for the two largest ethnic groups, the Arabs, and the Kurds. Male Iraqi Kurds and Arabs were each represented by 260 image samples. The dataset underwent a diverse array of preprocessing and data enhancement techniques, including image resizing, isolation, gamma correction, and contrast stretching. Moreover, to augment the dataset and expand its diversity, various techniques such as brightness adjustment, rotation, horizontal flip, and grayscale augmentations were systematically applied, effectively increasing the overall number of images, and enriching the dataset for improved model performance. Face images of Kurds and Arabs were classified using the Faster region-based CNN (RCNN) approach of deep learning. Due to insufficient data in the dataset, we propose employing transfer learning to extract features using several pre-trained models. Specifically, we examined EfficientNetB4, ResNet-50, SqueezeNet, VGG16, and MobileNetV2, resulting in accuracies of 96.73%, 94.91%, 93.39%, 92.48%, and 90.32%, accompanied by corresponding precision values of 0.86, 0.81, 0.80, 0.70, and 0.69, respectively. It is essential to emphasize that the following inference speeds – VGG16 (4.5 ms), ResNet-50 (4.6 ms), SqueezeNet (3.8 ms), MobileNetV2 (3.7 ms), and EfficientNet-B4 (16 ms) – represent the computing times needed for each backbone. Moreover, to achieve a harmonious trade-off between precision and the time required for inference, we chose ResNet-50 as the foundational framework for our model aimed at classifying ethnicity. The study also acknowledges limitations such as the availability and diversity of the dataset. Nevertheless, despite these limitations, it provides valuable perspectives on the automated prediction of Iraqi male ethnicity through facial features, presenting potential applications in various domains. The findings contribute to the broader conversation surrounding biometric identification and ethnic categorization, underscoring the importance of ongoing research and heightened awareness of the inherent limitations associated with such studies.
... In marine applications, many studies have demonstrated the possibility of using CNNs to classify benthic taxa or substrate in optical imagery [23,24,25,26,27,28,29,30,31,32]. However, there is currently less literature employing them at the habitat level [33,34,35,36,37,38] with many focusing on the same benchmark datasets, Benthoz15 [39] and Tasmania [40], in shallow waters. Shallowwater corals are a particularly common benthic target for CNN classification [41,42,43,44,45], largely thanks to the availability of open-source datasets such as MLC [46], EILAT and RSMAS. ...
... We then train a shallow (non deep-net) ML classifier, a Support Vector Machine (SVM) [52,53] to classify these deep features. SVMs have been shown to pair well with 'off-theshelf' CNN features in benthic image applications [42,45,34,35,36], as well as more generally [54,55,56]. They are a classical method, well documented and offer a good trade-off in terms of complexity, performance, computational demand and time. ...
... For our CNN, we use VGG16 [18]; one of the most implemented algorithms for image classification. Although several years old, VGG16 remains popular with high performance across diverse image applications [57,58,59,60,61,62], including marine classification tasks [63,64,65,66,45,34]. This VGG16 is pre-trained on a large and unrelated dataset, ImageNet [67]. ...
... The underwater imagery represents high-resolution groundtruthing data characterising the seafloor from which biological information can be extracted. The manual annotation of enormous underwater image datasets is tedious, error-prone, and time-consuming (Mahmood et al., 2020). However, the annotation of some of these images can be used to train machine learning (ML) algorithms to build full-coverage maps representing the composition of the seafloor (Brown et al., 2011). ...
Article
Full-text available
Benthic habitats mapping is essential to the management and conservation of marine ecosystems. The traditional methods of mapping benthic habitats, which involve multibeam data acquisition and manually collecting and annotating imagery data, are time-consuming. However, with technological advances, using machine learning (ML) algorithms with structure-from-motion (SfM) photogrammetry has become a promising approach for mapping benthic habitats accurately and at very high resolutions. This paper explores using SfM photogrammetry and extreme gradient boosting (XGBoost) classifier for benthic habitat 3D mapping of a vertical wall at the Charlie-Gibbs Fracture Zone in the North Atlantic Ocean. The classification workflow started with extracting frames from video footage. The SfM was then applied to reconstruct the 3D point cloud of the wall. Thereafter, nine geometric features were derived from the 3D point cloud geometry. The XGBoost classifier was then used to classify the vertical wall into rock, sponges, and corals (Case 1-three classes). In addition, we separated the sponges class into three types of sponges: Demospongiae, Hexactinellida, and other Porifera (Case 2-five classes). Moreover, we compared the results from XGBoost with the widely used ML classifier, random forest (RF). For Case 2, XGBoost achieved an overall accuracy (OA) of 74.45%, while RF achieved 73.10%. The OA improved by about 10% from both classifiers when the three types of sponges were combined into one class (Case 1). Results showed that the presented 3D mapping of benthic habitat has the potential to provide more detailed and accurate information about marine ecosystems.