FIGURE 1 - uploaded by Luigi Celona
Content may be subject to copyright.
Ball chart reporting the Top-1 and Top-5 accuracy vs. computational complexity. Top-1 and Top-5 accuracy using only the center crop versus floating-point operations (FLOPs) required for a single forward pass are reported. The size of each ball corresponds to the model complexity. (a) Top-1; (b) Top-5.

Ball chart reporting the Top-1 and Top-5 accuracy vs. computational complexity. Top-1 and Top-5 accuracy using only the center crop versus floating-point operations (FLOPs) required for a single forward pass are reported. The size of each ball corresponds to the model complexity. (a) Top-1; (b) Top-5.

Source publication
Article
Full-text available
This work presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition. For each DNN multiple performance indices are observed, such as recognition accuracy, model complexity, computational complexity, memory usage, and inference time. The behavior of such performance indice...

Contexts in source publication

Context 1
... key findings of this paper are the following: -the recognition accuracy does not increase as the num- ber of operations increases: in fact, there are some archi- tectures that with a relatively low number of operations, such as the SE-ResNeXt-50 (32x4d), achieve very high accuracy (see Figures 1a and b). This finding is indepen- dent on the computer architecture experimented; -there is not a linear relationship between model com- plexity and accuracy (see Figures 1a and b); -not all the DNN models use their parameters with the same level of efficiency (see Figures 2a and b); -the desired throughput (expressed for example as the number of inferences per second) places an upper bound to the achievable accuracy (see Figures 3a and b); -model complexity can be used to reliably estimate the total memory utilization (see Figure 4); -almost all models are capable of real-time or super real- time performance on a high-end GPU, while just a few of them can guarantee them on an embedded system (see Tables 1a and b); -even DNNs with a very low level model complexity have a minimum GPU memory footprint of about 0.6GB (see Table 2). ...
Context 2
... key findings of this paper are the following: -the recognition accuracy does not increase as the num- ber of operations increases: in fact, there are some archi- tectures that with a relatively low number of operations, such as the SE-ResNeXt-50 (32x4d), achieve very high accuracy (see Figures 1a and b). This finding is indepen- dent on the computer architecture experimented; -there is not a linear relationship between model com- plexity and accuracy (see Figures 1a and b); -not all the DNN models use their parameters with the same level of efficiency (see Figures 2a and b); -the desired throughput (expressed for example as the number of inferences per second) places an upper bound to the achievable accuracy (see Figures 3a and b); -model complexity can be used to reliably estimate the total memory utilization (see Figure 4); -almost all models are capable of real-time or super real- time performance on a high-end GPU, while just a few of them can guarantee them on an embedded system (see Tables 1a and b); -even DNNs with a very low level model complexity have a minimum GPU memory footprint of about 0.6GB (see Table 2). All the DNNs considered, as well as the software used for the analysis, are available online [7]. ...
Context 3
... key findings of this paper are the following: -the recognition accuracy does not increase as the num- ber of operations increases: in fact, there are some archi- tectures that with a relatively low number of operations, such as the SE-ResNeXt-50 (32x4d), achieve very high accuracy (see Figures 1a and b). This finding is indepen- dent on the computer architecture experimented; -there is not a linear relationship between model com- plexity and accuracy (see Figures 1a and b); -not all the DNN models use their parameters with the same level of efficiency (see Figures 2a and b); -the desired throughput (expressed for example as the number of inferences per second) places an upper bound to the achievable accuracy (see Figures 3a and b); -model complexity can be used to reliably estimate the total memory utilization (see Figure 4); -almost all models are capable of real-time or super real- time performance on a high-end GPU, while just a few of them can guarantee them on an embedded system (see Tables 1a and b); -even DNNs with a very low level model complexity have a minimum GPU memory footprint of about 0.6GB (see Table 2). ...
Context 4
... key findings of this paper are the following: -the recognition accuracy does not increase as the num- ber of operations increases: in fact, there are some archi- tectures that with a relatively low number of operations, such as the SE-ResNeXt-50 (32x4d), achieve very high accuracy (see Figures 1a and b). This finding is indepen- dent on the computer architecture experimented; -there is not a linear relationship between model com- plexity and accuracy (see Figures 1a and b); -not all the DNN models use their parameters with the same level of efficiency (see Figures 2a and b); -the desired throughput (expressed for example as the number of inferences per second) places an upper bound to the achievable accuracy (see Figures 3a and b); -model complexity can be used to reliably estimate the total memory utilization (see Figure 4); -almost all models are capable of real-time or super real- time performance on a high-end GPU, while just a few of them can guarantee them on an embedded system (see Tables 1a and b); -even DNNs with a very low level model complexity have a minimum GPU memory footprint of about 0.6GB (see Table 2). All the DNNs considered, as well as the software used for the analysis, are available online [7]. ...

Similar publications

Article
Full-text available
This paper presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition. For each DNN, multiple performance indices are observed, such as recognition accuracy, model complexity, computational complexity, memory usage, and inference time. The behavior of such performance indi...
Article
Full-text available
In this paper, a novel parallel indirect visual odometry (VO) system is proposed based on a newly designed map management method, key-frame selection, and a camera pose correction model, where the speeded-up robust features (SURF) algorithm is used to extract features from an image, and a linear exhaustive search (LES) algorithm is introduced to ma...
Article
Full-text available
Embedded systems operational environment poses tightened and usually conflicted design requirements. Software architects aim at introducing effective tradeoff methods to select the most appropriate design solutions to comply with the software specifications of an embedded system. When defining the software architecture for critical embedded systems...
Article
Full-text available
Building a trustworthy life-critical embedded system requires deep reasoning about the potential effects that sequences of machine instructions can have on full system operation. Rather than trying to analyze complete binaries and the countless ways their instructions can interact with one another — memory, side effects, control registers, implicit...
Article
Full-text available
With the numerous IoT devices, the cloud-centric data processing fails to meet the requirement of all IoT applications. The limited computation and communication capacity of the cloud necessitate the Edge Computing, i.e., starting the IoT data processing at the edge and transforming the connected devices to intelligent devices. Machine learning, th...

Citations

... However, that network is significantly larger (about 15 M parameters) and makes an embedded implementation more difficult. Finally, the MobileNet V2 accuracy density -the accuracy divided by the number of parameters -is an order of magnitude higher than that of VGG16 (Bianco et al., 2018). ...
Article
In this work, we evaluate the energy usage of fully embedded medical diagnosis aids based on both segmentation and classification of medical images implemented on Edge TPU and embedded GPU processors. We use glaucoma diagnosis based on color fundus images as an example to show the possibility of performing segmentation and classification in real time on embedded boards and to highlight the different energy requirements of the studied implementations. Several other works develop the use of segmentation and feature extraction techniques to detect glaucoma, among many other pathologies, with deep neural networks. Memory limitations and low processing capabilities of embedded accelerated systems (EAS) limit their use for deep network-based system training. However, including specific acceleration hardware, such as NVIDIA's Maxwell GPU or Google's Edge TPU, enables them to perform inferences using complex pre-trained networks in very reasonable times. In this study, we evaluate the timing and energy performance of two EAS equipped with Machine Learning (ML) accelerators executing an example diagnostic tool developed in a previous work. For optic disc (OD) and cup (OC) segmentation, the obtained prediction times per image are under 29 and 43 ms using Edge TPUs and Maxwell GPUs, respectively. Prediction times for the classification subsystem are lower than 10 and 14 ms for Edge TPUs and Maxwell GPUs, respectively. Regarding energy usage, in approximate terms, for OD segmentation Edge TPUs and Maxwell GPUs use 38 and 190 mJ per image, respectively. For fundus classification, Edge TPUs and Maxwell GPUs use 45 and 70 mJ, respectively.
... Powerful edge computing platforms, such as NVIDIA Tegra can run SOTA detection networks at high frame rates with a power budget of several tens of watts [21], [45], [46]. Further, the memory requirement to run such inferences are of megabytes for running at sensor rate [47], [48]. ...
Preprint
Full-text available
Smart glasses are rapidly gaining advanced functionality thanks to cutting-edge computing technologies, accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. Integrating AI into smart glasses featuring a small form factor and limited battery capacity is still challenging when targeting full-day usage for a satisfactory user experience. This paper illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors to enable prolonged continuous operation in smart glasses. We explore the energy-and latency-efficient of smart glasses in the case of real-time object detection. To this goal, we designed a smart glasses prototype as a research platform featuring two microcontrollers, including a novel milliwatt-power RISC-V parallel processor with a hardware accelerator for visual AI, and a Bluetooth low-power module for communication. The smart glasses integrate power cycling mechanisms, including image and audio sensing interfaces. Furthermore, we developed a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-based inference dubbed TinyissimoYOLO v1.3, v5, and v8, aiming at benchmarking object detection with smart glasses for energy and latency. Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO's 17ms inference latency and 1.59mJ energy consumption per inference while ensuring acceptable detection accuracy. Further evaluation reveals an end-to-end latency from image capturing to the algorithm's prediction of 56ms or equivalently 18 frames per seconds (FPS), with a total power consumption of 62.9mW, equivalent to a 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS per second.
... Building on these recent advances, here we develop an improved approach to root segmentation based on DL and specifically U-Net architecture: a U-Net backboned with either EfficientNet [40] or with SE-ResNeXt-101 (32 × 4d) [41] as encoders. Both architectures have already shown impressive results in image recognition tasks [40,42,43]. A backboned model is a neural network architecture [either Efficient-Net or SE-ResNeXt-101 (32 × 4d)] that serves as the main feature extractor in a much larger architecture (U-Net). ...
... SegRoot for instance is a modification of SegNet [38]. The SE-ResNeXt-101 (32 × 4d) architecture, which acts as an encoder here, has previously demonstrated high accuracies with a relatively small number of operations in image recognition tasks [42]. Alternatively, EfficientNet is a more recent architecture for image recognition that has also shown remarkably high accuracies with "fewer" parameters (43 million, Additional file 3) compared to architectures such as SENet (146 million parameters) [40]. ...
Article
Full-text available
Manual analysis of (mini-)rhizotron (MR) images is tedious. Several methods have been proposed for semantic root segmentation based on homogeneous, single-source MR datasets. Recent advances in deep learning (DL) have enabled automated feature extraction, but comparisons of segmentation accuracy, false positives and transferability are virtually lacking. Here we compare six state-of-the-art methods and propose two improved DL models for semantic root segmentation using a large MR dataset with and without augmented data. We determine the performance of the methods on a homogeneous maize dataset, and a mixed dataset of > 8 species (mixtures), 6 soil types and 4 imaging systems. The generalisation potential of the derived DL models is determined on a distinct, unseen dataset. The best performance was achieved by the U-Net models; the more complex the encoder the better the accuracy and generalisation of the model. The heterogeneous mixed MR dataset was a particularly challenging for the non-U-Net techniques. Data augmentation enhanced model performance. We demonstrated the improved performance of deep meta-architectures and feature extractors, and a reduction in the number of false positives. Although correction factors are still required to match human labelled root lengths, neural network architectures greatly reduce the time required to compute the root length. The more complex architectures illustrate how future improvements in root segmentation within MR images can be achieved, particularly reaching higher segmentation accuracies and model generalisation when analysing real-world datasets with artefacts—limiting the need for model retraining.
... The network architecture and the convolutional neural network were chosen from among the possible networks that were officially supported by the MATLAB software and could operate on this hardware. The important characteristics of networks are speed and accuracy; therefore, we chose networks with high processing speed and precision, according to previous reports [39,40]. In the hyperparameter settings, the batch size was the maximum that the hardware could work with and was in line with the values generally used in deep learning, and the number of epochs was set such that the value of the loss function nearly converged. ...
Article
Full-text available
Purpose: To build an image recognition network to evaluate tongue coating status. Methods: Two image recognition networks were built: one for tongue detection and another for tongue coating classification. Digital tongue photographs were used to develop both networks; images from 251 (178 women, 74.7±6.6 years) and 144 older adults (83 women, 73.8±7.3 years) who volunteered to participate were used for the tongue detection network and coating classification network, respectively. The learning objective of the tongue detection network is to extract a rectangular region that includes the tongue. You-Only-Look-Once (YOLO) v2 was used as the detection network, and transfer learning was performed using ResNet-50. The accuracy was evaluated by calculating the intersection over the union. For tongue coating classification, the rectangular area including the tongue was divided into a grid of 7×7. Five experienced panelists scored the tongue coating in each area using one of five grades, and the tongue coating index (TCI) was calculated. Transfer learning for tongue coating grades was performed using ResNet-18, and the TCI was calculated. Agreement between the panelists and network for the tongue coating grades in each area and TCI was evaluated using the kappa coefficient and intraclass correlation, respectively. Results: The tongue detection network recognized the tongue with a high intersection over union (0.885±0.081). The tongue coating classification network showed high agreement with tongue coating grades and TCI, with a kappa coefficient of 0.826 and an intraclass correlation coefficient of 0.807, respectively. Conclusions: Image recognition enables simple and detailed assessment of tongue coating status.
... Furthermore, to describe the lightweight degree of our model, we applied parameters such as weights, GFLOPs, and the number of parameters, which provide strong evidence on a range of model performances. Additionally, we introduced another performance measure called precision density [43]. It is defined as the precision divided by the parameters. ...
Article
Full-text available
Nowadays, the commercial potential of live e-commerce is being continuously explored, and machine vision algorithms are gradually attracting the attention of marketers and researchers. During live streaming, the visuals can be effectively captured by algorithms, thereby providing additional data support. This paper aims to consider the diversity of live streaming devices and proposes an extremely lightweight and high-precision model to meet different requirements in live streaming scenarios. Building upon yolov5s, we incorporate the MobileNetV3 module and the CA attention mechanism to optimize the model. Furthermore, we construct a multi-object dataset specific to live streaming scenarios, including anchor facial expressions and commodities. A series of experiments have demonstrated that our model realized a 0.4% improvement in accuracy compared to the original model, while reducing its weight to 10.52%.
... In the past decade, neural networks have demonstrated promising performance in many domains, such as medical imaging Liu et al. [2023], Xing et al. [2022], , physics and astronomy , , Su et al. [2020], remote sensing Zhang et al. [2023], Workman et al. [2022], natural language processing Liu et al. [2019], Liang et al. [2023], and multi-modality integration Radford et al. [2021], . The success of modern neural networks has been greatly facilitated by the tremendous computational power available today, and the vast amounts of data collected over the years Bianco et al. [2018]. When compared to traditional machine learning algorithms, deep learning algorithms exhibit significant performance improvements, particularly when datasets are large Mahapatra [2018], primarily due to the large capacity of modern neural networks. ...
Article
Full-text available
Deep neural networks have shown remarkable performance on a wide range of classification tasks and applications. However, the large model size and the enormous size of the training dataset make the training process slow and often limited by the computing resources. To overcome this limitation, distributed training can be used to accelerate the process by utilizing multiple devices for a single model training. In this work, we evaluate the performance of Microsoft DeepSpeed, a distributed training library, on image classification tasks by comparing the performance of 108 trained neural networks in 27 unique settings. Our experimental results suggest that DeepSpeed may provide limited benefits for simpler learning tasks (e.g. smaller neural network models or simpler datasets). On the other hand, for more complex learning tasks, DeepSpeed can provide up to 8× faster training with possible performance improvement. Our study contributes to a better understanding of the capabilities and limitations of the DeepSpeed library, providing insights into when and where it may be most beneficial to use in image classification settings.
... However, for practical applications in medication classification during pharmacy operations, a fast inference time is required to meet real-time demands. Previous research shows that the AlexNet architecture is known for its simplicity and efficiency [29]. Therefore, we design our two-stage deep learning method based on the AlexNet structure, which should be more readily applicable to practical use. ...
Article
Full-text available
Dispensing errors play a crucial role in various medical errors, unfortunately emerging as the third leading cause of death in the United States. This alarming statistic has spurred the World Health Organization (WHO) into action, leading to the initiation of the Medication Without Harm Campaign. The primary objective of this campaign is to prevent dispensing errors from occurring and ensure patient safety. Due to the rapid development of deep learning technology, there has been a significant increase in the development of automatic dispensing systems based on deep learning classification to avoid dispensing errors. However, most previous studies have focused on developing deep learning classification systems for unpackaged pills or drugs with the same type of packaging. However, in the actual dispensing process, thousands of similar drugs with diverse packaging within a healthcare facility greatly increase the risk of dispensing errors. In this study, we proposed a novel two-stage induced deep learning (TSIDL)-based system to classify similar drugs with diverse packaging efficiently. The results demonstrate that the proposed TSIDL method outperforms state-of-the-art CNN models in all classification metrics. It achieved a state-of-the-art classification accuracy of 99.39%. Moreover, this study also demonstrated that the TSIDL method achieved an inference time of only 3.12 ms per image. These results highlight the potential of real-time classification for similar drugs with diverse packaging and their applications in future dispensing systems, which can prevent dispensing errors from occurring and ensure patient safety efficiently.
... The input data are received by neurons in the input layer and transferred to the different layers in the network. The hidden layers are located between the input and output layers; the number of hidden layers and the number of neurons are determined by means of empirical testing or trial and error [127]. In order to predict an accurate output based on the original input, the weights of the DNN are constantly updated during training. ...
Article
Full-text available
Distribution grids must be regularly updated to meet the global electricity demand. Some of these updates result in fundamental changes to the structure of the grid network. Some recent changes include two-way communication infrastructure, the rapid development of distributed generations (DGs) in different forms, and the installation of smart measurement tools. In addition to other changes, these lead to distribution grid modifications, allowing more advanced features. Even though these advanced technologies enhance distribution grid performance, the operation, management, and control of active distribution networks (ADNs) have become more complicated. For example, distribution system state estimation (DSSE) calculations have been introduced as a tool to estimate the performance of distribution grids. These DSSE computations are highly dependent on data obtained from measurement devices in distribution grids. However, sufficient measurement devices are not available in ADNs due to economic constraints and various configurations of distribution grids. Thus, the modeling of pseudo-measurements using conventional and machine learning techniques from historical information in distribution grids is applied to address the lack of real measurements in ADNs. Different types of measurements (real, pseudo, and virtual measurements), alongside network parameters, are fed into model-based or data-based DSSE approaches to estimate the state variables of the distribution grid. The results obtained through DSSE should be sufficiently accurate for the appropriate management and overall performance evaluation of a distribution grid in a control center. However, distribution grids are prone to different cyberattacks, which can endanger their safe operation. One particular type of cyberattack is known as a false data injection attack (FDIA) on measurement data. Attackers try to inject false data into the measurements of nodes to falsify DSSE results. The FDIA can sometimes bypass poor traditional data-detection processes. If FDIAs cannot be identified successfully, the distribution grid’s performance is degraded significantly. Currently, different machine learning applications are applied widely to model pseudo-measurements, calculate DSSE variables, and identify FDIAs on measurement data to achieve the desired distribution grid operation and performance. In this study, we present a comprehensive review investigating the use of supervised machine learning (SML) in distribution grids to enhance and improve the operation and performance of advanced distribution grids according to three perspectives: (1) pseudo-measurement generation (via short-term load forecasting); (2) DSSE calculation; and (3) FDIA detection on measurement data. This review demonstrates the importance of SML in the management of ADN operation.
... However, this improvement comes at the cost of reduced inference speed and increased requirement of computational power [21]. Therefore, selecting a suitable CNN architecture on the basis of the characteristics of the image is a strategy to enhance the performance of the trained model [25]. The Inceptions architecture (Inceptions) [26] and residual neural networks (ResNets) [27] are widely used as backbones for Faster R-CNN. ...
Article
Full-text available
The analysis of AR is widely used to detect loss of acrosome in sperm, but the subjective decisions of experts affect the accuracy of the examination. Therefore, we develop an ARCS for objectivity and consistency of analysis using convolutional neural networks (CNNs) trained with various magnification images. Our models were trained on 215 microscopic images at 400× and 438 images at 1000× magnification using the ResNet 50 and Inception–ResNet v2 architectures. These models distinctly recognized micro-changes in the PM of AR sperms. Moreover, the Inception–ResNet v2-based ARCS achieved a mean average precision of over 97%. Our system’s calculation of the AR ratio on the test dataset produced results similar to the work of the three experts and could do so more quickly. Our model streamlines sperm detection and AR status determination using a CNN-based approach, replacing laborious tasks and expert assessments. The ARCS offers consistent AR sperm detection, reduced human error, and decreased working time. In conclusion, our study suggests the feasibility and benefits of using a sperm diagnosis artificial intelligence assistance system in routine practice scenarios.
... ResNet merupakan arsitektur yang dikembangkan oleh peniliti microsoft di Asia, yang kemudian memenangkan ImageNet Large Scale Visual Recognition Challenge pada tahun 2015. Berdasarkan penelitian yang dilakukan oleh Bianco dkk.[4] pada data citra validasi ImageNet-1k untuk permasalahan klasifikasi, model yang mencapai top-1 dan top-5 akurasi adalah NASNet-A-Large, namun dengan kompleksitas komputasi tertinggi. Selain itu, beberapa model lain yang memiliki kompleksitas komputasi lebih rendah (di bawah 15 G-FLOPs), namun tetap memiliki top-1 dan top-5 akurasi yang tinggi (di atas 90%) di antaranya adalah SE-ResNeXt-50, SE-ResNeXt-101, Inception-ResNet-v2, ResNet-50, ResNet-101, dan ResNet-152 yang merupakan keluarga algoritma ResNet.Seperti yang dapat dilihat pada Tabel I, penelitian yang dilakukan oleh Gómez-Ríos dkk.[10] pada permasalahan klasifikasi citra terumbu karang juga menunjukkan bahwa arsitektur ResNet memiliki hasil metrik akurasi yang lebih tinggi dibandingkan Inception-v3 dan DenseNet. ...
Preprint
Full-text available
The abundant biodiversity of coral reefs in Indonesian waters is a valuable asset that needs to be preserved. Rapid climate change and uncontrolled human activities have led to the degradation of coral reef ecosystems, including coral bleaching, which is a critical indicator of coral health conditions. Therefore, this research aims to develop an accurate classification model to distinguish between healthy corals and corals experiencing bleaching. This study utilizes a specialized dataset consisting of 923 images collected from Flickr using the Flickr API. The dataset comprises two distinct classes: healthy corals (438 images) and bleached corals (485 images). These images have been resized to a maximum of 300 pixels in width or height, whichever is larger, to maintain consistent sizes across the dataset. The method employed in this research involves the use of machine learning models, particularly convolutional neural networks (CNN), to recognize and differentiate visual patterns associated with healthy and bleached corals. In this context, the dataset can be used to train and test various classification models to achieve optimal results. By leveraging the ResNet model, it was found that a from-scratch ResNet model can outperform pretrained models in terms of precision and accuracy. The success in developing accurate classification models will greatly benefit researchers and marine biologists in gaining a better understanding of coral reef health. These models can also be employed to monitor changes in the coral reef environment, thereby making a significant contribution to conservation and ecosystem restoration efforts that have far-reaching impacts on life.