Conference Paper

Imagenet classification with deep convolutional neural networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Neural networks have become very popular in many areas, such as computer vision (Krizhevsky et al. 2012;Reinders et al. 2022;Ren et al. 2015;Simonyan and Zisserman 2015;Zhao et al. 2017;Qiao et al. 2021;Rudolph et al. 2022;Sun et al. 2021a), speech recognition (Graves et al. 2013;Park et al. 2019;Sun et al. 2021a), automated game-playing (Mnih et al. 2015;Dockhorn et al. 2017), or natural language processing (Collobert et al. 2011;Sutskever et al. 2014;Otter et al. 2021). Researchers have published many datasets for training neural networks and put enormous effort into providing labels for each data sample. ...
Chapter
Full-text available
Neural networks have demonstrated great success; however, large amounts of labeled data are usually required for training the networks. In this work, a framework for analyzing the road and traffic situations for cyclists and pedestrians is presented, which only requires very few labeled examples. We address this problem by combining convolutional neural networks and random forests, transforming the random forest into a neural network, and generating a fully convolutional network for detecting objects. Because existing methods for transforming random forests into neural networks propose a direct mapping and produce inefficient architectures, we present neural random forest imitation—an imitation learning approach by generating training data from a random forest and learning a neural network that imitates its behavior. This implicit transformation creates very efficient neural networks that learn the decision boundaries of a random forest. The generated model is differentiable, can be used as a warm start for fine-tuning, and enables end-to-end optimization. Experiments on several real-world benchmark datasets demonstrate superior performance, especially when training with very few training examples. Compared to state-of-the-art methods, we significantly reduce the number of network parameters while achieving the same or even improved accuracy due to better generalization.
... Since the introduction of the LeNet-5 model, many improvements have been made to overcome the shortages of CNNs. Krizhevsky et al. proposed a new deep CNN architecture called AlexNet, which outperformed the present image recognition methods using the ImageNet dataset [19]. From this time, many other architectures were proposed to improve the overall performance, such as the ResNet [20], VGGNet [21] or GoogleNet [22]. ...
Article
Full-text available
Neural networks experienced great deal of success in many domains of machine intelligence. In tasks such as object detection, speech recognition or natural language processing is performance of neural networks close to that of human. This allows penetration of neural networks in many domains. The medicine is one of the domains that can successfully harvest methodological advances in neural networks. Medical personnel has to deal with huge amount of data that are used for patients’ diagnosis, monitoring and treatment. Application of neural networks in diagnosis and decision support systems have proven to add more objectivity to diagnosis, allow for quicker and more accurate decision and provide more personalized treatment. In this brief review we describe several main architectures of neural networks together with their applications. We provide description of convolutional neural networks, auto-encoders and recurrent neural networks together with their applications such as medical image segmentation, processing of electrocardiogram for arrhythmia detection and many others.
... (2). Many batch-learning projects use an error-backprop method [23,24,76] that uses a gradient-based method to find a local minimum in error. ...
Chapter
Full-text available
“Deep learning” uses Post-Selection—selection of a model after training multiple models using data. The performance data of “Seep Learning” have been deceptively inflated due to two misconducts: 1: cheating in the absence of a test; 2: hiding bad-looking data. Through the same misconducts, a simple method Pure-Guess Nearest Neighbor (PGNN) gives no errors on any validation dataset V, as long as V is in the possession of the authors and both the amount of storage space and the time of training are finite but unbounded. The misconducts are fatal, because “Deep Learning” is not generalizable, by overfitting a sample set V. The charges here are applicable to all learning modes. This chapter proposes new AI metrics, called developmental errors for all networks trained, under four Learning Conditions: (1) a body including sensors and effectors, (2) an incremental learning architecture (due to the “big data” flaw), (3) a training experience, and (4) a limited amount of computational resources. Developmental Networks avoid Deep Learning misconduct because they train a sole system, which automatically discovers context rules on the fly by generating emergent Turing machines that are optimal in the sense of maximum likelihood across a lifetime, conditioned on the four Learning Conditions.
... SiamFC is a Siamese architecture network that is fully convolutional with respect to target patches (denoted as z) and current search regions (denoted as x). Two feature maps were extracted using a feature extraction network ϕ without padding modified from AlexNet [34]. A cross-correlation operation * was conducted on the two extracted feature maps as follows: ...
Preprint
Full-text available
In recent years, Siamese network-based trackers have achieved significant improvements in real-time tracking. Despite their success, performance bottlenecks caused by unavoidably complex scenarios in target-tracking tasks are becoming increasingly non-negligible. For example, occlusion and fast motion are factors that can easily cause tracking failures and are labeled in many high-quality tracking databases as challenging attributes. In addition, Siamese trackers tend to suffer from high memory costs, which restricts their applicability to mobile devices with tight memory budgets. To address these issues, we propose a Specialized teachers Distilled Siamese Tracker (SDST) framework to learn a student tracker, which is small, fast, and has enhanced performance in challenging attributes. SDST introduces two types of teachers for multi-teacher distillation: general teacher and specialized teachers. The former imparts basic knowledge to the students. The latter is used to transfer specialized knowledge to students, which helps improve their performance in challenging attributes. For students to efficiently capture critical knowledge from the two types of teachers, SDST is equipped with a carefully designed multi-teacher knowledge distillation model. Our model contains two processes: general teacher-student knowledge transfer and specialized teachers-student knowledge transfer. Extensive empirical evaluations of several popular Siamese trackers demonstrated the generality and effectiveness of our framework. Moreover, the results on Large-scale Single Object Tracking (LaSOT) show that the proposed method achieve compression rates of up to 8 × and framerates of 252 FPS while achieving a significant improvement of more than 2% – 4% in most challenging attributes. And obtaining outstanding accuracy on all challenging attributes.
... Convolutional neural networks (CNNs) have revolutionized image classification tasks. Krizhevsky et al. [3] introduced the influential AlexNet architecture, which achieved remarkable performance in the ImageNet challenge. The success of deep learning models like CNNs heavily relies on optimization algorithms. ...
Article
Optimization algorithms play a vital role in training deep learning models effectively. This research paper presents a comprehensive comparative analysis of various optimization algorithms for Convolutional Neural Networks (CNNs) in the context of time series regression. The study focuses on the specific application of maximum temperature prediction, utilizing a dataset of historical temperature records. The primary objective is to investigate the performance of different optimizers and evaluate their impact on the accuracy and convergence properties of the CNN model. Experiments were conducted using different optimizers, including Stochastic Gradient Descent (SGD), RMSprop, Adagrad, Adadelta, Adam, and Adamax, while keeping other factors constant. Their performance was evaluated and compared based on metrics such as mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared (R²), mean absolute percentage error (MAPE), and explained variance score (EVS) to measure the predictive accuracy and generalization capability of the models. Additionally, learning curves are analyzed to observe the convergence behavior of each optimizer. The experimental results, indicating significant variations in convergence speed, accuracy, and robustness among the optimizers, underscore the research value of this work. By comprehensively evaluating and comparing various optimization algorithms, we aimed to provide valuable insights into their performance characteristics in the context of time series regression using CNN models. This work contributes to the understanding of optimizer selection and its impact on model performance, assisting researchers and practitioners in choosing the most suitable optimization algorithm for time series regression tasks.
... Neural network architectures For our experiments on UCI data sets, we used a feedforward neural network architecture with 2 hidden layers, each with 128 neurons and ReLU activations. For our experiments on the image data sets, we used a convolutional neural network with the AlexNet architecture (Krizhevsky et al., 2012). We used the Glorot uniform initialization (Glorot & Bengio, 2010) for the network weights W and 0 as initialization for the sparsity variable s 0 . ...
Article
Full-text available
This paper presents a novel holistic deep learning framework that simultaneously addresses the challenges of vulnerability to input perturbations, overparametrization, and performance instability from different train-validation splits. The proposed framework holistically improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets. The results are further validated by ablation experiments and SHAP value analysis, which reveal the interactions and trade-offs between the different evaluation metrics. To support practitioners applying our framework, we provide a prescriptive approach that offers recommendations for selecting an appropriate training loss function based on their specific objectives. All the code to reproduce the results can be found at https://github.com/kimvc7/HDL.
... Output layers in neural networks are not inherently constrained within a specific range, and normalization is commonly employed to prevent them from excessively increasing output values. To mimic the NCCF computation, we explore the application of Local Response Normalization (LRN) (Krizhevsky et al., 2012) as a potential normalization technique. ...
Article
Full-text available
The task of estimating the fundamental frequency of an audio signal, also known as pitch tracking, has been a long-standing research area in speech signal processing. Recently, there has been a growing interest in data-driven pitch tracking methods that leverage deep learning technologies. However, deep learning-based pitch tracking models often neglect the methodologies employed in well-established signal processing-based pitch tracking algorithms, which incorporate valuable prior knowledge. Motivated by this, we propose Neural RAPT, an interpretable neural network-based pitch tracking model that incorporates signal processing knowledge from RAPT. The proposed model consists of a front-end module and a back-end module. The front-end module adopts the U-Net structure, which inherently involves downsampling and upsampling processes similar to RAPT. To enhance the U-Net, we introduce a neural autocorrelation function (ACF) module using masked CNN, along with a normalization layer that models the sample pair-wise product in the normalized cross-correlation function (NCCF). The back-end module is based on the Transformer architecture. The model is evaluated by pitch tracking experiments on the PTDB-TUG database and noisy mixtures with the NOISEX-92 database at different SNRs. The experimental results demonstrate that incorporating algorithmic knowledge into the model design leads to improved performance. The proposed model inherits the advantages of high accuracy from traditional pitch tracking algorithms, while also benefiting from the noise robustness offered by neural network-based methods. Consequently, our model exhibits superiority over existing deep learning-based pitch tracking methods.
... In this study, six popular CNN architectures are trained using various loss functions to evaluate the feasibility of the proposed loss function. First, AlexNet [35] has eight layers comprising five convolutional layers and three fully connected layers combined with dropout techniques. Its simplicity and moderate depth made its training fast. ...
Article
Full-text available
Facial expression recognition is crucial for understanding human emotions and nonverbal communication. With the growing prevalence of facial recognition technology and its various applications, accurate and efficient facial expression recognition has become a significant research area. However, most previous methods have focused on designing unique deep-learning architectures while overlooking the loss function. This study presents a new loss function that allows simultaneous consideration of inter- and intra-class variations to be applied to CNN architecture for facial expression recognition. More concretely, this loss function reduces the intra-class variations by minimizing the distances between the deep features and their corresponding class centers. It also increases the inter-class variations by maximizing the distances between deep features and their non-corresponding class centers, and the distances between different class centers. Numerical results from several benchmark facial expression databases, such as Cohn-Kanade Plus, Oulu-Casia, MMI, and FER2013, are provided to prove the capability of the proposed loss function compared with existing ones.
... Figure 1f demonstrates a metal artifact caused by a metallic object. In this assessment, the object detection model (YOLOv8) is employed to detect metal artifacts (7) Distance start = y boxtop − y startline (8) Distance end = y endline − y boxbottom , [37] dataset, while U-Net utilizes pre-trained weights on the ImageNet [38] dataset. The images are standardized by subtracting the mean and dividing by the standard deviation of the image. ...
Article
Full-text available
Background Chest computed tomography (CT) image quality impacts radiologists’ diagnoses. Pre-diagnostic image quality assessment is essential but labor-intensive and may have human limitations (fatigue, perceptual biases, and cognitive biases). This study aims to develop and validate a deep learning (DL)-driven multi-view multi-task image quality assessment (M2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}IQA) method for assessing the quality of chest CT images in patients, to determine if they are suitable for assessing the patient’s physical condition. Methods This retrospective study utilizes and analyzes chest CT images from 327 patients. Among them, 1613 images from 286 patients are used for model training and validation, while the remaining 41 patients are reserved as an additional test set for conducting ablation studies, comparative studies, and observer studies. The M2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}IQA method is driven by DL technology and employs a multi-view fusion strategy, which incorporates three scanning planes (coronal, axial, and sagittal). It assesses image quality for multiple tasks, including inspiration evaluation, position evaluation, radiation protection evaluation, and artifact evaluation. Four algorithms (pixel threshold, neural statistics, region measurement, and distance measurement) have been proposed, each tailored for specific evaluation tasks, with the aim of optimizing the evaluation performance of the M2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}IQA method. Results In the additional test set, the M2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}IQA method achieved 87% precision, 93% sensitivity, 69% specificity, and a 0.90 F1-score. Extensive ablation and comparative studies have demonstrated the effectiveness of the proposed algorithms and the generalization performance of the proposed method across various assessment tasks. Conclusion This study develops and validates a DL-driven M2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}IQA method, complemented by four proposed algorithms. It holds great promise in automating the assessment of chest CT image quality. The performance of this method, as well as the effectiveness of the four algorithms, is demonstrated on an additional test set.
... In the series of significant contributions by the authors of [20][21][22]. First, they used Alexnet [24] for the fire detection tasks. And found that because of its model size, it cannot be used in real-time surveillance devices. ...
Article
Full-text available
Fire is a dangerous and unwanted calamity that can destroy properties and lives in forest and urban areas within a few minutes. The effects of which may not be reversible. Therefore, computer vision-based methods can be essential in early fire detection tasks. But existing approaches combining computer vision and deep learning based methods are unsuitable for memory-constrained surveillance devices(CCTV and IP Sensors) with low computational power. This necessitates generating an automated system that can detect and localize this anomaly suitable for such small devices. Hence, this article proposes a custom framework of novel FireDL for fire detection and localization. The proposed FireDL is a lightweight model with 2.2 million parameters only. When trained on the same datasets, FireDL is eight times lighter than VGG16 and six times lighter than the Densenet121. The FireDL model comprises two modules: one for feature extraction containing four blocks of convolutions and the second for classification containing one block of fully connected layers. The experiments were conducted on two named Set 1(DeepFire) and Set 2(self-created using Foggia, Ko et al., etc.). It attained an accuracy of 97.63% on Set 1 and 99.82% on Set 2. Extensive experiments performed on Set 1 and Set 2 confirm its reliability for a real-world scenario.
... This seminal work played a pivotal role in shaping the subsequent evolution of CNNs. Krizhevsky introduced AlexNet in 2012 [33]. Building upon the foundations laid by LeNet, AlexNet effectively adopted the Rectified Linear Unit (ReLU) [34] as the activation function for CNN. ...
Article
Full-text available
A printed circuit board (PCB) functions as a substrate essential for interconnecting and securing electronic components. Its widespread integration is evident in modern electronic devices, spanning computers, cell phones, televisions, digital cameras, and diverse apparatus. Ensuring product quality mandates meticulous defect inspection, a task exacerbated by the heightened precision of contemporary circuit boards, intensifying the challenge of defect detection. Conventional algorithms, hampered by inefficiency and limited accuracy, fall short of usage benchmarks. In contrast, PCB defect detection algorithms rooted in deep learning hold promise for achieving heightened accuracy and efficiency, bolstered by their adeptness at discerning novel defect types. This review presents a comprehensive analysis of machine vision-based PCB defect detection algorithms, traversing the realms of machine learning and deep learning. It commences by contextualizing and elucidating the significance of such algorithms, followed by an extensive exploration of their evolution within the machine vision framework, encompassing classification, comparison, and analysis of algorithmic principles, strengths, and weaknesses. Moreover, the introduction of widely used PCB defect detection datasets and assessment indices enhances the evaluation of algorithmic performance. Currently, the detection accuracy can exceed 95% at an Intersection over Union (IoU) of 0.5. Lastly, potential future research directions are identified to address the existing issues in the current algorithm. These directions include utilizing Transformers as a foundational framework for creating new algorithms and employing techniques like Generative Adversarial Networks (GANs) and reinforcement learning to enhance PCB defect detection performance.
... The hash-code learning model D txt is "f c_hs4 → f c_hs5 → f c_hs6". Among them, the activation functions of the second and third layers are ReLU [53], and the last layer is the identity function. Fig. 5 manifests that the structure of text modal encoder g described above includes the feature extraction model E txt and the hash-code learning model D txt . ...
Article
Full-text available
Cross-modal hashing retrieval has attracted extensive attention due to its low storage requirements as well as high retrieval efficiency. In particular, how to more fully exploit the correlation of different modality data and generate a more distinguished representation is the key to improving the performance of this method. Moreover, Transformer-based models have been widely used in various fields, including natural language processing, due to their powerful contextual information processing capabilities. Based on these motivations, we propose a Transformer-based Distinguishing Strong Representation Deep Hashing (TDSRDH). For text modality, since the sequential relations between words imply semantic relations that are not independent relations, we thoughtfully encode them using a transformer-based encoder to obtain a strong representation. In addition, we propose a triple-supervised loss based on the commonly used pairwise loss and quantization loss. The latter two ensure the learned features and hash-codes can preserve the similarity of the original data during the learning process. The former ensures that the distance between similar instances is closer and the distance between dissimilar instances is farther. So that TDSRDH can generate more discriminative representations while preserving the similarity between modalities. Finally, experiments on the three datasets MIRFLICKR-25K , IAPR TC-12 , and NUS-WIDE demonstrated the superiority of TDSRDH over the other baselines. Moreover, the effectiveness of the proposed idea was demonstrated by ablation experiments.
... • Classification: In the context of classification, the term involves categorizing the entire image without specifying the object's location (ex. AlexNet (Krizhevsky et al., 2017) ). The main metric employed for classification is accuracy. ...
Article
Full-text available
Recently, Deep learning algorithms are becoming increasingly instrumental in autonomous driving by identifying and acknowledging road entities to ensure secure navigation and decision-making. Autonomous car datasets play a vital role in developing and evaluating perception systems. Nevertheless, the majority of current datasets are acquired using Light Detection and Ranging (LiDAR) and camera sensors. Utilizing deep neural networks yields remarkable outcomes in object recognition, especially when applied to analyze data from cameras and LiDAR sensors which perform poorly under adverse weather conditions such as rain, fog, and snow due to the sensor wavelengths. This paper aims to evaluate the ability to use RADAR dataset for detecting objects in adverse weather conditions, when LiDAR and Cameras may fail to be effective. This paper presents two experiments for object detection using Faster-RCNN architecture with Resnet-50 backbone and COCO evaluation metrics. Experiment 1 is object detection over only one class, while Experiment 2 is object detection over eight classes. The results show that as expected the average precision (AP) of detecting one class is (47.2) which is better than the results from detecting eight classes (27.4). Comparing my results from experiment 1 to the literature results which achieved an overall AP (45.77), my result was slightly better in accuracy than the literature mainly due to hyper-parameters optimization. The outcomes of object detection and recognition based on RADAR indicate the potential effectiveness of RADAR data in automotive applications particularly in adverse weather conditions, where vision and LiDAR may encounter limitations.
Article
Welcome to the concluding issue of IEEE Transactions on Computational Social Systems (TCSS) for the year 2023. We would like to seize this opportunity to extend our heartfelt appreciation and congratulations to all for your exceptional dedication and unwavering support. We eagerly anticipate further collaboration to enhance the publication quality and expedite the review process of TCSS in the upcoming year 2024.
Article
Spacecraft pose estimation plays an important role in an increasing number of on-orbit services: rendezvous and docking, formation flights, debris removal, and so on. Current solutions achieve excellent performance at the cost of a huge number of model parameters and are not applicable in space environments where computational resources are limited. In this paper, we present the Squeeze-and-Excitation based Spacecraft Pose Network (SESPNet). Our primary objective is to make a trade-off between minimizing model parameters and preserving performance to be more applicable to edge computing in space environments. Our contributions are primarily manifested in three aspects: first, we adapt the lightweight PeleeNet as the backbone network; second, we incorporate the SE attention mechanism to bolster the network’s feature extraction capabilities; third, we adopt the Smooth L1 loss function for position regression, which significantly enhances the accuracy of position estimation.
Article
Full-text available
In this paper, we address the question of achieving high accuracy in deep learning models for agricultural applications through edge computing devices while considering the associated resource constraints. Traditional and state-of-the-art models have demonstrated good accuracy, but their practicality as end-user available solutions remains uncertain due to current resource limitations. One agricultural application for deep learning models is the detection and classification of plant diseases through image-based crop monitoring. We used the publicly available PlantVillage dataset containing images of healthy and diseased leaves for 14 crop species and 6 groups of diseases as example data. The MobileNetV3-small model succeeds in classifying the leaves with a test accuracy of around 99.50%. Post-training optimization using quantization reduced the number of model parameters from approximately 1.5 million to 0.93 million while maintaining the accuracy of 99.50%. The final model is in ONNX format, enabling deployment across various platforms, including mobile devices. These findings offer a cost-effective solution for deploying accurate deep-learning models in agricultural applications.
Article
Об'єктом дослідження є процес передачі інформації між учасниками рою дронів що розширити зону покриття керуючим сигналом, та забезпечити гарантовану передачу інформації що дозволяє зберігати контроль над роєм дронів та безпечно керувати ним. Метою роботи є забезпечення гарантованої передачі інформації на відстань від 5до 10 км. в залежності від технічних можливостей та створення каналу комунікації між членами рою у умовах відсутності зв'язку з хабом. Використавши польотний контролер Pixhawk та інтеграційний модуль передачі прийому даних з підтримкою низько частотного протоколу LoRaWAN вдалося забезпечити надійний контроль у мовах втрати зв'язку з керуючим хабом. Також було визначено логіку присвоєння ролі комунікатора між керуючим хабом та членами рою для збільшення зони покриття керуючого сигналу. Бібл. 13, рис. 5
Article
Face anti-spoofing, which aims to defend against various attacks in face authentication systems, has drawn increasing attention with the advance in biometrics. Although many studies for face anti-spoofing have shown the remarkable performance improvement, they still suffer from a lack of generality, i.e., vulnerability to unknown attacks frequently occurring in real-world scenarios. Recently, the one-class learning approach has emerged as a way to overcome the aforementioned issue thanks to its capability to distinguish unseen forgery types by precisely understanding the subtle pattern of real faces. However, it is quite a difficult problem to determine the accurate decision boundary by only using real facial images. To cope with this limitation, we propose to apply a novel pseudo-negative sampling scheme in one-class learning for face anti-spoofing. More concretely, pseudo-negative samples are generated based on the statistical distribution of real facial samples and utilized as the proxy of fake facial samples to construct the robust decision boundary. The proposed method is designed by following a two-stage unsupervised learning framework. Firstly, the model learns the feature representation of real faces via the Siamese architecture in the pre-training stage. In the fine-tuning stage, pseudo-negative features are randomly sampled at a suitable distance from real facial features in the latent space. These sampled features are then utilized with real facial features to guide the classifier. Since such pseudo-negative features are not limited to specific fake properties, our classifier can effectively learn to distinguish real and fake faces without using any fake facial images during the training. Experimental results on benchmark datasets show that the proposed method is effective for face anti-spoofing even with unseen spoofing attacks, which achieves the state-of-the-art performance on the Replay-Attack dataset, i.e., 94.48% in AUC. The code and model are publicly available at: https://github.com/DCVL-FA/PNS-release.
Article
Full-text available
With advancements in deep learning (DL), artificial intelligence (AI) technology has become an indispensable tool. However, the application of DL incurs significant computational costs, making it less viable for edge AI scenarios. Consequently, the demand for cost-effective AI solutions, other than DL-based approaches, is increasing. Reservoir computing (RC) has attracted interest owing to its ability to provide low-cost training alternatives, holding great promise for edge AI applications. However, the training capability of RC is constrained by its reliance on a single linear layer, while weight connections in the remaining layers remain static during training. Moreover, accomplishing continuous learning tasks is difficult owing to the catastrophic forgetting in the linear layer. Therefore, we propose the integration of self-organizing multiple readouts to enhance RC’s training capability. Our method distributes training data across multiple readouts, which prevents catastrophic forgetting of readouts and empowers each readout to adeptly assimilate new data, thereby elevating the overall training performance. The self-organizing function, which assigns similar data to the same readout, optimizes the memory utilization of these multiple readouts. Experimental results show that an RC equipped with the proposed multiple readouts successfully solved a continuous learning task by mitigating catastrophic forgetting because of the data distribution to the multiple readouts. Additionally, the RC achieved higher accuracy in a sound recognition task compared with the existing RC paradigm because of ensemble learning in the multiple readouts. Multiple readouts are effective in enhancing the training capability of RC and can contribute to the realization of RC applications.
Chapter
Recent strides in AI and nanotechnology are converging remarkably. This chapter provides a comprehensive overview, exploring their principles and applications. Journeying through AI and nanotechnology's history, the authors highlight milestones that shape our understanding of their present and future. They delve into AI fundamentals like machine learning, deep learning, and natural language processing. Transitioning to nanotechnology, they explore its history, principles, and nanoscale material significance. Building on this, they investigate their synergy, showcasing AI's optimization of nanomaterial design. Addressing ethics, privacy, and environment, responsible implementation is essential. This chapter introduces the captivating AI-nanotech world, paving the way for deeper discussions. Along with the reader, the authors embark on an enlightening journey into their convergence's boundless potential.
Chapter
Emerging synergies of nanotechnology and artificial intelligence (AI) promise transformative impacts on various sectors. This fusion unlocks novel materials and applications previously unattainable. This chapter explores AI and nanotech potentials across healthcare, energy, environment, manufacturing, and transportation, emphasizing ethical frameworks for responsible use. The horizon shines bright for AI and nanotech, ushering in an era of unprecedented innovation. Rapid advancements beckon boundless achievements. However, prudent navigation is essential, given potential risks like autonomous weapons or hazardous nanomaterials. Ethical guidelines must steer these technologies toward positive trajectories. Concluding, the chapter addresses challenges and opportunities shaping AI and nanotech's trajectory. Their potential to reshape the world is evident. Guided by ethics, the authors hold the key to harnessing their power for global betterment, marrying innovation with ethical stewardship.
Article
Full-text available
Normal development of the immune system is essential for overall health and disease resistance. Bony fish, such as the zebrafish ( Danio rerio ), possess all the major immune cell lineages as mammals and can be employed to model human host response to immune challenge. Zebrafish neutrophils, for example, are present in the transparent larvae as early as 48 hours post fertilization and have been examined in numerous infection and immunotoxicology reports. One significant advantage of the zebrafish model is the ability to affordably generate high numbers of individual larvae that can be arrayed in multi-well plates for high throughput genetic and chemical exposure screens. However, traditional workflows for imaging individual larvae have been limited to low-throughput studies using traditional microscopes and manual analyses. Using a newly developed, parallelized microscope, the Multi-Camera Array Microscope (MCAM™), we have optimized a rapid, high-resolution algorithmic method to count fluorescently labeled cells in zebrafish larvae in vivo . Using transgenic zebrafish larvae, in which neutrophils express EGFP, we captured 18 gigapixels of images across a full 96-well plate, in 75 seconds, and processed the resulting datastream, counting individual fluorescent neutrophils in all individual larvae in 5 minutes. This automation is facilitated by a machine learning segmentation algorithm that defines the most in-focus view of each larva in each well after which pixel intensity thresholding and blob detection are employed to locate and count fluorescent cells. We validated this method by comparing algorithmic neutrophil counts to manual counts in larvae subjected to changes in neutrophil numbers, demonstrating the utility of this approach for high-throughput genetic and chemical screens where a change in neutrophil number is an endpoint metric. Using the MCAM™ we have been able to, within minutes, acquire both enough data to create an automated algorithm and execute a biological experiment with statistical significance. Finally, we present this open-source software package which allows the user to train and evaluate a custom machine learning segmentation model and use it to localize zebrafish and analyze cell counts within the segmented region of interest. This software can be modified as needed for studies involving other zebrafish cell lineages using different transgenic reporter lines and can also be adapted for studies using other amenable model species.
Chapter
Implementing in-house AI in the modern business is a classic example of digital transformation, often appearing simple and attractive, particularly given the emergence and availability of powerful, easy-to-use frameworks like TensorFlow or PyTorch. Such AIs are commonly considered for replacing cumbersome manual or physical systems, where neural networks may appear to be almost a panacea automation solution to solve scalability or diversification concerns. However, such systems have subtle and sometimes very surprising behaviours that require considerable domain expertise, in order to implement a functional system without expending more effort than the system ultimately gains. Fundamentally, they need to be deployed with a clear sense of what the AI system is going to achieve. Careful attention must be paid at the outset to drafting a clear and concrete design specification that indicates the intended function and, equally, draws a line under capabilities that are out of scope. Likewise, an effort needs to be made either to identify in-house people with the required skill sets to develop the system or alternatively to enter into close working partnerships with external providers who can identify the needs and clearly articulate an appropriate solution. Most challenging of all, especially at large scale, is the emerging ‘data gap’—the need to have access to or generate enormous volumes of labelled data—which often comes only at costs outside the budget of all but the largest companies. A case study in design collaboration between an emerging company transitioning from a physical to a virtual technology and a university research group with substantial expertise in AI systems is presented, both as an illustration of the complex design considerations and a model for how to build in-house expertise. The collaboration is ongoing, and outcomes are still preliminary, but the company is now starting to gain an appreciation for the complexity of real-world AI deployments and has developed a strategic plan that enables future growth. The emerging overall message is that modern AI is more an exercise in data automation than process automation.
Article
Cardiovascular diseases, such as heart attack and congestive heart failure, are the leading cause of death both in the United States and worldwide. The current medical practice for diagnosing cardiovascular diseases is not suitable for long-term, out-of-hospital use. A key to long-term monitoring is the ability to detect abnormal cardiac rhythms, i.e., arrhythmia, in real-time. Most existing studies only focus on the accuracy of arrhythmia classification, instead of runtime performance of the workflow. In this paper, we present our work on supporting real-time arrhythmic detection using convolutional neural networks, which take images of electrocardiogram (ECG) segments as input, and classify the arrhythmia conditions. To support real-time processing, we have carried out extensive experiments and evaluated the computational cost of each step of the classification workflow. Our results show that it is feasible to achieve real-time arrhythmic detection using convolutional neural networks. To further demonstrate the generalizability of this approach, we used the trained model with processed data collected by a customized wearable sensor from a lab setting, and the results shown that our approach is highly accurate and efficient. This research provides the potentials to enable in-home real-time heart monitoring based on 2D image data, which opens up opportunities for integrating both machine learning and traditional diagnostic approaches.
Article
Bioimage analysis plays a critical role in extracting information from biological images, enabling deeper insights into cellular structures and processes. The integration of machine learning and deep learning techniques has revolutionized the field, enabling the automated, reproducible, and accurate analysis of biological images. Here, we provide an overview of the history and principles of machine learning and deep learning in the context of bioimage analysis. We discuss the essential steps of the bioimage analysis workflow, emphasizing how machine learning and deep learning have improved preprocessing, segmentation, feature extraction, object tracking, and classification. We provide examples that showcase the application of machine learning and deep learning in bioimage analysis. We examine user‐friendly software and tools that enable biologists to leverage these techniques without extensive computational expertise. This review is a resource for researchers seeking to incorporate machine learning and deep learning in their bioimage analysis workflows and enhance their research in this rapidly evolving field.
Article
Technological advancements continue to expand the communications industry’s potential. Images, which are an important component in strengthening communication, are widely available. Therefore, image quality assessment (IQA) is critical in improving content delivered to end users. Convolutional neural networks (CNNs) used in IQA face two common challenges. One issue is that these methods fail to provide the best representation of the image. The other issue is that the models have a large number of parameters, which easily leads to overfitting. To address these issues, the dense convolution network (DSC-Net), a deep learning model with fewer parameters, is proposed for no-reference image quality assessment (NR-IQA). Moreover, it is obvious that the use of multimodal data for deep learning has improved the performance of applications. As a result, multimodal dense convolution network (MDSC-Net) fuses the texture features extracted using the gray-level co-occurrence matrix (GLCM) method and spatial features extracted using DSC-Net and predicts the image quality. The performance of the proposed framework on the benchmark synthetic datasets LIVE, TID2013, and KADID-10k demonstrates that the MDSC-Net approach achieves good performance over state-of-the-art methods for the NR-IQA task.
Article
Full-text available
Recent advancements in deep learning have highlighted the pivotal role of data in training deep neural networks. However, the persistent issue of imbalanced data distribution poses a challenge for achieving optimal performance. Existing approaches like re-sampling, re-weighting, decoupling representation and data augmentation have sought to address data imbalance by increasing data quantity and diversity, improving model performance and generalization. Notably, Copy-Paste data augmentation has shown promise but comes with significant time and computing requirements. To overcome these limitations, we propose an innovative solution called balanced object paste (BOP). BOP enhances data distribution by pasting additional objects onto target images using a set of well-defined principles, synthesizing new images and yielding promising results. The position for pasting is generated through a region generation method based on existing object distribution. Additional objects are sampled in a class-balanced manner and scaled to achieve balance across categories and sizes. BOP outperforms existing methods like Simple Copy-Paste, showcasing notable improvements of 4.1 in AP for PASCAL VOC and a speed boost of 13 times. Moreover, BOP consistently enhances detector performance across various datasets and conditions, and its versatility extends to box-labeled datasets, establishing it as a valuable tool for object detection.
Article
Full-text available
Semantic part detection within an object is of importance in the field of computer vision. This study proposes a novel approach to semantic part detection that starts by employing a convolutional neural network to concatenate a selection of feature maps from the network into a long vector for pixel representation. Using this dedicated pixel representation, we implement a range of techniques, such as Poisson disk sampling for pixel sampling and Poisson matting for pixel label correction. These techniques efficiently facilitate the training of a practical pixel classifier for part detection. Our experimental exploration investigated various factors that affect the model’s performance, including training data labeling (with or without the aid of Poisson matting), hypercolumn representation dimensionality, neural network architecture, post-processing techniques, and pixel classifier selection. In addition, we conducted a comparative analysis of our approach with established object detection methods.
Article
With the development of radio modulation technologies for communication and wireless applications, several studies have been conducted to reduce and eliminate noise during signal transmission. Although the influence of noise can be effectively addressed, it has become a popular research topic in mobile communications. Moreover, in recent telecommunication systems, owing to their complexity and comprehensive protocols, which require a large number of mathematical and engineering approaches, predicting and classifying noise is difficult. Thus, to effectively address these challenges, we propose a spatiotemporal AnoGAN to detect the noise that can occur during radio modulation. In our approach, we assemble a set of AnoGANs based on convolutional neural networks (CNNs) and long short-term memory (LSTM) to enable the system to learn the time-series features of the radio modulation signal and shape expressed in complex planes. The proposed spatiotemporal AnoGAN can discriminate the interference caused by noise without any annotation of anomalies using a generator and discriminator. The proposed spatiotemporal AnoGAN achieves a 91.4% recall in digitally modulated signals that were previously difficult to identify. Through an empirical analysis of the proposed method, we observed that the spatiotemporal AnoGAN accurately identified abnormal interference signals.
Article
Full-text available
This paper introduces YOLO, the best approach to object detection. Real-time detection plays a significant role in various domains like video surveillance, computer vision, autonomous driving and the operation of robots. YOLO algorithm has emerged as a well-liked and structured solution for real-time object detection due to its ability to detect items in one operation through the neural network. This research article seeks to lay out an extensive understanding of the defined Yolo algorithm, its architecture, and its impact on real-time object detection. This detection will be identified as a regression problem by frame object detection to spatially separated bounding boxes. Tasks like recognition, detection, localization, or finding widespread applicability in the best real-world scenarios, make object detection a crucial subdivision of computer vision. This algorithm detects objects in real-time using convolutional neural networks (CNN). Overall this research paper serves as a comprehensive guide to understanding the detection of objects in real-time using the You Only Look Once (YOLO) algorithm. By examining architecture, variations, and implementation details the reader can gain an understanding of YOLO’s capability.
Article
Full-text available
The rapid growth in multimedia, storage systems, and digital computers has resulted in huge repositories of multimedia content and large image datasets in recent years. For instance, biometric databases, which can be used to identify individuals based on fingerprints, facial features, or iris patterns, have gained a lot of attention both from academia and industry. Specifically, face image quality assessment (FIQA) has become a very important part of face recognition systems, since the performance of such systems strongly depends on the quality of input data, such as blur, focus, compression, pose, or illumination. The main contribution of this paper is an analysis of Benford’s law-inspired first digit distribution and perceptual features for FIQA. To be more specific, I investigate the first digit distributions in different domains, such as wavelet or singular values, as quality-aware features for FIQA. My analysis revealed that first digit distributions with perceptual features are able to reach a high performance in the task of FIQA.
Article
Full-text available
This paper proposes applying a novel deep‐learning model, TBDLNet, to recognize CT images to classify multidrug‐resistant and drug‐sensitive tuberculosis automatically. The pre‐trained ResNet50 is selected to extract features. Three randomized neural networks are used to alleviate the overfitting problem. The ensemble of three RNNs is applied to boost the robustness via majority voting. The proposed model is evaluated by five‐fold cross‐validation. Five indexes are selected in this paper, which are accuracy, sensitivity, precision, F1‐score, and specificity. The TBDLNet achieves 0.9822 accuracy, 0.9815 specificity, 0.9823 precision, 0.9829 sensitivity, and 0.9826 F1‐score, respectively. The TBDLNet is suitable for classifying multidrug‐resistant tuberculosis and drug‐sensitive tuberculosis. It can detect multidrug‐resistant pulmonary tuberculosis as early as possible, which helps to adjust the treatment plan in time and improve the treatment effect.
ResearchGate has not been able to resolve any references for this publication.