Conference Paper

Imagenet classification with deep convolutional neural networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The deep belief network [17] addresses the "vanishing gradient" issue through pre-training and subsequent fine-tuning. Using its innovative eight-layer convolutional neural network (CNN) architecture, AlexNet [18] won the ImageNet Large-Scale Visual Recognition Challenge. The classification accuracy and robustness efficiency achieved by AlexNet were superior to those of conventional techniques. ...
... Thus, to address these issues, lightweight CNN architectures, such as MobileNet [31] and SqueezeNet [32], are proposed to reduce the model size and computational demands. AlexNet [18] 2012 A five-layer convolutional and three-layer fully connected network, the first champion of CNN in the ImageNet competition. VGGNet [20] 2014 Replaces 5×5 or 7×7 convolutional kernels with smaller 3×3 kernels. ...
Article
Full-text available
Efficiently detecting intrusions on a railway perimeter is crucial for ensuring the safety of railway transportation. With the development of computer vision, researchers have been actively exploring methods for detecting foreign object intrusion via image recognition technology. This article reviews the background and importance of detecting railway perimeter intrusion, summarizes the limitations of traditional detection methods, and emphasizes the potential of improving detection accuracy and efficiency in image recognition with deep learning models. Further, it introduces the development of deep learning in image recognition, focusing on the principles and progress of key technologies such as convolutional neural networks (CNNs) and vision transformers (ViTs). In addition, the application status of semantic segmentation and object detection algorithms based on deep learning in detecting railway perimeter intrusion is explored, including the classification, principles, and performance of the algorithms in practical applications. Finally, it highlights the primary challenges faced in railway perimeter intrusion detection and projects future research directions to resolve these challenges, including multisource data fusion, large-scale dataset construction, model compression, and end-to-end multitask learning networks. These studies support the accuracy and real-time detection of railway perimeter intrusion, and provide technical guarantees for railway transportation monitoring tasks.
... The field of Neural Computer Vision has presented a great advancement, for instance, Convolutional Neural Networks (CNN) were proposed as a way to reduce the computational complexity of images in relation to densely connected neural networks [1]. AlexNet [2] revolutionized Computer Vision and Artificial Neural Networks (ANN) with an efficient GPU implementation of convolution operations. Further, improved neural network architectures were proposed. ...
... AlexNet [2] achieved a revolution at the fields of Computer Vision and ANN by implementing a GPU efficient convolutional operation, dropout [20] and data augmentation. Following this line, the ResNet [21] presented a residual connection to improve the gradient flow into earlier layers of deep neural networks. ...
Conference Paper
Computer vision in general presented several ad- vances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative mod- els, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe’s operations. Our results shows that our architecture provides State- of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32% test accuracy, 0.8% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang CNN.
... The experimental results of this study demonstrated promising advancements in the classification of ovarian cancer subtypes using modern deep learning (DL) models (Goodfellow et al., 2016;Krizhevsky et al., 2012;LeCun et al., 2015). The data was split into 70% for training, 15% for validation, and 15% for testing, allowing for a thorough and balanced assessment of the models. ...
... where W 1 , W 2 , and W 3 are the weight parameters for mixed loss. When the training sample classes are balanced, cross-entropy loss is widely regarded as an effective loss function for classification tasks due to its natural alignment with probability distributions [57], its efficacy in multi-class classification, its provision of stable and meaningful gradient information [58], and its outstanding performance on large-scale datasets [59]. In imbalanced samples, introducing Focal loss and Dice loss can mitigate the class imbalance issue; however, it significantly increases the difficulty of network training [49,[60][61][62]. ...
Article
Full-text available
Deep convolutional networks often encounter information bottlenecks when extracting land object features, resulting in critical geometric information loss, which impedes semantic segmentation capabilities in complex geospatial backgrounds. We developed LULC-SegNet, a semantic segmentation network for land use and land cover (LULC), which integrates features from the denoising diffusion probabilistic model (DDPM). This network enhances the clarity of the edge segmentation, detail resolution, and the visualization and accuracy of the contours by delving into the spatial details of the remote sensing images. The LULC-SegNet incorporates DDPM decoder features into the LULC segmentation task, utilizing machine learning clustering algorithms and spatial attention to extract continuous DDPM semantic features. The network addresses the potential loss of spatial details during feature extraction in convolutional neural network (CNN), and the integration of the DDPM features with the CNN feature extraction network improves the accuracy of the segmentation boundaries of the geographical features. Ablation and comparison experiments conducted on the Circum-Tarim Basin Region LULC Dataset demonstrate that the LULC-SegNet improved the LULC semantic segmentation. The LULC-SegNet excels in multiple key performance indicators compared to existing advanced semantic segmentation methods. Specifically, the network achieved remarkable scores of 80.25% in the mean intersection over union (MIOU) and 93.92% in the F1 score, surpassing current technologies. The LULC-SegNet demonstrated an IOU score of 73.67%, particularly in segmenting the small-sample river class. Our method adapts to the complex geophysical characteristics of remote sensing datasets, enhancing the performance of automatic semantic segmentation tasks for land use and land cover changes and making critical advancements.
... Convolutional neural networks (CNNs) have been widely used in TIR-PT to extract features. Most of the CNNs of TIR-PT adopt typical networks such as AlexNet [3] and ResNet [4] as the backbone architecture, then integrate more layers to use deep features. For example, MMNet [5] implemented a multi-task matching network based on AlexNet to learn object-specific discriminative and fine-grained correlation feature maps of TIR pedestrians. ...
Article
Full-text available
Manually-designed network architectures for thermal infrared pedestrian tracking (TIR-PT) require substantial effort from human experts. AlexNet and ResNet are widely used as backbone networks in TIR-PT applications. However, these architectures were originally designed for image classification and object detection tasks, which are less complex than the challenges presented by TIR-PT. This paper makes an early attempt to search an optimal network architecture for TIR-PT automatically, employing single-bottom and dual-bottom cells as basic search units and incorporating eight operation candidates within the search space. To expedite the search process, a random channel selection strategy is employed prior to assessing operation candidates. Classification, batch hard triplet, and center loss are jointly used to retrain the searched architecture. The outcome is a high-performance network architecture that is both parameter- and computation-efficient. Extensive experiments proved the effectiveness of the automated method.
... Convolutional neural networks (CNNs) with deep learning have been crucial in the field of computer vision lately [1]. They have demonstrated state-of-the-art performance in a variety of tasks, including object detection [2], semantic segmentation [3], and object recognition [4,5]. ...
Article
Full-text available
Deep convolutional neural networks (CNNs) have revolutionized computer vision, demonstrating remarkable performance in various tasks. However, their end-to-end learning strategy poses challenges to explainability. In this work, we explore the application of explainability techniques in brain tumor segmentation using magnetic resonance imaging (MRI) data. Our adaptive learning class activation map (AL-CAM) employs a unique multiple-pop-out training strategy and contrastive learning to enhance internal outputs, improving interpretability. Additionally, we introduce a novel approach to explainability in graph convolutional neural networks (GCNNs). The usage of traditional CNN interpretability tools such as saliency maps, CAM, and EB are often unable to handle the complexity of graph-structured data. Our work brim this gap by adapting and improving these techniques for graph convolutional neural networks (GCNN). We present two innovative tools: adaptive CAM for differentiated interpretability and contrastive EB for deeper insights into functions. Using a novel feature fusion approach, we further push the boundaries and combine the feature strengths of GNN and CNN for a holistic understanding of GCNN decision-making. Our proposed framework enables interpretability in various areas, not just medical imaging. Our work demonstrates the versatility of explainability methods and demonstrates their power in unlocking the secrets of GCNNs and ultimately solving real-world challenges, particularly in the field of medical image analysis.
... Note that in the case of tensile stress, the activation energy plot is asymmetric about the vertical axis. There are many asymmetric activation functions that have found use in neural networks, such as ReLU [27], Swish [28], Mish [29], EANAF [30], etc., which look similar to the activation function in Fig. 4. Application of this type of activation function for learning and inference tasks in neural networks is, however, outside the scope of this paper on TSNs and hence will be discussed elsewhere. ...
Preprint
Full-text available
Stochastic neurons are extremely efficient hardware for solving a large class of problems and usually come in two varieties -- "binary" where the neuronal statevaries randomly between two values of -1, +1 and "analog" where the neuronal state can randomly assume any value between -1 and +1. Both have their uses in neuromorphic computing and both can be implemented with low- or zero-energy-barrier nanomagnets whose random magnetization orientations in the presence of thermal noise encode the binary or analog state variables. In between these two classes is n-ary stochastic neurons, mainly ternary stochastic neurons (TSN) whose state randomly assumes one of three values (-1, 0, +1), which have proved to be efficient in pattern classification tasks such as recognizing handwritten digits from the MNIST data set or patterns from the CIFAR-10 data set. Here, we show how to implement a TSN with a zero-energy-barrier (shape isotropic) magnetostrictive nanomagnet subjected to uniaxial strain.
... In the implementation, we utilize a convolutional neural network (CNN)-based encoder. CNNs are suitable for finding local patterns and preserving spatial hierarchies within data [41,42]. Unless otherwise specified, this CNN-based encoder will be used throughout the remaining sections, with its architecture details provided in Section 3.3. ...
Preprint
Full-text available
Structural health monitoring (SHM) has experienced significant advancements in recent decades, accumulating massive monitoring data. Data anomalies inevitably exist in monitoring data, posing significant challenges to their effective utilization. Recently, deep learning has emerged as an efficient and effective approach for anomaly detection in bridge SHM. Despite its progress, many deep learning models require large amounts of labeled data for training. The process of labeling data, however, is labor-intensive, time-consuming, and often impractical for large-scale SHM datasets. To address these challenges, this work explores the use of self-supervised learning (SSL), an emerging paradigm that combines unsupervised pre-training and supervised fine-tuning. The SSL-based framework aims to learn from only a very small quantity of labeled data by fine-tuning, while making the best use of the vast amount of unlabeled SHM data by pre-training. Mainstream SSL methods are compared and validated on the SHM data of two in-service bridges. Comparative analysis demonstrates that SSL techniques boost data anomaly detection performance, achieving increased F1 scores compared to conventional supervised training, especially given a very limited amount of labeled data. This work manifests the effectiveness and superiority of SSL techniques on large-scale SHM data, providing an efficient tool for preliminary anomaly detection with scarce label information.
... By limit the impact of data masks on the overall training direction of the codebook, the method can better adapt to more complex scenarios and applications. We will test our approach on large-scale datasets [54] and determine the boundaries of its capabilities on other tasks [55]. ...
Article
Full-text available
Frequent use of codebook resets to enhance the usage of the Vector Quantization Variational Autoencoder (VQ-VAE) may significantly alter the codebook distribution and consequently diminish the training efficiency. In this work, we introduce a novel codebook learning approach called Exponentially Weighted Moving Average Control VQ-VAE (ECVQ-VAE). This method considers the nearest neighbor distance of the codebook during training as a monitoring sample and constructs a control line. Our quantizer restricts the update process of codebook vectors based on whether the drift during monitoring exceeds the control line while simultaneously adjusting the overall usage distribution of the codebook by promoting competition. This process enables an optimization that sustains full codebook usage while reducing the training demands. We demonstrate that our approach achieves better results than the existing methods in lightweight scenarios and extensively validate the generalizability of our quantizer across various datasets, tasks, and architectures (VQ-VAE, VQ-GAN).
... In the domain of liver cancer detection, the AI-based convolutional neural network (CNN) algorithms have contributed to a great extent in the prediction of cancer types [6] [2] [12] [13] [7] replacing traditional techniques such as SVM [14]. In the past decade, several CNN-based architectures have been proposed that either focus on deepening the network such as AlexNet [15] and VGG [16] or to overcome the vanishing gradient issue by adding the residual connections ResNet [17] architectures and DenseNet [18] or applying compound scaling methodology that demonstrates dependency on width, depth and resolution of CNN through EfficientNet architectures [19]. These models have been used as state-of-the-art for a long time and possess special features. ...
Preprint
Full-text available
Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to extract the features from pre-trained convolutional neural network (CNN) models and a classifier made up of a sequence of fully connected layers. This study uses a publicly available The Cancer Genome Atlas Hepatocellular Carcinoma (TCGA-LIHC)database (n=491) for model development and database of Kasturba Gandhi Medical College (KMC), India for validation. The pre-processing step involves patch extraction, colour normalization, and augmentation that results in 3920 patches for the TCGA dataset. The developed hybrid deep neural network consisting of a CNN-based pre-trained feature extractor and a customized artificial neural network-based classifier is trained using five-fold cross-validation. For this study, eight different state-of-the-art models are trained and tested as feature extractors for the proposed hybrid model. The proposed hybrid model with ResNet50-based feature extractor provided the sensitivity, specificity, F1-score, accuracy, and AUC of 100.00%, 100.00%, 100.00%, 100.00%, and 1.00, respectively on the TCGA database. On the KMC database, EfficientNetb3 resulted in the optimal choice of the feature extractor giving sensitivity, specificity, F1-score, accuracy, and AUC of 96.97, 98.85, 96.71, 96.71, and 0.99, respectively. The proposed hybrid models showed improvement in accuracy of 2% and 4% over the pre-trained models in TCGA-LIHC and KMC databases.
... Since 2012, classic backbone networks such as AlexNet [19], VGG [20], and ResNet have been applied to semantic segmentation tasks. Cloud and shadow detection essentially involves separating clouds and shadows from the background, which aligns with the definition of pixel-level classification in semantic segmentation [21]. ...
Article
Full-text available
Effective detection of the contours of cloud masks and estimation of their distribution can be of practical help in studying weather changes and natural disasters. Existing deep learning methods are unable to extract the edges of clouds and backgrounds in a refined manner when detecting cloud masks (shadows) due to their unpredictable patterns, and they are also unable to accurately identify small targets such as thin and broken clouds. For these problems, we propose MDU-Net, a multiscale dual up-sampling segmentation network based on an encoder–decoder–decoder. The model uses an improved residual module to capture the multi-scale features of clouds more effectively. MDU-Net first extracts the feature maps using four residual modules at different scales, and then sends them to the context information full flow module for the first up-sampling. This operation refines the edges of clouds and shadows, enhancing the detection performance. Subsequently, the second up-sampling module concatenates feature map channels to fuse contextual spatial information, which effectively reduces the false detection rate of unpredictable targets hidden in cloud shadows. On a self-made cloud and cloud shadow dataset based on the Landsat8 satellite, MDU-Net achieves scores of 95.61% in PA and 84.97% in MIOU, outperforming other models in both metrics and result images. Additionally, we conduct experiments to test the model’s generalization capability on the landcover.ai dataset to show that it also achieves excellent performance in the visualization results.
... Baselines: We compare CFGLT with other CFL algorithms, including FL+HC based on model parameters, FedSem based on model L2-distance, and IFCA based on models' local performance. In FL+HC, the server receives the update of the client model and uses hierarchical clustering [42] to assign clients into groups. In FedSem, after obtaining updated models from clients, the server assigns every client to the corresponding group with the L2 distance of models. ...
Article
Federated Learning (FL) has recently attracted a lot of attention due to its ability to train a machine learning model using data from multiple clients without divulging their privacy. However, the training data across clients can be very heterogeneous in terms of quality, amount, occurrences of specific features, etc. In this paper, we demonstrate how the server can observe data heterogeneity by mining gradient trajectories that the clients compute from a two-dimensional mapping of high-dimensional gradients computed by each client from its bottom layer. Based on these ideas, we propose a new clustered federated learning with gradient trajectory method, called CFLGT, which dynamically clusters clients together based on the gradient trajectories. We analyze CFLGT both theoretically and experimentally to show that it overcomes several drawbacks of mainstream Clustered Federated Learning (CFL) methods and outperforms other baselines.
... WE-SVM [50] 50.3 86.3 63.6 75.5 MDeNetplus [47] 88.4 91 90 90.5 COTR [48] 91.7 93.5 92.6 93.1 PVT [37] 94. ...
Article
Early dental caries detection by endoscope can prevent complications, such as pulpitis and apical infection. However, automatically identifying dental caries remains challenging due to the uncertainty in size, contrast, low saliency, and high interclass similarity of dental caries. To address these problems, we propose the Global Feature Detector (GFDet) that integrates the proposed Feature Selection Pyramid Network (FSPN) and Adaptive Assignment-Balanced Mechanism (AABM). Specifically, FSPN performs upsampling with the semantic information of adjacent feature layers to mitigate the semantic information loss due to sharp channel reduction and enhance discriminative features by aggregating fine-grained details and high-level semantics. In addition, a new label assignment mechanism is proposed that enables the model to select more high-quality samples as positive samples, which can address the problem of easily ignored small objects. Meanwhile, we have built an endoscopic dataset for caries detection, consisting of 1318 images labeled by five dentists. For experiments on the collected dataset, the F1-score of our model is 75.6%, which out-performances the state-of-the-art models by 7.1%.
... (1) Image classification involves labeling an image with specific categories. The use of CNNs for image classification gained prominence with the introduction of AlexNet in the ILSVRC 2012 competition [61], which employed five convolutional layers and demonstrated the potential of CNNs for multi-class classification. Subsequent models, such as VGGNet and GoogLeNet, incorporated more convolutional layers to deepen the networks, improving learning performance and prediction accuracy [62,63]. ...
Article
Full-text available
This positioning paper explores integrating smart in-process inspection and human–automation symbiosis within human–cyber–physical manufacturing systems. As manufacturing environments evolve with increased automation and digitalization, the synergy between human operators and intelligent systems becomes vital for optimizing production performance. Human–automation symbiosis, a vision widely endorsed as the future of human–automation research, emphasizes closer partnership and mutually beneficial collaboration between human and automation agents. In addition, to maintain high product quality and enable the in-time feedback of process issues for advanced manufacturing, in-process inspection is an efficient strategy that manufacturers adopt. In this regard, this paper outlines a research framework combining smart in-process inspection and human–automation symbiosis, enabling real-time defect identification and process optimization with cognitive intelligence. Smart in-process inspection studies the effective automation of real-time inspection and defect mitigation using data-driven technologies and intelligent agents to foster adaptability in complex production environments. Concurrently, human–automation symbiosis focuses on achieving a symbiotic human–automation relationship through cognitive task allocation and behavioral nudges to enhance human–automation collaboration. It promotes a human-centered manufacturing paradigm by integrating the studies in advanced manufacturing systems, cognitive engineering, and human–automation interaction. This paper examines critical technical challenges, including defect inspection and mitigation, human cognition modeling for adaptive task allocation, and manufacturing nudging design and personalization. A research roadmap detailing the technical solutions to these challenges is proposed.
... Neural network architectures have evolved significantly since the 1980s [17], with early cascade structures resembling modern dense networks. The global adoption of neural networks began with AlexNet [18], followed by VGGNet [19], ResNet [20], and DenseNet [21]. AlexNet introduced the dropout mechanism and was the first to harness GPUs for model training, significantly accelerating the process. ...
Article
Full-text available
In modern knitted garment production, accurate identification of fabric texture is crucial for enabling automation and ensuring consistent quality control. Traditional manual recognition methods not only demand considerable human effort but also suffer from inefficiencies and are prone to subjective errors. Although machine learning-based approaches have made notable advancements, they typically rely on manual feature extraction. This dependency is time-consuming and often limits recognition accuracy. To address these limitations, this paper introduces a novel model, called the Differentiated Leaning Weighted DenseNet (DLW-DenseNet), which builds upon the DenseNet architecture. Specifically, DLW-DenseNet introduces a learnable weight mechanism that utilizes channel attention to enhance the selection of relevant channels. The proposed mechanism reduces information redundancy and expands the feature search space of the model. To maintain the effectiveness of channel selection in the later stages of training, DLW-DenseNet incorportes a differentiated learning strategy. By assigning distinct learning rates to the learnable weights, the model ensures continuous and efficient channel selection throughout the training process, thus facilitating effective model pruning. Furthermore, in response to the absence of publicly available datasets for fabric texture recognition, we construct a new dataset named KF9 (knitted fabric). Compared to the fabric recognition network based on the improved ResNet, the recognition accuracy has increased by five percentage points, achieving a higher recognition rate. Experimental results demonstrate that DLW-DenseNet significantly outperforms other representative methods in terms of recognition accuracy on the KF9 dataset.
Article
Full-text available
As cyber threats evolve and grow more sophisticated, traditional network security approaches are struggling to keep pace with the increasing complexity of modern attacks. This paper investigates how Artificial Intelligence (AI) and Machine Learning (ML) are revolutionizing network security by automating critical processes such as threat detection and response. Traditional security models rely heavily on manual monitoring and predefined rules, which can result in delayed responses and missed threats. AI and ML technologies offer an alternative by enabling real-time analysis of network traffic, the identification of anomalies, and proactive threat mitigation. These systems are capable of learning from historical data, improving their detection capabilities over time, and adapting to new and unknown threats, including zero-day vulnerabilities. This research is based on a comprehensive review of existing literature and case studies from industries where AI and ML have been successfully integrated into security frameworks. The findings illustrate the effectiveness of AI and ML in improving security performance, reducing human error, and enhancing operational efficiency. Organizations that have adopted these technologies report faster response times, more accurate threat detection, and fewer false positives. However, the adoption of AI and ML also presents challenges, including the need for substantial initial investments, technical integration with legacy systems, and the requirement for skilled personnel to manage and optimize these technologies. Despite these challenges, AI and ML are becoming indispensable tools for organizations seeking to bolster their cybersecurity capabilities. As cyberattacks grow increasingly complex, the ability to automate critical security tasks and respond to threats in real time positions AI and ML as essential components of modern network defense strategies. This research underscores their potential to transform network security and highlights their role in protecting against the ever-increasing threat of cyberattacks.
Chapter
This chapter presents the principles of point cloud learning, including the foundations of deep learning and classical neural networks applied to point clouds. The first part covers the basic concepts of deep learning and provides a taxonomy of neural networks, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and graph neural networks (GNNs), among others. The second part focuses on the design of common point cloud learning networks, such as the PointNet series, point cloud transformers, and an efficient algorithm called Point Voxel CNN.
Article
Full-text available
In this tutorial, we present a compact and holistic discussion of Deep Learning with a focus on Convolutional Neural Networks (CNNs) and supervised regression. While there are numerous books and articles on the individual topics we cover, comprehensive and detailed tutorials that address deep learning from a foundational yet rigorous and accessible perspective are rare. Most resources on CNNs are either too advanced, focusing on cutting-edge architectures, or too narrow, addressing only specific applications like image classification. This tutorial not only summarizes the most relevant concepts but also provides an in-depth exploration of each, offering a complete yet agile set of ideas. Moreover, we highlight the powerful synergy between learning theory, statistics, and machine learning, which together underpin the deep learning and CNN frameworks. We aim for this tutorial to serve as an optimal resource for students, professors, and anyone interested in understanding the foundations of deep learning.
Article
Full-text available
Background/Objectives: This paper presents a Residual Neural Network (ResNet) based framework tailored for structured traffic accident data, aiming to improve accident severity prediction. The proposed model leverages residual learning to effectively model intricate relationships between numerical and categorical variables, resulting in a notable increase in prediction accuracy. Methods: A comparative analysis was performed with other Deep Learning (DL) architectures, including Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Darknet, and Extreme Inception (Xception), showing superior performance of the proposed Resnet. Key factors influencing accident severity were identified, with Shapley Additive Explanations (SHAP) values helping to address the need for transparent and explainable Artificial Intelligence (AI) in critical decision-making areas. Results: The generalizability of the ResNet model was assessed by training it, initially, on a UK road accidents dataset and validating it on a distinct dataset from India. The model consistently demonstrated high predictive accuracy, underscoring its robustness across diverse contexts, despite regional differences. Conclusions: These results suggest that the adapted ResNet model could significantly enhance traffic safety evaluations and contribute to the formulation of more effective traffic management strategies.
Article
Full-text available
Today, digital marketing has transformed, as programmatic advertising has become the new normal, allowing advertisers to deliver their campaigns to the right audiences at scale. Yet, in an ever-changing digital landscape, brand safety remains a challenge. Most traditional brand safety mechanisms fail to do the job of context well, leading to either missed opportunities or inappropriate placements that destroy the brand's reputation. In this paper, we explore using machine learning to redefine brand safety in programmatic advertising through the process of content analysis. In this work, we analyze the use of Natural Language Processing (NLP), computer vision, and sentiment analysis to gauge content quality and context across various platforms. Through case studies and real-world use, this paper demonstrates how machine learning might create a more nuanced, adaptable, and effective brand safety framework. The findings also highlight the critical need for AI-powered content analysis to protect brands and build consumer trust in digital advertising.
Article
Full-text available
Innovation is currently driving enhanced performance and productivity across various fields through process automation. However, identifying intricate details in images can often pose challenges due to morphological variations or specific conditions. Here, artificial intelligence (AI) plays a crucial role by simplifying the segmentation of images. This is achieved by training algorithms to detect specific pixels, thereby recognizing details within images. In this study, an algorithm incorporating modules based on Efficient Sub-Pixel Convolutional Neural Network for image super-resolution, U-Net based Neural baseline for image segmentation, and image binarization for masking was developed. The combination of these modules aimed to identify capillary structures at pixel level. The method was applied on different datasets containing images of eye fundus, citrus leaves, printed circuit boards to test how well it could segment the capillary structures. Notably, the trained model exhibited versatility in recognizing capillary structures across various image types. When tested with the Set 5 and Set 14 datasets, a PSNR of 37.92 and SSIM of 0.9219 was achieved, surpassing significantly other image superresolution methods. The enhancement module processes the image using three different varaiables in the same way, which imposes a complexity of O(n) and takes 308,734 ms to execute; the segmentation module evaluates each pixel against its neighbors to correctly segment regions of interes, generating an O(n2)O(n^{2}) quadratic complexity and taking 687,509 ms to execute; the masking module makes several runs through the whole image and in several occasions it calls processes of O(nlogn)O(n\log n) complexity at 581686 microseconds to execute, which makes it not only the most complex but also the most exhaustive part of the program. This versatility, rooted in its pixel-level operation, enables the algorithm to identify initially unnoticed details, enhancing its applicability across diverse image datasets. This innovation holds significant potential for precisely studying certain structures’ characteristics while enhancing and processing images with high fidelity through AI-driven machine learning algorithms.
Article
Full-text available
Cyberspace is emerging as a critical living environment, significantly influencing sustainable human development. Internet public opinion is a crucial aspect of cyberspace governance, serving as the most important form of expressing popular will. However, perceiving public opinion can be challenging due to its complex and elusive nature. In this paper, we propose a novel framework for perceiving popular will, managing public opinion, and influencing people’s behavior, based on machine learning and game theory approaches. Our framework leverages deep learning techniques to analyze public opinion, active learning methods to reduce costs, and game theory to make optimal management decisions. We verify the effectiveness of our framework using empirical data collected from Chinese provinces Y and G, and provide theoretical support by analyzing the interrelationship between public opinion, online public opinion, and people’s behavior. Our framework can be applied inexpensively to studies in other regions, thereby offering valuable insights into cyberspace governance and public opinion management.
Article
Full-text available
Cancer is a significant public health issue due to its high prevalence and lethality, particularly lung and colon cancers, which account for over a quarter of all cancer cases. This study aims to enhance the detection rate of lung and colon cancer by designing an automated diagnosis system. The system focuses on early detection through image pre-processing with a 2D Gaussian filter, while maintaining simplicity to minimize computational requirements and runtime. The study employs three Convolutional Neural Network (CNN) models-MobileNet, VGG16, and ResNet50-to diagnose five types of cancer: Colon Adenocarcinoma, Benign Colonic Tissue, Lung Adenocarcinoma, Benign Lung Tissue, and Lung Squamous Cell Carcinoma. A large dataset comprising 25 000 histopathological images is utilized. Additionally, the research addresses the need for safety levels in the model by using Class Activation Mapping (CAM) for explanatory purposes. Experimental results indicate that the proposed system achieves a high diagnostic accuracy of 99.38% for lung and colon cancers. This high performance underscores the effectiveness of the automated system in detecting these types of cancer. The findings from this study support the potential for early diagnosis of lung and colon cancers, which can facilitate timely therapeutic interventions and improve patient outcomes.
Article
Full-text available
Detection of cancer in human organs at an early stage is a crucial task and is important for the survival of the patients, especially in terms of complex structure, dynamic size, and dynamic length in organs like the pancreas. To deal with this problem, pancreatic semantic segmentation was introduced, but it was hampered by challenges related to image modalities and the availability of limited datasets. This paper provides different deep learning models for pancreatic detection. The proposed model pipeline has two phases: pancreas localization and segmentation. In the first phase, rough regions of the pancreas are detected with YOLOv5, and the detected regions are cropped to avoid an imbalance between the pancreas region and the background. In the second phase, the detected regions are segmented with various models like UNet, VNet, SegResNet and HighResNet for effective detection of cancer regions. The experiments were conducted on a private dataset collected from the Champalimaud Foundation in Portugal. The model’s performance is evaluated in terms of quantitative and qualitative analysis. From experiments, we found that, when compared to other Nets, YOLOv5 is superior in pancreatic area localization and 2.5D HighResNet is superior in segmentation.
Article
Full-text available
Due to constant loads, gear wear, and harsh working conditions, gearboxes are subject to fault occurrences. Faults in the gearbox can cause damage to the engine components, create unnecessary noise, degrade efficiency, and impact power transfer. Hence, the detection of faults at an early stage is highly necessary. In this work, an effort was made to use transfer learning to identify gear failures under five gear conditions—healthy condition, 25% defect, 50% defect, 75% defect, and 100% defect—and three load conditions—no load, T1 = 9.6, and T2 = 13.3 Nm. Vibration signals were collected for various gear and load conditions using an accelerometer mounted on the casing of the gearbox. The load was applied using an eddy current dynamometer on the output shaft of the engine. The obtained vibration signals were processed and stored as vibration radar plots. Residual network (ResNet)-50, GoogLenet, Visual Geometry Group 16 (VGG-16), and AlexNet were the network models used for transfer learning in this study. Hyperparameters, including learning rate, optimizer, train-test split ratio, batch size, and epochs, were varied in order to achieve the highest classification accuracy for each pretrained network. From the results obtained, VGG-16 pretrained network outperformed all other networks with a classification accuracy of 100%.
Article
Full-text available
Long-wave infrared (LWIR) spectral imaging plays a critical role in various applications such as gas monitoring, mineral exploration, and fire detection. Recent advancements in computational spectral imaging, powered by advanced algorithms, have enabled the acquisition of high-quality spectral images in real time, such as with the Uncooled Snapshot Infrared Spectrometer (USIRS). However, the USIRS system faces challenges, particularly a low spectral resolution and large amount of data noise, which can degrade the image quality. Deep learning has emerged as a promising solution to these challenges, as it is particularly effective at handling noisy data and has demonstrated significant success in hyperspectral imaging tasks. Nevertheless, the application of deep learning in LWIR imaging is hindered by the severe scarcity of long-wave hyperspectral image data, which limits the training of robust models. Moreover, existing networks that rely on convolutional layers or attention mechanisms struggle to effectively capture both local and global spectral correlations. To address these limitations, we propose the pixel-based Hierarchical Spectral Transformer (HST), a novel deep learning architecture that learns from publicly available single-pixel long-wave infrared spectral databases. The HST is designed to achieve a high spectral resolution for LWIR spectral image reconstruction, enhancing both the local and global contextual understanding of the spectral data. We evaluated the performance of the proposed method on both simulated and real-world LWIR data, demonstrating the robustness and effectiveness of the HST in improving the spectral resolution and mitigating noise, even with limited data.
Article
Full-text available
Neurons encode information in the timing of their spikes in addition to their firing rates. Spike timing is particularly precise in the auditory nerve, where action potentials phase lock to sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models to perform real-world hearing tasks with simulated cochlear input, assessing the precision of auditory nerve spike timing needed to reproduce human behavior. Models with high-fidelity phase locking exhibited more human-like sound localization and speech perception than models without, consistent with an essential role in human hearing. However, the temporal precision needed to reproduce human-like behavior varied across tasks, as did the precision that benefited real-world task performance. These effects suggest that perceptual domains incorporate phase locking to different extents depending on the demands of real-world hearing. The results illustrate how optimizing models for realistic tasks can clarify the role of candidate neural codes in perception.
Conference Paper
Pneumonia is a serious respiratory infection that presents significant diagnostic challenges due to the variability in its symptoms and its overlap with other respiratory diseases. This study investigates the potential of diagnostic uncertainty labels to enhance CAD system's pneumonia classification. Specifically, it explores the feasibility of a ternary classification approach (classifying X-rays as positive, negative, or uncertain), introducing uncertainty as a distinct diagnostic category, aiming to provide a more nuanced and cautious classification of pneumonia. Data processing techniques, including undersampling to balance classes, image resizing, and data augmentation, were applied. Transfer learning with the CheXNet model was then employed in a Monte Carlo cross-validation framework across 16 random data splits. The ROC curves and the areas under the ROC curves for the uncertainty class were analyzed, challenging the notion that uncertainty cannot be effectively characterized. The results indicated a degree of class separation, indicating that the uncertainty carried enough information to be characterized and suggesting the viability of the envisioned ternary model. Additionally, due to the exclusive use of frontal view X-rays and application of undersampling, results are expected to be further improved in future research.
ResearchGate has not been able to resolve any references for this publication.