Figure - uploaded by Max Ferguson
Content may be subject to copyright.
Fig. A1. The standard VGG-16 network architecture as proposed in [32]. Note that only layers “conv1” to “fc7” are used in the feature extractor.
Source publication
Automatic localization of defects in metal castings is a challenging task, owing to the rare occurrence and variation in appearance of defects. Convolutional neural networks (CNN) have recently shown outstanding performance in both image classification and localization tasks. We examine how several different CNN architectures can be used to localiz...
Citations
... VGG-16 architecture as introduced in28 . ...
Ensuring data privacy in medical image classification is a critical challenge in healthcare, especially with the increasing reliance on AI-driven diagnostics. In fact, over 30% of healthcare organizations globally have experienced a data breach in the last year, highlighting the need for secure solutions. This study investigates the integration of transfer learning and federated learning for privacy-preserving medical image classification using GoogLeNet and VGG16 as baseline models to evaluate the generalizability of the proposed framework. Pre-trained on ImageNet and fine-tuned on three specialized medical datasets for TB chest X-rays, brain tumor MRI scans, and diabetic retinopathy images, these models achieved high classification accuracy across various aggregation methods. Additionally, the proposed dynamic aggregation method was further analyzed using modern architectures, EfficientNetV2 and ResNet-RS, to assess the scalability and robustness of the model. A key contribution is the introduction of a novel adaptive aggregation method, which dynamically alternates between Federated Averaging (FedAvg) and Federated Stochastic Gradient Descent (FedSGD), based on data divergence during communication rounds. This approach optimizes model convergence while preserving privacy in collaborative settings. The results demonstrate that transfer learning, when combined with federated learning, offers a scalable, robust, and secure solution for real-world medical diagnostics, enabling healthcare institutions to train highly accurate models without compromising sensitive patient data.
... Fig. 4 is the architecture of the VGG-16 model. Fig. 4 Architecture of the VGG-16 model [10] Through the extension of the traditional CNN model (AlexNet), it can be understood that in addition to the design of relatively complex model structures (such as GoogLeNet), Furthermore, with the aim of improving predictive accuracy detected in this model, depth is crucial. Compared with the previous AlexNet, VGG has deeper depth and more parameters. ...
Many convolutional neural networks have emerged over the years, varying in accuracy, speed, and architecture. From LeNet-5 to GoogleNet, the CNN model developed rapidly, and the architecture of the model became more complex. These models are known for their accuracy on ImageNet. Hence, the theme of this study is to explore and study classical CNN models, because these models are similar in basic structure, but they modify the model in different ways. In addition, this paper selects four classical CNN models for learning and implementation, uses FashionMNIST datasets on four models, and records the performance differences of image classification of specific models, with the intention of studying the comparison of results among different models. The four models are LeNet-5, AlexNet, VGG-16, and GoogleNet. The training was carried out on the same equipment and under the same conditions. The study finds that GoogleNet achieves the best prediction accuracy. LeNet-5 spends the least amount of time on training and forecasting. The GoogleNet and AlexNet models can be considered for practical applications, while the VGG-16 and LeNet-6 are either inefficient due to the long training time of the models, or the accuracy cannot.
... Studies have demonstrated a correlation between the network generalization ability and the dataset size [101,102]. At the start of the training process, the model parameters (weights) are initialized based on statistical distributions [103,104]. ...
... Fine-tuning the transferred model on the actual defect dataset can improve the accuracy of defect detection. Max Ferguson et al. [101] applied this technique by transferring the parameters learned from the ImageNet and COCO datasets to defect detection in X-ray images of castings, alleviating the problem of overfitting caused by the small dataset of casting images. Fernando et al. [123] transferred the knowledge learned from the ImageNet dataset to detect defects in welded joints of oil pipes. ...
The evaluation of radiographic indications in welds plays a critical role in the quality assurance of the manufacturing process for metal products. The traditional visual approach for the evaluation of defects is inefficient and inconsistent. Various techniques for automated defect recognition of indications in weld radio-graphs have been proposed in the last three decades. In recent years, notable progresses have been made with the development of deep learning-based techniques. However, to date, the literature still lacks a comprehensive review of automated defect recognition in radiographic images. Therefore, this paper reviews the automated defect recognition in X-ray weld inspection, including traditional and deep-learning-based techniques. The review of traditional techniques is outlined from the perspective of image pre-processing, feature extraction, and defect analysis and evaluation. Deep-learning-based methods are reviewed from the perspective of datasets and networks structures, discussing the techniques employed to solve the small datasets problem, segmentation and classification of defects in welds. Finally, potential advancements in automated weld inspection techniques are drawn.
... Recently, deep learning techniques have been widely used across various fields [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. In the field of measurement and inspection, pattern recognition and classification of images acquired through cameras are representative applications [11][12][13][14][15][16]. ...
... Recently, deep learning techniques have been widely used across various fields [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. In the field of measurement and inspection, pattern recognition and classification of images acquired through cameras are representative applications [11][12][13][14][15][16]. For training algorithms in deep learning, it is essential to secure a large number of images with exactly known true labels. ...
A deep learning algorithm for thin film thickness analysis based on spectral reflectometry, using a dataset that reflects experimental conditions, has been proposed and implemented. This study extends our previous research, in which we designed an artificial neural network (ANN) algorithm using theoretical reflectance spectrum datasets and quantitatively evaluated it according to the international standard traceability system. The evaluation results indicated that one of the major sources of uncertainty was the offset between the outputs of the ANN algorithm and the certified values of certified reference materials (CRMs). In this study, we focused on how much the uncertainty factor related to the offset is affected by using a dataset that reflects experimental conditions instead of theoretical reflectance spectrum datasets. By applying the fluctuations in reflectance obtained from experiments to the theoretical reflectance spectrum, we created a dataset to train the ANN algorithm under the same conditions as in our previous studies for comparison. As a result, the major uncertainty factor related to the offset improved by about 30%. This study demonstrates the importance of having datasets that accurately reflect real-world conditions for training ANN algorithms.
... 64 filters exist in the first block and this number is doubled in the later blocks until it reaches 512. This model is ended by two fully connected hidden layers and one output layer.The two fully connected layers have the same neuron numbers which are 4096(Ferguson et al., 2017;Simonyan & Zisserman, 2014). In this work, the number of neurons in the output layer equals to six corresponding to the number of cell subtypes.A DenseNet represents an alternative type of convolutional neural network that incorporates dense blocks, where every layer establishes connections with all other layers possessing matching feature-map sizes. ...
Acute lymphoblastic leukemia (ALL) is a life-threatening disease that commonly affects children and is classified into three subtypes: L1, L2, and L3. Traditionally, ALL is diagnosed through morphological analysis, involving the examination of blood and bone marrow smears by pathologists. However, this manual process is time-consuming, laborious, and prone to errors. Moreover, the significant morphological similarity between ALL and various lymphocyte subtypes, such as normal, atypic, and reactive lymphocytes, further complicates the feature extraction and detection process. The aim of this study is to develop an accurate and efficient automatic system to distinguish ALL cells from these similar lymphocyte subtypes without the need for direct feature extraction. First, the contrast of microscopic images is enhanced using histogram equalization, which improves the visibility of important features. Next, a fuzzy C-means clustering algorithm is employed to segment cell nuclei, as they play a crucial role in ALL diagnosis. Finally, a novel convolutional neural network (CNN) with three convolutional layers is utilized to classify the segmented nuclei into six distinct classes. The CNN is trained on a labeled dataset, allowing it to learn the distinguishing features of each class. To evaluate the performance of the proposed model, quantitative metrics are employed, and a comparison is made with three well-known deep networks: VGG-16, DenseNet, and Xception. The results demonstrate that the proposed model outperforms these networks, achieving an approximate accuracy of 97%. Moreover, the model's performance surpasses that of other studies focused on 6-class classification in the context of ALL diagnosis. Research Highlights • Deep neural networks eliminate the requirement for feature extraction in ALL classification • The proposed convolutional neural network achieves an impressive accuracy of approximately 97% in classifying six ALL and lymphocyte subtypes. K E Y W O R D S acute lymphoblastic leukemia, convolutional neural network, deep neural network, microscopic image analysis
... Given that the reshaped soil spectra data as a 2-D image retains important positional information from the rearrangement of the spectra that are sensitive to changes in soil properties, the Swin transformer is particularly suitable. The Swin transformer model architecture consists of patch extraction and patch embedding layers to extract patches from the input image preparing them to use as input into (Ferguson et al., 2017) Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Purpose
Soil texture identification is vital for various agricultural and engineering applications but generally involves rigorous laboratory work, especially for estimating USCS (Unified Soil Classification System) soil texture classes. Soil texture influences soil water storage capacity, soil fertility, compaction characteristics, and soil strength. Soil spectroscopy offers a reliable approach that is non-destructive, rapid, and cost-effective to estimate several soil properties including texture. For engineering applications, the USCS soil texture classes are preferred, but very few studies have focussed on estimating USCS soil texture using soil spectroscopy or remote sensing data in general.
Methods
Two large soil spectral libraries (SSLs), viz., Kellog Soil Spectral Library (KSSL) and Open-source Soil Spectral Library (OSSL), as well as three deep learning algorithms (VGG-16, ResNet-16, and Swin transformers), were used in this study to predict six USCS soil texture classes and three USCS soil texture groups. The USCS soil texture classes and groups were derived by grouping clay, sand, and silt fractions that are closely associated with the corresponding USCS soil texture classes.
Results
The results indicate that the Swin transformer model performed the best with an accuracy of 67% for six USCS soil texture class predictions and 81% for three USCS soil texture group predictions. Cohen’s kappa value implies a moderate agreement (0.55) for soil texture class predictions and a substantial agreement (0.64) for soil texture group predictions.
Conclusion
The proposed methodology offers a novel approach for USCS soil texture class predictions utilizing SSLs and deep learning techniques.
... The architecture of VGG16 from [50]. accomplishments is its outstanding performance. ...
This research investigates advanced approaches in medical image analysis, specifically focusing on segmentation and classification techniques, as well as their integration into multi‐task architectures for lung infections. This research begins by explaining key architectural models used in segmentation and classification tasks. The study extends to the enhancement of these architectures through attention modules and conditional random fields. Relevant datasets and evaluation metrics, incorporating discussions on loss functions are also reviewed. This review encompasses recent advancements in single‐task and multi‐task models, highlighting innovations in semi‐supervised, self‐supervised, few‐shot, and zero‐shot learning techniques. Empirical analysis is conducted on both single‐task and multi‐task architectures, predominantly utilizing the U‐Net framework, and is applied across multiple datasets for segmentation and classification tasks. Results demonstrate the effectiveness of these models and provide insights into the strengths and limitations of different approaches. This research contributes to improved detection and diagnosis of lung infections by offering a comprehensive overview of current methodologies and their practical applications.
... Through the utilization of creative AI models and algorithms [95][96][97] , analyzing extensive datasets obtained from the virtual or practical casting process can yield valuable insights that can be used in aspects of the intelligent design of casting system [98][99][100] , the optimization of process parameters, the reliable prediction or improvement of casting quality [101][102][103] , and the predictive maintenance during the life cycle of castings. In a narrow sense, intelligent decision-making in the casting domain is the interdiscipline of turning multi-source information into better controllable actions at any scale. ...
Emerging technological advances are reshaping the casting sector in latest decades. Casting technology is evolving towards intelligent casting paradigm that involves automation, greenization and intelligentization, which attracts more and more attention from the academic and industry communities. In this paper, the main features of casting technology were briefly summarized and forecasted, and the recent developments of key technologies and the innovative efforts made in promoting intelligent casting process were discussed. Moreover, the technical visions of intelligent casting process were also put forward. The key technologies for intelligent casting process comprise 3D printing technologies, intelligent mold technologies and intelligent process control technologies. In future, the intelligent mold that derived from mold with sensors, control devices and actuators will probably incorporate the Internet of Things, online inspection, embedded simulation, decision-making and control system, and other technologies to form intelligent cyber-physical casting system, which may pave the way to realize intelligent casting. It is promising that the intelligent casting process will eventually achieve the goal of real-time process optimization and full-scale control, with the defects, microstructure, performance, and service life of the fabricated castings can be accurately predicted and tailored.
... A arquitetura VGG apresenta uma camada totalmente conectada no final da sua estrutura. No entanto, ao lidar com problemas de detecção de objetos, as camadas totalmente conectadas são removidas da arquitetura, utilizando a VGG16 exclusivamente para a 2 Figura 6: Arquitetura padrão da rede VGG 16 [22] extração de características. Essa rede de extração de características é conhecida como rede base [23]. ...
This work presents a comparative analysis using the mAP metricof the YOLOv7 and SSD algorithms applied to the field of computervision. The comparison is performed through the detection of potholeson highways and roads. In works defended by other authors,both techniques are considered for applications in computer vision.In this study, both models achieved promising results in potholedetection; however, it is observed that the YOLOv7 model reachedan mAP metric of 80%, while the SSD model obtained an mAP of73%. Thus, the results indicate that YOLOv7 is more efficient andaccurate in pothole detection in this specific context.
... Recent advancements in deep-learning-based approaches have emerged as the leading solution for a variety of tasks across multiple domains [2]. Specifically in the context of manufacturing defect detection, these techniques are now considered state of the art, significantly outperforming traditional methods [3][4][5]. However, their performance is conditioned by a significant condition: their data-hungry nature. ...
... The emergence of deep-learning-based approaches has improved the accuracy in manufacturing quality control [3][4][5][13][14][15]. Their complex structures are able to retain and automatically learn the information contained in the image, more effectively facilitating the image processing compared to previous techniques. ...
... Their complex structures are able to retain and automatically learn the information contained in the image, more effectively facilitating the image processing compared to previous techniques. Deep-learning-based models are built in an end-to-end manner so handcrafting processes are not required to extract discriminant features [3,16]. In fact, the feature extraction process is carried out automatically from raw images, followed by a classifier head that learns the boundary between defective and non-defective features. ...
In industrial quality control, especially in the field of manufacturing defect detection, deep learning plays an increasingly critical role. However, the efficacy of these advanced models is often hindered by their need for large-scale, annotated datasets. Moreover, these datasets are mainly based on RGB images, which are very different from X-ray images. Addressing this limitation, our research proposes a methodology that incorporates domain-specific self-supervised pretraining techniques using X-ray imaging to improve defect detection capabilities in manufacturing products. We employ two pretraining approaches, SimSiam and SimMIM, to refine feature extraction from manufacturing images. The pretraining stage is carried out using an industrial dataset of 27,901 unlabeled X-ray images from a manufacturing production line. We analyze the performance of the pretraining against transfer-learning-based methods in a complex defect detection scenario using a Faster R-CNN model. We conduct evaluations on both a proprietary industrial dataset and the publicly available GDXray dataset. The findings reveal that models pretrained with domain-specific X-ray images consistently outperform those initialized with ImageNet weights. Notably, Swin Transformer models show superior results in scenarios rich in labeled data, whereas CNN backbones are more effective in limited-data environments. Moreover, we underscore the enhanced ability of the models pretrained with X-ray images in detecting critical defects, crucial for ensuring safety in industrial settings. Our study offers substantial evidence of the benefits of self-supervised learning in manufacturing defect detection, providing a solid foundation for further research and practical applications in industrial quality control.