Preprint

HyperDefect-YOLO: Enhance YOLO with HyperGraph Computation for Industrial Defect Detection

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

In the manufacturing industry, defect detection is an essential but challenging task aiming to detect defects generated in the process of production. Though traditional YOLO models presents a good performance in defect detection, they still have limitations in capturing high-order feature interrelationships, which hurdles defect detection in the complex scenarios and across the scales. To this end, we introduce hypergraph computation into YOLO framework, dubbed HyperDefect-YOLO (HD-YOLO), to improve representative ability and semantic exploitation. HD-YOLO consists of Defect Aware Module (DAM) and Mixed Graph Network (MGNet) in the backbone, which specialize for perception and extraction of defect features. To effectively aggregate multi-scale features, we propose HyperGraph Aggregation Network (HGANet) which combines hypergraph and attention mechanism to aggregate multi-scale features. Cross-Scale Fusion (CSF) is proposed to adaptively fuse and handle features instead of simple concatenation and convolution. Finally, we propose Semantic Aware Module (SAM) in the neck to enhance semantic exploitation for accurately localizing defects with different sizes in the disturbed background. HD-YOLO undergoes rigorous evaluation on public HRIPCB and NEU-DET datasets with significant improvements compared to state-of-the-art methods. We also evaluate HD-YOLO on self-built MINILED dataset collected in real industrial scenarios to demonstrate the effectiveness of the proposed method. The source codes are at https://github.com/Jay-zzcoder/HD-YOLO.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Integrating gene expression across tissues and cell types is crucial for understanding the coordinated biological mechanisms that drive disease and characterize homoeostasis. However, traditional multi-tissue integration methods either cannot handle uncollected tissues or rely on genotype information, which is often unavailable and subject to privacy concerns. Here we present HYFA (hypergraph factorization), a parameter-efficient graph representation learning approach for joint imputation of multi-tissue and cell-type gene expression. HYFA is genotype agnostic, supports a variable number of collected tissues per individual, and imposes strong inductive biases to leverage the shared regulatory architecture of tissues and genes. In performance comparison on Genotype–Tissue Expression project data, HYFA achieves superior performance over existing methods, especially when multiple reference tissues are available. The HYFA-imputed dataset can be used to identify replicable regulatory genetic variations (expression quantitative trait loci), with substantial gains over the original incomplete dataset. HYFA can accelerate the effective and scalable integration of tissue and cell-type transcriptome biorepositories.
Article
Full-text available
To address the problems of low network accuracy, slow speed, and a large number of model parameters in printed circuit board (PCB) defect detection, an improved detection algorithm of PCB surface defects based on YOLOv5 is proposed, named PCB-YOLO, in this paper. Based on the K-means++ algorithm, more suitable anchors for the dataset are obtained, and a small target detection layer is added to make the PCB-YOLO pay attention to more small target information. Swin transformer is embedded into the backbone network, and a united attention mechanism is constructed to reduce the interference between the background and defects in the image, and the analysis ability of the network is improved. Model volume compression is achieved by introducing depth-wise separable convolution. The EIoU loss function is used to optimize the regression process of the prediction frame and detection frame, which enhances the localization ability of small targets. The experimental results show that PCB-YOLO achieves a satisfactory balance between performance and consumption, reaching 95.97% mAP at 92.5 FPS, which is more accurate and faster than many other algorithms for real-time and high-precision detection of product surface defects.
Article
Full-text available
Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this article, we propose complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding-box regression and nonmaximum suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, that is: 1) overlap area; 2) normalized central-point distance; and 3) aspect ratio, which are crucial for measuring bounding-box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted n\ell _{n} -norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires fewer iterations. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT and BlendMask-RT), and object detection (e.g., YOLO v3, SSD, and Faster R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves performance gains as +1.7 AP and +6.2 AR 100 for object detection, and +1.1 AP and +3.5 AR 100 for instance segmentation, with 27.1 FPS on one NVIDIA GTX 1080Ti GPU. All the source code and trained models are available at https://github.com/Zzh-tju/CIoU .
Article
Full-text available
To cope with the difficulties in inspection and classification of defects in printed circuit board (PCB), many methods have been proposed in previous work. However, few of them publish their datasets before, which hinders the introduction and comparison of new methods. In this study, HRIPCB, a synthesised PCB dataset that contains 1386 images with 6 kinds of defects is proposed for the use of detection, classification and registration tasks. Besides, a reference‐based method is adopted to inspect and an end‐to‐end convolutional neural network is trained to classify the defects, which are collectively referred to as the RBCNN approach. Unlike conventional approaches that require pixel‐by‐pixel processing, the RBCNN method proposed in this study firstly locates the defects and then classifies them by deep neural networks, which shows superior performance on the dataset.
Article
Full-text available
Location-Based Social Networks (LBSNs) have been widely used as a primary data source for studying the impact of mobility and social relationships on each other. Traditional approaches manually define features to characterize users’ mobility homophily and social proximity, and show that mobility and social features can help friendship and location prediction tasks, respectively. However, these hand-crafted features not only require tedious human efforts, but also are difficult to generalize. Against this background, we propose in this paper LBSN2Vec++, a heterogeneous hypergraph embedding approach designed specifically for LBSN data for automatic feature learning. Specifically, LBSN data intrinsically forms a heterogeneous hypergraph including both user-user homogeneous edges (friendships) and user-time-POI-semantic heterogeneous hyperedges (check-ins). Based on this hypergraph, we first propose a random-walk-with-stay scheme to jointly sample user check-ins and social relationships, and then learn node embeddings from the sampled (hyper)edges by not only preserving the n -wise node proximity captured by the hyperedges, but also considering embedding space transformation between node domains to fully grasp the complex structural characteristics of the LBSN heterogeneous hypergraph. Using real-world LBSN datasets collected in six cities all over the world, our extensive evaluation shows that LBSN2Vec++ significantly and consistently outperforms both state-of-the-art graph embedding techniques by up to 68 percent and the best-performing hand-crafted features in the literature by up to 70.14 percent on friendship and location prediction tasks.
Article
We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation. This enables the model to acquire both semantic and structural information, advancing beyond conventional feature-focused learning. Hyper-YOLO incorporates the proposed Mixed Aggregation Network (MANet) in its backbone for enhanced feature extraction and introduces the Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net) in its neck. HyperC2Net operates across five scales and breaks free from traditional grid structures, allowing for sophisticated high-order interactions across levels and positions. This synergy of components positions Hyper-YOLO as a state-of-the-art architecture in various scale models, as evidenced by its superior performance on the COCO dataset. Specifically, Hyper-YOLO-N significantly outperforms the advanced YOLOv8-N and YOLOv9-T with 12% APval\rm{AP}^{val} and 9% APval\rm{AP}^{val} improvements. The source codes are at https://github.com/iMoonLab/Hyper-YOLO .
Article
Coupling the classification and localization head compromises the defect detection performance due to feature conflict and spatial misalignment between these two sub-tasks. In this paper, we propose a Task-Aware Attention Network for weak surface defect detection, referred to as TAANet. Specifically, we devise a Feature-Aware Decoupled Head (FADH) to learn task-specific features for defect classification and box regression, respectively. FADH is composed of two parallel branches, where one aims to encode strong semantic context preferred for classification, and the other is to learn more structural information expected for regression, thus disentangling the two tasks at the feature level. Moreover, an efficient Attention-Guided Feature Pyramid Network (AG-FPN) is developed by introducing attention guidance maps in the unidirectional FPN to enhance the feature representations of weak defects, replacing continuously stacked top-down and bottom-up connections. AG-FPN shortens the information transmission pathway in an efficient fashion and extracts more salient and discriminative features for the prediction layers in FADH. The proposed TAANet yields state-of-the-art 86.3% and 77.6% mAP@IoU=0.5 at 58.7 FPS on the publicly available datasets, NEU-DET and GC10-DET, which effectively improves the defect detection performance over comparable detectors.
Article
In the daily inspection tasks of electric power companies, detecting defects of the equipment on the high-voltage transmission lines is an essential task, especially the power line insulators which can pose a significant impact on the stability and security of power grid operations. Collecting and labeling data for training a well-performing deep-leaning model can be labor-intensive tasks and reducing the amount of training data can help reduce the costs and increase the efficiency. Pre-trained weights may be better initialization points for the object detection optimization process in downstream tasks and small-sized datasets, such as defect detection tasks. To effectively adapt the pre-trained weights of common object datasets for power line insulator detection, this paper proposes AdIn-DETR, a defect detection model with two proposed adapter modules, GSG-adapter with Gaussian saliency guidance for convolutional feature modulation and LFO-Adapter with looking-forward ability to refract the learned relational modeling of the decoder layers. Our proposed method can achieve competitive performance on relatively small-sized datasets to detect bunch-drop defects for composite and glass insulators. This paper experimentally demonstrates that our model can achieve 95.4% and 96.1% AP50 accuracy with R50 and R101 backbone on real-world insulators meanwhile reaching 104 FPS and 71 FPS which are faster than the state-of-the-art YOLO models.
Article
This article proposes a practical and generalizable object detector, termed feature extraction-fusion-prediction network (FEFP-Net) for real-world application scenarios. The existing object detection methods have recently achieved excellent performance, however they still face three major challenges for real-world applications, i.e., feature similarity between classes, object size variability, and inconsistent localization and classification predictions. In order to effectively alleviate the current difficulties, the FEFP-Net with three key components is proposed, and the improved detection accuracy is proved in various applications: 1) Extraction Phase: an adaptive fine-grained feature extraction network is proposed to capture features of interest from coarse to fine details, which effectively avoids misclassification due to feature similarity; 2) Fusion Phase: a bidirectional neighbor connection network is designed to identify objects with different sizes by aggregating multilevel features and 3) Prediction Phase: in order to improve the accuracy of object localization and classification, a task specific prediction network is presented, which sufficiently exploits both the spatial and channel information of features. Compared with the State-of-the-Art methods, we achieved competitive results in the MS-COCO dataset. Further, we demonstrated the performance of FEFP-Net in different application fields, such as medical imaging, industry, agriculture, transportation, and remote sensing. These comprehensive experiments indicate that FEFP-Net has satisfactory accuracy and generalizability as a basic object detector.
Article
The intelligent goal of process manufacturing is to achieve high efficiency and greening of the entire production. Whereas the information system it used is functionally independent, resulting to knowledge gaps between each level. Decision-making still requires lots of knowledge workers making manually. The industrial metaverse is a necessary means to bridge the knowledge gaps by sharing and collaborative decision-making. Considering the safety and stability requirements of the process manufacturing, this article conducts a thorough survey on the process manufacturing intelligence empowered by industrial metaverse. First, it analyzes the current status and challenges of process manufacturing intelligence, and then summarizes the latest developments about key enabling technologies of industrial metaverse, such as interconnection technologies, artificial intelligence, cloud–edge computing, digital twin (DT), immersive interaction, and blockchain technology. On this basis, taking into account the characteristics of process manufacturing, a construction approach and architecture for the process industrial metaverse is proposed: a virtual-real fused industrial metaverse construction method that combines DTs with physical avatar, which can effectively ensure the safety of metaverse’s application in industrial scenarios. Finally, we conducted preliminary exploration and research, to prove the feasibility of proposed method.
Article
In the detection of insulator defects on transmission lines, the detection precision is still not ideal, primarily attributed to the significant variation in target scale and complex image backgrounds. We propose the multiscale channel information (MCI)-global-local attention (GLA), a plug-in designed for YOLO series models, featuring two modules: the MCI extraction module and the GLA based on context information module (GLA-CI). MCI comprehensively extracts and utilizes multiscale feature map information, while GLA-CI captures both global context information and local spatial details, thereby augmenting the learning capability of networks. Experimental results indicate that the MCI-GLA plug-in improves the average precision (AP) of YOLOv4 to YOLOv8 models in detecting insulator breakage defects by 7.3%, 4.6%, 4.5%, 4.0%, and 5.3%, respectively. In particular, YOLOv7+MCI-GLA exhibits superior precision and inference time compared to other methods on self-constructed and public datasets. The code for this article can be found at https://github.com/falian0527/MCI-GLA .
Article
This paper proposes an improved defect detect algorithm named WGSO-YOLO for real-time detection of optical lens surface defects to solve the slow detection speed, low detection accuracy and unbalanced datasets commonly observed in optical lens surface defect detection. The proposed algorithm replaces the standard convolution with the GSConv module in the network feature fusion part, reducing the number of parameters and computational complexity without significant loss of contribution. It incorporates second-order channel attention mechanism to enhance the model’s feature extraction and fusion capabilities. In the prediction phase, the CIoU loss function is replaced with the WIoU loss function to improve the model’s generalization ability. The experiments demonstrate that our model performs exceptionally well on the optical lens surface defect dataset, achieving an FPS of 96 and an mAP@.5 of 0.927.
Article
Defect detection aims to locate and classify defects in images, which is a necessary yet challenging task in industrial product quality monitoring. The current anchor-based detectors have weak generalization performance due to their inability to consider numerous scale priors. Moreover, the basic networks lack the ability to dynamically capture and utilize multiscale feature representations, resulting in low accuracy in industrial defect detection. To counter these challenges, an efficient anchor-free detector with dynamic receptive field assignment (DRFA) and task alignment is proposed. First, a feature pyramid structure with DRFA is innovatively designed to sufficiently extract multiscale feature representation and flexibly adjust the receptive field to detect diverse defects. Second, a task decoupling prediction mechanism is proposed to improve localization and classification prediction capabilities by introducing feature reassembly and task-specific information enhancers. Next, an anchor-free-based deep supervision with task-aligned is presented to encourage both to make accurate and consistent predictions, thereby effectively improving the overall detection performance. Finally, three industrial defect datasets (NEU-DET, PCB, WELD) are employed for experiments. The results show that the proposed method achieves 5.3% higher average AP than other state-of-the-art detectors.
Article
Although deep learning-based surface defect detection approaches have performed remarkably well in recent years, the complicated shapes and large size differences of surface defects still pose enormous challenges for most existing methods. To address these issues, we propose a novel surface defect detection method joining spatial deformable convolution and a dense feature pyramid, named SDDF-Net. First, we construct a spatial deformable convolution-based feature extraction network, which uses a dynamic convolutional kernel with spatial information to increase the feature extraction capability of complicated defects. Second, we build a dense feature pyramid-based feature fusion network that fuses features from different network layers to improve the detection accuracy of multiscale defects. Third, we present a novel hybrid loss that combines complete intersection over union loss and normalized Wasserstein distance loss to enhance the defect recognition and location learning abilities of our method. Finally, we run our method on the NEU-DET, DAGM2007 and DeepPCB datasets to conduct a comprehensive comparison with some state-of-the-art general object detection models and specialized surface defect detection methods. The experimental results show that the proposed SDDF-Net performs competitively in terms of detection accuracy and computational efficiency when compared with existing methods. This indicates that SDDF-Net achieves good results for surface defect detection and is qualified for real-time processing in industry.
Article
The surface defects of printed circuit boards (PCB) generated during the manufacturing process have an adverse effect on product quality, which further directly affects the stability and reliability of equipment performance. However, there are still great challenges in accurately recognizing tiny defects on the surface of PCB under the complex background due to its compact layout. To address the problem, a novel YOLO-HMC network based on improved YOLOv5 framework is proposed in this paper to identify the tiny-size PCB defect more accurately and efficiently with fewer model parameters. Firstly, the backbone part adopts the HorNet for enhancing the feature extraction ability and deepening the information interaction. Secondly, an improved multiple convolutional block attention module (MCBAM) is designed to improve the ability of the model to highlight the defect location from a highly similar PCB substrate background. Thirdly, the content-aware reassembly of features (CARAFE) is used to replace the up-sampling layer for fully aggregating the contextual semantic information of PCB images in a large receptive field. Moreover, aiming at the difference between PCB defect detection and natural detection, the original model detection head is optimized to ensure that YOLOv5 can accurately detect PCB tiny defects. Extensive experiments on PCB defect public datasets have demonstrated a significant advantage compared with several state-of-the-art models, whose mean Average Precision (mAP) can reach 98.6%, verifying the accuracy and applicability of the proposed YOLO-HMC.
Article
Dense distribution and significant size difference of transmission line connecting fittings are difficult to maintain, and long-term exposure to the outdoor environment is vulnerable to adverse environmental effects of rust failure. The common image processing methods and deep learning algorithms are not competent for this kind of dense small target detection task, so the target detection model based on an image processing hierarchical algorithm is proposed in this paper, which uses anchor free and decoupled head design ideas, through ASFF multi-scale information feature fusion strategy and ECA + Varifocal Loss interactive saliency area capture strategy to construct a dense small target detection network suitable for a complex environment. The experimental results show that the comprehensive performance of Deformable YOLOX is superior to 13 current advanced target detection algorithms. Compared with the baseline model, Deformable YOLOX can better understand the multi-scale semantic information of the image and learn the small details that are more difficult to distinguish. Combined with a target detection algorithm, an early warning algorithm for rust grade assessment of connecting fittings is proposed, and an online monitoring system is designed, which has practical engineering application value.
Article
Defect detection is a task to locate and classify the possible defects in an image. However, unlike common object detection tasks, defect detection often needs to deal with images with relatively complex backgrounds, for example, in industrial product quality inspection scenario. The complex background can greatly interfere with the feature of the target objects in the multiscale feature fusion process and therefore puts great challenge on the defect detector. In this work, a channel-space adaptive enhancement feature pyramid network (CA-FPN) is proposed to eliminate this interference from the complex background. By extracting the inner relationship of different scale features, CA-FPN realizes adaptive fusion of multiscale features to enhance the semantic information of the defect while avoiding background interference as much as possible. In particular, CA-FPN is very lightweight. Moreover, considering that defects are often of varying sizes and can be extremely tiny or slender, a flexible anchor-free detector CA-AutoAssign is proposed by combining CA-FPN and an anchor-free detection strategy AutoAssign. Based on the Alibaba Cloud Tianchi Fabric dataset and NEU-DET, CA-AutoAssign is compared with the state-of-the-art (SOTA) detectors. The experimental results show that CA-AutoAssign has the best detection performance with AP50 [mean average precision (mAP) with the intersection over union (IOU) threshold of 50%] reaching 89.1 and 82.7, respectively. Despite the improvement in accuracy, the processing time has barely increased. Furthermore, CA-FPN is applied to other classical detectors, and the experimental results demonstrate the competitiveness and generalization ability of CA-FPN. The code is available at https://github.com/EasonLuht/CA-AutoAssign.git .
Article
Fault detection for key components in the braking system of freight trains is critical for ensuring railway transportation safety. Despite the frequently employed methods based on deep learning, these fault detectors are extremely reliant on hardware resources and complex to implement. In addition, no train fault detectors consider the drop in accuracy induced by scale variation of fault parts. This paper proposes a lightweight anchor-free framework to solve the above problems. Specifically, to reduce the amount of computation and model size, we introduce a lightweight backbone and adopt an anchor-free method for localization and regression. To improve detection accuracy for multi-scale parts, we design a feature pyramid network to generate rectangular layers of different sizes to map parts with similar aspect ratios. Experiments on four fault datasets show that our framework achieves 98.44% accuracy while the model size is only 22.5 MB, outperforming state-of-the-art detectors.
Article
Graph Neural Networks have attracted increasing attention in recent years. However, existing GNN frameworks are deployed based upon simple graphs, which limits their applications in dealing with complex data correlation of multi-modal/multi-type data in practice. A few hypergraph-based methods have recently been proposed to address the problem of multi-modal/multi-type data correlation by directly concatenating the hypergraphs constructed from each single individual modality/type, which is difficult to learn an adaptive weight for each modality/type. In this paper, we extend the original conference version HGNN, and introduce a general high-order multi-modal/multi-type data correlation modeling framework called HGNN +^+ to learn an optimal representation in a single hypergraph based framework. It is achieved by bridging multi-modal/multi-type data and hyperedge with hyperedge groups. Specifically, in our method, hyperedge groups are first constructed to represent latent high-order correlations in each specific modality/type with explicit or implicit graph structures. An adaptive hyperedge group fusion strategy is then used to effectively fuse the correlations from different modalities/types in a unified hypergraph. After that a new hypergraph convolution scheme performed in spatial domain is used to learn a general data representation for various tasks. We have evaluated this framework on several popular datasets and compared it with recent state-of-the-art methods. The comprehensive evaluations indicate that the proposed HGNN +^+ framework can consistently outperform existing methods with a significant margin, especially when modeling implicit data correlations. We also release a toolbox called THU-DeepHypergraph for the proposed framework, which can be used for various of applications, such as data classification, retrieval and recommendation.
Article
Defect detection is to locate and classify the possible defects in an image, which plays a key role in the quality inspection link in the manufacturing process of industrial products. Defects in industrial products are generally very small and extremely uneven in scale, resulting in poor detection results. Therefore, we propose an efficient scale-aware network (ES-Net) to improve the effect of defect detection. By addressing the information loss of tiny targets and the mismatch between the receptive field of detection head and the scale of targets, ES-Net improves the overall defect detection effect, especially for tiny defects. Considering that existing works directly use an integrated feature to enhance features at all levels, it may cause confusion in the direction of network optimization. Therefore, we propose the aggregated feature guidance module (AFGM), which first performs efficient cascading fusion of multi-level features to filter cross-layer conflicts. Then the split and aggregation enhancement (SAE) module is designed to further optimize the integrated feature map, and the result is used to guide the shallow features. Moreover, we also introduce the multi-receptive field fusion (MFF) module to generate multi-receptive field information to supplement the shallow features after dimensionality reduction. The efficient stair pyramid (ESP) is a further improvement of feature pyramid network (FPN)-based network. In particular, we propose the dynamic scale-aware head (DSH) in shallow detection layer, which can adaptively select the best detection receptive field according to different scales of targets, thereby improving the detection performance of tiny targets. Extensive experimental results on Aliyun Tianchi fabric dataset (76.2% mAP), NEU-DET (79.1% mAP), and printed circuit board (PCB) defect dataset of Peking University (97.5% mAP) demonstrate the proposed ES-Net achieves competitive results compared to the state-of-the-art (SOTA) methods. Moreover, the high efficiency of ES-Net makes it more applicable in scenarios with high real-time requirements.
Article
The regular detection of pavement cracks is critical for life and property security. However, existing deep learning-based methods of crack detection face difficulties in terms of data acquisition and defect counting. An automatic intelligent detection and tracking system for pavement cracks is proposed. Our system is formed of a pavement crack generative adversarial network (PCGAN) and a crack detection and tracking network called YOLO-MF. First, PCGAN is used to generate realistic crack images, to address the problem of the small number of available images. Next, YOLO-MF is developed based on an improved YOLO v3 modified by an acceleration algorithm and median flow (MF) algorithm to count the number of cracks. In a counting loop, our improved YOLO v3 detects cracks and the MF algorithm tracks the cracks detected in a video. This improved algorithm achieves the best accuracy of 98.47% and F1 score of 0.958 among other algorithms, and the precision-recall curve was close to the top right. A tiny model was developed and an acceleration algorithm was applied, which improved the detection speed by factors of five and six, respectively. In on-site measurement, three cracks were detected and tracked, and the total count was correct. Finally, the system was embedded in an intelligent device consisting of a calculating module, an automated unmanned aerial vehicle, and other components.
Article
Rail surface defect inspection based on machine vision faces challenges against the complex background with interference and severe data imbalance. To meet these challenges, we regard defect detection as a key-point estimation problem and present the attention neural network for rail surface defect detection via CASIoU-guided center-point estimation (CCEANN). CCEANN contains two crucial components. One is the stacked attention Hourglass backbone via cross-stage fusion of multi-scale features (CSFA-Hourglass), in which the convolutional block attention module with variable receptive fields (VRF-CBAM) is introduced, and a two-stage Hourglass structure balancing the network depth and feature fusion plays a key role. Furthermore, the CASIoU-guided center-point estimation head module (CASIoU-CEHM) integrating the delicate coordinate compensation mechanism regresses detection boxes flexibly to adapt to defects' large-scale variation, in which the proposed CASIoU loss, a loss regressing the consistency of Intersection-over-Union (IoU), central-point distance, area ratio, and scale ratio between the targeted defect and the predicted defect, achieves higher regression accuracy than state-of-the-art IoU-based losses. The experiments demonstrate that CCEANN outperforms competitive deep learning-based methods in four surface defect datasets.
Article
Modern industrial plants generally consist of multiple manufacturing units, and the local correlation within each unit can be used to effectively alleviate the effect of spurious correlation and meticulously reflect the operation status of the process system. Therefore, the local correlation, which is called spatial information here, should also be taken into consideration when developing the monitoring model. In this study, a cascaded monitoring network (MoniNet) method is proposed to develop the monitoring model with concurrent analytics of temporal and spatial information. By implementing convolutional operation to each variable, the temporal information that reveals dynamic correlation of process data and spatial information that reflects local characteristics within individual operation unit can be extracted simultaneously. For each convolutional feature, a submodel is developed and then all the submodels are integrated to generate a final monitoring model. Based on the developed model, the operation status of the newly collected sample can be identified by comparing the calculated statistics with their corresponding control limits. Similar to the convolutional neural network (CNN), the MoniNet can also expand its receptive field and capture deeper information by adding more convolutional layers. Besides, the filter selection and submodel development in MoniNet can be replaced to generalize the proposed network to many existing monitoring strategies. The performance of the proposed method is validated using two real industrial processes. The illustration results show that the proposed method can effectively detect process anomalies by concurrent analytics of temporal and spatial information.
Article
State-of-the-art object detectors usually progressively downsample the input image until it is represented by small feature maps, which loses the spatial information and compromises the representation of small objects. In this article, we propose a context-aware block net (CAB Net) to improve small object detection by building high-resolution and strong semantic feature maps. To internally enhance the representation capacity of feature maps with high spatial resolution, we delicately design the context-aware block (CAB). CAB exploits pyramidal dilated convolutions to incorporate multilevel contextual information without losing the original resolution of feature maps. Then, we assemble CAB to the end of the truncated backbone network (e.g., VGG16) with a relatively small downsampling factor (e.g., 8) and cast off all following layers. CAB Net can capture both basic visual patterns as well as semantical information of small objects, thus improving the performance of small object detection. Experiments conducted on the benchmark Tsinghua-Tencent 100K and the Airport dataset show that CAB Net outperforms other top-performing detectors by a large margin while keeping real-time speed, which demonstrates the effectiveness of CAB Net for small object detection.
Article
Recently, a hypergraph constructed from functional magnetic resonance imaging (fMRI) was utilized to explore brain functional connectivity networks (FCNs) for the classification of neurodegenerative diseases. Each edge of a hypergraph (called hyperedge) can connect any number of brain regions-of-interest (ROIs) instead of only two ROIs, and thus characterizes highorder relations among multiple ROIs that cannot be uncovered by a simple graph in the traditional graph based FCN construction methods. Unlike the existing hypergraph based methods where all hyperedges are assumed to have equal weights and only certain topological features are extracted from the hypergraphs, we propose a hypergraph learning based method for FCN construction in this paper. Specifically, we first generate hyperedges from fMRI time series based on sparse representation, then employ hypergraph learning to adaptively learn hyperedge weights, and finally define a hypergraph similarity matrix to represent the FCN. In our proposed method, weighting hyperedges results in better discriminative FCNs across subjects, and the defined hypergraph similarity matrix can better reveal the overall structure of brain network than using those hypergraph topological features. Moreover, we propose a multi-hypergraph learning based method by integrating multi-paradigm fMRI data, where the hyperedge weights associated with each fMRI paradigm are jointly learned and then a unified hypergraph similarity matrix is computed to represent the FCN. We validate the effectiveness of the proposed method on the Philadelphia Neurodevelopmental Cohort dataset for the classification of individuals’ learning ability from three paradigms of fMRI data. Experimental results demonstrate that our proposed approach outperforms the traditional graph based methods (i.e., Pearson’s correlation and partial correlation with the graphical Lasso) and the existing unweighted hypergraph based methods, which sheds light on how to optimize estimation of FCNs for cognitive and behavioral study.
Article
In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality . While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN.
Article
A complete defect detection task aims to achieve the specific class and precise location of each defect in an image, which makes it still challenging for applying this task in practice. The defect detection is a composite task of classification and location, leading to related methods are often hard to take into account the accuracy of both. And the implementation of defect detection depends on a special detection dataset which contains expensive manual annotations. In this paper, we proposed a novel defect detection system based on deep learning and focused on a practical industrial application: steel plate defect inspection. In order to achieve strong classification-ability, this system employs a baseline convolution neural network (CNN) to generate feature maps at each stage. And then the proposed multilevel-feature fusion network (MFN) combines multiple hierarchical features into one feature, which can include more location details of defects. Based on these multilevel features, a region proposal network (RPN) is adopted to generate regions of interest (ROIs). For each ROI, a detector, consisting of a classifier and a bounding box regressor, produces the final detection results. Finally, we set up a defect detection dataset NEU-DET for training and evaluating our method. On the NEU-DET, our method achieves 74.8/82.3 mAP with baseline networks ResNet34/50 by using 300 proposals. In addition, by using only 50 proposals, our method can detect at 20 fps on a single GPU and reach 92% of the above performance, hence the potential for real-time detection.
Conference Paper
Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Clip-fsac: Boosting clip for few-shot anomaly classification with synthetic anomalies
  • Z Zuo
  • Y Wu
  • B Li
  • J Dong
  • Y Zhou
  • L Zhou
  • Y Qu
  • Z Wu
Z. Zuo, Y. Wu, B. Li, J. Dong, Y. Zhou, L. Zhou, Y. Qu, and Z. Wu, "Clip-fsac: Boosting clip for few-shot anomaly classification with synthetic anomalies," in Int. Joint Conf. Artif. Intell., 2024, pp. 1834-1842.
Yolox: Exceeding yolo series in 2021
  • Z Ge
  • S Liu
  • F Wang
  • Z Li
  • J Sun
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, "Yolox: Exceeding yolo series in 2021," ArXiv Preprint arXiv:2107.08430, 2021.
Yolov10: Real-time end-to-end object detection
  • A Wang
  • H Chen
  • L Liu
  • K Chen
  • Z Lin
  • J Han
  • G Ding
A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, "Yolov10: Real-time end-to-end object detection," ArXiv Preprint arXiv:2405.14458, 2024.