Preprint
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Surface defect detection is one of the most essential processes for industrial quality inspection. Deep learning-based surface defect detection methods have shown great potential. However, the well-performed models usually require large training data and can only detect defects that appeared in the training stage. When facing incremental few-shot data, defect detection models inevitably suffer from catastrophic forgetting and misclassification problem. To solve these problems, this paper proposes a new knowledge distillation network, called Dual Knowledge Align Network (DKAN). The proposed DKAN method follows a pretraining-finetuning transfer learning paradigm and a knowledge distillation framework is designed for fine-tuning. Specifically, an Incremental RCNN is proposed to achieve decoupled stable feature representation of different categories. Under this framework, a Feature Knowledge Align (FKA) loss is designed between class-agnostic feature maps to deal with catastrophic forgetting problems, and a Logit Knowledge Align (LKA) loss is deployed between logit distributions to tackle misclassification problems. Experiments have been conducted on the incremental Few-shot NEU-DET dataset and results show that DKAN outperforms other methods on various few-shot scenes, up to 6.65% on the mean Average Precision metric, which proves the effectiveness of the proposed method.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Object detection has made enormous progress and has been widely used in many applications. However, it performs poorly when only limited training data is available for novel classes that the model has never seen before. Most existing approaches solve few-shot detection tasks implicitly without directly modeling the detectors for novel classes. In this article, we propose GenDet, a new meta-learning-based framework that can effectively generate object detectors for novel classes from few shots and, thus, conducts few-shot detection tasks explicitly. The detector generator is trained by numerous few-shot detection tasks sampled from base classes each with sufficient samples, and thus, it is expected to generalize well on novel classes. An adaptive pooling module is further introduced to suppress distracting samples and aggregate the detectors generated from multiple shots. Moreover, we propose to train a reference detector for each base class in the conventional way, with which to guide the training of the detector generator. The reference detectors and the detector generator can be trained simultaneously. Finally, the generated detectors of different classes are encouraged to be orthogonal to each other for better generalization. The proposed approach is extensively evaluated on the ImageNet, VOC, and COCO data sets under various few-shot detection settings, and it achieves new state-of-the-art results.
Conference Paper
Full-text available
In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF’s strong performance.
Article
In this paper, we focus on the challenging few-shot class incremental learning (FSCIL) problem, which requires to transfer knowledge from old tasks to new ones and solves catastrophic forgetting. We propose the exemplar relation distillation incremental learning framework to balance the tasks of old-knowledge preserving and new-knowledge adaptation. First, we construct an exemplar relation graph to represent the knowledge learned by the original network and update gradually for new tasks learning. Then an exemplar relation loss function for discovering the relation knowledge between different classes is introduced to learn and transfer the structural information in relation graph. A large number of experiments demonstrate that relation knowledge does exist in the exemplars and our approach outperforms other state-of-the-art class-incremental learning methods on the CIFAR100, miniImageNet, and CUB200 datasets.
Article
Unsupervised anomaly segmentation methods based on knowledge distillation have recently been developed and have shown superior segmentation performance. However, little attention has been paid to the overfitting problem caused by the inconsistency between the capacity of a neural network and the amount of knowledge in this scheme. This study proposes a novel method called informative knowledge distillation (IKD) to address the overfitting problem by distilling informative knowledge and offering a strong supervisory signal. Technically, a novel context similarity loss method is proposed to capture context information from normal data manifolds. In addition, a novel adaptive hard sample mining method is proposed to encourage more attention on hard samples with valuable information. With IKD, informative knowledge can be distilled such that the overfitting problem can be effectively mitigated, and the performance can be further increased. The proposed method achieved better results on several categories of the well-known MVTec AD dataset than state-of-the-art methods in terms of AU-ROC, achieving 97.81% overall in 15 categories. Extensive experiments on ablation have also been conducted to demonstrate the effectiveness of IKD in alleviating the overfitting problem.
Article
Detecting objects and estimating their viewpoints in images are key tasks of 3D scene understanding. Recent approaches have achieved excellent results on very large benchmarks for object detection and viewpoint estimation. However, performances are still lagging behind for novel object categories with few samples. In this paper, we tackle the problems of few-shot object detection and few-shot viewpoint estimation. We demonstrate on both tasks the benefits of guiding the network prediction with class-representative features extracted from data in different modalities: image patches for object detection, and aligned 3D models for viewpoint estimation. Despite its simplicity, our method outperforms state-of-the-art methods by a large margin on a range of datasets, including PASCAL and COCO for few-shot object detection, and Pascal3D+ and ObjectNet3D for few-shot viewpoint estimation. Furthermore, when the 3D model is not available, we introduce a simple category-agnostic viewpoint estimation method by exploiting geometrical similarities and consistent pose labelling across different classes. While it moderately reduces performance, this approach still obtains better results than previous methods in this setting. Last, for the first time, we tackle the combination of both few-shot tasks, on three challenging benchmarks for viewpoint estimation in the wild, ObjectNet3D, Pascal3D+ and Pix3D, showing very promising results.
Article
Most automatic product surface inspection methods in industry are data-hungry and task-specific. It is difficult to collect adequate labeled samples in practice due to factors including expensive data annotation cost, inadequate samples for some categories, and limitations on the initial production stage. In this article, a multiple guidance network (MGNet) is proposed to address these issues. In the network, the feature extraction machine (FEM) produces four feature maps of different functions to enhance the inspection ability of the algorithm. Also, the probability map generation (PMG) module is designed for coarse positioning of objects. Moreover, the structures of the mutual guidance and historical guidance (HG) guarantee that the network can fully utilize the information of the auxiliary dataset. Only one support sample containing the labeled objects is required for reference, and the network can determine whether the same labeled objects exist in the query images and locate them. For a comprehensive evaluation of MGNet, three experiments are carried out using three real-world datasets. Experiment results verify that the proposed method is promising for industrial product surface inspection with one labeled target sample.
Article
In this study, a data-augmentation method is proposed to narrow the significant difference between the distribution of training and test sets when small sample sizes are concerned. Two major obstacles exist in the process of defect detection on sanitary ceramics. The first results from the high cost of sample collection, namely, the difficulty in obtaining a large number of training images required by deep-learning algorithms, which limits the application of existing algorithms in sanitary-ceramic defect detection. Second, due to the limitation of production processes, the collected defect images are often marked, thereby resulting in great differences in distribution compared with the images of test sets, which further affects the performance of detect-detection algorithms. The lack of training data and the differences in distribution between training and test sets lead to the fact that existing deep learning-based algorithms cannot be used directly in the defect detection of sanitary ceramics. The method proposed in this study, which is based on a generative adversarial network and the Gaussian mixture model, can effectively increase the number of training samples and reduce distribution differences between training and test sets, and the features of the generated images can be controlled to a certain extent. By applying this method, the accuracy is improved from approximately 75% to nearly 90% in almost all experiments on different classification networks.
Article
Anomaly localization is valuable for improvement of complex production processing in smart manufacturing system. As the distribution of anomalies is unknowable and labeled data is few, unsupervised methods based on convolutional neural network have been studied for anomaly localization. But therere still problems for real industrial applications, in term of the localization accuracy, computation time and memory storage. This paper proposes a novel framework called as Gaussian Clustering of Pre-trained Feature, including the clustering and inference stage, for anomaly localization in unsupervised way. The GCPF consists of the Pre-trained Deep Feature Extraction, Multiple Independent Multivariate Gaussian Clustering, and Multi-Hierarchical Anomaly Scoring three modules. In the clustering stage, features of normal images are extracted by pre-trained CNN at the PDFE module, then clustered at the MIMGC module. In the inference stage, features of target images are extracted and then scored for anomaly localization at the MHAS module. The GCPF is compared with the state-of-the-art methods on MVTec dataset, achieving ROCAUC of 96.86\% over all 15 categories, and extended to NanoTWICE and DAGM datasets. The GCPF outperforms the comparing methods for unsupervised anomaly localization, and significantly reserves the low computation complexity and online memory storage which are important for real industrial applications.(https://github.com/smiler96/GCPF)
Article
In the integrated circuit (IC) packaging, the surface defect detection of flexible printed circuit boards (FPCBs) is important to control the quality of IC. Although various computer vision (CV)-based object detection frameworks have been widely used in industrial surface defect detection scenarios, FPCB surface defect detection is still challenging due to non-salient defects and the similarities between diverse defects on FPCBs. To solve this problem, a decoupled two-stage object detection framework based on convolutional neural networks (CNNs) is proposed, wherein the localization task and the classification task are decoupled through two specific modules. Specifically, to effectively locate non-salient defects, a multi-hierarchical aggregation (MHA) block is proposed as a location feature (LF) enhancement module in the defect localization task. Meanwhile, to accurately classify similar defects, a locally non-local (LNL) block is presented as a SEF enhancement module in the defect classification task. What is more, an FPCB surface defect detection dataset (FPCB-DET) is built with corresponding defect category and defect location annotations. Evaluated on the FPCB-DET, the proposed framework achieves state-of-the-art (SOTA) accuracy to 94.15% mean average precision (mAP) compared with the existing surface defect detection networks. Soon, source code and dataset will be available at https://github.com/SCUTyzy/decoupled-two-stage-framework .
Article
Few-shot learning, aiming to learn novel concepts from one or a few labeled examples, is an interesting and very challenging problem with many practical advantages. Existing few-shot methods usually utilize data of the same classes to train the feature embedding module and in a row, which is unable to learn adapting to new tasks. Besides, traditional few-shot models fail to take advantage of the valuable relations of the support-query pairs, leading to performance degradation. In this article, we propose a transductive relation-propagation graph neural network (GNN) with a decoupling training strategy (TRPN-D) to explicitly model and propagate such relations across support-query pairs, and empower the few-shot module the ability of transferring past knowledge to new tasks via the decoupling training. Our few-shot module, namely TRPN, treats the relation of each support-query pair as a graph node, named relational node, and resorts to the known relations between support samples, including both intraclass commonality and interclass uniqueness. Through relation propagation, the model could generate the discriminative relation embeddings for support-query pairs. To the best of our knowledge, this is the first work that decouples the training of the embedding network and the few-shot graph module with different tasks, which might offer a new way to solve the few-shot learning problem. Extensive experiments conducted on several benchmark datasets demonstrate that our method can significantly outperform a variety of state-of-the-art few-shot learning methods.
Article
Few-shot learning (FSL) aims to classify novel images based on a few labeled samples with the help of meta-knowledge. Most previous works address this problem based on the hypothesis that the training set and testing set are from the same domain, which is not realistic for some real-world applications. Thus, we extend FSL to domain-agnostic few-shot recognition, where the domain of the testing task is unknown. In domain-agnostic few-shot recognition, the model is optimized on data from one domain and evaluated on tasks from different domains. Previous methods for FSL mostly focus on learning general features or adapting to few-shot tasks effectively. They suffer from inappropriate features or complex adaptation in domain-agnostic few-shot recognition. In this brief, we propose meta-prototypical learning to address this problem. In particular, a meta-encoder is optimized to learn the general features. Different from the traditional prototypical learning, the meta encoder can effectively adapt to few-shot tasks from different domains by the traces of the few labeled examples. Experiments on many datasets demonstrate that meta-prototypical learning performs competitively on traditional few-shot tasks, and on few-shot tasks from different domains, meta-prototypical learning outperforms related methods.
Article
Rail surface defect inspection based on machine vision faces challenges against the complex background with interference and severe data imbalance. To meet these challenges, we regard defect detection as a key-point estimation problem and present the attention neural network for rail surface defect detection via CASIoU-guided center-point estimation (CCEANN). CCEANN contains two crucial components. One is the stacked attention Hourglass backbone via cross-stage fusion of multi-scale features (CSFA-Hourglass), in which the convolutional block attention module with variable receptive fields (VRF-CBAM) is introduced, and a two-stage Hourglass structure balancing the network depth and feature fusion plays a key role. Furthermore, the CASIoU-guided center-point estimation head module (CASIoU-CEHM) integrating the delicate coordinate compensation mechanism regresses detection boxes flexibly to adapt to defects' large-scale variation, in which the proposed CASIoU loss, a loss regressing the consistency of Intersection-over-Union (IoU), central-point distance, area ratio, and scale ratio between the targeted defect and the predicted defect, achieves higher regression accuracy than state-of-the-art IoU-based losses. The experiments demonstrate that CCEANN outperforms competitive deep learning-based methods in four surface defect datasets.
Article
In modern manufacturing, vision-based defect recognition is an essential technology to guarantee product quality, and it plays an important role in industrial intelligence. With the developments of industrial big data, defect images can be captured by ubiquitous sensors. And, how to realize accuracy recognition has become a research hotspot. In the past several years, many vision-based defect recognition methods have been proposed, and some newly-emerged techniques, such as deep learning, have become increasingly popular and have addressed many challenging problems effectively. Hence, a comprehensive review is urgently needed, and it can promote the development and bring some insights in this area. This paper surveys the recent advances in vision-based defect recognition and presents a systematical review from a feature perspective. This review divides the recent methods into designed-feature based methods and learned-feature based methods, and summarizes the advantages, disadvantages and application scenarios. Furthermore, this paper also summarizes the performance metrics for vision-based defect recognition methods. And some challenges and development trends are also discussed.
Article
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
Article
Recently, digital twins (DTs) have become a research hotspot in smart manufacturing, and using DTs to assist defect recognition has also become a development trend. Real-time data collection is one of the advantages of DTs, and it can help the realization of real-time defect recognition. However, DT-driven defect recognition cannot be realized unless some bottlenecks of the recognition models, such as the time efficiency, have been solved. To improve the time efficiency, novel defect class recognition is an essential problem. Most of the existing methods can only recognize the known defect classes, which are available during training. For new incoming classes, known as novel classes, these models must be rebuilt, which is time-consuming and costly. This greatly impedes the realization of DT-driven defect recognition. To overcome this problem, this paper proposes a deep lifelong learning method for novel class recognition. The proposed method uses a two-level deep learning architecture to detect and recognize novel classes, and uses a lifelong learning strategy, weight imprinting, to upgrade the model. With these improvements, the proposed method can handle novel classes timely. The experimental results indicate that the proposed method achieves good results for the novel classes, and it has almost no delay for production. Compared with the rebuilt methods, the time cost is reduced by at least 200 times. This result suggests that the proposed method has good potential in the realization of DT-driven defect recognition.
Article
When a detection model that has been well-trained on a set of classes faces new classes, incremental learning is always necessary to adapt the model to detect the new classes. In most scenarios, it is required to preserve the learned knowledge of the old classes during incremental learning rather than reusing the training data from the old classes. Since the objects in remote sensing images often appear in various sizes, arbitrary directions, and dense distribution, it further makes incremental learning-based object detection more difficult. In this article, a new architecture for incremental object detection is proposed based on feature pyramid and knowledge distillation. Especially, by means of a feature pyramid network (FPN), the objects with various scales are detected in the different layers of the feature pyramid. Motivated by Learning without Forgetting (LwF), a new branch is expended in the last layer of FPN, and knowledge distillation is applied to the outputs of the old branch to maintain the old learning capability for the old classes. Multitask learning is adopted to jointly optimize the losses from two branches. Experiments on two widely used remote sensing data sets show our promising performance compared with state-of-the-art incremental object detection methods.
Article
Vision-based defect classification is an important technology to control the quality of product in manufacturing system. As it is very hard to obtain enough labeled samples for model training in the real-world production, the semi-supervised learning which learns from both labeled and unlabeled samples is more suitable for this task. However, the intra-class variations and the inter-class similarities of surface defect, named as the poor class separation, may cause the semi-supervised methods to perform poorly with small labeled samples. While graph-based methods, such as graph convolution network (GCN), can solve the problem well. Therefore, this paper proposes a new graph-based semi-supervised method, named as multiple micrographs graph convolutional network (MMGCN), for surface defect classification. Firstly, MMGCN performs graph convolution by constructing multiple micrographs instead of a large graph, and labels unlabeled samples by propagating label information from labeled samples to unlabeled samples in the micrographs to obtain multiple labels. Weighting the labels can obtain the final label, which can solve the limitations of computation complexity and practicality of original GCN. Secondly, MMGCN divides unlabeled dataset into multiple batches and sets an accuracy threshold. When the model accuracy reaches the threshold, the unlabeled datasets are labeled in batches. A famous case has been used to evaluate the performance of the proposed method. The experimental results demonstrate that the proposed MMGCN can achieve better computation complexity and practicality than GCN. And for accuracy, MMGCN can also obtain the best performance and the best class separation in the comparison with other semi-supervised surface defect classification methods.
Chapter
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances, and is useful when manual annotation is time-consuming or data acquisition is limited. Unlike previous attempts that exploit few-shot classification techniques to facilitate FSOD, this work highlights the necessity of handling the problem of scale variations, which is challenging due to the unique sample distribution. To this end, we propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD. It generates multi-scale positive samples as object pyramids and refines the prediction at various scales. We demonstrate its advantage by integrating it as an auxiliary branch to the popular architecture of Faster R-CNN with FPN, delivering a strong FSOD solution. Several experiments are conducted on PASCAL VOC and MS COCO, and the proposed approach achieves state of the art results and significantly outperforms other counterparts, which shows its effectiveness. Code is available at https://github.com/jiaxi-wu/MPSR.
Article
Few-shot learning aims to learn a well-performing model from a few labeled examples. Recently, quite a few works propose to learn a predictor to directly generate model parameter weights with episodic training strategy of meta-learning and achieve fairly promising performance. However, the predictor in these works is task-agnostic, which means that the predictor cannot adjust to novel tasks in the testing phase. In this article, we propose a novel meta-learning method to learn how to learn task-adaptive classifier-predictor to generate classifier weights for few-shot classification. Specifically, a meta classifier-predictor module, (MPM) is introduced to learn how to adaptively update a task-agnostic classifier-predictor to a task-specialized one on a novel task with a newly proposed center-uniqueness loss function. Compared with previous works, our task-adaptive classifier-predictor can better capture characteristics of each category in a novel task and thus generate a more accurate and effective classifier. Our method is evaluated on two commonly used benchmarks for few-shot classification, i.e., miniImageNet and tieredImageNet. Ablation study verifies the necessity of learning task-adaptive classifier-predictor and the effectiveness of our newly proposed center-uniqueness loss. Moreover, our method achieves the state-of-the-art performance on both benchmarks, thus demonstrating its superiority.
Article
In vision-based defect recognition, deep learning (DL) is a research hotspot. However, DL is sensitive to image quality, and it is hard to collect enough high-quality defect images. The low-quality images usually lose some useful information and may mislead the DL methods into poor results. To overcome this problem, this paper proposes a generative adversarial network (GAN)-based DL method for low-quality defect image recognition. A GAN is used to reconstruct the low-quality defect images, and a VGG16 network is built to recognize the reconstructed images. The experimental results under low-quality defect images show that the proposed method achieves very good performances, which has accuracies of 95.53%-99.62% with different masks and noises, and they are improved greatly compared with the other methods. Furthermore, the results on PSNR, SSIM, cosine and mutual information indicate that the quality of the reconstructed image is improved greatly, which is very helpful for defect analysis.
Article
Surface defect detection is a critical task in industrial production process. Nowadays, there are lots of detection methods based on computer vision and have been successfully applied in industry, they also achieved good results. However, achieving full automation of surface defect detection remains a challenge, due to the complexity of surface defect, in intra-class, while the defects between inter-class contain similar parts, there are large differences in appearance of the defects. To address these issues, this paper proposes a pyramid feature fusion and global context attention network for pixel-wise detection of surface defect, called PGA-Net. In the framework, the multi-scale features are extracted at first from backbone network. Then the pyramid feature fusion module is used to fuse these features into five resolutions through some efficient dense skip connections. Finally, the global context attention module is applied to the fusion feature maps of adjacent resolution, which allows effective information propagate from low-resolution fusion feature maps to high-resolution fusion ones. In addition, the boundary refinement block is added to the framework to refine the boundary of defect and improve the result of predict. The final prediction is the fusion of the five resolutions fusion feature maps. The results of evaluation on four real-world defect datasets demonstrate that the proposed method outperforms the state-of-the-art methods on mean Intersection of Union and mean Pixel Accuracy (NEU-Seg: 82.15%, DAGM 2007: 74.78%, MT_defect: 71.31%, Road_defect: 79.54%).
Chapter
We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.1% AP on MS COCO, outperforming all existing one-stage detectors.
Article
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.
Conference Paper
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
In this paper, a novel fabric detect detection scheme based on HOG and SVM is proposed. Firstly, each block-based feature of the image is encoded using the histograms of orientated gradients (HOG), which are insensitive to various lightings and noises. Then, a powerful feature selection algorithm, AdaBoost, is performed to automatically select a small set of discriminative HOG features in order to achieve robust detection results. In the end, support vector machine (SVM) is used to classify the fabric defects. Experimental results demonstrate the efficiency of our proposed algorithm. Index Terms—Fabric defect; HOG; AdaBoost; SVM;
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Frustratingly Simple Few-Shot Object Detection
  • X Wang
  • T Huang
  • J Gonzalez
  • T Darrell
  • F Yu
X. Wang, T. Huang, J. Gonzalez, T. Darrell, and F. Yu, "Frustratingly Simple Few-Shot Object Detection," in Proc. Int. Conf. Mach. Learn., Nov. 2020, pp. 9919-9928.
Yolov3: An incremental improvement
  • J Redmon
  • A Farhadi
J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
MMDetection: Open mmlab detection toolbox and benchmark
  • K Chen
K. Chen et al., "MMDetection: Open mmlab detection toolbox and benchmark," arXiv preprint arXiv:1906.07155, 2019.