Article

SAFER-STUDENT for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data

Authors:
  • South China Botanical Garden,Chinese Academy of Sciences
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Deep semi-supervised learning (SSL) methods aim to utilize abundant unlabeled data to improve the seen-class classification. However, in the open-world scenario, collected unlabeled data tend to contain unseen-class data, which would degrade the generalization to seen-class classification. Formally, we define the problem as safe deep semi-supervised learning with unseen-class unlabeled data. One intuitive solution is removing these unseen-class instances after detecting them during the SSL process. Nevertheless, the performance of unseen-class identification is limited by the lack of suitable score function, the uncalibrated model, and the small number of labeled data. To this end, we propose a safe SSL method called SAFE R -STUDENT from the teacher-student view. Firstly, to enhance the ability of teacher model to identify seen and unseen classes, we propose a general scoring framework called D iscrepancy with R aw (DR). Secondly, based on unseen-class data mined by teacher model from unlabeled data, we calibrate student model by newly proposed U nseen-class E nergy-bounded C alibration (UEC) loss. Thirdly, based on seen-class data mined by teacher model from unlabeled data, we propose W eighted C onfirmation B ias E limination (WCBE) loss to boost seen-class classification of student model. Extensive studies show that SAFE R -STUDENT remarkably outperforms the state-of-the-art, verifying the effectiveness of our method in the under-explored problem.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Further complicating this scenario is the presence of unseen-class samples within the training data, which can significantly disrupt the learning process, potentially leading to unstable outcomes or severe performance degradation [5, 9,28]. Several similar definitions have emerged to describe this scenario, including safe SSL [9], open-set SSL [22,24,31,45], and the challenge of managing UnLabeled data from Unseen Classes in Semi-Supervised Learning (ULUC-SSL) [14]. In this paper, we prefer to refer to it as the safe classification problem of safe SSL [9], to emphasize that our core goal is to ensure that the model's performance is not compromised, or even degraded, by the presence of unseen-class samples during the training process. ...
... More precisely, as shown in Figure 1, safe SSL is concerned about the semisupervised learning scenarios with class distribution mismatch, where the unlabeled data includes samples from both seen and unseen classes [13,14]. Specifically speaking, seen classes refer to categories that are known and labeled during training, whereas unseen classes denote those not labeled or explicitly identified during this phase. ...
... First, mainstream safe SSL methods adopt the optimization strategy of training a single model, typically using the classifier for seen classes to detect unseen classes [13,14,34]. However, during the utilization of unlabeled data, unseen-class samples may be mislabeled as seen ones and vice versa, causing confusion in distinguishing between the two. ...
Preprint
Semi-supervised learning can significantly boost model performance by leveraging unlabeled data, particularly when labeled data is scarce. However, real-world unlabeled data often contain unseen-class samples, which can hinder the classification of seen classes. To address this issue, mainstream safe SSL methods suggest detecting and discarding unseen-class samples from unlabeled data. Nevertheless, these methods typically employ a single-model strategy to simultaneously tackle both the classification of seen classes and the detection of unseen classes. Our research indicates that such an approach may lead to conflicts during training, resulting in suboptimal model optimization. Inspired by this, we introduce a novel framework named Diverse Teacher-Students (\textbf{DTS}), which uniquely utilizes dual teacher-student models to individually and effectively handle these two tasks. DTS employs a novel uncertainty score to softly separate unseen-class and seen-class data from the unlabeled set, and intelligently creates an additional (K+1)-th class supervisory signal for training. By training both teacher-student models with all unlabeled samples, DTS can enhance the classification of seen classes while simultaneously improving the detection of unseen classes. Comprehensive experiments demonstrate that DTS surpasses baseline methods across a variety of datasets and configurations. Our code and models can be publicly accessible on the link https://github.com/Zhanlo/DTS.
Preprint
Full-text available
Large Language Models (LLMs) have revolutionized various fields, and their applications in biomedicine and healthcare have shown transformative potential. These models, trained on vast text corpora, have shown remarkable proficiency in generating, understanding, and analyzing human language. In the biomedical and healthcare sectors, where vast amounts of unstructured data are generated daily, LLMs are driving transformative change. Despite their potential, integrating LLMs into healthcare and biomedicine presents significant challenges, including data privacy, model bias, and the complexity of incorporating LLMs into existing clinical workflows. Ethical concerns such as patient confidentiality, algorithmic bias, and transparency in LLM-driven decisions are also critical issues that must be addressed. This review explores the current state of LLMs in biomedicine and healthcare, examining their practical applications, benefits, limitations, and ethical challenges. We also discuss the technical hurdles of implementing these models and highlight future research directions, aiming to unlock their full potential to advance both biomedical science and patient care.
Article
Full-text available
The presence of noisy examples in the training set inevitably hampers the performance of out-of-distribution (OOD) detection. In this paper, we investigate a previously overlooked problem called OOD detection under asymmetric open-set noise, which is frequently encountered and significantly reduces the identifiability of OOD examples. We analyze the generating process of asymmetric open-set noise and observe the influential role of the confounding variable, entangling many open-set noisy examples with partial in-distribution (ID) examples referred to as hard-ID examples due to spurious-related characteristics. To address the issue of the confounding variable, we propose a novel method called Adversarial Confounder REmoving (ACRE) that utilizes progressive optimization with adversarial learning to curate three collections of potential examples (easy-ID, hard-ID, and open-set noisy) while simultaneously developing invariant representations and reducing spurious-related representations. Specifically, by obtaining easy-ID examples with minimal confounding effect, we learn invariant representations from ID examples that aid in identifying hard-ID and open-set noisy examples based on their similarity to the easy-ID set. By triplet adversarial learning, we achieve the joint minimization and maximization of distribution discrepancies across the three collections, enabling the dual elimination of the confounding variable. We also leverage potential open-set noisy examples to optimize a K+1-class classifier, further removing the confounding variable and inducing a tailored K+1-Guided scoring function. Theoretical analysis establishes the feasibility of ACRE, and extensive experiments demonstrate its effectiveness and generalization. Code is available at https://github.com/Anonymous-re-ssl/ACRE0.
Article
Semi-supervised learning (SSL) aims to reduce the heavy reliance of current deep models on costly manual annotation by leveraging a large amount of unlabeled data in combination with a much smaller set of labeled data. However, most existing SSL methods assume that all labeled and unlabeled data are drawn from the same feature distribution, which can be impractical in real-world applications. In this study, we take the initial step to systematically investigate the open-domain semi-supervised learning setting, where a feature distribution mismatch exists between labeled and unlabeled data. In pursuit of an effective solution for open-domain SSL, we propose a novel framework called GlocalMatch , which aims to exploit both glo bal and lo cal ( i.e. , glocal) cluster structure of open-domain unlabeled data. The glocal cluster structure is utilized in two complementary ways. Firstly, GlocalMatch optimizes a Glocal Cluster Compacting (GCC) objective, that encourages feature representations of the same class, whether with in the same domain or across different domains, to become closer to each other. Secondly, GlocalMatch incorporates a Glocal Semantic Aggregation (GSA) strategy to produce more reliable pseudo-labels by aggregating predictions from neighboring clusters. Extensive experiments demonstrate that GlocalMatch outperforms the state-of-the-art SSL methods significantly, achieving superior performance for both in-domain and out-of-domain generalization. The code is released in https://github.com/nukezil/GlocalMatch .
Article
Full-text available
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e.,AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Despite comprehensive surveys of related fields, the summarization of OOD detection methods remains incomplete and requires further advancement. This paper specifically addresses the gap in recent technical developments in the field of OOD detection. It also provides a comprehensive discussion of representative methods from other sub-tasks and how they relate to and inspire the development of OOD detection methods. The survey concludes by identifying open challenges and potential research directions.
Article
Full-text available
In this article, we model a set of pixelwise object segmentation tasks — automatic video segmentation (AVS), image co-segmentation (ICS) and few-shot semantic segmentation (FSS) — in a unified view of segmenting objects from relational visual data. To this end, we propose an attentive graph neural network (AGNN) that addresses these tasks in a holistic fashion, by formulating them as a process of iterative information fusion over data graphs. It builds a fully-connected graph to efficiently represent visual data as nodes and relations between data instances as edges. The underlying relations are described by a differentiable attention mechanism, which thoroughly examines fine-grained semantic similarities between all the possible location pairs in two data instances. Through parametric message passing, AGNN is able to capture knowledge from the relational visual data, enabling more accurate object discovery and segmentation. Experiments show that AGNN can automatically highlight primary foreground objects from video sequences (i.e., automatic video segmentation), and extract common objects from noisy collections of semantically related images (i.e., image co-segmentation). AGNN can even generalize segment new categories with little annotated data (i.e., few-shot semantic segmentation). Taken together, our results demonstrate that AGNN provides a powerful tool that is applicable to a wide range of pixel-wise object pattern understanding tasks with relational visual data. Our algorithm implementations have been made publicly available at https://github.com/carrierlxk/AGNN .
Article
Full-text available
As a developing trend of urbanization, massive amounts of urban statistical data with multiple views (e.g., views of Population and Economy) are increasingly collected and benefited to diverse domains, including transportation service, regional analysis, etc. Unfortunately, these statistical data that are divided into fine-grained regions usually suffer from missing value problem during the acquisition and storage processes. It is mianly caused by some inevitable circumstances, e.g., the document defacement, statistical difficulty in remote districts, and inaccurate information cleaning, etc. Those missing entries which make valuable information invisible may distort the further urban analysis. To improve the quality of missing data imputation, we propose an improved spatial multi-kernel learning method to guide the imputation process incorporating with the adaptive-weight non-negative matrix factorization strategy. Our model takes into account the regional latent similarities and the real geographical positions as well as the correlations among various views that are able to complete missing values precisely. We conduct intensive experiments to evaluate our method and compare with other state-of-the-art approaches on real-world datasets. All the empirical results show that the proposed model outperforms all the other state-of-the-art methods. Additionally, our model represents a strong generalization ability across multiple cities.
Conference Paper
Full-text available
Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system. However, likelihoods derived from such models have been shown to be problematic for detecting certain types of inputs that significantly differ from training data. In this paper, we pose that this problem is due to the excessive influence that input complexity has in generative models' likelihoods. We report a set of experiments supporting this hypothesis, and use an estimate of input complexity to derive an efficient and parameter-free OOD score, which can be seen as a likelihood-ratio, akin to Bayesian model comparison. We find such score to perform comparably to, or even better than, existing OOD detection approaches under a wide range of data sets, models, model sizes, and complexity estimates.
Conference Paper
Full-text available
Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. In previous ternarized neural networks, a hard threshold Δ is introduced to determine quantization intervals. Although the selection of Δ greatly affects the training results, previous works estimate Δ via an approximation or treat it as a hyper-parameter, which is suboptimal. In this paper, we present the Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine quantization intervals instead of depending on a hard threshold. Concretely, we replace the original ternary kernel with the addition of two binary kernels at training time, where ternary values are determined by the combination of two corresponding binary values. At inference time, we add up the two binary kernels to obtain a single ternary kernel. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and extreme low bit networks. Experiments on ImageNet with AlexNet (Top-1 55.6%), ResNet-18 (Top-1 66.2%) achieves new state-of-the-art.
Article
Full-text available
The real-world deployment of Deep Neural Networks (DNNs) in safety-critical applications such as autonomous vehicles needs to address a variety of DNNs' vulnerabilities, one of which being detecting and rejecting out-of-distribution outliers that might result in unpredictable fatal errors. We propose a new technique relying on self-supervision for generalizable out-of-distribution (OOD) feature learning and rejecting those samples at the inference time. Our technique does not need to pre-know the distribution of targeted OOD samples and incur no extra overheads compared to other methods. We perform multiple image classification experiments and observe our technique to perform favorably against state-of-the-art OOD detection methods. Interestingly, we witness that our method also reduces in-distribution classification risk via rejecting samples near the boundaries of the training set distribution.
Article
Full-text available
As a key mission of the modern traffic management, crowd flow prediction (CFP) benefits in many tasks of intelligent transportation services. However, most existing techniques focus solely on forecasting entrance and exit flows of metro stations that do not provide enough useful knowledge for traffic management. In practical applications, managers desperately want to solve the problem of getting the potential passenger distributions to help authorities improve transport services, termed as crowd flow distribution (CFD) forecasts. Therefore, to improve the quality of transportation services, we proposed three spatiotemporal models to effectively address the network-wide CFD prediction problem based on the online latent space (OLS) strategy. Our models take into account the various trending patterns and climate influences, as well as the inherent similarities among different stations that are able to predict both CFD and entrance and exit flows precisely. In our online systems, a sequence of CFD snapshots is used as the training data. The latent attribute evolutions of different metro stations can be learned from the previous trend and do the next prediction based on the transition patterns. All the empirical results demonstrate that the three developed models outperform all the other state-of-the-art approaches on three large-scale real-world datasets.
Conference Paper
Full-text available
The problem of extracting meaningful data through graph analysis spans a range of different fields, such as the internet, social networks, biological networks, and many others. The importance of being able to effectively mine and learn from such data continues to grow as more and more structured data become available. In this paper, we present a simple and scalable semi-supervised learning method for graph-structured data in which only a very small portion of the training data are labeled. To sufficiently embed the graph knowledge, our method performs graph convolution from different views of the raw data. In particular, a dual graph convolutional neural network method is devised to jointly consider the two essential assumptions of semi-supervised learning: (1) local consistency and (2) global consistency. Accordingly, two convolutional neural networks are devised to embed the local-consistency-based and global-consistency-based knowledge, respectively. Given the different data transformations from the two networks, we then introduce an unsupervised temporal loss function for the ensemble. In experiments using both unsupervised and supervised loss functions, our method outperforms state-of-the-art techniques on different datasets.
Article
Full-text available
Semi-supervised learning is attracting increasing attention due to the fact that datasets of many domains lack enough labeled data. Variational Auto-Encoder (VAE), in particu- lar, has demonstrated the benefits of semi-supervised learning. The majority of existing semi-supervised VAEs utilize a classifier to exploit label information, where the param- eters of the classifier are introduced to the VAE. Given the limited labeled data, learn- ing the parameters for the classifiers may not be an optimal solution for exploiting label information. Therefore, in this paper, we develop a novel approach for semi-supervised VAE without classifier. Specifically, we propose a new model called Semi-supervised Dis- entangled VAE (SDVAE), which encodes the input data into disentangled representation and non-interpretable representation, then the category information is directly utilized to regularize the disentangled representation via the equality constraint. To further enhance the feature learning ability of the proposed VAE, we incorporate reinforcement learning to relieve the lack of data. The dynamic framework is capable of dealing with both image and text data with its corresponding encoder and decoder networks. Extensive experiments on image and text datasets demonstrate the effectiveness of the proposed framework.
Article
Full-text available
Semi-supervised learning methods using Generative Adversarial Networks (GANs) have shown promising empirical success recently. Most of these methods use a shared discriminator/classifier which discriminates real examples from fake while also predicting the class label. Motivated by the ability of the GANs generator to capture the data manifold well, we propose to estimate the tangent space to the data manifold using GANs and employ it to inject invariances into the classifier. In the process, we propose enhancements over existing methods for learning the inverse mapping (i.e., the encoder) which greatly improves in terms of semantic similarity of the reconstructed sample with the input sample. We observe considerable empirical gains in semi-supervised learning over baselines, particularly in the cases when the number of labeled examples is low. We also provide insights into how fake examples influence the semi-supervised learning procedure.
Chapter
The proceedings of the 2001 Neural Information Processing Systems (NIPS) Conference. The annual conference on Neural Information Processing Systems (NIPS) is the flagship conference on neural computation. The conference is interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, vision, speech and signal processing, reinforcement learning and control, implementations, and diverse applications. Only about 30 percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. These proceedings contain all of the papers that were presented at the 2001 conference. Bradford Books imprint
Article
Cross-modal hashing (CMH) has gained much attention due to its effectiveness and efficiency in facilitating efficient retrieval between different modalities. Whereas, most existing methods unconsciously ignore the hierarchical structural information of the data, and often learn a single-layer hash function to directly transform cross-modal data into common low-dimensional hash codes in one step. This sudden drop of dimension and the huge semantic gap can cause the discriminative information loss. To this end, we adopt a coarse-to-fine progressive mechanism and propose a novel Hierarchical Consensus Cross-Modal Hashing (HCCH). Specifically, to mitigate the loss of important discriminative information, we propose a coarse-to-fine hierarchical hashing scheme that utilizes a two-layer hash function to refine the beneficial discriminative information gradually. And then, the 2,1\ell _{2,1} -norm is imposed on the layer-wise hash function to alleviate the effects of redundant and corrupted features. Finally, we present consensus learning to effectively encode data into a consensus space in such a progressive way, thereby reducing the semantic gap progressively. Through extensive contrast experiments with some advanced CMH methods, the effectiveness and efficiency of our HCCH method are demonstrated on four benchmark datasets. We have released the source code at https://github.com/sunyuan-cs .
Article
With the development of video network, image set classification (ISC) has received a lot of attention and can be used for various practical applications, such as video based recognition, action recognition, and so on. Although the existing ISC methods have obtained promising performance, they often have extreme high complexity. Due to the superiority in storage space and complexity cost, learning to hash becomes a powerful solution scheme. However, existing hashing methods often ignore complex structural information and hierarchical semantics of the original features. They usually adopt a single-layer hashing strategy to transform high-dimensional data into short-length binary codes in one step. This sudden drop of dimension could result in the loss of advantageous discriminative information. In addition, they do not take full advantage of intrinsic semantic knowledge from whole gallery sets. To tackle these problems, in this paper, we propose a novel Hierarchical Hashing Learning (HHL) for ISC. Specifically, a coarse-to-fine hierarchical hashing scheme is proposed that utilizes a two-layer hash function to gradually refine the beneficial discriminative information in a layer-wise fashion. Besides, to alleviate the effects of redundant and corrupted features, we impose the ℓ 2,1 norm on the layer-wise hash function. Moreover, we adopt a bidirectional semantic representation with the orthogonal constraint to keep intrinsic semantic information of all samples in whole image sets adequately. Comprehensive experiments demonstrate HHL acquires significant improvements in accuracy and running time. We will release the demo code on https://github.com/sunyuan-cs.
Article
Deep semi-supervised learning is a fast-growing field with a range of practical applications. This paper provides a comprehensive survey on both fundamentals and recent advances in deep semi-supervised learning methods from perspectives of model design and unsupervised loss functions. We first present a taxonomy for deep semi-supervised learning that categorizes existing methods, including deep generative methods, consistency regularization methods, graph-based methods, pseudo-labeling methods, and hybrid methods. Then we provide a comprehensive review of 60 representative methods and offer a detailed comparison of these methods in terms of the type of losses, architecture differences, and test performance results. In addition to the progress in the past few years, we further discuss some shortcomings of existing methods and provide some tentative heuristic solutions for solving these open problems.
Article
In this paper we revisit the idea of pseudo-labeling in the context of semi-supervised learning where a learning algorithm has access to a small set of labeled samples and a large set of unlabeled samples. Pseudo-labeling works by applying pseudo-labels to samples in the unlabeled set by using a model trained on combination of the labeled samples and any previously pseudo-labeled samples, and iteratively repeating this process in a self-training cycle. Current methods seem to have abandoned this approach in favor of consistency regularization methods that train models under a combination of different styles of self-supervised losses on the unlabeled samples and standard supervised losses on the labeled samples. We empirically demonstrate that pseudo-labeling can in fact be competitive with the state-of-the-art, while being more resilient to out-of-distribution samples in the unlabeled set. We identify two key factors that allow pseudo-labeling to achieve such remarkable results (1) applying curriculum learning principles and (2) avoiding concept drift by restarting model parameters before each self-training cycle. We obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled samples, and 68.87% top-1 accuracy on Imagenet-ILSVRC using only 10% of the labeled samples.
Article
Deep semi-supervised learning (SSL) aims to utilize a sizeable unlabeled set to train deep networks, thereby reducing the dependence on labeled instances. However, the unlabeled set often carries unseen classes that cause the deep SSL algorithm to lose generalization. Previous works focus on the data level that they attempt to remove unseen class data or assign lower weight to them but could not eliminate their adverse effects on the SSL algorithm. Rather than focusing on the data level, this paper turns attention to the model parameter level. We find that only partial parameters are essential for seen-class classification, termed safe parameters. In contrast, the other parameters tend to fit irrelevant data, termed harmful parameters. Driven by this insight, we propose Safe Parameter Learning (SPL) to discover safe parameters and make the harmful parameters inactive, such that we can mitigate the adverse effects caused by unseen-class data. Specifically, we firstly design an effective strategy to divide all parameters in the pre-trained SSL model into safe and harmful ones. Then, we introduce a bi-level optimization strategy to update the safe parameters and kill the harmful parameters. Extensive experiments show that SPL outperforms the state-of-the-art SSL methods on all the benchmarks by a large margin. Moreover, experiments demonstrate that SPL can be integrated into the most popular deep SSL networks and be easily extended to handle other cases of class distribution mismatch.
Article
Out-of-distribution (OOD) detection is important for deploying machine learning models in the real world, where test data from shifted distributions can naturally arise. While a plethora of algorithmic approaches have recently emerged for OOD detection, a critical gap remains in theoretical understanding. In this work, we develop an analytical framework that characterizes and unifies the theoretical understanding for OOD detection. Our analytical framework motivates a novel OOD detection method for neural networks, GEM, which demonstrates both theoretical and empirical superiority. In particular, on CIFAR-100 as in-distribution data, our method outperforms a competitive baseline by 16.57% (FPR95). Lastly, we formally provide provable guarantees and comprehensive analysis of our method, underpinning how various properties of data distribution affect the performance of OOD detection.
Chapter
Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in an open-world setting. However, existing OOD detection solutions can be brittle in the open world, facing various types of adversarial OOD inputs. While methods leveraging auxiliary OOD data have emerged, our analysis on illuminative examples reveals a key insight that the majority of auxiliary OOD examples may not meaningfully improve or even hurt the decision boundary of the OOD detector, which is also observed in empirical results on real data. In this paper, we provide a theoretically motivated method, Adversarial Training with informative Outlier Mining (ATOM), which improves the robustness of OOD detection. We show that, by mining informative auxiliary OOD data, one can significantly improve OOD detection performance, and somewhat surprisingly, generalize to unseen adversarial attacks. ATOM achieves state-of-the-art performance under a broad family of classic and adversarial OOD evaluation tasks. For example, on the CIFAR-10 in-distribution dataset, ATOM reduces the FPR (at TPR 95%) by up to 57.99% under adversarial OOD inputs, surpassing the previous best baseline by a large margin.
Chapter
With the COVID-19 pandemic bringing about a severe global crisis, our health systems are under tremendous pressure. Automated screening plays a critical role in the fight against this pandemic, and much of the previous work has been very successful in designing effective screening models. However, they would lose effectiveness under the semi-supervised learning environment with only positive and unlabeled (PU) data, which is easy to collect clinically. In this paper, we report our attempt towards achieving semi-supervised screening of COVID-19 from PU data. We propose a new PU learning method called Constraint Non-Negative Positive Unlabeled Learning (cnPU). It suggests the constraint non-negative risk estimator, which is more robust against overfitting than previous PU learning methods when giving limited positive data. It also embodies a new and efficient optimization algorithm that can make the model learn well on positive data and avoid overfitting on unlabeled data. To the best of our knowledge, this is the first work that realizes PU learning of COVID-19. A series of empirical studies show that our algorithm remarkably outperforms state of the art in real datasets of two medical imaging modalities, including X-ray and computed tomography. These advantages endow our algorithm as a robust and useful computer-assisted tool in the semi-supervised screening of COVID-19.
Article
In plenty of real-life tasks, strongly supervised information is hard to obtain, and thus weakly supervised learning has drawn considerable attention recently. This paper investigates the problem of learning from incomplete and inaccurate supervision, where only a limited subset of training data is labeled but potentially with noise. This setting is challenging and of great importance but rarely studied in the literature. We notice that in many applications, the limited labeled data are with certain structures, which paves us a way to design effective methods. Specifically, we observe that labeled data are usually with one-sided noise such as the bug detection task, where the identified buggy codes are indeed with defects, while codes checked many times or newly fixed may still have other flaws. Furthermore, when there occurs two-sided noise in the labeled data, we exploit the class-prior information of unlabeled data, which is typically available in practical tasks. We propose novel approaches for the incomplete and inaccurate supervision learning tasks and effectively alleviate the negative influence of label noise with the help of a vast number of unlabeled data. Both theoretical analysis and extensive experiments justify and validate the effectiveness of the proposed approaches.
Article
Graph Neural Networks (GNNs) have led to state-of-the-art performance on many machine learning tasks such as node classification and link prediction. Most existing GNN models exploit a single type of aggregator to aggregate neighboring nodes information, and then add or concatenate the aggregator output to the current center node vector. However, a single aggregator is difficult to capture the different aspects of neighboring information and the simple update methods limit the expressive capability of GNNs. Not only that, existing supervised or semi-supervised GNN models are trained based on the loss function of the node label, which leads to the neglect of graph structure information. In this paper, we propose a novel graph neural network architecture, Graph Attention & Interaction Network (GAIN). We use multiple types of aggregators to gather neighboring information in different aspects and integrate the outputs of these aggregators through the attention mechanism. Furthermore, we design a graph regularized loss to better preserve graph structural information. Additionally, we introduce the explicit feature interaction method to update the node embeddings. We conduct comprehensive experiments on two node-classification benchmarks and a real-world financial news dataset. The experiments demonstrate our GAIN model outperforms current state-of-the-art performances on all the tasks.
Chapter
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available. While existing SSL methods assume that samples in the labeled and unlabeled data share the classes of their samples, we address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data. Instead of training an OOD detector and SSL separately, we propose a multi-task curriculum learning framework. First, to detect the OOD samples in unlabeled data, we estimate the probability of the sample belonging to OOD. We use a joint optimization framework, which updates the network parameters and the OOD score alternately. Simultaneously, to achieve high performance on the classification of in-distribution (ID) data, we select ID samples in unlabeled data having small OOD scores, and use these data with labeled data for training the deep neural networks to classify ID samples in a semi-supervised manner. We conduct several experiments, and our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
Article
In this paper, we focus on co-selection of instances and features in the semi-supervised learning scenario. In this context, co-selection becomes a more challenging problem as data contain labeled and unlabeled examples sampled from the same population. To carry out such semi-supervised co-selection, we propose a unified framework, called sCOs, which efficiently integrates labeled and unlabeled parts into the co-selection process. The framework is based on introducing both a sparse regularization term and a similarity preserving approach . It evaluates the usefulness of features and instances in order to select the most relevant ones, simultaneously. We propose two efficient algorithms that work for both convex and nonconvex functions. To the best of our knowledge, this paper offers, for the first time ever, a study utilizing nonconvex penalties for the co-selection of semi-supervised learning tasks. Experimental results on some known benchmark datasets are provided for validating sCOs and comparing it with some representative methods in the state-of-the art.
Article
Semi-supervised learning (SSL) aims to avoid the need for collecting prohibitively expensive labelled training data. Whilst demonstrating impressive performance boost, existing SSL methods artificially assume that small labelled data and large unlabelled data are drawn from the same class distribution. In a more realistic scenario with class distribution mismatch between the two sets, they often suffer severe performance degradation due to error propagation introduced by irrelevant unlabelled samples. Our work addresses this under-studied and realistic SSL problem by a novel algorithm named Uncertainty-Aware Self-Distillation (UASD). Specifically, UASD produces soft targets that avoid catastrophic error propagation, and empower learning effectively from unconstrained unlabelled data with out-of-distribution (OOD) samples. This is based on joint Self-Distillation and OOD filtering in a unified formulation. Without bells and whistles, UASD significantly outperforms six state-of-the-art methods in more realistic SSL under class distribution mismatch on three popular image classification datasets: CIFAR10, CIFAR100, and TinyImageNet.
Article
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for building fair models with sacrifice in accuracy, most are geared toward either supervised or unsupervised learning. Yet two observations inspired us to wonder whether semi-supervised learning might be useful to solve discrimination problems. First, previous study showed that increasing the size of the training set may lead to a better trade-off between fairness and accuracy. Second, the most powerful models today require an enormous of data to train which, in practical terms, is likely possible from a combination of labeled and unlabeled data. Hence, in this paper, we present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data, a re-sampling method to obtain multiple fair datasets and lastly, ensemble learning to improve accuracy and decrease discrimination. A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
Article
Automated Screening of COVID-19 from chest CT is of emergency and importance during the outbreak of SARS-CoV-2 worldwide in 2020. However, accurate screening of COVID-19 is still a massive challenge due to the spatial complexity of 3D volumes, the labeling difficulty of infection areas, and the slight discrepancy between COVID-19 and other viral pneumonia in chest CT. While a few pioneering works have made significant progress, they are either demanding manual annotations of infection areas or lack of interpretability. In this paper, we report our attempt towards achieving highly accurate and interpretable screening of COVID-19 from chest CT with weak labels. We propose an attention-based deep 3D multiple instance learning (AD3D-MIL) where a patient-level label is assigned to a 3D chest CT that is viewed as a bag of instances. AD3D-MIL can semantically generate deep 3D instances following the possible infection area. AD3D-MIL further applies an attention-based pooling approach to 3D instances for providing insight into the contribution of each instance to the bag label. AD3D-MIL finally learns Bernoulli distributions of the bag-level labels for more accessible learning. We collected 460 chest CT examples: 230 CT examples from 79 patients with COVID-19, 100 CT examples from 100 patients with common pneumonia, and 130 CT examples from 130 people without pneumonia. A series of empirical studies show that our algorithm achieves an overall accuracy of 97.9%, AUC of 99.0%, and Cohen kappa score of 95.7%. These advantages endow our algorithm as an efficient assisted tool in the screening of COVID-19.
Article
In this paper, we address the challenges of detecting instances from emerging classes over a non-stationary data stream during classification. In particular, instances from an entirely unknown class may appear in a data stream over time. Existing classification techniques utilize unsupervised clustering to identify the emergence of such data instances. Unfortunately, they make strong assumptions which are typically invalid in practice; (i) Most instances associated with a class are closer to each other in feature space than instances associated with different classes, (ii) Covariates of data are normalized through an oracle to overcome the effect of a few data instances having large feature values, and (iii) Labels of instances from emerging classes are readily available soon after detection. To address the challenges that occur in practice when the above assumptions are weak, we propose a practical semi-supervised emerging class detection framework. Particularly, we aim to identify similar data instances within local regions in feature space by incorporating a mutual graph clustering mechanism. Our empirical evaluation of this framework on real-world datasets demonstrates its superiority of classification performance compared to existing methods while using significantly fewer labeled instances.
Article
Semi-supervised learning constructs the predictive model by learning from a few labeled training examples and a large pool of unlabeled ones. It has a wide range of application scenarios and has attracted much attention in the past decades. However, it is noteworthy that although the learning performance is expected to be improved by exploiting unlabeled data, some empirical studies show that there are situations where the use of unlabeled data may degenerate the performance. Thus, it is advisable to be able to exploit unlabeled data safely. This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality; model uncertainty, where the learning algorithm fails to handle the uncertainty during training; measure diversity, where the safe performance could be adapted to diverse measures.
Article
Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. We find that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and that performance can degrade substantially when the unlabeled dataset contains out-of-class examples. To help guide SSL research towards real-world applicability, we make our unified reimplemention and evaluation platform publicly available.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results with almost no additional computational cost. We also show improved performance in the low-data regime on the STL-10 dataset.
Article
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification performance at tasks such as visual object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world. Using the state-of-the-art Convolutional Neural Networks (CNNs), we provide scene classification CNNs (Places-CNNs) as baselines, that significantly outperform the previous approaches. Visualization of the CNNs trained on Places shows that object detectors emerge as an intermediate representation of scene classification. With its high-coverage and high-diversity of exemplars, the Places Database along with the Places-CNNs offer a novel resource to guide future progress on scene recognition problems.
Article
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the output distribution. Virtual adversarial loss is defined as the robustness of the model's posterior distribution against local perturbation around each input data point. Our method is similar to adversarial training, but differs from adversarial training in that it determines the adversarial direction based only on the output distribution and that it is applicable to a semi-supervised setting. Because the directions in which we smooth the model are virtually adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward and backpropagations. In our experiments, we applied VAT to supervised and semi-supervised learning on multiple benchmark datasets. With additional improvement based on entropy minimization principle, our VAT achieves the state-of-the-art performance on SVHN and CIFAR-10 for semi-supervised learning tasks.
Article
The recently proposed temporal ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, temporal ensembling becomes unwieldy when using large datasets. To overcome this problem, we propose a method that averages model weights instead of label predictions. As an additional benefit, the method improves test accuracy and enables training with fewer labels than earlier methods. We report state-of-the-art results on semi-supervised SVHN, reducing the error rate from 5.12% to 4.41% with 500 labels, and achieving 5.39% error rate with 250 labels. By using extra unlabeled data, we reduce the error rate to 2.76% on 500-label SVHN.