Article

Universal and Scalable Weakly-Supervised Domain Adaptation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Domain adaptation leverages labeled data from a source domain to learn an accurate classifier for an unlabeled target domain. Since the data collected in practical applications usually contain noise, the weakly-supervised domain adaptation algorithm has attracted widespread attention from researchers that tolerates the source domain with label noises or/and features noises. Several weakly-supervised domain adaptation methods have been proposed to mitigate the difficulty of obtaining the high-quality source domains that are highly related to the target domain. However, these methods assume to obtain the accurate noise rate in advance to reduce the negative transfer caused by noises in source domain, which limits the application of these methods in the real world where the noise rate is unknown. Meanwhile, since source data usually comes from multiple domains, the naive application of single-source domain adaptation algorithms may lead to sub-optimal results. We hence propose a universal and scalable weakly-supervised domain adaptation method called PDCAS to ease restraints of such assumptions and make it more general. Specifically, PDCAS includes two stages: progressive distillation and domain alignment. In progressive distillation stage, we iteratively distill out potentially clean samples whose annotated labels are highly consistent with the prediction of model and correct labels for noisy source samples. This process is non-supervision by exploiting intrinsic similarity to measure and extract initial corrected samples. In domain alignment stage, we consider Class-Aligned Sampling which balances the samples for both source and target domains along with the global feature distributions to alleviate the shift of label distributions. Finally, we apply PDCAS in multi-source noisy scenario and propose a novel multi-source weakly-supervised domain adaptation method called MSPDCAS, which shows the scalability of our framework. Extensive experiments on Office-31 and Office-Home datasets demonstrate the effectiveness and robustness of our method compared to state-of-the-art methods.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Adversarial example attacks are deemed to be a serious threat to deep neural network (DNN) models. Generating adversarial examples in white-box settings has been well-studied, however, it remains challenging to generate transferable adversarial examples that successfully attack black-box models. This work proposes Foolmix, a novel method for generating transferable adversarial examples for black-box attacks. The design of Foolmix is inspired by our observation that adversarial examples with high transferability usually carry multi-class features in the latent space of DNN models. Thus, we propose a dual-blending strategy that blends the image with a set of random pixel-blocks and blends the gradient by calculating the loss of the blended image for both the ground-truth label and a set of random labels. The dual-blending strategy pressures the example to penetrate multiple class regions and gain multi-class features in the latent space, greatly enhancing the transferability of the generated adversarial example. However, the randomness in the blending process might also pressure the example to approach the boundary of the original class region, which lowers the robustness of the example. To mitigate this problem, we further propose an update method in the starting forward direction to guide the generated adversarial example to go deep into multi-class adversarial regions while being globally far away from the original class region. Compared to state-of-the-art transformation-based attacks, Foolmix significantly enhances the transferability of generated adversarial examples, boosting the average transferable attack success rate by 13.2% and 16.9% on mainstream CNNs and ViTs respectively, while achieving better defense breakthrough ability.
Conference Paper
Full-text available
As a vital problem in classification-oriented transfer, unsupervised domain adaptation (UDA) has attracted widespread attention in recent years. Previous UDA methods assume the marginal distributions of different domains are shifted while ignoring the discriminant information in the label distributions. This leads to classification performance degeneration in real applications. In this work, we focus on the conditional distribution shift problem which is of great concern to current conditional invariant models. We aim to seek a kernel covariance embedding for conditional distribution which remains yet unexplored. Theoretically , we propose the Conditional Kernel Bures (CKB) metric for characterizing conditional distribution discrepancy , and derive an empirical estimation for the CKB metric without introducing the implicit kernel feature map. It provides an interpretable approach to understand the knowledge transfer mechanism. The established consistency theory of the empirical estimation provides a theoretical guarantee for convergence. A conditional distribution matching network is proposed to learn the conditional invariant and discriminative features for UDA. Extensive experiments and analysis show the superiority of our proposed model.
Article
Full-text available
Domain Adaptation (DA) attempts to transfer knowledge in labeled source domain to unlabeled target domain without requiring target supervision. Recent advanced methods conduct DA mainly by aligning domain distributions. However, the performances of these methods suffer extremely when source and target domains encounter a large domain discrepancy. We argue this limitation may attribute to insufficient domain-specialized feature exploring, because most works merely concentrate on domain-general feature learning while integrating totally-shared convolutional networks (convnets). In this paper, we relax the completely-shared convnets assumption and propose Domain Conditioned Adaptation Network, which introduces domain conditioned channel attention module to excite channel activation separately for each domain. Such a partially-shared convnets module allows domain-specialized features in low-level to be explored appropriately. Furthermore, we develop Generalized Domain Conditioned Adaptation Network to automatically determine whether domain channel activations should be modeled separately in each attention module. Then, the critical domain-dependent knowledge could be adaptively extracted according to the domain statistics gap. Meanwhile, to effectively align high-level feature distributions across two domains, we further deploy feature adaptation blocks after task-specific layers, which will explicitly mitigate the domain discrepancy. Extensive experiments on four cross-domain benchmarks demonstrate our approaches outperform existing methods, especially on very tough cross-domain learning tasks.
Conference Paper
Full-text available
Large neural networks are difficult to deploy on mobile devices because of intensive computation and storage. To alleviate it, we study ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. In previous ternarized neural networks, a hard threshold Δ is introduced to determine quantization intervals. Although the selection of Δ greatly affects the training results, previous works estimate Δ via an approximation or treat it as a hyper-parameter, which is suboptimal. In this paper, we present the Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine quantization intervals instead of depending on a hard threshold. Concretely, we replace the original ternary kernel with the addition of two binary kernels at training time, where ternary values are determined by the combination of two corresponding binary values. At inference time, we add up the two binary kernels to obtain a single ternary kernel. Our method dramatically outperforms current state-of-the-arts, lowering the performance gap between full-precision networks and extreme low bit networks. Experiments on ImageNet with AlexNet (Top-1 55.6%), ResNet-18 (Top-1 66.2%) achieves new state-of-the-art.
Article
Full-text available
Recent work in domain adaptation bridges different domains by adversarially learning a domain-invariant representation that cannot be distinguished by a domain discriminator. Existing methods of adversarial domain adaptation mainly align the global images across the source and target domains. However, it is obvious that not all regions of an image are transferable, while forcefully aligning the untransferable regions may lead to negative transfer. Furthermore, some of the images are significantly dissimilar across domains, resulting in weak image-level transferability. To this end, we present Transferable Attention for Domain Adaptation (TADA), focusing our adaptation model on transferable regions or images. We implement two types of complementary transferable attention: transferable local attention generated by multiple region-level domain discriminators to highlight transferable regions, and transferable global attention generated by single image-level domain discriminator to highlight transferable images. Extensive experiments validate that our proposed models exceed state of the art results on standard domain adaptation datasets.
Article
Full-text available
While Unsupervised Domain Adaptation (UDA) algorithms, i.e., there are only labeled data from source domains, have been actively studied in recent years, most algorithms and theoretical results focus on Single-source Unsupervised Domain Adaptation (SUDA). However, in the practical scenario, labeled data can be typically collected from multiple diverse sources, and they might be different not only from the target domain but also from each other. Thus, domain adapters from multiple sources should not be modeled in the same way. Recent deep learning based Multi-source Unsupervised Domain Adaptation (MUDA) algorithms focus on extracting common domain-invariant representations for all domains by aligning distribution of all pairs of source and target domains in a common feature space. However, it is often very hard to extract the same domain-invariant representations for all domains in MUDA. In addition, these methods match distributions without considering domain-specific decision boundaries between classes. To solve these problems, we propose a new framework with two alignment stages for MUDA which not only respectively aligns the distributions of each pair of source and target domains in multiple specific feature spaces, but also aligns the outputs of classifiers by utilizing the domainspecific decision boundaries. Extensive experiments demonstrate that our method can achieve remarkable results on popular benchmark datasets for image classification.
Article
Full-text available
This paper addresses the limitations of previous training methods that emphasize either easy examples like self-paced learning or difficult examples like hard example mining. Inspired by active learning, we propose two alternatives to re-weight training samples based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold (or threshold closeness). Extensive experimental results on multiple datasets show that our methods reliably improve accuracy in various network architectures, including providing additional gains on top of other popular training tools, such as ADAM, dropout, and distillation.
Article
Full-text available
The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to domain adaptation in deep networks that can simultaneously learn adaptive classifiers and transferable features from labeled data in the source domain and unlabeled data in the target domain. We relax a shared-classifier assumption made by previous methods and assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging several layers into the deep network to explicitly learn the residual function with reference to the target classifier. We embed features of multiple layers into reproducing kernel Hilbert spaces (RKHSs) and match feature distributions for feature adaptation. The adaptation behaviors can be achieved in most feed-forward models by extending them with new residual layers and loss functions, which can be trained efficiently using standard back-propagation. Empirical evidence exhibits that the approach outperforms state of art methods on standard domain adaptation datasets.
Article
Domain adaptation methods reduce domain shift typically by learning domain-invariant features. Most existing methods are built on distribution matching, e.g ., adversarial domain adaptation, which tends to corrupt feature discriminability. In this paper, we propose Discriminative Radial Domain Adaptation (DRDR) which bridges source and target domains via a shared radial structure. It’s motivated by the observation that as the model is trained to be progressively discriminative, features of different categories expand outwards in different directions, forming a radial structure. We show that transferring such an inherently discriminative structure would enable to enhance feature transferability and discriminability simultaneously. Specifically, we represent each domain with a global anchor and each category a local anchor to form a radial structure and reduce domain shift via structure matching. It consists of two parts, namely isometric transformation to align the structure globally and local refinement to match each category. To enhance the discriminability of the structure, we further encourage samples to cluster close to the corresponding local anchors based on optimal-transport assignment. Extensively experimenting on multiple benchmarks, our method is shown to consistently outperforms state-of-the-art approaches on varied tasks, including the typical unsupervised domain adaptation, multi-source domain adaptation, domain-agnostic learning, and domain generalization.
Chapter
Recent studies on learning with noisy labels have shown remarkable performance by exploiting a small clean dataset. In particular, model agnostic meta-learning-based label correction methods further improve performance by correcting noisy labels on the fly. However, there is no safeguard on the label miscorrection, resulting in unavoidable performance degradation. Moreover, every training step requires at least three back-propagations, significantly slowing down the training speed. To mitigate these issues, we propose a robust and efficient method, FasTEN, which learns a label transition matrix on the fly. Employing the transition matrix makes the classifier skeptical about all the corrected samples, which alleviates the miscorrection issue. We also introduce a two-head architecture to efficiently estimate the label transition matrix every iteration within a single back-propagation, so that the estimated matrix closely follows the shifting noise distribution induced by label correction. Extensive experiments demonstrate that our FasTEN shows the best performance in training efficiency while having comparable or better accuracy than existing methods, especially achieving state-of-the-art performance in a real-world noisy dataset, Clothing1M.KeywordsLearning with noisy labelsLabel correctionTransition matrix estimation
Article
Domain adaptation leverages rich knowledge from a related source domain so that it can be used to perform tasks in a target domain. For more knowledge to be obtained under relaxed conditions, domain adaptation methods have been widely used in pattern recognition and image classification. However, most of the existing domain adaptation methods only consider how to minimize different distributions of the source and target domains, which neglects what should be transferred for a specific task and suffers negative transfer by distribution outliers. To address these problems, in this paper, we propose a novel domain adaptation method called weighted correlation embedding learning (WCEL) for image classification. In the WCEL approach, we seamlessly integrated correlation learning, graph embedding, and sample reweighting into a unified learning model. Specifically, we extracted the maximum correlated features from the source and target domains for image classification tasks. In addition, two graphs were designed to preserve the discriminant information from interclass samples and neighborhood relations in intraclass samples. Furthermore, to prevent the negative transfer problem, we developed an efficient sample reweighting strategy to predict the target with different confidence levels. To verify the performance of the proposed method in image classification, extensive experiments were conducted with several benchmark databases, verifying the superiority of the WCEL method over other state-of-the-art domain adaptation algorithms.
Article
Universal domain adaptation (UniDA) aims to transfer knowledge learned from a labeled source domain to an unlabeled target domain under domain shift and category shift. Without prior category overlap information, it is challenging to simultaneously align the common categories between two domains and separate their respective private categories. Additionally, previous studies utilize the source classifier's prediction to obtain various known labels and one generic "unknown" label of target samples. However, over-reliance on learned classifier knowledge is inevitably biased to source data, ignoring the intrinsic structure of target domain. Therefore, in this paper, we propose a novel two-stage UniDA framework called MATHS based on the principle of mutual nearest neighbor contrast and hybrid prototype discrimination. In the first stage, we design an efficient mutual nearest neighbor contrastive learning scheme to achieve feature alignment, which exploits the instance-level affinity relationship to uncover the intrinsic structure of two domains. We introduce a bimodality hypothesis for the maximum discriminative probability distribution to detect the possible target private samples, and present a data-based statistical approach to separate the common and private categories. In the second stage, to obtain more reliable label predictions, we propose an incremental pseudo-classifier for target data only, which is driven by the hybrid representative prototypes. A confidence-guided prototype contrastive loss is designed to optimize the category allocation uncertainty via a self-training mechanism. Extensive experiments on three benchmarks demonstrate that MATHS outperforms previous state-of-the-arts on most UniDA settings.
Article
This paper studies a weakly supervised domain adaptation (WSDA) problem, where we only have access to the source domain with noisy labels, from which we need to transfer useful information to the unlabeled target domain. Although there have been a few studies on this problem, most of them only exploit unidirectional relationships from the source domain to the target domain. In this paper, we propose a universal paradigm called GearNet to exploit bilateral relationships between the two domains. Specifically, we take the two domains as different inputs to train two models alternately, and a symmetrical Kullback-Leibler loss is used for selectively matching the predictions of the two models in the same domain. This interactive learning schema enables implicit label noise canceling and exploit correlations between the source and target domains. Therefore, our GearNet has the great potential to boost the performance of a wide range of existing WSDA methods. Comprehensive experimental results show that the performance of existing methods can be significantly improved by equipping with our GearNet.
Article
Most existing studies on unsupervised domain adaptation (UDA) assume that each domain’s training samples come with domain labels (e.g., painting, photo). Samples from each domain are assumed to follow the same distribution and the domain labels are exploited to learn domain-invariant features via feature alignment. However, such an assumption often does not hold true—there often exist numerous finer-grained domains (e.g., dozens of modern painting styles have been developed, each differing dramatically from those of the classic styles). Therefore, forcing feature distribution alignment across each artificially-defined and coarse-grained domain can be ineffective. In this paper, we address both single-source and multi-source UDA from a completely different perspective, which is to view each instance as a fine domain . Feature alignment across domains is thus redundant. Instead, we propose to perform dynamic instance domain adaptation (DIDA). Concretely, a dynamic neural network with adaptive convolutional kernels is developed to generate instance-adaptive residuals to adapt domain-agnostic deep features to each individual instance. This enables a shared classifier to be applied to both source and target domain data without relying on any domain annotation. Further, instead of imposing intricate feature alignment losses, we adopt a simple semi-supervised learning paradigm using only a cross-entropy loss for both labeled source and pseudo labeled target data. Our model, dubbed DIDA-Net, achieves state-of-the-art performance on several commonly used single-source and multi-source UDA datasets including Digits, Office-Home, DomainNet, Digit-Five, and PACS.
Article
Unsupervised domain adaptation (UDA) enables a learning machine to adapt from a labeled source domain to an unlabeled target domain under the distribution shift. Thanks to the strong representation ability of deep neural networks, recent remarkable achievements in UDA resort to learning domain-invariant features. Intuitively, the goal is that a good feature representation and the hypothesis learned from the source domain can generalize well to the target domain. However, the learning processes of domain-invariant features and source hypotheses inevitably involve domain-specific information that would degrade the generalizability of UDA models on the target domain. The lottery ticket hypothesis proves that only partial parameters are essential for generalization. Motivated by it, we find in this paper that only partial parameters are essential for learning domain-invariant information. Such parameters are termed transferable parameters that can generalize well in UDA. In contrast, the rest parameters tend to fit domain-specific details and often cause the failure of generalization, which are termed untransferable parameters. Driven by this insight, we propose Transferable Parameter Learning (TransPar) to reduce the side effect of domain-specific information in the learning process and thus enhance the memorization of domain-invariant information. Specifically, according to the distribution discrepancy degree, we divide all parameters into transferable and untransferable ones in each training iteration. We then perform separate update rules for the two types of parameters. Extensive experiments on image classification and regression tasks (keypoint detection) show that TransPar outperforms prior arts by non-trivial margins. Moreover, experiments demonstrate that TransPar can be integrated into the most popular deep UDA networks and be easily extended to handle any data distribution shift scenarios.
Article
Unsupervised domain adaptation (UDA) has recently become an appealing research topic in visual recognition, since it exploits all accessible well-labeled source data to train a model with high generalization on target domain without any annotations. However, due to the significant domain discrepancy, the bottleneck for UDA is to learn effective domain-invariant feature representations. To fight off such an obstacle, we propose a novel cross-domain learning framework named Maximum Structural Generation Discrepancy (MSGD) to accurately estimate and mitigate domain shift via introducing an intermediate domain. First, the cross-domain topological structure is explored to propagate target samples to generate a novel intermediate domain paired with the specific source instances. The intermediate domain plays as the bridge to gradually reduce distribution divergence across source and target domains. Concretely, the similar category semantic across source and intermediate features tends to naturally conduct the class-level alignment on eliminating their domain shift. In terms of no target annotation, the domain-level alignment manner is suitable to narrow down the distance between intermediate and target domains. Moreover, to produce high-quality generative instances, we develop the class-driven collaborative translation (CDCT) module to generate class-consistent cross-domain samples in each mini-batch with the assistance of pseudo-labels. Extensive experimental analyses on five domain adaptation benchmarks demonstrate the effectiveness of our MSGD on solving UDA problem.
Article
Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with label noise from a supervised learning perspective. Next, we provide a comprehensive review of 62 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority. Subsequently, we perform an in-depth analysis of noise rate estimation and summarize the typically used evaluation methodology, including public noisy datasets and evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for future studies.
Article
Unsupervised Domain Adaptation (UDA) aims to learn a classifier for the unlabeled target domain by leveraging knowledge from a labeled source domain with a different but related distribution. Many existing approaches typically learn a domain-invariant representation space by directly matching the marginal distributions of the two domains. However, they ignore exploring the underlying discriminative features of the target data and align the cross-domain discriminative features, which may lead to suboptimal performance. To tackle these two issues simultaneously, this paper presents a Joint Clustering and Discriminative Feature Alignment (JCDFA) approach for UDA, which is capable of naturally unifying the mining of discriminative features and the alignment of class-discriminative features into one single framework. Specifically, in order to mine the intrinsic discriminative information of the unlabeled target data, JCDFA jointly learns a shared encoding representation for two tasks: supervised classification of labeled source data, and discriminative clustering of unlabeled target data, where the classification of the source domain can guide the clustering learning of the target domain to locate the object category. We then conduct the cross-domain discriminative feature alignment by separately optimizing two new metrics: 1) an extended supervised contrastive learning, i.e. , semi-supervised contrastive learning 2) an extended Maximum Mean Discrepancy (MMD), i.e. , conditional MMD, explicitly minimizing the intra-class dispersion and maximizing the inter-class compactness. When these two procedures, i.e. , discriminative features mining and alignment are integrated into one framework, they tend to benefit from each other to enhance the final performance from a cooperative learning perspective. Experiments are conducted on four real-world benchmarks ( e.g. , Office-31, ImageCLEF-DA, Office-Home and VisDA-C). All the results demonstrate that our JCDFA can obtain remarkable margins over state-of-the-art domain adaptation methods. Comprehensive ablation studies also verify the importance of each key component of our proposed algorithm and the effectiveness of combining two learning strategies into a framework.
Article
Existing domain adaptation approaches often try to reduce distribution difference between source and target domains and respect domain-specific discriminative structures by some distribution [e.g., maximum mean discrepancy (MMD)] and discriminative distances (e.g., intra-class and inter-class distances). However, they usually consider these losses together and trade off their relative importance by estimating parameters empirically. It is still under insufficient exploration so far to deeply study their relationships to each other so that we cannot manipulate them correctly and the model’s performance degrades. To this end, this article theoretically proves two essential facts: 1) minimizing MMD equals to jointly minimizing their data variance with some implicit weights but, respectively, maximizing the source and target intra-class distances so that feature discriminability degrades and 2) the relationship between intra-class and inter-class distances is as one falls and another rises. Based on this, we propose a novel discriminative MMD with two parallel strategies to correctly restrain the degradation of feature discriminability or the expansion of intra-class distance; specifically: 1) we directly impose a tradeoff parameter on the intra-class distance that is implicit in the MMD according to 1) and 2) we reformulate the inter-class distance with special weights that are analogical to those implicit ones in the MMD and maximizing it can also lead to the intra-class distance falling according to 2). Notably, we do not consider the two strategies in one model due to 2). The experiments on several benchmark datasets not only prove the validity of our revealed theoretical results but also demonstrate that the proposed approach could perform better than some compared state-of-art methods substantially. Our preliminary MATLAB code will be available at https://github.com/WWLoveTransfer/ .
Article
As one of the most prevalent branches of transfer learning, domain adaptation is dedicated to generalizing the knowledge of a source domain to a target domain to perform machine learning tasks. In domain adaptation, the key strategy is to overcome the shift between different domains and learn shared features with domain invariance. However, most existing methods focus on extracting the common features of the source and target domains, and do not consider the shift problem of class center in the target domain caused by this process. Specifically, when we align the domain distributions, we often ignore the inherent feature attributes of the data, or under the guidance of false pseudo-labels, cause the target domain data to be far away from the class center after projection. This is not conducive to classification task. To address these problems, in this study, we propose a novel domain adaptation method, referred to as discriminative invariant alignment (DIA), for image representation. DIA enriches the knowledge matrix by combining the class discriminative information of the source domain and local data structure information of the target domain into a new framework. By introducing the maximum margin criterion of the source domain, the classification boundaries are expanded. To verify the performance of the proposed method, we compared DIA with several state-of-the-art methods on five benchmark databases. The experimental results show that DIA is superior to the state-of-the-art methods.
Chapter
Although unsupervised domain adaptation methods have been widely adopted across several computer vision tasks, it is more desirable if we can exploit a few labeled data from new domains encountered in a real application. The novel setting of the semi-supervised domain adaptation (SSDA) problem shares the challenges with the domain adaptation problem and the semi-supervised learning problem. However, a recent study shows that conventional domain adaptation and semi-supervised learning methods often result in less effective or negative transfer in the SSDA problem. In order to interpret the observation and address the SSDA problem, in this paper, we raise the intra-domain discrepancy issue within the target domain, which has never been discussed so far. Then, we demonstrate that addressing the intra-domain discrepancy leads to the ultimate goal of the SSDA problem. We propose an SSDA framework that aims to align features via alleviation of the intra-domain discrepancy. Our framework mainly consists of three schemes, i.e., attraction, perturbation, and exploration. First, the attraction scheme globally minimizes the intra-domain discrepancy within the target domain. Second, we demonstrate the incompatibility of the conventional adversarial perturbation methods with SSDA. Then, we present a domain adaptive adversarial perturbation scheme, which perturbs the given target samples in a way that reduces the intra-domain discrepancy. Finally, the exploration scheme locally aligns features in a class-wise manner complementary to the attraction scheme by selectively aligning unlabeled target features complementary to the perturbation scheme. We conduct extensive experiments on domain adaptation benchmark datasets such as DomainNet, Office-Home, and Office. Our method achieves state-of-the-art performances on all datasets.
Article
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks. However, in many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data. To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain. Unfortunately, direct transfer across domains often performs poorly due to the presence of domain shift or dataset bias . Domain adaptation (DA) is a machine learning paradigm that aims to learn a model from a source domain that can perform well on a different (but related) target domain. In this article, we review the latest single-source deep unsupervised DA methods focused on visual tasks and discuss new perspectives for future research. We begin with the definitions of different DA strategies and the descriptions of existing benchmark datasets. We then summarize and compare different categories of single-source unsupervised DA methods, including discrepancy-based methods, adversarial discriminative methods, adversarial generative methods, and self-supervision-based methods. Finally, we discuss future research directions with challenges and possible solutions.
Chapter
The inadequacy of labeled data in various domains has limited the use of deep learning on several tasks. In many cases it is quite expensive in terms of both time and human effort to collect, annotate, and organize large datasets to train deep neural networks. In recent years, domain adaptation algorithms have been highly successful in leveraging labeled data from related but different datasets to build accurate classification models for unlabeled target datasets. In this chapter we present a deep learning based hashing model for domain adaptation. Hashing techniques are popular in computer vision for their efficiency in both data storage and data retrieval. We use hash-based image feature representations for robust similarity measures between features. We propose a Deep Hashing Network that is trained to learn unique hash codes by leveraging the data from both the labeled source domain and the unlabeled target domain, to correctly classify unlabeled target data. We present a detailed study of several transfer tasks across multiple datasets to corroborate the advantages of our framework.
Article
Unsupervised domain adaptation facilitates the unlabeled target domain relying on well-established source domain information. The conventional methods forcefully reducing the domain discrepancy in the latent space will result in the destruction of intrinsic data structure. To balance the mitigation of domain gap and the preservation of the inherent structure, we propose a Bi-Directional Generation domain adaptation model with consistent classifiers interpolating two intermediate domains to bridge source and target domains. Specifically, two cross-domain generators are employed to synthesize one domain conditioned on the other. The performance of our proposed method can be further enhanced by the consistent classifiers and the cross-domain alignment constraints. We also design two classifiers which are jointly optimized to maximize the consistency on target sample prediction. Extensive experiments verify that our proposed model outperforms the state-of-the-art on standard cross domain visual benchmarks.
Article
Domain adaptation improves a target task by knowledge transfer from a source domain with rich annotations. It is not uncommon that “source-domain engineering” becomes a cumbersome process in domain adaptation: the high-quality source domains highly related to the target domain are hardly available. Thus, weakly-supervised domain adaptation has been introduced to address this difficulty, where we can tolerate the source domain with noises in labels, features, or both. As such, for a particular target task, we simply collect the source domain with coarse labeling or corrupted data. In this paper, we try to address two entangled challenges of weaklysupervised domain adaptation: sample noises of the source domain and distribution shift across domains. To disentangle these challenges, a Transferable Curriculum Learning (TCL) approach is proposed to train the deep networks, guided by a transferable curriculum informing which of the source examples are noiseless and transferable. The approach enhances positive transfer from clean source examples to the target and mitigates negative transfer of noisy source examples. A thorough evaluation shows that our approach significantly outperforms the state-of-the-art on weakly-supervised domain adaptation tasks.
Conference Paper
Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on WebVision, a large benchmark containing 2.2 million images of real-world noisy labels. https://github.com/google/mentornet
Chapter
Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world “wild tasks” where large difference between labeled training/source data and unseen test/target data exists. In particular, such difference is often referred to as “domain gap”, and could cause significantly decreased performance which cannot be easily remedied by further increasing the representation power. Unsupervised domain adaptation (UDA) seeks to overcome such problem without target domain labels. In this paper, we propose a novel UDA framework based on an iterative self-training (ST) procedure, where the problem is formulated as latent variable loss minimization, and can be solved by alternatively generating pseudo labels on target data and re-training the model with these labels. On top of ST, we also propose a novel class-balanced self-training (CBST) framework to avoid the gradual dominance of large classes on pseudo-label generation, and introduce spatial priors to refine generated labels. Comprehensive experiments show that the proposed methods achieve state of the art semantic segmentation performance under multiple major UDA settings.
Article
Domain adaptation generalizes a learning machine across source domain and target domain under different distributions. Recent studies reveal that deep neural networks can learn transferable features generalizing well to similar novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, feature transferability drops significantly in higher task-specific layers with increasing domain discrepancy. To formally reduce the dataset shift and enhance the feature transferability in task-specific layers, this paper presents a novel framework for deep adaptation networks, which generalizes deep convolutional neural networks to domain adaptation. The framework embeds the deep features of all task-specific layers to reproducing kernel Hilbert spaces (RKHSs) and optimally match different domain distributions. The deep features are made more transferable by exploring low-density separation of target-unlabeled data and very deep architectures, while the domain discrepancy is further reduced using multiple kernel learning for maximal testing power of kernel embedding matching. This leads to a minimax game framework that learns transferable features with statistical guarantees, and scales linearly with unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed networks yield state-of-the-art results on standard visual domain adaptation benchmarks.
Article
In this paper, we study the problem of learning image classification models with label noise. Existing approaches depending on human supervision are generally not scalable as manually identifying correct or incorrect labels is timeconsuming, whereas approaches not relying on human supervision are scalable but less effective. To reduce the amount of human supervision for label noise cleaning, we introduce CleanNet, a joint neural embedding network, which only requires a fraction of the classes being manually verified to provide the knowledge of label noise that can be transferred to other classes. We further integrate CleanNet and conventional convolutional neural network classifier into one framework for image classification learning. We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets. Experimental results show that CleanNet can reduce label noise detection error rate on held-out classes where no human supervision available by 41.5% compared to current weakly supervised methods. It also achieves 47% of the performance gain of verifying all images with only 3.2% images verified on an image classification task.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Article
Adversarial learning methods are a promising approach to training robust deep networks, and can generate complex samples across diverse domains. They also can improve recognition despite the presence of domain shift or dataset bias: several adversarial approaches to unsupervised domain adaptation have recently been introduced, which reduce the difference between the training and test domain distributions and thus improve generalization performance. Prior generative approaches show compelling visualizations, but are not optimal on discriminative tasks and can be limited to smaller shifts. Prior discriminative approaches could handle larger domain shifts, but imposed tied weights on the model and did not exploit a GAN-based loss. We first outline a novel generalized framework for adversarial adaptation, which subsumes recent state-of-the-art approaches as special cases, and we use this generalized view to better relate the prior approaches. We propose a previously unexplored instance of our general framework which combines discriminative modeling, untied weight sharing, and a GAN loss, which we call Adversarial Discriminative Domain Adaptation (ADDA). We show that ADDA is more effective yet considerably simpler than competing domain-adversarial methods, and demonstrate the promise of our approach by exceeding state-of-the-art unsupervised adaptation results on standard cross-domain digit classification tasks and a new more difficult cross-modality object classification task.
Article
We propose the coupled generative adversarial network (CoGAN) framework for generating pairs of corresponding images in two different domains. It consists of a pair of generative adversarial networks, each responsible for generating images in one domain. We show that by enforcing a simple weight-sharing constraint, the CoGAN learns to generate pairs of corresponding images without existence of any pairs of corresponding images in the two domains in the training set. In other words, the CoGAN learns a joint distribution of images in the two domains from images drawn separately from the marginal distributions of the individual domains. This is in contrast to the existing multi-modal generative models, which require corresponding images for training. We apply the CoGAN to several pair image generation tasks. For each task, the GoGAN learns to generate convincing pairs of corresponding images. We further demonstrate the applications of the CoGAN framework for the domain adaptation and cross-domain image generation tasks.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.