Available via license: CC BY 4.0
Content may be subject to copyright.
Discovering Shared Space for Visual Recognitionby
Cross-Domain Residual Learning
Jie Pan
Ningbo Polytechnic
Yufang Dan
Ningbo Polytechnic
Baoqi Zhao
Ningbo Polytechnic
Jianwen Tao
Ningbo Polytechnic
Research Article
Keywords: Shared space, Residual model, Domain adaptation, Visual recognition
Posted Date: July 4th, 2024
DOI: https://doi.org/10.21203/rs.3.rs-4568347/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Additional Declarations: No competing interests reported.
Discovering Shared Space for Visual Recognition
by Cross-Domain Residual Learning
Jie Pan1, Yufang Dan1, Baoqi Zhao1, Jianwen Tao1
1School of Artificial Intelligence, Ningbo Polytechnic, Lushan Street,
Ningbo, 315100, Zhejiang, China.
Contributing authors: 2021026@nbpt.edu.cn;yufang dan@aliyun.com;
mosu local@163.com;jianwentao@aliyun.com;
Abstract
In order to solve the problem of inconsistent data distribution in machine learn-
ing, domain adaptation based on feature representation methods extract features
from source domain, and transfer to target domain for classification. The existing
feature representation based methods mainly solve the problem of inconsistent
feature distribution between the source domain data and the target domain data,
but only few methods analyze the correlation of cross-domain features between
original space and shared latent space, which reduce the performance of domain
adaptation. To this end, we propose a domain adaptation method with residual
module, the main ideas of which are: (1) transfer the source domain data features
to the target domain data through the shared latent space to achieve features
sharing; (2) build a cross domain residual learning model using the latent feature
space as the residual connection of the original feature space, which improves
the propagation efficiency of features; (3) regular feature space to sparse features
representation, which can improve the robustness of the model; and (4) give opti-
mization algorithm, and the experiments on the public visual datasets verify the
effectiveness of the method.
Keywords: Shared space, Residual model, Domain adaptation, Visual recognition
1 Introduction
In machine learning, it is necessary to have complete and sufficient training data for
parameter tuning to achieve good performance. Therefore, the amount of data and the
distribution of data features become an important factor. Traditional machine learning
1
Fig. 1 The schematic illustration of comparison algorithms between three typical feature represen-
tation based domain adaptation.
assumes that the training data and test data satisfy the condition of independent and
identically distributed. Under this assumption, most models work well. However, in
practical applications, it is usually difficult to obtain a large amount of annotated
data for a specific task, and data annotation is time-consuming and laborious, which
leads to insufficient data in machine learning models. Moreover, there is a problem
of inconsistent distribution of data between different data domains, which is called
sample selection bias[1] or covariance displacement[2], [3]. The above issues will reduce
the robustness and reliability of traditional machine learning models, leading to a
decrease in the performance of the model.
To avoid repetitive data annotation and improve the performance of the model, cur-
rently, domain adaptation (DA) methods that can achieve knowledge transfer between
different domains, solving the problems of different data distributions and lack of train
data in different domains effectively [4], [5], [6]. Domain adaptation can be mainly
divided into instance based methods, feature representation based methods, and clas-
sifier based methods. This paper focuses on the features representation based methods,
which assumes that the relevant domain data has shared implicit features [7]. That is,
that the marginal distribution between data in different domains is matched [8]. Then
uses feature space information to align the domain data in the feature space in a map-
ping way to find the shared features between different domains. Therefore, a reliable
machine learning model can be constructed using source domain data with sufficient
labels [9]. By transferring source and target domain data features, common features
between different domain data are utilized to expand the features of target domain
data, reducing the strict constraint requirements of the model on data and improving
the performance of the model [10].
The initial domain adaptation methods based on feature representation use feature
representation methods to transform source domain data and target domain data
into latent space, and construct a classifier through the latent feature representation
2
of the target domain. The schematic illustration as shown in Figure 1 (a). Gheisari
et al.[11] proposed using mapping functions to map data from both the source and
target domains into the same space, minimizing classification errors while maximizing
manifold consistency. Zheng et al.[12] searched for a new dimensionality reduction
method to find potential space, reducing the original data from different fields to a
low dimensional space with the smallest distribution distance, to achieve the feature
transfer from source domain data to the target domain data. Blitzer et al.[13] used
structural correspondence learning to model the correlation between data features in
different fields, and utilized key features for discrimination. Jiang et al.[14] further
introduced latent space into multi perspective data and extended its features to address
the differences caused by multi perspective data. Schematic illustration as shown in
Figure 1(a).
Subsequent methods considering that the original space information can guide the
construction of target domain classifier, so the original space information is combined
with the characteristics of latent space, and the common information is used to build
a classifier[15], as shown in Figure 1(b). Dong et al.[16] proposed a feature transfer
method that utilizes domain data to share low dimensional latten spaces, embedding
the low dimensional latten spaces into Support Vector Machine (SVM), constraining
the source domain data to have the same latten space as the target domain data,
thereby improving the performance of SVM. Yao et al. [17] used to expand it to the
problem of multi-source adaptation, and restrict the predicted label matrix. However,
the above methods do not consider the distribution of latten space features from
the perspective of distribution consistency, and the differences in distribution will
reduce the effectiveness of shared features, thereby reducing the performance of domain
adaptation.
Inspired by Residual Network[18], we propose a shared latent space domain
adaptation with residual model (LRDA) to make full use of the relationship between
original feature and latent space feature in domains, and address the problem of
mismatched distribution of latent space features in domain adaptation. Specifically,
mapping the source and target domain data into a shared latent feature space, mining
common features between source and target domains, to achieve the goal of fully uti-
lizing the source domain data to enhance the feature expression of the target domain
data. Subsequently, minimizing the distribution differences between the shared fea-
ture space of the source and target domain, ensuring latent space features have a more
consistent feature distribution, and reducing model performance degradation caused
by inconsistent feature distribution. Finally, by combining the original spatial features
with the the latent space features in both source and target domain, a residual model
is formed to reduce the difficulty of fitting the model and obtain a better classifier, as
shown in Figure 1(c).
The main contributions of this paper are:
1. We introduce a shared latent space by source domain feature and target domain
feature, constructing a latent space domain adaptation method. Measuring the
differences in latent space feature distribution, further constraining the consistent
distribution of shared feature spaces in the domain to align the feature distribution
in the shared latent space.
3
2. We build a residual model using the original feature space and latent feature space,
optimizing residual function to reduce the difficulty of feature transfer and improve
model performance.
3. We adopt the l2,1-norm for feature selection to sparsely represent the original fea-
ture, increasing the robustness of the model for outliers and noise naturally existing
in dataset.
4. Experiments verify that our method has better performance and can effectively
recognize shallow and deep features, effectively improve the performance of cross
domain visual recognition tasks.
2 LRDA
2.1 Notation & Definition
We denote with small and capital letters respectively column vectors and matrices.
We denote Xs=xs
1, xs
2, . . . , xs
ns∈Rd×nsis the matrix of source domain data xs
i
, where dmeans d-dimension feature and nsis the number of source domain sam-
ples. We denote Ys=ys
1, ys
2, . . . , ys
ns∈Rns×cis source domain label matrix, where
ys
i={0,1}1×cis one-hot label for source domain data, and cis the number of cat-
egories. Source domain is denoted as Ds={(xs
i, ys
i)∈Xs×Ys:i= 1,2,· · · , ns}. In
the same way, given Dt=xt
j, yt
j∈Xt×Yt:j= 1,2,· · · , ntas target domain,
Xt=xt
1, xt
2, . . . , xt
nt∈Rd×ntand Yt=yt
1, yt
2, . . . , yt
nt∈Rnt×cas target domain
data matrix and target domain label matrix respectively. It is worth noting that the
target domain data usually lacks complete label, but source and target domains share
the same c-cardinality label set.
Assuming the distribution of the data is known, then the marginal probability
distribution of the source domain data and the target domain data can be represented
by P(Xs) and Q(Xt) respectively. The goal of domain adaptation is to design a robust
model that can predict the labels of target domain data using samples from the source
domain. As a matter of fact, Dsand Dtare neither identical nor uncorrelated, so
their edge probability distributions are different. For domain adaptation problems, the
source domain data and target domain data come from different domains. Therefore,
it is assumed that P(Xs)=Q(Xt). However, the distribution of category conditions
for the source and target domain data is consistent, i.e. P(Xs|Ys) = Q(Xt|Yt).
2.2 Formulation of LRDA
We design an objective function for latent space residual domain adaptation, which can
simultaneously achieve the following goals: (1) finding a shared latent space to ensure
that there are more common feature between the source domain data and the target
domain data; (2) regularization the models of source domain and target domain to
reduce the complexity of model structure; and (3) constraining in the latent space, the
distribution of the source and target domains is consistent. Due to the lack of sufficient
label in the target domain, we denote a label matrix F=ft
1, f t
2, . . . , f t
nt∈Rnt×c
corresponding to the target domain samples Xt. If the sample has a label, set ft
i=yt
i;
otherwise ft
iis a pseudo label. Therefore, we can express our model using the following
4
objective functions:
R=αL(fs(Xs), Y s) + LftXt, F
+β(Ωr(fs) + Ωr(ft))
+γΩd(Xs, Xt),(1)
where fsand ftare source and target domain classifiers, respectively. L(·) is regression
losses in the source and target domains. Ωris the regularization term of the source
domain and the target domain to reduce the generalization error of model. Ωdis used
to align the distribution of data between the source and target domains. α,β,γare
regularization hyper parameter to adjust the weight.
Definition 1 (Residual model). Assuming data is x, the objective function of the
residual model can be expressed as f(x) = x+F(x), where F(x) = f(x)−xis residual
function. The structure of the residual model is shown in Figure 2(a).
He [18] proved that comparing to fitting the objective function f(x), the difficulty
of fitting the residual function is obviously lower. Therefore, the objective function
can be transformed into: let model F(x) to approximate the residual function f(x)−x
and use x+F(x) to express the objective function.
Fig. 2 Comparison algorithms of three typical feature based representation domain adaptation.
Inspired by the residual model, we use the feature of the source and target domains
to construct a shared latent feature space to constrain the classifier to integrate the
common feature of the source and target domain. The latent feature space and the
original feature space are constructed as a residual model, reducing the difficulty of
fitting the model and improving the performance, as shown in Figure 2(b). Specifically,
the classifier of the shared feature space consists of two parts: the decision function
of the original data and the decision function of the shared feature, which can be
represented as follows:
fs(Xs) = WsTXs+VT
sθTXs
ft(Xt) = WtTXt+VT
tθTXt,(2)
where Wsand Wtare the classification model vectors of the source domain and the
target domain in the original space. θ∈Rr×dis shared latent spatial transformation
matrices used to map the source domain and the target domain to the same feature
5
space. ris the dimension of the shared feature space, Vsand Vtare the classification
model vectors of θXsand θX t, respectively. The source domain and target domain
information are shared by θ.
Then the regression loss of the residual model classifier is:
L(fs(Xs), Y s) =
WsTXs+VT
sθTXs−Ys
2
LftXt, F =
WtTXt+VT
tθTXs−F
2.(3)
In order to improve the robustness of the model, we conduct sparse regression on
the feature extracted from the source domain and the target domain[19]. We use l2,1-
norms [20] to sparse the classification model vectors of the original space to obtain
the function of Ωr:
Ωr(fs) = ∥Ws∥2,1+∥Ws+θVs∥2
Ωr(ft) = ∥Wt∥2,1+∥Wt+θVt∥2.(4)
Due to source and target domain are commonly P(Xs)=Q(Xt) , that is, incon-
sistent distribution between source and target domain. If simply applying a source
domain classifier to target domain, it may lead to deterioration of model performance
due to misaligned domain distribution. How to minimize the distribution gap between
different domains is a key issue for DA. Therefore, we use MMD [21] to calculate
the average vector distance of data in the reproducing kernel Hilbert space, so as to
measure the distribution difference between different domains. Therefore, minimizing
Equation (5) can align the feature distribution differences in the shared space, and the
transformed source and target domains have common latent space distribution feature:
Ωd(Xs, Xt) =
1
ns
θTXs−1
nt
θTXt
2
=tr(θTXDX Tθ)
s.t. θTθ=Ir×r,(5)
where Di,j =
1
n2
s, when xi, xj∈Xs
1
n2
t
, when xi, xj∈Xt
−1
nsnt, others
,X= [Xs, Xt], ris the dimension of the
shared hidden space. According to matrix theory. By minimizing Equation (5), the
feature in latent space have the similarly distribution, also ensuring both discriminant
and domain invariance of the model.
2.3 Final Formulation
In summary, we propose a latent space domain adaptation with residual model. The
algorithm schematic is shown in Figure 1(c). Combining Equation (3) - (5), the
objective function is as follows:
R1= arg min
Ws,Wt,Vs,Vt,F,θ
6
+α
XsTWs+XsT Vsθ−Ys
2
+
XsTWt+XsT Vtθ−F
2
+β∥Ws∥2,1+∥Wt∥2,1
+∥Ws+θVs∥2+∥Wt+θVt∥2
+γtr(θTXDXTθ).
s.t. θTθ=Ir×r
F F T=Ir×r(6)
The objective function is solved by minimize the Ws,Wt,Vs,Vt,Fand θ, and
we apply these parameters to the decision functions fs(Xs) and ft(Xt). Thus we fuse
decision functions linearly as final decision functions:
m= max
myt
i=ϕfsxt
i+ (1 −ϕ)ftxt
im.(7)
where ϕ∈[0,1] is hyper parameter to adjust the weight between source and the target
domain classifiers. To simplify, we set ϕ= 0.5.
For the convenience of subsequent solutions, this paper converts Equation (8) to:
R2= arg min
Us,Ut,Vs,Vt,F,θ
+α
XsTUs−Ys
2
+
XtTUt−F
2
+β∥Us∥2+∥Ut∥2
+∥Us−θVs∥2,1+∥Ut−θVt∥2,1
+γtr(θTXDXTθ),(8)
where Us=Ws+θVs,Ut=Wt+θVt, are auxiliary variables for simplify the calculation.
3 Optimization Algorithm
To facilitate optimization, we refer to parameter alternation optimization methods in
fuzzy clustering, we divides six variables into four groups:{Us,Ut},{Vs,Vt},{F},
{θ}. We iteratively optimize these sets of parameters until the variables converge or
the objective function is less than the threshold. Next, we optimize each part of the
functions in Equation (8).
7
3.1 Optimization Procedure
3.1.1 Optimize Vs,Vtby Fixing Us,Ut,Fand θ
Theorem 1. When Vs=θTUsand Vt=θTUt, the objective function R2can obtain
the minimum value regarding the variable Vsand Vs.
Proof. Assuming that the variablesUs,Ut,Fand θare known, Equation (8) can be
transformed into:
R3= arg min
Vs,Vt
β∥Us−θVs∥2,1+∥Ut−θVt∥2,1.(9)
Let:
L1= arg min
Vs
βtr (Us−θVs)TMs(Us−θVs),
L2= arg min
Vt
βtr (Ut−θVt)TMt(Ut−θVt).(10)
In terms of the definition in Nie et al, for matrix A∈Rn×d, we can derive ∥A∥2,1=
2tr ATMA, where Mii =1
2∥Ai,:∥2
. By solving the derivative of Equation (10) w.r.t.
Vs,Vtand letting it equal to zero:
∂L1
∂Vs
= 0 ⇒Vs=θTUs,
∂L2
∂Vt
= 0 ⇒Vt=θTUt.(11)
3.1.2 Optimize Us,Utby Fixing Vs,Vt,Fand θ
Theorem 2. After obtaining the optimal value obtained from Theorem 1, when Us=
As−1Bsand Ut=At−1Bt, the objective function R2can obtain the minimum value
regarding the variable Usand Us.
Proof. It is obviously that Equation (8) with fixed Vs,Vt,Fand θis equivalent to:
R4= arg min
Us,Ut
+α
XsTUs−Ys
2
+
XtTUt−F
2
+β∥Us∥2+∥Ut∥2
+∥Us−θVs∥2,1+∥Ut−θVt∥2,1.(12)
8
Let:
L3= arg min
Us
α
XsTUs−Ys
2
+β∥Us∥2+∥Us−θVs∥2,1,
L4= arg min
Ut
α
XtTUt−F
2
+β∥Ut∥2+∥Ut−θVt∥2,1.(13)
By solving the derivative of Equation (13) w.r.t. Vs,Vtand letting it equal to zero.
We have:
Us=As−1Bs,
Ut=At−1Bt.(14)
where As=αXsXsT+β+βMs,Bs=αX sYs+βMsθVs,At=αX tXtT+β+βMt
and Bt=αXtF+βMtθV .
3.1.3 Optimize Fby Fixing Us,Ut,Vs,Vtand θ
Theorem 3. After obtaining the optimal values obtained from Theorem 1 and
Theorem 2, the objective function R2can obtain the minimum value when F=XtTUt
Proof. Assuming known variables Us,Ut,Vs,Vtand θ, then Equation (8) can be
simplified to be:
R5= arg min
Fα
XtTUt−F
2.(15)
By setting the derivative of Equation (14) w.r.t. Fas 0, the prediction labels F
are obtained by F=XtTUt.
3.1.4 Optimize θby Fixing Us,Ut,Vs,Vtand F
Theorem 4. When the variableUs,Ut,Vs,Vtand Fobtain the optimal solution
obtained from Theorem 1-3, the solution of θcan be transformed into a singular value
decomposition(SVD) problem for the matrix O.
Proof. Then with fixed Us,Utand F, we can get the optimal θby solving:
R6= arg min
Vs,Vt,θ
β∥Us−θVs∥2,1+∥Ut−θVt∥2,1
+γtr(θTXDXTθ) (16)
9
To solve the minimum value of Equation (16), let Vs=θTUs,Vt=θTUtand then
equation (16) can be rewritten as solving the generalized eigen-decomposition problem:
R7= arg min
θ2βtr −θTU UTN θ −θTN UU Tθ
+γtr(θTXDXTθ)
= arg max
θtr θTOOTθ,
s.t. θTθ=Ir×r(17)
where U= [Us, Ut]. Let OOT= 2βU U TN+ 2β N U UT−γXDX T. According to
matrix theory, the solution of the above equation can be obtained by singular value
decomposition of the matrix O. Let O=EΣGT, by descending the diagonal elements
in the diagonal matrix Σ, the first rrows of ETare the optimal solutions for θ.
3.2 Algorithm Description
According to the above optimization rules, the method adopts iterative learning strat-
egy for parameter optimization to achieve parameter update and optimization. The
algorithm description is shown in Algorithm 1.
Algorithm 1 Latent Space Domain Adaptation with Residual Model
Require: Source domain data and labels Xs×Ys, target source domain data and
labels Xt×Yt, regularization hyper-parameter α,β,γ, shared feature dimension
r, the maximal iteration number N.
1: Set v= 0,set Us,Ut,Vs,Vt,θas uniform distribution matrix in U[0,1], set threshold
εof objective function
2: while v > N or R2< ε do
3: Compute Vs=θTUsand Vt=θTUt
4: Compute As=αXsXsT+β+βMsand Bs=αX sYs+βMsθVs
5: Compute At=αXtXtT+β+βMtand Bt=αX tF+βMtθVt
6: Compute U=A−1Band get Us,Ut
7: Compute F=XtTUt
8: Compute OOT= 2βU U TN+ 2β N U UT−γXDX T, let O=EΣGT, by
descending the diagonal elements in the diagonal matrix Σ, the first rrows of ET
are the optimal solutions for θ.
9: Calculate the objective function R2according to Equation (8).
10: Let v=v+ 1
11: end while
4 Experiment and Analysis
In order to verify the effectiveness of LRDA, we experiment on 6 visual databases as
benchmark domain databases, and compare with existing related methods.
10
4.1 Datasets
1. Office31[22] contains images from Amazon(A), DSLR(D), and Webcam(W). There
are 31 categories. Reference of [23], we use the characteristics of fine-tuning on
AlexNet-FC7.
2. Office-Caltech contains 10 categories of images between Caltech-256 (C)[24] and
Office31, and each type contains more than 80 images. Among them, the number of
images from sourec C, A, W, D are 1123, 958, 295 and 157. Office-Caltech provides
SURF features and DeCAF features [25].
3. Office-Home[26] includes 4 domains: Art (sketching, painting, decoration and
other forms, A), Clipart (Collection of clipping images, Cl), Product (Object images
without a background,Pr) and Real-World (object images taken with ordinary
cameras, RW). Each domain consists of 65 categories.
4. PIE[27] is pose, illumination and expressions of faces, including 68 people with 13
attitudes. Reference in [28], we select C05 (left), C07 (upward), C09 (down), C27
(front), and C29 (right).
5. MNIST-UPS is composed of handwritten digital datasets MNIST[29] and
UPS[30]. The experiments in this article are consistent with [30], we select 2000
and 1,800 images from MNIST and UPS, the size of image is 16 ×16.
6. COIL20[31] contains 20 objects, each rotating 360◦horizontally and capturing
an image every 5◦, obtaining 72 images per object. According to the direction of
the shooting target, the database is divided into two subsets: COIL1 and COIL2.
Specifically, COIL1 includes all images captured along the [0 ; 85] and [180 ; 265]
directions, while COIL2 includes the remaining directions.
Table 1 Details of six domain adaptation datasets
Database Subset Images Feature(size) Classes
Office31
Amazon A 2817
31
DSLR D 498 AlexNet-FC7(4096)
Webcam W 795
Office-Caltech
Amazon A 958
10
Caltech C 1123 SURF(800)
DSLR D 157 DeCAF6(4096)
Webcam W 295
Office-Home
Art A 2421
65
Clipart Cl 4379 ResNet101-P5
Product Pr 4428 VGG-FC7
Real-World Rw 4357
PIE
C05 P1 3332
68
C06 P2 1629
C07 P3 1632 Pixel(1024)
C08 P4 3329
C09 P5 1632
MNIST-USPS MNIST M 2000 VGG-FC7 10
USPS U 1800
COIL20 COIL1 C1 720 VGG-FC7 20
COIL2 C2 720
11
4.2 Benchmark Methods & Experimental Settings
To comprehensively verify the effectiveness, we compare LRDA with shallow and deep
unsupervised domain adaptation methods. The shallow methods are: SVM, GFK[32],
JDA[28], DIP[33], CDDA[34], JGSA[25], SA[35], CORAL[23], ATI[36], DICE[37]. The
deep methods are: DDC[38], DAN[39], DANN[40], DRCN[41], RTN[42], WDAN[43],
JAN[44], ADDA[45], AutoDIAL[42], CAN[46], SFDA[9], GVB-GD[47], GSDA[48],
SRDC[49], DGA-DA[50]. In addition, we re-run the public code of JGSA[25] and
CORAL[23], and replicate the code of DGA-DA. The original experimental results
of other methods were obtained from corresponding papers. We run the CORAL
source code in LIBSVM and selected the optimal variant DICESVM method for its
performance.
Parameter Setting In the method of LRDA, there are three hyper parameters
need to be set, which are used to balance the importance of residual classifiers, feature
selection, and feature distribution alignment. Therefore, these three parameters have
a crucial impact on the final performance of the algorithm. How to set is still an
unresolved issue in machine learning. Referring to the work of Tao et al.[51], we use
grid search strategy to solve this problem. Specifically, we adjust hyper parameters
within the range of grids {10−6,10−5,...,105,106}to find the parameter combination
with the highest accuracy. The maximum number of iterations to 100.In order to
achieve optimal results, all experimental algorithm parameters will be searched and
adjusted.
Performance evaluation Due to target domain data without label, standard
cross validation cannot be performed. To address this issue, we conduct p-fold cross
validation on labeled source domain data, which trains using p-1 fold source domain
data and target domain data,then calculates the average accuracy of all category data
on 1-fold source domain data. The optimal parameters are obtained when
Table 2 Recognition accuracies(%) on the Office31 with shallow DA methods
Method A→D A→W D→A D→W W→A W→D Avg.
SVM 57.8 56.9 47.2 95.8 45.5 98.6 67.0
GFK 58.2 59.4 45.9 95.6 43.8 98.6 66.9
JDA 66.5 68.8 56.3 97.7 53.5 99.6 73.7
DIP 56.0 51.9 44.0 95.3 42.3 98.8 64.7
CDDA 64.1 65.2 55.0 97.2 53.8 99.8 72.5
JGSA 67.5 62.3 55.6 98.1 52.0 99.8 72.5
SA 59.4 57.7 47.2 95.1 46.5 99.0 67.5
CORAL 60.4 57.0 47.6 96.2 46.3 99.0 67.8
ATI 70.3 68.7 55.3 95.0 56.9 98.7 74.2
DICE 68.5 72.5 58.1 97.2 60.3 100 76.1
LRDA 67.8 74.5 59.0 98.4 60.8 98.9 76.6
12
Table 3 Recognition accuracies(%) on the Office31 with deep DA methods
Method A→D A→W D→A D→W W→A W→D Avg.
DDC 64.4 61.8 52.1 95.0 52.2 98.5 70.6
DAN 67.0 68.5 54.0 96.0 53.1 99.0 72.9
DANN 72.3 73 53.4 96.4 51.2 99.2 74.3
DRCN 66.8 68.7 56 96.4 54.9 99.0 73.6
RTN 71.0 73.3 50.5 96.8 51.0 99.6 73.7
WDAN 64.5 66.8 53.8 95.9 52.7 98.7 72.1
JAN 71.8 74.9 58.3 96.6 55.0 99.5 76.0
ADDA 72.4 75.1 58.8 97.0 57.3 99.6 76.7
AutoDIAL 73.6 75.5 58.1 96.6 59.4 99.5 77.1
CAN 74.6 79.3 60.1 97.0 58.7 97.0 77.8
GVB-GD 75.0 74.8 53.4 98.7 53.7 100 75.9
GSDA 74.8 75.7 53.5 99.1 54.9 100 76.3
SRDC 75.8 75.7 56.7 99.2 57.1 100 77.4
SFDA 76.0 74.2 56.6 98.5 55.5 99.8 76.8
LRDA 74.4 76.8 58.7 99.1 58.8 99.5 77.9
Table 4 Recognition accuracies(%) on Office-Caltech with SURF features
Data JDA DIP OTGL CDDA JGSA SA CORAL DICE DGA-DA LRDA
A→C 39.4 40.0 34.6 39.6 41.5 44.3 45.1 44.3 38.0 45.2
A→D 39.5 36.9 38.9 38.2 45.2 36.3 39.5 52.2 42.0 52.1
A→W 38.0 35.9 37.0 46.1 45.1 38.3 44.4 53.2 52.2 54.1
C→A 44.8 40.8 44.2 49.4 53.1 54.8 54.3 53.8 50.3 54.6
C→D 45.2 45.2 44.5 49.7 48.4 45.2 36.3 51.6 52.9 50.9
C→W 41.7 37.3 38.9 38.6 48.5 44.4 38.6 54.9 45.8 55.2
D→A 33.1 31.5 37.2 32.8 38.7 39.4 37.7 42.6 37.1 43.7
D→C 31.5 30.6 32.4 33.7 30.3 34.3 33.8 34.2 33.5 34.5
D→W 89.5 83.7 81.1 89.8 93.2 85.1 84.7 84.1 91.2 91.2
W→A 32.8 27.6 39.4 36.7 40.8 36.3 35.9 39.5 32.4 40.5
W→C 31.2 28.8 36.0 32.0 33.6 33.2 33.7 38.6 31.2 38.7
W→D 89.2 91.7 84.0 91.1 88.5 83.4 86.6 87.3 91.7 88.7
Avg. 46.3 44.2 45.7 48.1 50.6 47.9 47.6 53.0 49.8 54.1
4.3 Fesults of Cross Domain Visual Recognition
4.3.1 Results and on the Office31 Dataset
The comparison between LRDA and shallow domain adaptation method on the
Office31 dataset is shown in Table 2and Table 3. LRDA can basically achieve the
optimal recognition accuracy. The results of deep domain adaptation methods are
shown in Table 3. It is worth noting that whether compared with shallow models
or deep models, the recognition accuracies of LRDA in tasks A→D and W→D are
slightly lower than comparison methods. The reason is LRDA needs a certain amount
of target domain data to provide information for model decision-making, resulting in
a greater preference for source domain data in the model. Overall, LRDA generally
achieves good performance on both shallow and deep models, achieved 76.6% and
77.8% respectively, indicating its good applicability.
13
Table 5 Recognition accuracies(%) on Office-Caltech with DeCAF6 features
Data GFK JDA DIP OTGL DGA-DA SA CORAL DICE JDO ATI RTN LRDA
A→C 82.1 85 78.9 85.5 86.7 85.0 83.6 87.6 85.2 86.5 87.8 86.8
A→D 82.2 86.6 80.9 85.0 89.2 87.3 84.7 91.1 87.9 92.8 92.9 91.5
A→W 73.2 83.1 69.8 83.1 86.4 76.6 74.2 88.1 84.8 88.7 93.8 92.1
C→A 92.1 91 89.8 92.2 92.3 92.2 91.2 93.4 91.5 93.8 93.2 93.5
C→D 92.4 87.9 84.7 87.3 92.4 88.5 87.9 95.5 89.8 89.6 93.9 93.4
C→W 84.4 82.4 72.2 84.2 89.8 82.0 80.7 95.3 88.8 93.6 96.6 96.9
D→A 88.4 91 85.3 92.3 92.4 85.6 83.8 92.5 88.1 93.4 93.6 93.2
D→C 80.3 85.1 75.5 84.1 86.5 76.9 71.6 88.5 84.3 85.9 83.4 88.8
D→W 99 99.7 98.6 96.3 99.0 96.9 97.6 99.0 96.6 98.9 98.6 98.4
W→A 84.3 91.1 72.4 90.6 90.7 83.6 72.1 91.1 90.7 93.6 92.7 92.9
W→C 76.5 85.3 70.3 81.5 85.6 74.3 67.4 88.0 82.6 86.3 84.8 85.4
W→D 100 100 99.4 96.3 100 99.4 100 100 98.1 100 100 100
Avg. 86.2 89.1 81.5 88.2 90.9 85.7 82.9 92.5 89.0 91.9 92.6 92.7
4.3.2 Results on the Office-Caltech Dataset
This section conducts cross domain recognition experiments on Office-Caltech dataset,
with reference settings [52]. The results with SURF and DeCAF6 features are shown
in Table 4and Table 5. The LRDA achieves the highest recognition accuracy among
10 cross domain recognition tasks with SURF features, only on C→D and W→D tasks
are slightly lower than DICE and DGA-DA. In addition, LRDA achieves the high-
est average accuracy, demonstrating the effectiveness of the method. In experiments
with DeCAF6 features, accuracy of LRDA is above 85% in all settings and achieves
the best results in average accuracy particularly. Comparing with RTN based on the
deep learning model AlexNet, LRDA achieves higher accuracy in all five experimen-
tal settings, with a higher accuracy of 5.4% in the D→C compared to RTN, resulting
in significant performance improvement. That due to the significant distribution dif-
ferences between images of different concepts, it is difficult to align the fields. LRDA
can further explore common features between different fields, thereby enhancing the
ability of discriminators.
4.3.3 Results on the Office-Home Dataset
Table 6and Table 7show the cross domain recognition results on Office-Home dataset
with ResNet101-P5 features. We fine-tune the parameters on the ResNet101 and
obtain the results by inputting the extracted fifth feature pooling layer into the unsu-
pervised domain for adaptive learning. Notice that LRDA achieves the best recognition
performance in five domain recognition tasks, with an average recognition accuracy
rate 0.2% lower than DICE. We consider that DICE excluding the features of differ-
ent types of data while incorporating the distribution of similar data. Table 6 shows
the cross domain recognition results on Office-Home with VGG-F7 features. This
experiment compares both shallow and deep models, and LRDA is the second-best per-
forming methods in average accuracy that is only inferior to JAN. Counter-intuitively,
LRDA achieves better result even than deep methods (e.g., DAN and DANN). In addi-
tion, the performance of GSDA, GVB-GD, SRDC, and SFDA methods is superior to
14
Table 6 Recognition accuracies(%) with on the Office-Home ResNet101-P5 features
Data GFK JDA DIP CDDA JGSA DGA-DA CNN SA CORAL DICE LRDA
Ar→Cl 35.8 40.5 35.5 40.8 40.8 40.8 36.0 36.7 36.3 42.6 42.5
Ar→Pr 54.4 58.9 54.3 57.7 58.2 57.7 53.7 54.7 54.1 61.1 60.9
Ar→Rw 65.0 67.5 64.9 66.3 67.5 66.3 64.6 65.3 65.3 68.3 68.6
Cl→Ar 39.2 40.8 39.5 41.3 40.8 41.3 39.2 39.9 39.2 43.3 43.1
Cl→Pr 48.4 51.9 48.0 51.7 52.0 51.7 48.9 48.3 47.9 54.3 54.7
Cl→Rw 51.8 55.2 51.9 53.9 54.6 53.9 51.7 51.4 51.5 57.1 57.5
Pr→Ar 42.3 45.1 41.9 46.1 45.3 46.1 41.2 41.4 41.5 48.3 48.4
Pr→Cl 32.5 33.3 32.1 35.4 33.5 35.4 32.9 33.0 32.7 35.9 35.5
Pr→Rw 64.1 67.2 63.7 66.0 66.4 66.0 63.2 64.4 64.1 69.2 68.8
Rw→Ar 58.1 58.8 58.2 59.1 58.5 59.1 57.9 57.8 57.7 60.2 59.3
Rw→Cl 39.5 44.2 39.2 45.3 43.8 45.3 39.5 40.0 39.5 46.2 46.0
Rw→Pr 69.6 72.4 69.6 71.6 72.4 71.6 68.9 69.5 69.3 73.5 72.7
Avg. 50.1 53.0 49.9 52.9 52.8 52.9 49.8 50.2 49.9 55.0 54.8
Table 7 Recognition accuracies(%) on the Office-Home with VGG-F7 features
Data GSDA GVB-GD SRDC SFDA JAN CDDA JGSA DICE DAN DANN LRDA
Ar→Cl 61.3 57.0 52.3 59.7 45.9 42.6 41.9 43.5 43.6 45.6 44.4
Ar→Pr 76.1 74.7 76.3 79.5 61.2 62.3 63.1 64.2 57.0 59.3 64.2
Ar→Rw 79.4 79.8 81.0 82.4 68.9 69.2 70.1 70.5 67 9 70.1 70.6
Cl→Ar 65.4 64.6 69.5 69.7 50.4 46.3 46.5 46.7 45.8 47.0 46.1
Cl→Pr 73.3 74.1 76.2 78.6 59.7 52.9 54.1 54.6 56.5 58.5 56.9
Cl→Rw 74.3 74.6 78.0 79.2 61.0 57.8 59.0 59.4 60.4 60.9 59.5
Pr→Ar 65.0 65.2 68.7 66.1 45.8 50.5 50.4 51.8 44 46.1 50.7
Pr→Cl 53.2 55.1 53.8 57.2 43.4 43.0 42.0 43.9 43.6 43.7 44.1
Pr→Rw 80.0 81.0 81.7 82.6 70.3 69.6 70.5 71.8 67.7 68.5 69.8
Rw→Ar 72.2 74.6 76.3 73.9 63.9 63.1 62.3 64.1 63.1 63.2 63.7
Rw→Cl 60.6 59.7 57.1 60.8 52.4 47.9 46.2 48 51.5 51.8 52.4
Rw→Pr 83.1 84.3 85 85.5 76.8 73.8 73.1 74.3 74.3 76.8 74.7
Avg. 70.3 70.4 71.3 72.9 58.3 56.6 56.6 57.7 56.3 57.6 58.1
that of LRDA methods in some tasks. The results indicate that above methods con-
sider the differences between different categories of data or maintain the distribution
structure of intra class data when extracting domain features, thereby improving the
performance of the above methods. Nevertheless, LRDA still has good performance,
especially achieving better performance than deep methods, indicating that shallow
methods have advantages in domain adaptation tasks, reducing algorithm complexity
and improving model efficiency while ensuring algorithm accuracy.
4.3.4 Results on the PIE Dataset
In the task of cross domain facial recognition, this experiment with two data prepro-
cessing methods(l2normalization and z-score standardization)[37]. The experimental
settings divide into two groups, and the recognition accuracies are shown in Table 8
and 9. In the experiment using the l2normalization (Table 7), we can clearly find that
LRDA achieves the highest recognition accuracy in four tasks compared to shallow
15
Table 8 Recognition accuracies(%) on the PIE Dataset with l2regularization
Data DIP OTGL CDDA JGSA SA CORAL DGA-DA GFK JDA CDDA DICE LRDA
P1→P2 29.9 59.4 76.3 62.2 32.8 31.8 76.4 42.2 76.1 77.5 85.1 83.6
P1→P3 32.8 58.7 72.3 60.0 34.5 31.9 72.5 53.7 74.1 77 86.9 86.4
P1→P4 36.7 0 92.1 80.6 43.5 41.8 92.1 69.8 92.8 90.3 96.2 95.9
P1→P5 12.7 48.4 60.7 45.1 22.5 19.9 60.8 43.9 70.8 67.8 71.8 72.0
P2→P1 25.8 61.9 77.0 68.2 27.7 26.6 77.0 43.2 79.8 77.3 76.8 77.1
P2→P3 53.4 64.4 77.5 64.9 37.3 35.0 77.5 54.0 80.2 81.1 79.1 78.6
P2→P4 50.1 0 87.1 77.6 58.5 59.7 87.1 69.1 90.4 88.5 93.5 93.6
P2→P5 29.5 52.7 64.3 52.3 27.1 25.9 63.6 42.6 68.3 70.4 72.3 71.6
P3→P1 22.7 57.9 80.8 62.9 29.1 25.1 80.8 51.6 78.3 81.2 80.1 80.7
P3→P2 36.3 64.7 72.2 60.3 37.0 36.5 72.2 52.0 81.2 82.0 77.7 80.4
P3→P4 45.8 0 84.7 71.0 54.8 54.0 84.7 72.1 92.3 91.7 95.1 94.8
Avg. 34.2 42.6 76.8 64.1 36.8 35.3 76.8 54.0 80.4 80.4 83.2 83.2
Table 9 Recognition accuracies(%) on the PIE Dataset with z-score standardization
Data DIP OTGL CDDA JGSA SA CORAL DGA-DA GFK JDA CDDA DICE LRDA
P3→P5 20.2 52.8 64.3 51.2 30.5 26.0 64.5 50.9 70.5 79.9 78.1 78.6
P4→P1 31.4 0 93.6 84.4 52.4 48.3 93.4 72.6 95.1 89.7 96.8 96.1
P4→P2 67.5 0 93.2 83.5 70.0 69.7 93.2 75.8 94.7 94.8 96.6 95.7
P4→P3 76.8 0 92.2 80.8 72.7 72.7 92.2 80.9 92.4 92.1 94.5 94.0
P4→P5 36.5 0 74.0 65.9 48.6 48.5 74.0 61.1 80.8 85.1 90.4 89.8
P5→P1 14.2 45.7 68.1 53.5 34.5 32.0 67.7 45.3 64.1 67.3 79.4 78.8
P5→P2 29.3 51.3 65.1 57.5 30.9 30.4 65.4 38.9 74.2 74.5 71.4 72.2
P5→P3 31.7 52.6 70.5 54.3 31.9 32.6 71.6 47.7 75.3 79.3 82.7 82.3
P5→P4 26.3 0 79.7 62.3 45.1 44.5 79.7 59.3 81.7 80.3 89.5 88.9
Avg. 37.1 22.5 77.9 65.9 46.3 45.0 78.0 59.2 81.0 82.3 86.6 86.3
and deep methods. Besides, LRDA has the highest average accuracy. In the experi-
ment of z-score standardization, LRDA achieves recognition performance comparable
to DICE and far superior to other recognition methods. Affected by the features in the
background area in facial data, LRDA focuses on how to effectively convert features
between the source and target domains, possibly aligning interference features in the
background area. However, the recognition accuracy of LRDA on the PIE dataset is
still comparable.
4.3.5 Results on the MNIST-USPS & COIL20 Dataset
In order to further demonstrate the advantages of the proposed method in cross
domain adaptation tasks of digital images, Tabel 10 shows the cross domain recogni-
tion performance on the MNIST-UPS dataset and COIL20. As can be seen from the
first three rows, LRDA achieves highest average accuracy, and U→M task. Neverthe-
less, the performance of LRDA is slightly lower than that of DGA-DA and DICE on
M→U task , possibly due to the significant differences between these two fields. In
the M→U task, the feature distribution is difficult to close, and the feature projection
methods of DGA-DA and DICE are more suitable for this type of data. The compar-
ison results in last three rows of Table 9 indicate that the performance of LRDA is
16
Table 10 Recognition accuracies(%) on the MNIST-USPS and COIL20 Dataset
Data DIP OTGL CDDA JGSA SA CORAL DGA-DA GFK JDA CDDA DICE LRDA
M→U 68.6 67.8 70.6 67.1 76.2 71.2 80.4 82.3 35.8 79.7 82.1 80.6
U→M 50.1 48.8 60.0 46.3 62.1 54.9 68.2 70.8 36.4 59.8 68.8 71.1
MU-Avg. 59.3 58.3 65.3 56.7 69.1 63.0 74.3 76.5 36.1 69.8 75.4 75.8
C1→C2 86.4 86.8 94.7 84.6 91.5 86.9 95.4 99.6 82.1 92.5 98.5 99.2
C2→C1 85.0 85.0 93.5 84.0 93.9 86.9 93.9 99.7 81.8 94.4 98.7 99.3
C1C2-Avg. 85.7 85.9 94.1 84.3 92.7 86.9 94.7 99.7 81.9 93.5 98.6 99.3
Table 11 Computation time(s) of each domain adaptation method on Office31
Task JDA DIP SA GFK DICE CDDA CORAL LRDA
A→D 71.23 65.08 74.71 77.55 126.28 116.58 64.2 114.79
D→W 45.52 54.4 59.64 72.29 71.33 96.37 48.37 92.77
W→A 147.12 220.33 135.51 234.72 240.91 319.08 238.18 225.88
Fig. 3 Convergence curve of LRDA algorithm.
closed to 100%, only slightly lower than DGA-DA, with a recognition accuracy reduc-
tion of approximately 0.5%. This is due to the exclusion of heterogeneous data by the
DGA-DA method. Overall, LRDA is still significantly superior to other methods.
4.4 Convergence Analysis
In this section, we explore the convergence of the algorithm. LRDA adapts iterative
learning strategy for parameter optimization to achieve parameter update optimiza-
tion. Experiments use a computer with a memory of 64GB and an Intel i5-9600k
CPU, use MATLAB as experimental platform. Figure 3 shows the convergence curve
of LRDA. Three sets of experiments use Amazon (A), DSLR (D) and Webcam (W) as
the source and target domains in Office31 dataset. The result shows that after 40-50
iterations of optimization, algorithm achieves convergence in all of three sets of exper-
iments, indicating that the objective function of the model can be minimized through
17
parameter iterative learning. It proves that the LRDA method has good convergence
and can be optimized through parameter iteration.
4.5 Complexity Comparison
Since domain adaptation method requiring the extraction of features from two data
domains, which lead to an increase in computation time. Table 11 compares the com-
putational time of 8 domain adaptation methods on the Office13 dataset, with the
experimental setup being the same as section 4.4. The experimental result indicates
that because of LRDA need for iterative optimization , its time cost is relatively
high, which also occurs in DICE, CDDA, and CORAL. Because of closed form, the
computation time of GFK, DIP, and SA are relatively low. JDA needs feature decom-
position of data, its computation time is increasing as the increase of data volume.
However, according to the experiment in section 4.3, although LRDA requires a longer
computation time, its performance is higher than other methods.
5 Conclusion
In this work, we explore the mismatched distribution of latent space features, and
neglected the association between the original space and latent space. In response to
these problems, we propose a latent space domain adaptation with residual model.
LRDA reduces the difference in feature distribution in the shared latent space to
enhance the feature expression. The objective function constructs the original fea-
ture space and the shared feature space as residual model to facilitate parameter
optimization, improve model performance. The experiments compare LRDA with the-
state-of-art shallow and deep domain adaptation algorithms on six open datasets,
verify the proposed method is comparable. Due to the need of calculating decision
functions in domains, it increases the time cost of algorithm. However, accuracy and
time consumption are still a difficult contradiction to solve. In the subsequent work,
we will further investigate how to perform batch training on adaptive models in the
field of large-scale data to reduce the time cost of the algorithm. In addition, the hyper
parameter setting of the algorithm needs to be determined through grid search, which
is also a problem that needs further exploration.
Declarations
•Funding
No
•Conflict of interest
Not applicable
•Data availability
No
18
References
[1] Huang, J., Gretton, A., Borgwardt, K., Sch¨olkopf, B., Smola, A.: Correcting sam-
ple selection bias by unlabeled data. Advances in neural information processing
systems 19 (2006)
[2] Sugiyama, M., Storkey, A.J.: Mixture regression for covariate shift. Advances in
neural information processing systems 19 (2006)
[3] Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P., Kawanabe, M.: Direct
importance estimation with model selection and its application to covariate shift
adaptation. Advances in neural information processing systems 20 (2007)
[4] Ghifary, M., Balduzzi, D., Kleijn, W.B., Zhang, M.: Scatter component anal-
ysis: A unified framework for domain adaptation and domain generalization.
IEEE transactions on pattern analysis and machine intelligence 39(7), 1414–1430
(2016)
[5] Evgeniou, T., Micchelli, C.A., Pontil, M., Shawe-Taylor, J.: Learning multiple
tasks with kernel methods. Journal of machine learning research 6(4) (2005)
[6] Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE
Transactions on Pattern Analysis and Machine Intelligence 34(3), 465–479 (2012)
[7] Wang, S., Wang, B., Zhang, Z., Heidari, A.A., Chen, H.: Class-aware sample
reweighting optimal transport for multi-source domain adaptation. Neurocom-
puting 523, 213–223 (2023)
[8] Rostami, M., Bose, D., Narayanan, S., Galstyan, A.: Domain adaptation for
sentiment analysis using robust internal represen-tations. In: Findings of the
Association for Computational Linguistics: EMNLP 2023, pp. 11484–11498
(2023)
[9] Ding, N., Xu, Y., Tang, Y., Xu, C., Wang, Y., Tao, D.: Source-free domain adap-
tation via distribution estimation. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 7212–7222 (2022)
[10] Ge, C., Huang, R., Xie, M., Lai, Z., Song, S., Li, S., Huang, G.: Domain adapta-
tion via prompt learning. IEEE Transactions on Neural Networks and Learning
Systems (2023)
[11] Gheisari, M., Baghshah, M.S.: Unsupervised domain adaptation via represen-
tation learning and adaptive classifier learning. Neurocomputing 165, 300–311
(2015)
[12] Zheng, V.W., Pan, S.J., Yang, Q., Pan, J.J.: Transferring multi-device localization
models using latent multi-task learning. In: AAAI, vol. 8, pp. 1427–1432 (2008)
19
[13] Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural corre-
spondence learning. In: Proceedings of the 2006 Conference on Empirical Methods
in Natural Language Processing, pp. 120–128 (2006)
[14] Jiang, L., Hauptmann, A.G., Xiang, G.: Leveraging high-level and low-level
features for multimedia event detection. In: Proceedings of the 20th ACM
International Conference on Multimedia, pp. 449–458 (2012)
[15] Tao, J., Xu, H.: Discovering domain-invariant subspace for depression recogni-
tion by jointly exploiting appearance and dynamics feature representations. IEEE
Access 7, 186417–186436 (2019)
[16] Aimei, D., Shitong, W.: A shared latent subspace transfer learning algorithm
using svm. Acta Automatica Sinica 40(10), 2276–2287 (2014)
[17] Yao, Z., Tao, J.: Multi-source adaptation multi-label classification framework via
joint sparse feature selection and shared subspace learning. Computer Engineering
and Applications 53(7), 88–96 (2017)
[18] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni-
tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 770–778 (2016)
[19] Shi, X., Guo, Z., Lai, Z., Yang, Y., Bao, Z., Zhang, D.: A framework of joint graph
embedding and sparse regression for dimensionality reduction. IEEE Transactions
on Image Processing 24(4), 1341–1355 (2015)
[20] Ma, Z., Nie, F., Yang, Y., Uijlings, J.R., Sebe, N.: Web image annotation via
subspace-sparsity collaborated feature selection. IEEE Transactions on Multime-
dia 14(4), 1021–1030 (2012)
[21] Gretton, A., Borgwardt, K., Rasch, M., Sch¨olkopf, B., Smola, A.: A kernel method
for the two-sample-problem. Advances in neural information processing systems
19 (2006)
[22] Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to
new domains. In: Computer Vision–ECCV 2010: 11th European Conference on
Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings,
Part IV 11, pp. 213–226 (2010). Springer
[23] Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation.
In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
[24] Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)
[25] Zhang, J., Li, W., Ogunbona, P.: Joint geometrical and statistical alignment for
visual domain adaptation. In: Proceedings of the IEEE Conference on Computer
20
Vision and Pattern Recognition, pp. 1859–1867 (2017)
[26] Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hash-
ing network for unsupervised domain adaptation. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)
[27] TSim, S., Bsat, M.: Thecmupose, illuminationandexpressiondatabase. PAMI
25(12), 1615–1618 (2003)
[28] Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer feature learning
with joint distribution adaptation. In: Proceedings of the IEEE International
Conference on Computer Vision, pp. 2200–2207 (2013)
[29] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied
to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
[30] Hull, J.J.: A database for handwritten text recognition research. IEEE Transac-
tions on pattern analysis and machine intelligence 16(5), 550–554 (1994)
[31] Nene, S.A., Nayar, S.K., Murase, H., et al.: Columbia object image library (coil-
20) (1996)
[32] Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised
domain adaptation. In: 2012 IEEE Conference on Computer Vision and Pattern
Recognition, pp. 2066–2073 (2012). IEEE
[33] Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised
domain adaptation by domain invariant projection. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 769–776 (2013)
[34] Luo, L., Wang, X., Hu, S., Wang, C., Tang, Y., Chen, L.: Close yet distinctive
domain adaptation. arXiv preprint arXiv:1704.04235 (2017)
[35] Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual
domain adaptation using subspace alignment. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 2960–2967 (2013)
[36] Panareda Busto, P., Gall, J.: Open set domain adaptation. In: Proceedings of the
IEEE International Conference on Computer Vision, pp. 754–763 (2017)
[37] Liang, J., He, R., Sun, Z., Tan, T.: Aggregating randomized clustering-promoting
invariant projections for domain adaptation. IEEE transactions on pattern
analysis and machine intelligence 41(5), 1027–1042 (2018)
[38] Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain
confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474
(2014)
21
[39] Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with
deep adaptation networks. In: International Conference on Machine Learning, pp.
97–105 (2015). PMLR
[40] Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F.,
Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks.
The journal of machine learning research 17(1), 2096–2030 (2016)
[41] Ghifary, M., Kleijn, W.B., Zhang, M., Balduzzi, D., Li, W.: Deep reconstruction-
classification networks for unsupervised domain adaptation. In: Computer Vision–
ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October
11–14, 2016, Proceedings, Part IV 14, pp. 597–613 (2016). Springer
[42] Long, M., Zhu, H., Wang, J., Jordan, M.I.: Unsupervised domain adaptation with
residual transfer networks. Advances in neural information processing systems 29
(2016)
[43] Yan, H., Ding, Y., Li, P., Wang, Q., Xu, Y., Zuo, W.: Mind the class weight
bias: Weighted maximum mean discrepancy for unsupervised domain adapta-
tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 2272–2281 (2017)
[44] Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint
adaptation networks. In: International Conference on Machine Learning, pp.
2208–2217 (2017). PMLR
[45] Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain
adaptation. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 7167–7176 (2017)
[46] Kang, G., Jiang, L., Wei, Y., Yang, Y., Hauptmann, A.: Contrastive adaptation
network for single-and multi-source domain adaptation. IEEE transactions on
pattern analysis and machine intelligence 44(4), 1793–1804 (2020)
[47] Cui, S., Wang, S., Zhuo, J., Su, C., Huang, Q., Tian, Q.: Gradually vanish-
ing bridge for adversarial domain adaptation. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 12455–12464
(2020)
[48] Hu, L., Kan, M., Shan, S., Chen, X.: Unsupervised domain adaptation with hier-
archical gradient synchronization. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 4043–4052 (2020)
[49] Tang, H., Chen, K., Jia, K.: Unsupervised domain adaptation via structurally
regularized deep clustering. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 8725–8735 (2020)
22
[50] Luo, L., Chen, L., Hu, S., Lu, Y., Wang, X.: Discriminative and geometry-aware
unsupervised domain adaptation. IEEE transactions on cybernetics 50(9), 3914–
3927 (2020)
[51] Tao, J., Dan, Y., Zhou, D., He, S.: Robust latent multi-source adaptation for
encephalogram-based emotion recognition. Frontiers in Neuroscience 16, 850906
(2022)
[52] Lee, S., Kim, D., Kim, N., Jeong, S.-G.: Drop to adapt: Learning discriminative
features for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision, pp. 91–100 (2019)
23