Available via license: CC BY 4.0
Content may be subject to copyright.
Early Detection of Misinformation for Infodemic
Management: A Domain Adaptation Approach
Minjia Mao
University of Delaware
Xiaohang Zhao∗
Shanghai University of Finance and Economics
Xiao Fang∗
University of Delaware
∗Corresponding author
Abstract. An infodemic refers to an enormous amount of true information and misinformation disseminated during a disease
outbreak. Detecting misinformation at the early stage of an infodemic is key to manage it and reduce its harm to public health. An
early stage infodemic is characterized by a large volume of unlabeled information concerning a disease. As a result, conventional
misinformation detection methods are not suitable for this misinformation detection task because they rely on labeled information in
the infodemic domain to train their models. To address the limitation of conventional methods, state-of-the-art methods learn their
models using labeled information in other domains to detect misinformation in the infodemic domain. The efficacy of these methods
depends on their ability to mitigate both covariate shift and concept shift between the infodemic domain and the domains from
which they leverage labeled information. These methods focus on mitigating covariate shift but overlook concept shift, rendering
them less effective for the task. In response, we theoretically show the necessity of tackling both covariate shift and concept shift as
well as how to operationalize each of them. Built on the theoretical analysis, we develop a novel misinformation detection method
that addresses both covariate shift and concept shift. Using two real-world datasets, we conduct extensive empirical evaluations to
demonstrate the superior performance of our method over state-of-the-art misinformation detection methods as well as prevalent
domain adaptation methods that can be tailored to solve the misinformation detection task.
Key words: misinformation detection, infodemic management, domain adaptation, covariate shift, concept shift, deep learning,
contrastive learning, transfer learning
1. Introduction
An infodemic refers to an overwhelming volume of true and false information spread during a disease
outbreak (Van Der Linden 2022). Misinformation, or false information, in an infodemic misleads the
public about the disease and causes significant harm to public health (Imhoff and Lamberty 2020, Freeman
et al. 2022, Van Der Linden 2022). For example, the misinformation that the Ebola virus is intentionally
created by the Congolese government to eliminate people in the city of Beni has led to attacks on Ebola
clinics by local residents, hindering timely treatments for Ebola.1More recently, during the outbreak of the
1See https://news.un.org/en/story/2019/03/1034381 (last accessed on May 28, 2024)
1
arXiv:2406.10238v1 [cs.CL] 2 Jun 2024
2
coronavirus disease (COVID-19), a significant amount of misinformation has been diffused on social
media. Examples include the misinformation that baking soda can cure coronavirus,2the false claim that
injecting disinfectant can prevent the virus (Borah et al. 2022), and the incorrect attribution of the outbreak
in Italy to Middle East illegal immigrants.3Such widespread of misinformation about COVID-19 has
caused social panic, led people to dismiss health guidance, and weakened their confidence in vaccines,
ultimately undermining pandemic response efforts (Bursztyn et al. 2020). Therefore, it is imperative to
manage an infodemic to mitigate its adverse impact on public health (Freeman et al. 2022, Hwang and Lee
2024).
According to the World Health Organization (WHO), infodemic management is the use of risk- and
evidence-based approaches to manage an infodemic and reduce its harm to public health.4The key to
effective infodemic management is the detection of misinformation at the early stage of an infodemic
(Buchanan 2020, Van Der Linden 2022). Early identification of misinformation discourages people from
sharing it and prevents it from reaching a much larger population, thereby mitigating its potential harm to
public health (Buchanan 2020). Moreover, the likelihood that a person trusts misinformation increases as
the person is exposed longer to the misinformation (Zajonc 2001, Moravec et al. 2019). Early
identification of misinformation reduces individuals’ exposure time to misinformation, thereby lowering
their likelihoods of trusting it. This, in turn, increases their chances of adhering to health guidance,
ultimately benefiting public health as a whole.
An early stage infodemic features two salient characteristics. First, an emerging health event triggers a
huge amount of true and false information spreading on various media platforms in a short time period.
For example, during the early stage of the COVID-19 infodemic, a significant volume of true and false
information concerning COVID-19 is diffused on social media (Yue et al. 2022), and the WHO declares a
worldwide infodemic (Van Der Linden 2022). Second, it requires expert knowledge to distinguish between
true information and misinformation in an infodemic. Moreover, during the early stage of a disease outbreak
and its ensuing infodemic, even experts have no or limited knowledge about the disease, making it even
more difficult to discern true information from misinformation. For example, during the early stage of
the Ebola outbreak, scientists lack knowledge about the disease and its treatment (Adebimpe et al. 2015).
Consequently, it is common that information disseminated at the early stage of an infodemic is unlabeled.
As there is no labeled information (i.e., true or false), it is impossible to learn a misinformation detection
model using conventional misinformation detection methods such as Abbasi et al. (2010), Nan et al. (2021),
and Wei et al. (2022), which require labeled information in their training data.
2See https://apnews.com/article/archive-fact- checking-8736262219 (last accessed on May 28, 2024)
3See https://time.com/5789666/italy-coronavirus- far-right-salvini/ (last accessed on May 28, 2024)
4See https://www.who.int/health-topics/infodemic#tab=tab_1 (last accessed on May 28, 2024)
3
A viable solution is to leverage labeled information in other domains to build a misinformation detection
model for the infodemic domain. Indeed, popular fact-checking websites like PolitiFact5and Gossipcop6
contain a wealth of verified true and false information in domains such as politics and entertainment.
Moreover, misinformation across domains shares some common characteristics. For example, compared to
true information, misinformation spreads faster, farther, and more widely (Oh et al. 2013, Vosoughi et al.
2018). As another example, fake news, an important type of misinformation, often features emotional
headlines and demonstrates inconsistency between their headlines and contents (Siering et al. 2016).
Because of these common characteristics shared across domains, it is possible to learn a misinformation
detection model using labeled information in other domains and transfer it to the infodemic domain.
However, information in each domain also has its unique characteristics. Understandably, the choice of
words and the organization of words to covey sentiments and semantics differ across domains. For
example, less than 20% of words used in politics news overlaps with those found in entertainment news
(Shu et al. 2020). Because each domain possesses unique characteristics, a misinformation detection model
learned with labeled information in other domains usually performs poorly in the infodemic domain.
Domain adaptation is a widely employed technique to enhance the performance of a model trained in
one domain but applied to another (Ben-David et al. 2010). It aims to mitigate disparities between data in
different domains such that a model learned using data in one domain performs well in another domain (Ben-
David et al. 2010). Concretely, data in a domain is represented as a joint distribution p(x, y),x∈ X , y ∈ Y,
where Xdenotes the feature space and Yis the label space (Liu et al. 2022). In the context of misinformation
detection, xmight be explicit features, such as the number of words in a piece of information, or latent
features, such as the embedding of a piece of information (Zhou and Zafarani 2020). Label ydenotes
whether a piece of information is true or false. The join distribution p(x, y)can be expressed as the product
of p(x)and p(y|x), i.e., p(x, y) = p(x)p(y|x). Accordingly, to effectively alleviate disparities between data
in different domains, it is essential to reduce both their differences in terms of p(x)and their differences
in terms of p(y|x), where the former type of differences is referred to as covariate shift and the latter type
of differences is known as concept shift (Liu et al. 2022). Existing domain adaptation methods focus on
mitigating covariate shift between domains (Ganin et al. 2016, Long et al. 2017, Zhu et al. 2019, Peng et al.
2019). By applying these domain adaptation methods, Huang et al. (2021), Li et al. (2021), Yue et al. (2022)
develop state-of-the-art misinformation detection methods, which learn a model from labeled information
in one or more source domains to identify misinformation in a target domain with no labeled information.
Although these misinformation detection methods are applicable to our problem, their efficacy is limited
because they fail to account for concept shift.
5See https://www.politifact.com/ (last accessed on May 28, 2024)
6See https://en.wikipedia.org/wiki/Gossip_Cop (last accessed on May 28, 2024)
4
In response to the research gaps, this study contributes to the literature with a novel misinformation
detection method that tackles both covariate shift and concept shift. Our method design is rooted in our
theoretical analysis, which shows the necessity of addressing both covariate shift and concept shift for
effective domain adaptation as well as how to operationalize each of them. Guided by the theoretical
analysis, we develop two modules in our method: one for assessing and reducing covariate shift and the
other for evaluating and mitigating concept shift. Therefore, the primary methodological difference
between our method and state-of-the-art domain adaptation-based misinformation detection methods (e.g.,
Yue et al. (2022)) is the addressing of concept shift by our method. It also differs from conventional
misinformation detection methods such as Wei et al. (2022). Specifically, our method is a cross-domain
method that learns a model from labeled information in other domains to classify unlabeled information in
the infodemic domain. In contrast, conventional misinformation detection methods are single-domain
methods that learn a model from labeled information in a domain (e.g., infodemic domain) to classify
unlabeled information in the same domain.
2. Related Work
Existing methods for misinformation detection can be classified into three categories: single-domain,
multi-domain, and cross-domain approaches. In this section, we review each category of these methods
and highlight the methodological novelty of our study.
2.1. Single and Multi-domain Misinformation Detection Methods
Single-domain misinformation detection methods train a classification model using labeled information
from a specific domain (e.g., true and fake news from the domain of politics), and then employ the model
to classify unlabelled information in the same domain (e.g., unlabelled political news). In this category,
knowledge-based methods classify unlabelled information by comparing the content of the information
(e.g., textual content of news) against known facts (Ciampaglia et al. 2015). More specifically, these
methods retrieve pertinent facts from authoritative websites like Wikipedia, represent extracted facts as a
knowledge graph, and employ the knowledge graph to automatically detect misinformation. Additionally,
writing styles identified from the content of information, such as readability, sensationalism, informality,
and subjectivity, have been utilized to detect misinformation (Zhou and Zafarani 2020). Recent methods
consider the propagation of information on social media, in addition to the content of information
(Papanastasiou 2020). Some methods employ propagation patterns to detect misinformation because
misinformation spreads faster, farther, and more widely compared to true information (Vosoughi et al.
2018). Others harness the collective intelligence of crowds, such as likes, shares, and comments from
social media users (Atanasov et al. 2017, Pennycook and Rand 2021). For example, Wei et al. (2022)
employ both the content of news articles and social media users’ comments on these news articles to detect
fake news.
5
Multi-domain misinformation detection methods learn a classification model from labeled information
in various domains (e.g., true and fake news in the politics, health, and sports domains), and then employ
the model to classify unlabelled information in each of these domains (Nan et al. 2021, Zhu et al. 2023).
The rationale underlying these methods is to leverage labeled information in multiple domains to enhance
misinformation detection in each individual domain. However, because of variations in information
content among different domains, a model learned with labeled information in these domains exhibits
significant performance differences when applied to each of these domains (Zhu et al. 2023). That is, the
effectiveness of the learned model in identifying misinformation in one domain could be significantly
inferior to its effectiveness in detecting misinformation in another domain. To tackle this problem, several
methods incorporate domain-specific characteristics into the model training. For example, Nan et al.
(2021) propose to embed different domains as different vectors and integrate these vectors into the
learning of a multi-domain misinformation detection model. Zhu et al. (2023) introduce domain adapters
tailored to each specific domain; representations of news articles in each domain are then adjusted by their
corresponding domain adapter to train a multi-domain model.
2.2. Cross-domain Misinformation Detection Methods
Cross-domain misinformation detection methods leverage labeled information in one or more source
domains (e.g., true and fake news in the politics and sports domains) to classify unlabelled information in a
target domain (e.g., unlabelled news in the health domain). Unlike multi-domain methods, which identify
misinformation in various domains, cross-domain methods focus on detecting misinformation in one target
domain. Depending on whether training data contain labeled or unlabeled information in a target domain,
cross-domain methods can be further categorized into two subgroups.
One subgroup of cross-domain methods learns a model from training data consisting of labeled
information in source domains as well as labeled information in a target domain to classify unlabeled
information in the target domain (Mosallanezhad et al. 2022, Nan et al. 2022). These methods essentially
utilize labeled source domain information as out-of-distribution data to improve the performance of
misinformation detection in the target domain (Zhang et al. 2023). For example, Nan et al. (2022) pretrain
a misinformation detection model using labeled information in source and target domains. Next, a
language model is learned from labeled target domain information. Each piece of source domain
information is then assigned a transferability score based on the degree to which the language model can
predict its content. Finally, the pretrained model is fine-tuned using labeled source domain information
weighted by their corresponding transferability scores as well as labeled target domain information.
The other subgroup learns a model from training data consisting of labeled information in source domains
and unlabeled information in a target domain. Since there is no labeled target domain information, this
subgroup of methods aims to transfer a model learned from labeled source domain information to the target
6
domain. To this end, domain adaptation is a suitable and widely adopted technique. In the following, we
review domain adaptation methods and their applications in cross-domain misinformation detection. In the
literature of domain adaption, a domain consists of a feature space X, a label space Y, and a probability
distribution p(x, y),x∈ X and y∈ Y (Liu et al. 2022). As the probability distribution in a source domain
usually differs from that in a target domain, a model learned from source domain data often performs poorly
in the target domain (Ben-David et al. 2010). To tackle this issue, it is necessary to reduce the disparity
between source and target domains in terms of p(x, y). Existing domain adaptation methods concentrate
on mitigating the difference in feature distribution (i.e., p(x)) between source and target domains, known
as covariate shift (Kouw and Loog 2018).7In this vein, Ganin et al. (2016) propose a domain adversarial
network to learn domain-invariant features from data in source and target domains. Saito et al. (2018)
devise a mini-max mechanism to align feature distributions between source and target domains. In the
maximum stage, two classifiers are learned from source samples such that the discrepancy between their
classifications of target samples is maximized. In the minimum stage, features of target samples are adjusted
to minimize such discrepancy. Rangwani et al. (2022) develop a domain adaption method that incorporates
a smoothing mechanism into the method proposed by Ganin et al. (2016). Rostami and Galstyan (2023)
construct a pseudo-dataset using a model learned from labeled source domain data and then mitigate feature
distribution difference between the pseudo-dataset and target domain data.8While the methods reviewed
above implicitly minimize covariate shift between source and target domains, there are methods explicitly
measure and minimize covariate shift. For example, Long et al. (2015) utilize a metric known as the multi-
kernel Maximum Mean Discrepancy (MMD) to quantify covariate shift, integrate this metric into a loss
function, and minimize covariate shift by minimizing the loss function. In a follow-up study, Long et al.
(2017) introduce another metric , known as the joint MMD, to measure covariate shift. In addition, Chen
et al. (2022) employ the nuclear-norm Wasserstein discrepancy to evaluate covariate shift between source
and target domains. Recent studies have employed domain adaptation techniques to solve the cross-domain
misinformation detection problem with unlabeled target domain information. For example, Li et al. (2021)
extend the work of Ganin et al. (2016) to detect misinformation in a target domain by leveraging labeled
information in multiple source domains whereas Ng et al. (2023) apply the domain adversarial network
7In the field of data stream mining, concept shift or concept drift refers to the phenomenon that data samples with same features
could have different labels at different timestamps (Agrahari and Singh 2022, Roychowdhury et al. 2023). Given a model trained
with previous data samples and a continuous influx of new data samples, the objective of data stream mining methods is to determine
when and how to retrain the model (Vorburger and Bernstein 2006, Fang et al. 2013, Agrahari and Singh 2022, Roychowdhury
et al. 2023). These methods are not applicable to our problem because they require labeled data samples at all timestamps while
instances of target domain information are unlabeled in our problem. Moreover, these methods aim to decide when and how to
retrain a model in a dynamic environment, whereas the objective of our problem is to classify unlabeled target domain instances.
8Although the title of the paper contains the phrase of concept shift, it actually mitigates covariate shift. This is evident in the
objective function of the proposed algorithm, i.e., Equation (3) in (Rostami and Galstyan 2023). According to the equation, the
algorithm aims to mitigate the difference between the feature distribution pT(XT)of target domain data and the feature distribution
ˆpJ(Zp)of the pseudo-dataset.
7
Table 1 Comparison Between Our Method and Existing Misinformation Detection Methods.
Applicable to
our problem
Addressing
covariate shift
Addressing
concept shift
Single- and multi-domain misinformation
detection methods, e.g., Wei et al. (2022),
Zhu et al. (2023)
No No No
Cross-domain misinformation detection
methods with labeled target domain
information, e.g., Mosallanezhad et al.
(2022), Nan et al. (2022)
No Yes No
Cross-domain misinformation detection
methods with unlabeled target domain
information, e.g., Li et al. (2021), Yue et al.
(2022)
Yes Yes No
Our method Yes Yes Yes
developed by (Ganin et al. 2016) to identify fake news and reviews. Huang et al. (2021) utilize the MMD
metric to measure covariate shift between source domain information and target domain information, and
mitigate covariate shift by minimizing the metric. Yue et al. (2022) develop a variant of the MMD metric to
measure covariate shift between source and target domain information.
2.3. Key Novelty of Our Study
Our literature review suggests the following research gaps, as summarized in Table 1. First, existing single
and multi-domain misinformation detection methods as well as cross-domain methods with labeled target
domain information are not applicable to our problem. These methods require labeled target domain
information (i.e., the infodemic domain in our study) in their training datasets, while our problem involves
completely unlabeled information in the infodemic domain. Second, although existing cross-domain
methods with unlabeled target domain information are applicable to our problem, they are less effective in
solving the problem because they fail to address concept shift. In fact, none of the existing misinformation
detection methods is capable of tackling concept shift. However, to ensure the efficacy of a model learned
using labeled source domain information in classifying unlabeled target domain information, it is essential
to mitigate both covariate shift and concept shift between source and target domains (Liu et al. 2022). To
this end, we propose a misinformation detection method that tackles both covariate shift and concept shift,
in contrast to existing methods that focus on covariate shift solely. Therefore, the key methodological
novelty of our method lies in its addressing of concept shift. To implement this methodological novelty,
our study features two innovations. First, our theoretical analysis of domain adaptation not only
underscores the importance of addressing both covariate shift and concept shift but also shows how to
operationalize each of them. Second, informed by our theoretical analysis, we introduce a novel
misinformation detection method, two modules of which are respectively designed to tackle covariate shift
and concept shift.
8
3. Problem Formulation
We consider a misinformation detection problem at the early stage of an infodemic, when all instances
of information in this domain are unlabeled. Concretely, let DTbe a dataset of NTpieces of unlabeled
information in the infodemic domain. The subscript Tsignifies target domain, which is the infodemic
domain in this study. As an example, a piece of unlabeled information might encompass the textual content
of a news article about COVID-19. Let xT
idenote a set of features extracted from a piece of unlabeled
information in DT,i= 1,2,...,NT. We aim to predict the label (true or false) for each piece of information
in DT.
To accomplish this objective, we are given kdatasets of labeled information, i.e., DS1,DS2,...,DSk.
The subscript Srefers to source domain, which provides labeled information to facilitate the label
prediction in the target domain. Concretely, DSjconsists of NSjpieces of labeled information,
j= 1,2,...,k. Considering an example of Sjbeing the politics domain, a piece of information in this
domain could comprise the textual content of a political news article. Each piece of labeled information in
DSjis represented as (xSj
i, ySj
i), where xSj
idenotes features extracted from the information, ySj
iis the
label of the information, and i= 1,2,...,NSj. Label ySj
i= 0 indicates that the information is true and
ySj
i= 1 shows that the information is false. We now formally define the problem of early detection of
misinformation for infodemic management.
Definition 1 (Early Detection of Misinformation Problem (EDM)). Given a dataset DTof NTpieces
of unlabeled information in the infodemic domain and kdatasets of labeled information in various source
domains, DS1,DS2,...,DSk, where DSjconsists of NSjpieces of labeled information, j= 1,2,...,k, the
objective of the problem is to learn a model from the data to classify each piece of information in DTas
true or false.
4. Method
In this section, we theoretically analyze the performance of a model learned using labeled source domain
instances but applied to classify unlabeled target domain instances. Guided by the insights from our
theoretical analysis, we then propose a novel method to solve the EDM problem.
4.1. Theoretical Analysis
We start with the setting of one target domain Tand one source domain S. Let DTbe a dataset of NT
unlabelled instances in Tand DSbe a dataset of NSlabeled instances in S. In the context of misinformation
detection, an instance is a piece of information. We denote fTas the true labeling function in the target
domain that maps an instance with features xinto the probability of the instance being 1, i.e., fT(x) =
pT(y= 1|x). Similarly, fSis the true labeling function in the source domain and fS(x) = pS(y= 1|x). Let
hbe a hypothesis learned from labeled instances in DSand h(x) = ph(y= 1|x). Following Ben-David et al.
9
(2010), we define the source error ϵS(h)of hypothesis has the expected difference between hand the true
labeling function fSin the source domain
ϵS(h) = Ex∼DS[|h(x)−fS(x)|],(1)
where DSrepresents the feature distribution in the source domain. In a similar manner, we define the target
error ϵT(h)of hypothesis has the expected difference between hand the true labelling function fTin the
target domain
ϵT(h) = Ex∼DT[|h(x)−fT(x)|],(2)
where DTdenotes the feature distribution in the target domain. Recall that domain adaptation aims to learn
hfrom labeled instances in DSto classify each unlabelled instance in DT. Thus, the objective of domain
adaptation is to minimize the target error ϵT(h). To achieve this objective, we show a bound of ϵT(h)in the
following theorem.
Theorem 1 For a source domain instance with features xS
i, let c(xS
i)be the features of its nearest
target domain instance, where i= 1,2,...,NSand the distance between a pair of instances are measured
over their feature space. If fTis L-Lipschitz continuous,9then for any η∈(0,1), with probability at least
(1 −η)2,
ϵT(h)≤ϵS(h) + ˆ
dH(DS,DT) + 1
NS
NS
X
i=1
|fS(xS
i)−fT(c(xS
i))|+L
NS
NS
X
i=1
||xS
i−c(xS
i)|| +C1,(3)
where Lis the Lipschitz constraint constant, C1is a constant, ˆ
dH(DS,DT)denotes the empirical
H-divergence between distributions DSand DT, estimated using features of source domain instances in
DSand features of target domain instances in DT, and ||xS
i−c(xS
i)|| represents the distance between xS
i
and c(xS
i).
Proof. See Appendix B.
Theorem 1 establishes an upper bound of ϵT(h), which can be empirically computed using datasets DS
and DTdrawn from distributions DSand DT, respectively. By Theorem 1, to minimize ϵT(h), we need to
minimize the first three terms of its upper bound specified in Inequality (3). This is because the fourth term
of the upper bound is inherently minimized, given that c(xS
i)is the closest to xS
i, and the last term C1is a
constant.
The first term ϵS(h)can be minimized by learning a classification model from labeled source domain
instances. The second term ˆ
dH(DS,DT)is the empirical estimation of the H-divergence between the
source domain feature distribution DSand the target domain feature distribution DT.H-divergence has
9The L-Lipschitz continuous assumption is commonly used in theoretical analyses of machine learning algorithms (e.g., Arjovsky
et al. 2017, Asadi et al. 2018, Kim et al. 2021).
10
been extensively utilized in domain adaptation studies to measure distances between distributions. Please
refer to Appendix A for a description of H-divergence and its empirical estimation ˆ
dH(DS,DT).
Fundamentally, minimizing the H-divergence between source and target domain feature distributions is to
minimize covariate shift between source and target domains. Therefore, existing domain adaptation
methods for mitigating covariate shift, such as Ganin et al. (2016) and Zhu et al. (2019), can be applied to
minimize ˆ
dH(DS,DT).
For the third term 1
NSPNS
i=1 |fS(xS
i)−fT(c(xS
i))|, recall that fS(xS
i) = pS(y= 1|xS
i)and
fT(c(xS
i)) = pT(y= 1|c(xS
i)). Hence, to minimize the third term, we need to minimize the summation of
differences between pS(y= 1|xS
i)and pT(y= 1|c(xS
i)), summed over all source domain instances and
their corresponding nearest target domain instances. Therefore, minimizing the third term is essentially to
minimize concept shift between source and target domains.
Next, we consider the setting of one target domain Tand multiple source domains S1, S2,...,Sk. Similar
to the notations used in the setting of one source domain, we denote DTas a dataset of NTunlabelled
instances in Tand DSjas a dataset of NSjlabeled instances in Sj,j= 1,2,...,k. Let hbe a hypothesis
learned from labeled instances in DS1,DS2,...,DSk. We denote fTand fSjas the true labeling functions in
Tand Sj, respectively, j= 1,2,...,k. For a hypothesis h,ϵT(h)and ϵSj(h)represent its errors in Tand Sj,
respectively, j= 1,2,...,k. Built on Theorem 1, the following proposition gives an upper bound of ϵT(h)
in the setting of multiple source domains.
Proposition 1 For a source domain instance with features xSj
iin DSj, let c(xSj
i)be the features of its
nearest instance in DT, where j= 1,2,...,k and i= 1,2,...,NSj. If fTis L-Lipschitz continuous, then for
any η∈(0,1), with probability at least (1 −η)2k,
ϵT(h)≤1
k
k
X
j=1
{ϵSj(h)+ ˆ
dH(DSj,DT) + 1
NSj
NSj
X
i=1
|fSj(xSj
i)−fT(c(xSj
i))|+L
NSj
NSj
X
i=1
||xSj
i−c(xSj
i)|| +Cj},
(4)
where Lis the Lipschitz constraint constant, Cjis a constant, and ˆ
dH(DSj,DT)denotes the empirical
H-divergence between the feature distribution in the source domain Sjand that in the target domain T.
Proof. See Appendix B.
Proposition 1 offers useful insights for solving the EDM problem, which aims to learn a model from
labeled information in multiple source domains to classify unlabeled information in the target infodemic
domain with minimum error (i.e., ϵT(h)). Informed by the proposition, we design a method that minimizes
ϵT(h)through its three modules, each dedicated to reducing one of the first three terms in Inequality (4).
Specifically, to reduce Pk
j=1 ϵSj(h), we implement a classification module, which is trained to accurately
predict the label for each piece of source domain information. A covariate alignment module is developed
to reduce Pk
j=1 ˆ
dH(DSj,DT), thus diminishing covariate shift between source domains and the target
11
domain. Finally, we propose a concept alignment module to mitigate Pk
j=1 PNSj
i=1 |fSj(xSj
i)−fT(c(xSj
i))|,
thereby reducing concept shift between source domains and the target domain. The concept alignment
module constitutes the main methodological novelty of our method, and we designate our method as
Domain Adaptation with Concept Alignment (DACA).
Figure 1 Overall Architecture of Domain Adaptation with Concept Alignment (DACA) Method
Note. Better view in color.
4.2. Method Overview
Grounded in our theoretical analysis, the DACA method adeptly alleviates disparities between source and
target domains through its covariate alignment and concept alignment modules. As a result, the model
learned from labeled source domain information by its classification module can effectively classify
unlabeled infodemic domain information. Figure 1 illustrates the overall architecture of the DACA
method. As shown, the inputs to the method consist of kdatasets of labeled information in various source
domains, DS1,DS2,...,DSk, and a dataset DTof unlabeled information in the target infodemic domain. A
content embedding function extracts features from the content of each piece of input information. Various
content embedding functions (e.g., Kenton and Toutanova 2019, Zhu et al. 2023) could be used for this
purpose, and we describe the content embedding function employed in this study in Section 5.2. Let
XSj={xSj
i|i= 1,2,...,NSj}and XT={xT
i|i= 1,2,...,NT}denote the set of features extracted from
input information in source domain Sjand target domain T, respectively, j= 1,2,...,k. These features
serve as the inputs to the three modules of the DACA method.
12
The classification module trains a label classifier to accurately predict the label of a piece of information
as either true (0) or false (1). Concretely, the label classifier takes the features xof each piece of information
as input and predicts its probability of being 1, i.e., p(ˆy= 1|x). Given that only source domain information
is labeled, the label classifier is trained using labelled source domain information. The covariate alignment
module trains a domain classifier to predict whether a piece of information is from a source domain or the
target domain. As the objective of this module is to alleviate the feature distribution difference between
source domains and the target domain, the domain classifier is trained to be perplexed in distinguishing
between source domain information and target domain information (Ganin et al. 2016).
Proposition 1 sheds light on the design of the concept alignment module, the main innovation of our
method. According to the proposition, for each piece of information in a source domain, we need to find its
nearest target domain information. Therefore, it is imperative to have a function that measures the
similarity between two pieces of information over their feature space. To this end, we introduce a
contrastive learning sub-module that learns a similarity function using labeled source domain information.
Utilizing the similarity function, a concept alignment sub-module identifies the nearest target domain
information for each piece of source domain information. By Proposition 1, this sub-module then mitigates
concept shift between source domains and the target domain by minimizing
Pk
j=1 PNSj
i=1 |fSj(xSj
i)−fT(c(xSj
i))|, where fSj(xSj
i)and fT(c(xSj
i)) are estimated using the label
classifier in the classification module.
4.3. Classification Module and Covariate Alignment Module
As depicted in Figure 1, the classification module and the covariate alignment module share a feature
transformation layer. This layer, denoted by Fh, takes each feature vector xin XS1,XS2, ..., XSk,XTas
input and outputs the transformed vector h. Formally, we have
h=Fh(x) = MLP1(x;θh),(5)
where MLP1is a multi-layer perceptron (MLP) with trainable parameters θh.
The classification module features an MLP-based label classifier, denoted by Fy, which takes the
transformed feature vector has input and outputs the probability of being labeled as 1 (i.e., false) for the
piece of information characterized by feature vector x. Concretely, we have
p(ˆy= 1|x) = Fy(h) = SigmoidMLP2(h;θy),(6)
where ˆyis the predicted label for the piece of information, MLP2is an MLP with trainable parameters θy,
and Sigmoid() denotes the sigmoid function. The classification module is trained by minimizing the total
cross-entropy loss summed over all labeled source domain information:
Ly(θh, θy) = −X
x∈XSylog p(ˆy= 1|x)+ (1 −y) log 1−p(ˆy= 1|x)
=−X
x∈XSylog Fy(Fh(x)) + (1 −y) log 1−Fy(Fh(x)),
(7)
13
where XS=∪k
j=1XSjand yis the true label of x. Minimizing Lytrains the classification module to score
high probabilities for true labels, thereby inducing a model with low source error (i.e., Pk
j=1 ϵSj(h)in
Proposition 1).
The covariate alignment module aims to mitigate covariate shift between source domains and the target
domain, i.e., reducing Pk
j=1 ˆ
dH(DSj,DT)in Proposition 1. To this end, we adapt the domain adversarial
method developed by Ganin et al. (2016) to implement this module. Let ddenote the true domain for a
piece of information characterized by feature vector x, where d= 0 if it belongs to a source domain and
d= 1 if it comes from the target domain. The module employs a domain classifier, denoted by Fd, that
takes the transformed feature vector has input and outputs its probability of belonging to the target domain.
Mathematically, we have
p(ˆ
d= 1|x) = Fd(h) = SigmoidMLP3(h;θd),(8)
where ˆ
dis the predicted domain (i.e., source or target) for x, and MLP3is an MLP with trainable parameters
θd. On the one hand, Fdis trained to accurately predict the domain for each piece of information. On the
other hand, to reduce covariate shift, the vector hproduced by the feature transformation layer Fhshould
be domain-invariant, i.e., when presented to Fd, the classifier cannot tell whether it is from a source domain
or the target domain (Li et al. 2021). With this reasoning in mind, the learning of domain-invariant features
can be formulated as a min-max game defined as follows(Ganin et al. 2016, Li et al. 2021):
min
θh
max
θdX
x∈Xdlog p(ˆ
d= 1|x)+ (1 −d) log 1−p(ˆ
d= 1|x)
= min
θh
max
θdX
x∈Xdlog Fd(Fh(x)) + (1 −d) log 1−Fd(Fh(x)),
(9)
where X= (∪k
j=1XSj)∪XTand dis the true domain (i.e., source or target) that xbelongs to. The
maximization objective in Equation 9 promotes a more accurate domain classifier, while the minimization
objective strengthens the domain invariance of the transformed feature vector hby making hard even for
the most accurate domain classifier to discern its domain identity. Following Ganin et al. (2016), the two
adversarial objectives in Equation 9 can be unified using a gradient reversal layer R(z)defined as:
R(z) = z;dR(z)
dz=−I,(10)
where Iis an identity matrix. According to Equation 10, the gradient reversal layer makes no change to its
input but flips the sign of the gradient passed through it. Equipped with the gradient reversal layer, solving
the min-max problem in Equation 9 is equivalent to minimize the following loss (Ganin et al. 2016):
Ld(θh, θd) = −X
x∈Xdlog Fd(R(Fh(x))) + (1 −d) log 1−Fd(R(Fh(x))),(11)
14
where we replace Fh(x)in Equation 9 with R(Fh(x)). To minimize the loss defined in Equation 11,
parameters θdof the domain classifier Fdare updated with gradient descent, leading to a more accurate
domain classifier. In contrast, because of the gradient reversal layer, parameters θhof the feature
transformation layer Fhare adjusted with gradient upscent, resulting in more domain-invariant h(Ganin
et al. 2016).
4.4. Concept Alignment Module
The key novelty of our method is the proposed concept alignment module, which mitigates concept shift
between source domains and the target domain. By Proposition 1, to mitigate concept shift, we need to
learn a function that measures the similarity between two pieces of information. To this end, we propose a
contrastive learning sub-module. Specifically, for two pieces of information respectively characterized by
feature vectors xiand xj, we define the similarity between them as
Fs(xi,xj) = MLP4(xi;θs)·MLP4(xj;θs)
||MLP4(xi;θs)|| ||MLP4(xj;θs)||,(12)
where Fsdenotes the similarity function, MLP4is an MLP with learnable parameters θsthat transforms
a feature vector (e.g., xi) into a different representation space, notation ·denotes the dot product of two
vectors, and || || represents the L2norm of a vector.
According to Equation 12, learning the similarity function Fsis essentially learning the parameters θsof
MLP4. The rationale underlying the learning is that information with identical labels exhibit greater
similarity than information with distinct labels in their representation space (Frosst et al. 2019). Therefore,
we design the contrastive learning sub-module that learns the parameters θsvia the objective of
maximizing the similarity between identically labeled information in the transformed representation space
while minimizing the similarity between distinctly labeled information. More specifically, for each piece
of source domain information characterized by feature vector x, we randomly select another piece of
source domain information, bearing the same label, as its positive peer. We denote the feature vector of the
positive peer as x+. In addition, we randomly select mother pieces of source domain information, each
having the opposite label of the focal information characterized by x, to serve as its negative peers. We
denote the feature vectors of these negative peers as x−
l,l= 1,2,...,m. The objective is thus to maximize
the similarity Fs(x,x+)between xand x+, while minimizing the similarity Fs(x,x−
l)between xand x−
l,
l= 1,2,...,m. Accordingly, the contrastive learning sub-module learns the similarity function by
minimizing the following loss summed over all labelled source domain information:
Ls(θs) = −X
x∈XS
log exp(Fs(x,x+)/τ)
exp(Fs(x,x+)/τ) + Σm
l=1 exp(Fs(x,x−
l)/τ),(13)
15
where XS=∪k
j=1XSj,Fsis given in Equation 12, and 0< τ < 1is a hyperparameter.10 By minimizing
Lsw.r.t. θs, this sub-module learns Fsthat measures the similarity between two pieces of information in
their transformed representation space such that identically labeled information are close to each other and
distinctly labeled information are distant from each other.
Applying the similarity function Fs, for each piece of source domain information characterized by
feature vector xSj
i, the concept alignment sub-module identifies its most similar target domain information
characterized by feature vector c(xSj
i),j= 1,2,...,k and i= 1,2,...,NSj. It then mitigates concept shift
between source domains and the target domain by minimizing Pk
j=1 PNSj
i=1 |fSj(xSj
i)−fT(c(xSj
i))|in
Proposition 1. To this end, fSj(xSj
i)and fT(c(xSj
i)) are estimated using the label classifier described in
Section 4.3. Specifically, we have
ˆ
fSj(xSj
i) = p(ˆy= 1|xSj
i) = Fy(Fh(xSj
i)),
where ˆ
fSjis the estimation of fSj, and Fhand Fyare defined by Equations 5 and 6, respectively. Similarly,
fT(c(xSj
i)) is estimated by
ˆ
fT(c(xSj
i)) = p(ˆy= 1|c(xSj
i)) = Fy(Fh(c(xSj
i))).11
With the estimations of fSjand fT, the concept alignment sub-module reduces concept shift by minimizing
Pk
j=1 PNSj
i=1 |Fy(Fh(xSj
i)) −Fy(Fh(c(xSj
i)))|. Accordingly, this sub-module is trained by minimizing the
following loss:
Lc(θh, θy) = X
x∈XShFy(Fh(x)) −Fy(Fh(c(x)))2−Fy(Fh(x)) −Fy(Fh(d(x)))2i.(14)
where XS=∪k
j=1XSj. For each piece of source domain information with feature vector x∈XS,c(x)
and d(x)respectively denote the feature vectors characterizing its most similar and most dissimilar target
domain information, both of which are identified using the similarity function Fs. Minimizing the first term
in Lcessentially minimizes Pk
j=1 PNSj
i=1 |Fy(Fh(xSj
i)) −Fy(Fh(c(xSj
i)))|. Consequently, it enforces the
desired property that similar source and target domain information have similar label predictions (Liu et al.
2022), thus reducing concept shift between source domains and the target domain. Minimizing the second
term −Fy(Fh(x)) −Fy(Fh(d(x)))2maximizes the label prediction difference between Fy(Fh(x)) and
Fy(Fh(d(x))). As a result, label predictions diverge for dissimilar source and target domain information,
which further mitigates concept shift.
10 Hyperparameter τadjusts the emphasis on the two objectives (Frosst et al. 2019). Lower τenables the learning of the similarity
function to place more emphasis on maximizing Fs(x,x+)while higher τshifts the learning to emphasize more on minimizing
Fs(x,x−
l).
11 The label classifier is trained to minimize the error of classifying source domain information. Thus, it is reasonable to estimate
fSjusing the label classifier. Since target domain information is unlabeled, we proxy fTusing the label classifier, a strategy of
which the effectiveness has been validated by previous studies (Long et al. 2013, 2018, Tachet des Combes et al. 2020).
16
4.5. The DACA Method
The DACA method is trained to minimize the combined losses of its three modules:
LDACA(θh, θy, θd, θs) = Ly(θh, θy) + Ld(θh, θd) + Ls(θs) + Lc(θh, θy)(15)
where Ly,Ld,Ls, and Lcare defined by Equations 7, 11, 13, and 14, respectively. There are two practical
considerations when training DACA. First, the DACA method is trained in a two-stage manner. In the first
stage (or the warmup stage), the method is trained by minimizing LDACA − Lc. The rationale is that, to
accurately estimate the concept alignment loss Lc, DACA needs to learn a reliable similarity function first.
In the second stage, DACA is trained by minimizing LDACA as defined by Equation 15. Second, to speed
up the training of the DACA method, it is beneficial to mitigate concept shift between each pair of source
domains. This is accomplished by minimizing a loss function for each pair of source domains. Concretely,
for a pair of source domains Sjand Sn, the loss function has the same form as Equation 14 but with Sj
treated as the source domain in the equation and Snregarded as the target domain. Once trained, DACA
predicts the probability that a piece of target domain information characterized by feature vector xTis false
as Fy(Fh(xT)), where Fhand Fyare defined by Equations 5 and 6, respectively.
5. Empirical Evaluation
5.1. Data
We evaluated the performance of our proposed DACA method using the publicly available datasets of
English news, which have been widely employed to assess the performance of misinformation detection
methods (Nan et al. 2021, Mosallanezhad et al. 2022, Zhu et al. 2023). One is the MM-COVID dataset,
which contains 4,750 pieces of true news (i.e., true information) and 1,317 pieces of fake news (i.e.,
misinformation) on COVID-19 as well as user comments on these news (Li et al. 2020). Specifically, 8%
of COVID news are accompanied by user comments. In our evaluation, we treated the COVID domain as
the infodemic or target domain. For source domains, we utilized the FakeNewsNet dataset, which consists
of true and fake news alongside their associated comments from the domains of entertainment and politics
(Shu et al. 2020). Specifically, there are 16,804 pieces of true news and 5,067 pieces of fake news from the
entertainment domain, and 1,583 instances of true news and 1,287 instances of fake news from the politics
domain. User comments accompany 27% of entertainment news and 59% of politics news. Table 2 reports
the summary statistics of the datasets used in our evaluation. Examples of true and fake news from the
source and target domains are given in Figure 2.
5.2. Evaluation Procedure and Benchmark Methods
We considered two scenarios of early detection of misinformation in our evaluation. During the early stage
of an infodemic, there is little understanding of the disease that causes the infodemic. Consequently, even
experts encounter difficulty distinguishing true news from fake news in the infodemic domain, leaving all
17
Table 2 Summary Statistics of Evaluation Datasets.
Domain Number of true news Number of fake news
COVID (Infodemic / Target) 4,750 1,317
Entertainment (Source) 16,804 5,067
Politics (Source) 1,583 1,287
Figure 2 Examples of True and Fake News in Evaluation Datasets
news in this domain unlabeled. Hence, the first evaluation scenario entails each method (ours or benchmark)
utilizing labeled news in the entertainment and politics domains to predict the label for each piece of news in
the COVID domain. Accordingly, the inputs to each method encompass labeled news in the entertainment
and politics domains, along with user comments on these news, as well as unlabeled news in the COVID
domain and their associated comments. In the second evaluation scenario, user comments accompanying
COVID news are dropped from the inputs. As a result, the inputs in this scenario consist of labeled news
in the entertainment and politics domains, alongside user comments on these news, and unlabeled news in
the COVID domain. This scenario emulates the onset of an infodemic, characterized by a large volume of
information, yet with very few or no comments accompanying these information.
In each evaluation scenario, we conducted 25 experiments for every method (ours or benchmark) and
measured its performance using the metrics recall and F1-score. To curb the spread of fake news and
minimize their societal impact, it is crucial to identify as many instances of fake news as possible (Zhu
et al. 2023). To this end, recall is an important metric as it measures the effectiveness of a method in
identifying fake news. In addition, F1-score evaluates a method’s performance in both identifying fake
18
news and avoiding predicting true news as fake. Concretely, let P be the number of fake news, TP be the
number of fake news that are predicted as fake, and FP be the number of true news that are predicted as
fake. Recall is defined as TP/P, and F1-score is computed as 2TP/(TP +FP +P).
As reviewed in Section 2 and summarized in Table 1, among all existing misinformation detection
methods, only cross-domain misinformation detection methods with unlabeled target domain information
can solve the EDM problem investigated in this paper. Therefore, we benchmarked our method against
state-of-the-art methods in this category. Specifically, one benchmark is the MMD method, which
minimizes covariate shift between source and target domain information, measured using the maximum
mean discrepancy (MMD) metric (Huang et al. 2021). Another benchmark is the contrastive adaptation
network for misinformation detection (CANMD) method, which assesses covariate shift using a variant of
the MMD metric (Yue et al. 2022). In addition, we also compared our method against the domain
adversarial neural network (DANN) method; it minimizes covariate shift by learning domain-invariant
features from source and target domain information (Li et al. 2021). Moreover, general-purpose domain
adaptation methods that were not originally designed for misinformation detection can also be adapted to
solve our problem. As analyzed in Section 2.2, there are two categories of domain adaptation methods.
One category explicitly measures and minimizes covariate shift while the other category implicitly
minimizes covariate shift by learning domain-invariant features. Accordingly, we benchmarked against
representative methods in each category. For the category that explicitly measures and minimizes covariate
shift, we considered the multi-kernel MMD (MK-MMD) method (Long et al. 2015), the joint adaptation
network (JAN) method (Long et al. 2017), and the discriminator-free adversarial learning network
(DALN) method (Chen et al. 2022) as our baselines. MK-MMD and JAN (Long et al. 2015, 2017) are
commonly used domain adaption methods whereas DALN (Chen et al. 2022) is a state-of-the-art method
in this category. Representative methods in the other category include the maximum classifier discrepancy
(MCD) method (Saito et al. 2018) and the smooth domain adversarial training (SDAT) method (Rangwani
et al. 2022). MCD is widely used for domain adaptation and SDAT is a state-of-the-art implicit domain
adaptation method. Table 3 lists all methods compared in our evaluation.
In each evaluation scenario, all compared methods took identical inputs and employed the same content
embedding method to represent the textual contents of these inputs. Specifically, English words in the inputs
were embedded using the pretrained Roberta model (Liu et al. 2019). Next, the content embedding method
proposed by Zhu et al. (2023) was utilized to embed the textual contents of the inputs. This method was
developed based on BERT (Kenton and Toutanova 2019) and extracted three types of features from each
instance of textual content (e.g., a news article): (1) semantic features, such as word usage patterns, (2) style
features, such as writing styles, and (3) emotional features, such as emotional signals expressed in a news
article. It then aggregated these features into a single feature vector through an attention mechanism. We
trained our DACA method using the Adam optimizer with a learning rate of 0.0001 (Kingma and Ba 2015).
19
Table 3 Methods Compared in Our Evaluation.
Method Notes
DACA Our method
MMD Cross-domain misinformation detection method based on the MMD metric
(Huang et al. 2021)
CANMD Cross-domain misinformation detection method based on a variant of the
MMD metric (Yue et al. 2022)
DANN Cross-domain misinformation detection method based on the learning of
domain-invariant features (Li et al. 2021)
MK-MMD Domain adaptation method based on the multi-kernel MMD metric (Long
et al. 2015)
JAN Domain adaptation method based on the joint MMD metric (Long et al. 2017)
DALN Domain adaptation method using the Nuclear-norm Wasserstein Discrepancy
(Chen et al. 2022)
MCD Domain adaptation method utilizing a mini-max mechanism to align feature
distributions between source and target domains (Saito et al. 2018)
SDAT Domain adaptation method employing a smoothing mechanism to learn
domain-invariant features (Rangwani et al. 2022)
The hyperparamters in Equation 13 of our method were set as follows: the number of instances of source
domain information mwas set to 3, and the hyperparameter τwas set to 0.5. Implementation details of the
benchmark methods are given in Appendix C.
5.3. Evaluation Results
Table 4 presents the average recall and F1-score for each method in evaluation scenario 1, along with
standard deviations (in parentheses), across 25 experiments. As reported, our DACA method significantly
outperforms each benchmark method in both recall and F1-score. In particular, our method achieves an
average recall of 0.745, indicating that, on average, it correctly identifies 74.5% of fake news in the
COVID domain. Such performance is attained without any labeled COVID news in the training data,
highlighting the efficacy of our method in transferring a model learned from labeled news in the
entertainment and politics domains to predict the labels of news in the COVID domain. Moreover, our
method respectively outperforms three state-of-the-art misinformation detection methods—MMD,
CANMD, and DANN—by 5.82%, 6.13%, and 13.22% in recall, and by 6.05%, 8.77%, and 10.62% in
F1-score. Additionally, the performance advantages of our method over representative domain adaptation
methods—MK-MMD, JAN, DALN, MCD, and SDAT—range from 4.20% to 11.36% in recall and from
4.66% to 18.26% in F1-score. Since the key methodological difference between our method and the
benchmark methods lies in the mitigation of concept shift by its concept alignment module, the
performance improvements achieved by our method can be largely attributed to this module. Given the
huge volume of information generated at the early stage of an infodemic, such performance advantages
20
achieved by our method could result in substantially more instances of misinformation being identified by
our method, in comparison to the benchmarks. As a result, a greater volume of misinformation could be
prevented from dissemination, thereby significantly benefiting public health and society at large
(Buchanan 2020, Van Der Linden 2022).
Table 4 Performance Comparison between Our Method and Benchmark Methods (Evaluation Scenario 1).
Method Recall Improvement
by DACA
F1-score Improvement
by DACA
DACA 0.745 0.719
(0.049) (0.034)
MMD 0.704∗∗ 5.82% 0.678∗∗ 6.05%
(0.044) (0.049)
CANMD 0.702∗∗ 6.13% 0.661∗∗ 8.77%
(0.026) (0.030)
DANN 0.658∗∗ 13.22% 0.650∗∗ 10.62%
(0.053) (0.047)
MK-MMD 0.715∗4.20% 0.687∗∗ 4.66%
(0.038) (0.041)
JAN 0.710∗∗ 4.93% 0.686∗∗ 4.81%
(0.030) (0.034)
DALN 0.669∗∗ 11.36% 0.660∗∗ 7.47%
(0.039) (0.052)
MCD 0.671∗∗ 11.03% 0.608∗∗ 18.26%
(0.020) (0.049)
SDAT 0.712∗∗ 4.63% 0.682∗∗ 5.43%
(0.002) (0.015)
Note: Significance levels are denoted by * and ** for 0.05 and 0.01, respectively.
Table 5 reports the average recall and F1-score for each method in evaluation scenario 2, with standard
deviations given in parentheses. Again, our method significantly outperforms each benchmark method in
both evaluation metrics. Specifically, it surpasses three state-of-the-art misinformation detection methods
with recall improvements ranging from 6.15% to 17.00% and F1-score enhancements ranging from 5.47%
to 13.92%. Moreover, in comparison to five representative domain adaptation methods, our method boosts
recall by a range of 4.73% to 12.74% and increases F1-score by a range of 4.13% to 13.47%.12 As a
robustness check, we conducted another empirical evaluation using a public dataset of Chinese news.
Evaluation results reported in Appendix D further support the superior performance of our method over the
benchmarks.
12 For each method, its performance change in two evaluation scenarios is small because only 8% of COVID news is accompanied
by user comments. Further, different methods respond differently to noises and signals in user comments. As a result, dropping user
comments associated with COVID news leads to decreased performance for some methods and improved performance for others.
21
Table 5 Performance Comparison between Our Method and Benchmark Methods (Evaluation Scenario 2).
Method Recall Improvement
by DACA
F1-score Improvement
by DACA
DACA 0.757 0.720
(0.019) (0.021)
MMD 0.713∗∗ 6.15% 0.682∗∗ 5.47%
(0.048) (0.046)
CANMD 0.701∗∗ 7.89% 0.657∗∗ 9.59%
(0.031) (0.033)
DANN 0.647∗∗ 17.00% 0.632∗∗ 13.92%
(0.038) (0.039)
MK-MMD 0.722∗∗ 4.73% 0.691∗∗ 4.13%
(0.031) (0.033)
JAN 0.710∗∗ 6.63% 0.680∗∗ 5.79%
(0.026) (0.036)
DALN 0.671∗∗ 12.74% 0.661∗∗ 8.87%
(0.040) (0.043)
MCD 0.679∗∗ 11.50% 0.633∗∗ 13.47%
(0.021) (0.025)
SDAT 0.695∗∗ 8.80% 0.665∗∗ 8.20%
(0.025) (0.035)
Note: Significance levels are denoted by * and ** for 0.05 and 0.01, respectively.
5.4. Performance Analysis
Having demonstrated the superior performance of our method over the benchmarks, it is intriguing to
delve deeper and analyze the factors contributing to its performance advantages. To this end, we conducted
ablation studies to investigate the contribution of each novel design artifact of our DACA method to its
performance. In particular, we focused on the key novelty of our method – the concept alignment module.
As elaborated in Section 4.4, it consists of two sub-modules: the contrastive learning sub-module and the
concept alignment sub-module. The former represents instances of information in a transformed space,
whereas the latter computes distances between these instances in the transformed space and mitigates
concept shift. To evaluate the overall contribution of the concept alignment module, we removed this
module from the DACA method and designated the resulting method as W/o CL CA (Without both the
Contrastive Learning and the Concept Alignment sub-modules). The performance difference between
DACA and W/o CL CA reveals the the overall contribution of the concept alignment module to the
performance of our method. To further investigate the functioning mechanism of the concept alignment
module, we dropped its contrastive learning sub-module but kept its concept alignment sub-module. We
named the resulting method W/o CL (Without the Contrastive Learning sub-module). The performance
difference between DACA and W/o CL uncovers the role of the contrastive learning sub-module in
concept alignment.
22
Table 6 Ablation Study of DACA (Evaluation Scenario 1).
Method Recall Improvement
by DACA
F1-score Improvement
by DACA
DACA 0.745 0.719
(0.049) (0.034)
W/o CL CA 0.658∗∗ 13.22% 0.650∗∗ 10.62%
(0.053) (0.047)
W/o CL 0.663∗∗ 12.37% 0.648∗∗ 10.96%
(0.047) (0.052)
Note: Significance level is denoted by ** for 0.01.
Table 7 Ablation Study of DACA (Evaluation Scenario 2).
Method Recall Improvement
by DACA
F1-score Improvement
by DACA
DACA 0.757 0.720
(0.019) (0.021)
W/o CL CA 0.647∗∗ 17.00% 0.632∗∗ 13.92%
(0.038) (0.039)
W/o CL 0.663∗∗ 14.18% 0.638∗∗ 12.85%
(0.034) (0.025)
Note: Significance level is denoted by ** for 0.01.
Table 6 presents the average performance of DACA, W/o CL CA, and W/o CL in evaluation scenario
1, along with their respective standard deviations (in parentheses), across 25 experiments. As reported, the
performance of W/o CL CA is significantly inferior to that of DACA, due to the removal of the concept
alignment module. More specifically, W/o CL CA trails DACA by 13.22% in recall and 10.62% in F1-
score, which collectively demonstrate the contribution of the concept alignment module to the performance
of DACA. Furthermore, W/o CL performs significantly worse than DACA but comparable to W/o CL CA.
It is noted that W/o CL has the concept alignment sub-module while omitting the contrastive learning
sub-module. The comparison results thus show that, to realize the benefits of the concept alignment sub-
module, it must be coupled with the contrastive learning sub-module. This empirical finding is in line with
the theoretical analysis in Section 4.1, which defines concept shift over source domain instances and their
corresponding nearest target domain instances. Therefore, to mitigate concept shift, we need a suitable
similarity function that measures similarities between instances in an appropriate representation space. And
the similarity function is learned by the contrastive learning sub-module. Table 7 reports the results of the
ablation study in evaluation scenario 2, consistent with our findings from the ablation study in evaluation
scenario 1. Overall, the ablation studies in both scenarios demonstrate the significant contribution of its
23
concept alignment module to the performance of our method. Moreover, the ablation studies show that it is
appropriate to design this module with two sub-modules: contrastive learning and concept alignment.
6. Conclusions
6.1. Summary and Contributions
To contain harmful effects of an infodemic on public health, it is crucial to detect misinformation at the early
stage of the infodemic. An early stage infodemic is characterized by a large volume of unlabeled information
spread across various media platforms. Consequently, conventional misinformation detection methods are
not suitable for this misinformation detection task because they rely on labeled information in the infodemic
domain to train their models. State-of-the-art misinformation detection methods learn their models with
labeled information in other domains to detect misinformation in the infodemic domain, thereby applicable
to the task. The efficacy of these methods depends on their ability to mitigate both covariate shift and concept
shift between the infodemic domain and the domains from which they leverage labeled information. These
methods focus on mitigating covariate shift but overlook concept shift, making them less effective for the
task. In response, we propose a novel misinformation detection method that addresses both covariate shift
and concept shift. Through extensive empirical evaluations with two widely used datasets, we demonstrate
the superior performance of our method over state-of-the-art misinformation detection methods as well as
prevalent domain adaptation methods that can be tailored to solve the misinformation detection task.
Our study makes the following contributions to the extant literature. First, our study belongs to the area of
computational design science research in the Information Systems (IS) field (Rai et al. 2017, Padmanabhan
et al. 2022). This area of research develops computational algorithms and methods to solve business and
societal problems and aims at making methodological contributions to the literature, e.g., Abbasi et al.
(2010), Li et al. (2017), Zhao et al. (2023). In particular, the methodological contribution of our study lies
in its addressing of concept shift, in addition to covariate shift. More specifically, we theoretically show the
importance of addressing concept shift and how to operationalize it. Built on the theoretical analysis, we
develop a novel concept alignment module to mitigate concept shift, as described in Section 4.4. Second,
given its significant social and economic impact, misinformation detection and management has attracted
attention from IS scholars, e.g., Moravec et al. (2019), Wei et al. (2022), Hwang and Lee (2024). Our study
adds to this stream of IS research with a novel method that is effective in detecting misinformation at the
early stage of an infodemic.
6.2. Implications for Infodemic Management and Future Work
Every epidemic is accompanied by an infodemic, a phenomenon known since the Middle Ages
(Zarocostas 2020). The wide dissemination of misinformation during an infodemic misleads people to
dismiss health guidance and pursue unscientific treatments, resulting in substantial harm to public health
and significant social and economic consequences (Bursztyn et al. 2020, Romer and Jamieson 2020).
24
Furthermore, the pervasive reach of the Internet and social media platforms accelerates the spread of
misinformation and amplifies its harmful impacts on public health and society (Zarocostas 2020).
Detecting misinformation at the early stage of an infodemic discourages people from believing and sharing
it, thereby preventing it from going viral (Buchanan 2020). Hence, early detection of misinformation is
crucial for managing an infodemic and mitigating its adverse effects. Accordingly, a direct implication of
our study is to provide an effective early misinformation detection method for infodemic management. Our
method effectively overcomes obstacles to early detection of misinformation. First, vast amount of
information spread during the early stage of an infodemic makes manual identification of misinformation
impractical. In response, our deep learning-based method automatically learns to differentiate
misinformation from true information. Second, there is no labeled information at the early stage of an
infodemic, rendering traditional misinformation detection methods inapplicable. Accordingly, our method
leverages labeled information in other domains to detect misinformation in the infodemic domain.
Our study empowers infodemic management in several ways. Operators of media platforms can employ
our method to effectively detect misinformation in the infodemic domain that spreads on their platforms.
Subsequently, they can flag and debunk identified misinformation, which helps curb the diffusion of
misinformation and alleviate its negative impacts (Pennycook et al. 2020, Wei et al. 2022). Additionally,
flagged misinformation is less likely to be clicked, thereby reducing revenues for those who monetize
misinformation and diminishing their incentives to create more misinformation (Pennycook and Rand
2021). Moreover, platform operators can trace misinformation identified by our method back to its
producers. They could then restrict those who frequently produce misinformation from publishing on their
platforms. In this regard, platforms are encouraged to set up guidelines to regulate their content producers
(Hartley and Vu 2020). Platform operators can also trace misinformation detected by our method to its
spreaders. Understandably, blocking individuals who disseminate a large amount of misinformation is
important for combating misinformation on their platforms.
Furthermore, our study sheds light on the value of cross-domain data sharing for infodemic
management. At the early stage of an infodemic, even experts have difficulty in distinguishing between
misinformation and true information. Moreover, recruiting experts to verify and label information is costly
(Kim et al. 2018). Cross-domain data sharing enables us to utilize labeled information in other domains to
detect misinformation in the infodemic domain. Hence, it is a viable and cost-effective approach to
misinformation detection and infodemic management. To implement this approach, we need to address a
challenge. Information from different domains exhibits different marginal and conditional distributions,
known as covariate shift and concept shift, respectively. Thus, the challenge is how to mitigate covariate
shift and concept shift between the infodemic domain and source domains that provide labeled
information. Our proposed DACA method is effective in tackling this challenge. To facilitate cross-domain
data sharing for infodemic management, media platforms are recommended to establish data exchange
25
mechanisms to ensure that participating platforms are properly incentivized and labeled information is
securely shared.
Our study has limitations and can be extended in several directions. First, our method employs
information content to detect misinformation. Future work could extend our method by incorporating
information propagation patterns. To this end, graphical neural networks are useful tools for modeling such
patterns. Second, our empirical evaluations are based on fake news datasets, which have been widely
utilized to assess the performance of misinformation detection methods in the literature. It is also
interesting to evaluate the performance of our method in detecting other types of misinformation such as
hyperpartisan news and yellow journalism (Pennycook and Rand 2021). Third, our method could be
generalized to solve other domain adaptation problems, in addition to the misinformation detection
problem tackled in this study. Hence, another area worthy of future investigation is how to generalize our
method to solve the problem of classifying unlabeled target domain instances by leveraging labeled source
domain instances.
References
Abbasi A, Zhang Z, Zimbra D, Chen H, Nunamaker Jr JF (2010) Detecting fake websites: The contribution of statistical learning theory. MIS
Quarterly 435–461.
Adebimpe WO, Adeyemi DH, Faremi A, Ojo JO, Efuntoye AE (2015) The relevance of the social networking media in ebola virus disease prevention
and control in southwestern nigeria. The Pan African Medical Journal 22(Suppl 1).
Agrahari S, Singh AK (2022) Concept drift detection in data stream mining: A literature review. Journal of King Saud University-Computer and
Information Sciences 34(10):9523–9540.
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. Proceedings of the 34th International Conference on Machine
Learning, 214–223.
Asadi K, Misra D, Littman M (2018) Lipschitz continuity in model-based reinforcement learning. Proceedings of the 35th International Conference
on Machine Learning, 264–273 (PMLR).
Atanasov P, Rescober P, Stone E, Swift SA, Servan-Schreiber E, Tetlock P, Ungar L, Mellers B (2017) Distilling the wisdom of crowds: Prediction
markets vs. prediction polls. Management Science 63(3):691–706.
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Machine Learning
79:151–175.
Borah P, Austin E, Su Y (2022) Injecting disinfectants to kill the virus: Media literacy, information gathering sources, and the moderating role of
political ideology on misperceptions about covid-19. Mass Communication and Society 1–27.
Buchanan T (2020) Why do people spread false information online? the effects of message and viewer characteristics on self-reported likelihood of
sharing social media disinformation. Plos one 15(10):e0239666.
Bursztyn L, Rao A, Roth CP, Yanagizawa-Drott DH (2020) Misinformation during a pandemic. Technical report, National Bureau of Economic
Research.
Chen L, Chen H, Wei Z, Jin X, Tan X, Jin Y, Chen E (2022) Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial
domain adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7181–7190.
Ciampaglia GL, Shiralkar P, Rocha LM, Bollen J, Menczer F, Flammini A (2015) Computational fact checking from knowledge networks. PloS
one 10(6):e0128193.
Fang X, Sheng ORL, Goes P (2013) When is the right time to refresh knowledge discovered from data? Operations Research 61(1):32–44.
Freeman D, Waite F, Rosebrock L, Petit A, Causier C, East A, Jenner L, Teale AL, Carr L, Mulhall S, et al. (2022) Coronavirus conspiracy beliefs,
mistrust, and compliance with government guidelines in england. Psychological Medicine 52(2):251–263.
Frosst N, Papernot N, Hinton G (2019) Analyzing and improving representations with the soft nearest neighbor loss. Proceedings of the 36th
International Conference on Machine Learning, 2012–2020 (PMLR).
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, March M, Lempitsky V (2016) Domain-adversarial training of neural
networks. Journal of Machine Learning Research 17(59):1–35.
Hartley K, Vu MK (2020) Fighting fake news in the covid-19 era: policy insights from an equilibrium model. Policy Sciences 53(4):735–758.
Huang Y, Gao M, Wang J, Shu K (2021) Dafd: Domain adaptation framework for fake news detection. Proceedings of the 28th Neural Information
Processing Conference, 305–316.
Hwang EH, Lee S (2024) A nudge to credible information as a countermeasure to misinformation: Evidence from twitter. Information Systems
Research .
Imhoff R, Lamberty P (2020) A bioweapon or a hoax? the link between distinct conspiracy beliefs about the coronavirus disease (covid-19) outbreak
and pandemic behavior. Social Psychological and Personality Science 11(8):1110–1118.
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of
NAACL-HLT, 4171–4186.
26
Kim H, Papamakarios G, Mnih A (2021) The lipschitz constant of self-attention. Proceedings of the 38th International Conference on Machine
Learning, 5562–5571 (PMLR).
Kim J, Tabibian B, Oh A, Sch¨
olkopf B, Gomez-Rodriguez M (2018) Leveraging the crowd to detect and reduce the spread of fake news and
misinformation. Proceedings of the eleventh ACM International Conference on Web Search and Data Mining, 324–332.
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations.
Kouw WM, Loog M (2018) An introduction to domain adaptation and transfer learning. arXiv preprint arXiv:1812.11806 .
Li Y, Jiang B, Shu K, Liu H (2020) Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation. arXiv preprint
arXiv:2011.04088 .
Li Y, Lee K, Kordzadeh N, Faber B, Fiddes C, Chen E, Shu K (2021) Multi-source domain adaptation with weak supervision for early fake news
detection. Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 668–676 (IEEE).
Li Z, Fang X, Bai X, Sheng ORL (2017) Utility-based link recommendation for online social networks. Management Science 63(6):1938–1952.
Liu X, Yoo C, Xing F, Oh H, El Fakhri G, Kang JW, Woo J, et al. (2022) Deep unsupervised domain adaptation: A review of recent advances and
perspectives. APSIPA Transactions on Signal and Information Processing 11(1).
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining
approach. arXiv preprint arXiv:1907.11692 .
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. Proceedings of the 32nd International
Conference on Machine Learning, 97–105 (PMLR).
Long M, Cao Z, Wang J, Jordan MI (2018) Conditional adversarial domain adaptation. Proceedings of the 32nd International Conference on Neural
Information Processing Systems, 1647–1657, NIPS’18 (Red Hook, NY, USA: Curran Associates Inc.).
Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. Proceedings of the IEEE International
Conference on Computer Vision, 2200–2207.
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. Proceedings of the 34th International Conference
on Machine Learning, 2208–2217 (PMLR).
Moravec PL, Minas RK, Dennis AR (2019) Fake news on social media: People believe what they want to believe when it makes no sense at all.
MIS Quarterly 43(4).
Mosallanezhad A, Karami M, Shu K, Mancenido MV, Liu H (2022) Domain adaptive fake news detection via reinforcement learning. Proceedings
of the ACM Web Conference 2022, 3632–3640.
Nan Q, Cao J, Zhu Y, Wang Y, Li J (2021) Mdfend: Multi-domain fake news detection. Proceedings of the 30th ACM International Conference on
Information & Knowledge Management, 3343–3347.
Nan Q, Wang D, Zhu Y, Sheng Q, Shi Y, Cao J, Li J (2022) Improving fake news detection of influential domain via domain-and instance-level
transfer. Proceedings of the 29th International Conference on Computational Linguistics, 2834–2848.
Ng KC, Ke PF, So MK, Tam KY (2023) Augmenting fake content detection in online platforms: A domain adaptive transfer learning