ArticlePDF Available

Promoting Online Evaluation Skills Through Educational Chatbots

Authors:

Abstract and Figures

Online evaluation skills such as assessing the credibility and relevance of Internet sources are crucial for students' self-regulated learning on the Internet, yet many struggle to identify reliable information online. While AI-based chatbots have made progress in teaching various skills, their application in improving online evaluation skills remains underexplored. In this study, we present an educational chatbot designed to train university students to evaluate online information. Participants were assigned to one of three conditions: (1) training with the interactive chatbot, (2) training with a static checklist, or (3) no additional training (i.e., baseline condition). In an ecologically valid test that provided a simulated web environment, participants had to identify the most reliable and relevant websites among several non-target websites to solve given problems. Participants in the chatbot condition outperformed those in the baseline condition on this test, while participants in the checklist condition showed no significant advantage over the baseline condition. These findings suggest the potential of educational chatbots as effective tools for improving critical evaluation skills. The implications of using chatbots for scalable educational interventions are discussed, particularly in light of recent advances such as the integration of large language models into search engines and the potential for hybrid intelligence paradigms that combine human oversight with AI-driven learning tools.
Content may be subject to copyright.
Promoting online evaluation skills through educational chatbots
Nils Knoth
a,*
, Carolin Hahnel
b
, Mirjam Ebersbach
a
a
Institute for Psychology, University of Kassel, Holl¨
andische Straße 36-38, 34127, Kassel, Germany
b
Faculty for Psychology, Ruhr University Bochum, Universit¨
atsstraße 150, 44801, Bochum, Germany
ARTICLE INFO
Keywords:
Online evaluation skills
Digital literacy
Information literacy
Educational chatbot
Scalable interventions
Hybrid intelligence
ABSTRACT
Online evaluation skills such as assessing the credibility and relevance of Internet sources are crucial for students
self-regulated learning on the Internet, yet many struggle to identify reliable information online. While AI-based
chatbots have made progress in teaching various skills, their application in improving online evaluation skills
remains underexplored. In this study, we present an educational chatbot designed to train university students to
evaluate online information. Participants were assigned to one of three conditions: (1) training with the inter-
active chatbot, (2) training with a static checklist, or (3) no additional training (i.e., baseline condition). In an
ecologically valid test that provided a simulated web environment, participants had to identify the most reliable
and relevant websites among several non-target websites to solve given problems. Participants in the chatbot
condition outperformed those in the baseline condition on this test, while participants in the checklist condition
showed no signicant advantage over the baseline condition. These ndings suggest the potential of educational
chatbots as effective tools for improving critical evaluation skills. The implications of using chatbots for scalable
educational interventions are discussed, particularly in light of recent advances such as the integration of large
language models into search engines and the potential for hybrid intelligence paradigms that combine human
oversight with AI-driven learning tools.
1. Introduction
In recent years, several instructional approaches to promote stu-
dents critical evaluation of digital online information have been dis-
cussed in media education (Hobbs & Jensen, 2013; Spante et al., 2018),
ranging from the provision of checklists to support the assessment of
online resources (Blakeslee, 2004; Mandalios, 2013) over MOOCs on
information literacy (Dreisiebner et al., 2021; Guggemos et al., 2022) to
the pedagogical design of curricula that strengthen lateral reading skills
on the Internet (Breakstone et al., 2021; Wineburg et al., 2022). More
recently, a number of studies have shown that collaborative argumen-
tation with another person can also be an effective way to stimulate the
ability to evaluate online information (Mayweg-Paus et al., 2021;
Mayweg-Paus & Macagno, 2016; Zimmermann & Mayweg-Paus, 2021).
Building on these ndings, the current study investigates whether
educational chatbots can also serve as such a partner for collaborative
reasoning about online information, potentially providing a highly
scalable solution to broadly and efciently improve online evaluation
skills. Literature reviews suggest that educational chatbots may be a
promising technology for digital learning, offering unique opportunities
in terms of scalability, personalization, and communicative mentoring
(e.g., Wollny et al., 2021). This is especially true given chatbotsability
to engage users in dialogic reasoning and scaffold the argumentation
process, e.g. through personalized feedback (Costello et al., 2024; Guo
et al., 2022; Wambsganss et al., 2021). However, their suitability for
promoting critical evaluation skills towards online information is not yet
clear, leaving a research gap to be investigated in the current study.
Therefore, this study aims to address the problem of underdeveloped
online information evaluation skills and proposes an approach that uses
an educational chatbot, built on instructional principles, to provide a
scalable and interactive tool to stimulate and scaffold online evaluation
skills.
2. Theoretical background
2.1. Evaluating online information
In a digitalized world, the Internet is individuals central reference
point for information (Gasser et al., 2012; Grothaus et al., 2021).
However, the Internet as a highly dynamic media environment poses
* Corresponding author. Universit¨
at Kassel, Holl¨
andische Straße 36-38, 34127, Kassel, Germany.
E-mail addresses: nils.knoth@uni-kassel.de (N. Knoth), carolin.hahnel@ruhr-uni-bochum.de (C. Hahnel), mirjam.ebersbach@uni-kassel.de (M. Ebersbach).
Contents lists available at ScienceDirect
Computers in Human Behavior: Articial Humans
journal homepage: www.journals.elsevier.com/computers-in-human-behavior-artificial-humans
https://doi.org/10.1016/j.chbah.2025.100160
Received 12 November 2024; Received in revised form 23 April 2025; Accepted 6 May 2025
Computers in Human Behavior: Articial Humans 4 (2025) 100160
Available online 9 May 2025
2949-8821/© 2025 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ).
numerous challenges and new demands on citizens, as there are sys-
tematic differences between online and ofine environments (e.g.,
Kozyreva et al., 2020). The challenges lie in persuasive and manipula-
tive choice architectures, promotion of sources of questionable quality,
biased reporting, advertisements and clickbait, which are further exac-
erbated by the distracting nature of Internet environments and general
information overload (Kozyreva et al., 2020; Lewandowsky & van der
Linden, 2021). This circumstance is partly attributed to the lack of
gatekeepers, quality control, and regulation on the Internet, as well as
low barriers to publishing information, which is further exacerbated by
the use of social bots (Metzger & Flanagin, 2015; Meβmer et al., 2021;
Shao et al., 2018). Under these circumstances, the ability to evaluate
online information and to navigate the Internet competently is a critical
requirement for successful self-directed knowledge acquisition based on
this source of information (e.g., List & Alexander, 2017), and thus a
transversal competence relevant for academic performance as a whole
(Zlatkin-Troitschanskaia et al., 2021).
In addition, the Internet can be an environment of false and
misleading information where students may encounter blatant lies and
disinformation in many Internet sources such as wikis, social media, or
mass media. Studies show that false information on Twitter spreads six
times faster than true information and receives far more retweets and
reactions than true posts, a worrisome nding that is exacerbated when
it comes to political disinformation (Vosoughi et al., 2018). Further-
more, there is a continued inuence effect (Lewandowsky et al., 2012;
Rapp, 2016), which means that mere exposure to inaccurate "facts" can
cause people to incorporate the expressed misinformation into their
understanding, even if their pre-existing understanding was accurate
and even if it is later debunked. These ndings further highlight the
importance of fostering critical evaluation skills to build resilience to an
online environment that seeks to manipulate and polarize (e.g., Lew-
andowsky & van der Linden, 2021).
While the need for online evaluation skills is evident, individuals still
show decits in this skill. A representative large-scale study (Meβmer
et al., 2021) found that only 22 % of the German population had (very)
high digital news and information literacy skills, while 46 % had (very)
low skills. In particular, participants showed difculty distinguishing
between information and disinformation, and between advertising and
opinion. For example, 56 % of respondents considered advertising to be
information, regardless of whether it was in online publications, social
media or YouTube. In addition, participants found it relatively difcult
to assess the trustworthiness of a source and identify conicts of interest,
with only 65 % of respondents able to determine the neutrality of a
source. Similar results have been repeatedly found in U.S. student
populations, who struggled to distinguish news from advertising and
generally misattributed credibility based on incorrect reasons, such as
the "look" of a website or the amount of information provided
(Breakstone et al., 2021; for similar ndings, see Fraillon et al., 2020). In
addition, students practical use of information and communication
technologies is also very heterogeneous and partly inadequate, with 20
% of rst-year students and 52 % of advanced students failing to meet
minimum standards in ICT skills (Senkbeil et al., 2019).
These ndings, coupled with the increased demands for condent
navigation of the Internet as outlined above, can have serious conse-
quences, as this information forms the basis for many beliefs and be-
haviors in peoples daily lives, whether in their studies, work, health,
politics, or consumer decisions. Furthermore, the ability to evaluate
online information has implications for many areas, such as digital
sovereignty (Kaloudis, 2021), democratic opinion and will formation
(Lorenz-Spreen et al., 2020), trust in social institutions (Kavanagh &
Rich, 2018), academic success (Zlatkin-Troitschanskaia et al., 2021), job
opportunities (S´
anchez-Canut et al., 2023), and social participation
(Kerres, 2020). Therefore, new and innovative solutions are needed to
address this issue, as compensating for the lack of these critical skills
through formal education alone is not scalable.
2.2. Past approaches towards fostering individualsevaluation of online
information
Several information literacy interventions were proposed in the
literature, some dating back to the early days of the Internet. Since a
comprehensive review of these approaches is beyond the scope of this
study, only a selection of them will be discussed.
For example, Blakeslee (2004) developed the CRAAP checklist. The
acronym-structured checklist is designed to encourage students to check
online information for currency, relevance, authority, accuracy, and
purpose. Each aspect of the checklist is supported by a series of prompts
designed to help students evaluate the information they encounter. For
example, questions may be asked about relevance ("Does the informa-
tion relate to your topic and answer your questions?") or accuracy ("Is
the information supported by evidence?"). Today, many universities
have embraced the use of this checklist approach to foster information
evaluation (Leeder & Shah, 2016), providing similar lists or advocating
the use of these lists for rst-year media and information literacy
learning (Lenker, 2017). However, the checklist approach was also
criticized because checklists can encourage students to focus exclusively
on individual websites rather than the broader Web or even on super-
cial cues and signals of credibility that can be easily faked in todays
digital landscape (e.g., a sites "look" or top-level domain, an organiza-
tions nonprot status, or links to seemingly authoritative sources
(Kozlowska-Barrios, 2023; Wierzbicki, 2018; Wineburg et al., 2022).
An alternative approach, lateral reading, was introduced in the Civic
Online Reasoning Curriculum (Breakstone et al., 2021; McGrew et al.,
2019; Wineburg et al., 2022). These authors argue that checklists like
CRAAP (Blakeslee, 2004) are outdated, based on analog-era assump-
tions that emphasize careful reading of a single source (Wineburg et al.,
2022). Instead, they posit that the Internet requires a different skill set.
Lateral reading treats the web as a network, where evaluating one source
involves consulting others linked to the topic (McGrew et al., 2019;
Wineburg et al., 2022). The strategy stems from how professional
fact-checkers verify online information. In this sense, using the lateral
reading heuristic in a digital environment means turning to the broader
Web to determine the credibility of online information, rather than
reading vertically, from top to bottom, as proposed by checklist ap-
proaches, and spending too much time on potentially deceptive sites.
This approach has also repeatedly found empirical support, as experi-
ments have shown that students trained in lateral reading judged online
information more accurately than controls (Breakstone et al., 2021;
McGrew et al., 2019). Massive open online courses (MOOCs) were also
developed as an educational tool to promote information literacy and
online evaluation skills (Dreisiebner et al., 2021; Guggemos et al.,
2022). However, evaluations of these courses have shown that students
are often unable to cope with the open, self-directed learning environ-
ment of MOOCs, which is reected in low completion rates (Dreisiebner
et al., 2021).
Another promising approach for critical reection of online infor-
mation is collaborative argumentation. Drawing on ndings from
computer-supported collaborative learning (Chen et al., 2018; Noroozi
et al., 2012), Zimmermann and Mayweg-Paus (2021) hypothesized that
collaborative argumentation with another person promotes the critical
elaboration of online information. Their experiment took place in the
context of teacher training and focused on the selection of online
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
2
information based on either individual or collaborative argumentation
in the form of a chat with another participant. Both groups had to search
for and decide on four web items that supported a sound argumentation
regarding the educational use of mobile phones in the classroom. After
selecting each web item, participants were asked to indicate how and
what criteria they used to make their selections. Results showed that
participants in the collaborative argumentation group expressed
signicantly more elaborate reasoning behaviors when selecting online
information than participants in the individual condition. This effect
remained signicant even after controlling for participants epistemic
beliefs.
Accordingly, collaboratively engaged argumentation while search-
ing for and evaluating online information appears to provide a unique
benet for deeper, elaborative online reasoning and critical reection on
ones online search activities. This benet is attributed to the underlying
cognitive mechanisms of expressing and critically scrutinizing different
perspectives and arguments while engaging in collaborative argumen-
tation (Mayweg-Paus & Macagno, 2016; Noroozi et al., 2012; Zimmer-
mann & Mayweg-Paus, 2021). Subsequently, scrutinizing an
interlocutor may trigger a different cognitive "state of vigilance" than
when confronted with and acquiring online information on ones own,
which is unlikely to lead to metacognitive activation (Chinn & Clark,
2013; Molerov et al., 2020). In this way, approaches aiming at training
critical reection guided by prompting questions (Blakeslee, 2004;
Leeder & Shah, 2016), could also be enriched by such collaborative
aspects. In this respect, chatbots may serve as communication partners
that elicit and scaffold students critical reection processes as they
search for information online.
In summary, there is a variety of approaches to promoting online
information evaluation skills. Many of them contain too much content
and require several hours of instruction (Breakstone et al., 2021; Drei-
siebner et al., 2021; McGrew et al., 2019) or lack the feasibility re-
quirements for use in independent searches (Blakeslee, 2004;
Mayweg-Paus & Macagno, 2016; Zimmermann & Mayweg-Paus, 2021).
Therefore, the present study takes a learner-centered approach. Building
on the aforementioned research, the existing knowledge and pedagog-
ical materials on online reasoning and evaluation are extended with the
possibilities of applying articial intelligence in the form of an educa-
tional chatbot, with the aim of creating a personalized, scalable, and
interactive learning tool for developing online evaluation skills.
2.3. Educational Chatbots
The progress and prominent rise of articial intelligence (AI) systems
in recent years has raised questions about the impact of AI-based tech-
nologies on education (Chiu et al., 2023; Ninaus & Sailer, 2022;
Zawacki-Richter et al., 2019). Although many opportunities and risks
are discussed in this context, some practical implementations of
AI-based tools in education have already taken root. The most common
applications are based on learning analytics (Ahmad et al., 2022) and
the design, development, and implementation of conversational agents
and chatbots (Wollny et al., 2021), following the tradition of intelligent
tutoring systems (DMello & Graesser, 2023; Graesser et al., 2005,
2007). In addition, the discussion about the potential of chatbots for
education has accelerated dramatically since the launch of the large
language model (LLM) ChatGPT by the company OpenAI in November
2022 (Kasneci et al., 2023; Laato et al., 2023).
Although the launch of ChatGPT (OpenAI, 2023) has generated a lot
of public interest in chatbots and related technologies, conversational
agents for educational purposes (so-called pedagogical conversational
agents) have already been developed and investigated in recent years,
given the widespread availability of speech-based articial intelligence
technologies (Google Assistant, Amazons Alexa, Apples Siri, etc.) and
corresponding platforms for hosting and distribution. Conversational
agents are software that allow users to interact with computers through
conversation, for example in the form of chatbots, virtual agents, or
articial conversational entities (Zierau et al., 2020). They can engage in
"natural" conversations with humans through speech- or text-based
modalities. Speech-based conversational agents can use speech as an
interaction channel, based on machine learning and natural language
processing (NLP) methods, through which machines can learn to
converse with users and perform tasks (Schmitt et al., 2021). In contrast,
text-based conversational agents, including chatbots, are mostly based
on a set of xed rules (called intents) or ows to respond to user requests
and thus control the conversation (Budiu, 2018). Such chatbots that
follow a specic predetermined learning path of intent-response pairs
can be described as "ow-based chatbots" (Winkler & Soellner, 2018), in
the sense that the conversational ow of the user-chatbot interaction is
predened. In contrast, the paradigm is currently shifting with the
proliferation of LLMs, which, based on machine learning techniques of
deep learning and reinforcement learning, can serve as the basis for
"articially intelligent chatbots" (Winkler & Soellner, 2018). In the
present study, we used a conversational agent (CAs) that is text-based
and follows predened intents, thus focusing on the eld of educa-
tional "ow-based chatbots", as only these offer the possibility of highly
standardized interventions.
The effectiveness of pedagogical conversational agents and educa-
tional chatbots has been widely studied and discussed in recent years
(Hobert & Berens, 2024; Kuhail et al., 2023; P´
erez-Marín, 2021; Winkler
& Soellner, 2018; Wolly et al., 2021). Due to their high scalability,
accessibility, and possible personalization, educational chatbots can
offer multiple opportunities as tools for digital learning. However,
practical, in-depth experiences with concrete implementations of
educational chatbots and their sound evaluation are largely lacking
(Wollny et al., 2021). Nevertheless, some educational chatbots had a
signicant impact on the learning of various skills, such as language
learning (Annamalai et al., 2023), mathematics (Cheng et al., 2024), or
programming (Daud, 2020), as well as fostering general motivation and
self-efcacy for learning (Fryer et al., 2020; Huang, 2019) or improving
domain-specic skills through formative feedback (Vijayakumar et al.,
2019).
Nevertheless, research also reveals open potentials and gaps, espe-
cially with regard to the promotion of metacognitive skills, such as self-
regulation and reection (P´
erez-Marín, 2021). Similarly, as far as the
promotion of online evaluation skills is concerned, to the best of our
knowledge, there is no research to date that examines the usefulness of
educational chatbots in promoting such skills. Nevertheless, research
has already harnessed the capabilities of chatbots and dialogue systems
to promote individualsargumentative reasoning (Costello et al., 2024;
Guo et al., 2022; Wambsganss et al., 2021), which is conceptually close
to online evaluation skills. Considering the different pedagogical roles
that chatbots can take, Wollny et al. (2021) suggest that more complex
skills may be trainable through educational chatbots if they take on a
mentoring role that goes beyond simply providing information and al-
lows for self-regulated learning and scaffolding when needed (Cabales,
2019). The authors also point out that research on educational chatbots
is mainly driven by technology and lacks a clear pedagogical focus of
learning support. Therefore, the present study aims to address this
Table 1
Cognitive Apprenticeship model that served as major design principle for the
chatbot of the current study.
Principles for designing Cognitive Apprenticeship interactions (Collins, 2005)
Modeling Teacher performs a task, so students can observe
Coaching Teacher observes and facilitates while students perform a task
Scaffolding Teacher provides support to help the student perform a task
Articulation Teacher encourages students to verbalize their knowledge and
thinking
Reection Teacher enables students to compare their performance with others
Exploration Teacher invites students to pose and solve their own problems
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
3
research gap and uses an educational chatbot to explore its usefulness
for teaching online evaluation skills. To ensure that the chatbot is
grounded in learning theories, the cognitive apprenticeship model
(Collins et al., 1987; Collins, 2005) was used as a key design principle.
This model provides concrete design principles for the design of didactic
interactions provided by the chatbot, as it addresses several pedagogical
roles that include learning, mentoring, and assisting processes based on
a clear sequence of activities (Table 1). The implementation of such a
chatbot also provides an opportunity to incorporate the collaborative
argumentation paradigm (Noroozi et al., 2012; Zimmermann &
Mayweg-Paus, 2021) in a scalable conversational agent. A chatbot
following this approach could act as an assistant that accompanies in-
formation search processes, and enables situated learning in an inter-
active way.
2.4. Design principle: Cognitive apprenticeship
The educational use of chatbots is mainly based on the notion of
didactic interactivity (Chi & Wylie, 2014). Following didactic ap-
proaches to interactive learning, conversational agents can be used
because they are able to correctly interpret usersnatural language ex-
pressions and generate appropriate outputs based on natural language
processing methods (Kuhail et al., 2023; Ouyang & Jiao, 2021). Ideally,
certain aspects of human tutoring behavior can be replicated by ma-
chines, allowing for a time-exible, individualized, and scalable form of
teaching and practice that would not be possible with current human
stafng levels in universities. However, for a more rigorous design of
conversational capabilities, a chatbot should be based on concrete
design principles based on established scientic theories that best meet
the requirements of the given application domain (Gregor et al., 2020;
Hevner, 2007).
In order to provide a framework for the development of educational
chatbots, design principles should be based on a learning theory (or
knowledge base) that explains a process for teaching complex cognitive
skills. For this purpose, this study refers to the Cognitive Apprenticeship
model (Collins et al., 1987; Collins, 2005) as the main design principle.
Cognitive Apprenticeship is a theoretical approach based on the situated
learning model (Brown et al., 1989), which emphasizes the "making
visible" of underlying cognitive processes through verbalization and
reection in dialogue as the central learning process (Levin et al., 2021).
The concrete learning process is designed according to six sequential
steps: Modeling, Coaching, Scaffolding, Articulation, Reection, and
Exploration (Table 1). Therefore, according to the resulting design
principles, an information literacy educational chatbot should be able to
enhance usersevaluative skills through realistic and concrete material
("situated learning") in a collaborative dialogue-oriented interaction
that models, coaches, and scaffolds. Furthermore, it should provide
opportunities for users to articulate and reect on their thoughts, as well
as to explore a given problem on their own. These principles of cognitive
apprenticeship were integrated into the chatbot interaction design and
embedded in a context of situated learning within the simulated web
environment of the EVON assessment (Hahnel et al., 2020), making
visible the cognitive processes required for skilled online evaluation
behavior.
Compared to traditional checklist approaches, the instructional steps
of coaching, scaffolding, and articulation were implemented within the
chatbot interactions. In addition, reection was partially implemented
through scripted feedback. Exploration was partially realized through
the study design, as participants had to solve comparable tasks on their
own after interacting with the chatbot. Modeling was partially imple-
mented in that the chatbot in Task 1 described in detail which actions
were to be performed, but did not perform them autonomously
(additional details can be found in Appendix B).
In terms of instructional content related to online evaluation skills,
participants in the present study were provided with guiding questions
and links to sources to evaluate (as hints) to support their thinking.
These guiding questions provided by the chatbot were based on various
checklist approaches (Blakeslee, 2004; Mandalios, 2013) and elements
of MOOCs (Dreisiebner et al., 2021). The types of criteria addressed
were not exhaustive, but rather tailored to the nature of the EVON task
items (Hahnel et al., 2020), providing concrete and standardized cues
for judging information quality (Hilligoss & Rieh, 2008; List et al., 2016;
Lucassen & Schraagen, 2013; Molerov et al., 2020; Rieh, 2002). The
questions were in the form of short evaluation prompts and could serve
as simple decision aids, following the idea of fast and frugal decision
trees, a series of yes-or-no questions leading to a recommended action
(Hafenbr¨
adl et al., 2016; Kozyreva et al., 2020; Luan et al., 2011). Such a
criteria-based approach to information processing could serve to
develop heuristic strategies that enable information literate evaluation
of online information. As a result of such training, participants may no
longer need to consciously elaborate, but can heuristically orient
themselves to these criteria (Bråten & Braasch, 2018; Bromme & Gold-
man, 2014; Rousseau & Gunia, 2016). In addition, each learning step
includes the opportunity to share their reasoning or decision for a
source, allowing the chatbot to monitor the process and provide feed-
back and/or support (see Appendices A and B for more detailed
information).
2.5. Present study
The present experiment investigated the effect of a chatbot inter-
vention on students online evaluation skills. In a simulated web envi-
ronment (EVON, Hahnel et al., 2020), students had to critically examine
given websites and choose the most appropriate one to solve given
problems. Three problems were presented in the training phase and ve
problems served as a transfer test. During the training phase, students
were supported by an interactive chatbot, a static checklist, or received
no further support (i.e., baseline condition).
Building on previous ndings on collaborative argumentation
(Zimmermann & Mayweg-Paus, 2021), educational chatbots may be
more effective in promoting online evaluation skills due to their inter-
activity compared to static materials such as checklists. Therefore, it was
expected that interacting with an educational chatbot in a training phase
would lead to better performance on subsequent test items, compared to
both the checklist and the baseline condition in which participants
received no further support (Hypothesis 1).
In addition, the effectiveness of the checklist approach, in the form of
a simple decision aid or a fast and frugal decision tree (Kozyreva et al.,
2020), was tested against the baseline condition to account for differ-
ences in test performance that could be attributed to the availability of
additional information per se. Therefore, it was expected that partici-
pants provided with static checklist prompts during the training phase
would outperform participants in the baseline condition on the test
(Hypothesis 2).
In addition to behavioral measures, a measure of self-efcacy con-
cerning information searching behavior, the SWE-IV-16 (Behm, 2018),
was employed. Individual self-efcacy concerning information search-
ing behavior is an essential aspect of competent and successful sourcing
of online information, reecting individuals evaluations of their own
competence in sourcing online information (Andreassen & Bråten, 2013;
Bronstein, 2014; Hendriks et al., 2020). It included participantslevel of
information literacy prior to the intervention, and additionally allowed
for this variable to be considered as a potentially relevant moderator of
the differential effectiveness of the intervention. Accordingly, it was
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
4
hypothesized that participants level of self-efcacy for
information-searching behavior would moderate the effectiveness of the
implemented interventions in relation to the test items, such that the
greater the self-efcacy for information-searching behavior, the smaller
the inuence of the intervention (Hypothesis 3).
1
Because the primary goal of the present study was to test whether
participants acquired specic online evaluation skills by interacting
with the chatbot, all hypotheses were based on performance on the test
items where no assistance was available. However, we also explored
differences in performance during the training phase, when either the
chatbot or the checklist was still available. This line of reasoning follows
Salomon et al.s (1991) distinction between effects with technology that
resemble an intellectual partnership between human and technology,
and effects of technology that reect the achievement of cognitive residue
(i.e., learning) from this interaction.
2
In addition, cognitive load and
judgments of learning were explored for differences between conditions.
Finally, a small set of human-computer interactionist variables were
collected in an evaluation of the chatbot interaction, such as chatbot
acceptance and adoption, student perceptions of the chatbot, and mea-
sures of usability experience.
3. Method
This study was preregistered (https://osf.io/fndvy).
3.1. Design
The present laboratory study followed a single-factorial, between-
subjects design with training condition (i.e., chatbot, checklist, baseline)
as the independent variable, and performance on a subsequent practice
test as the dependent variable.
3.2. Sample
To determine the target sample size, we used G*Power (Faul et al.,
2007) for three t-tests with a power of 1-β =0.80 to detect an effect size
of d =0.50, with a Bonferroni adjusted alpha of 0.016. The required
sample size for each condition was N =73, resulting in a total sample
size of N =219. Undergraduate psychology students receiving partial
course credit were recruited. Participants were required to be at least 18
years old and to have a B1 level of prociency in German.
The initial sample consisted of 198 students. Data from 23 partici-
pants were excluded because they either did not complete all test items,
the data transfer failed during the test, or participants in the chatbot
condition stated after the study that they did not click on the solution in
the training phase, but instead told it to the chatbot. Considering the
manipulation check ("Did you interact with the chatbot and explore infor-
mation together?"), 17 participants indicated that they didnt use the
chatbot during the training phase and therefore had to be excluded. For
outlier analysis, z-scores were calculated for test performance across all
participants. Two data points with absolute z-scores >3 were identied
and considered outliers, as this threshold is commonly used to identify
extreme values. Consequently, these data points were excluded prior to
analysis. Unfortunately, we did not reach our target sample size due to
attrition, which compromised the statistical power of our design. This is
discussed in more detail in Section 5.1.
The nal sample included N =156 participants (n =55 in the
chatbot condition, n =49 in the checklist condition, and n =52 in the
baseline condition). Participants had a mean age of M =21.8 years (SD
=3.36; range: 1841 years), of which n =125 were female, n =28 were
male, n =2 considered themselves as non-binary, and n =1 did not
specify gender.
3.3. Material
3.3.1. Educational chatbot based on cognitive apprenticeship
The educational chatbot used in this study was built in Google
Dialogow-ES (https://cloud.google.com/dialogow). Dialogow al-
lows the design of chatbot dialogues based on the construction of intent-
response pairs. Intents describe the users utterances to which the
chatbot will respond with a helpful response. Several potential intents
have been collected based on task requirements and encountered web-
site characteristics. These can be generalized to some extent by the
chatbot using machine learning techniques. Most importantly, the
chatbot used in this study was not based on a Large Language Model
(LLM), but was a script-based chatbot. While AI chatbots based on LLMs
may offer a more dynamic and responsive mode of interaction, the
implementation of a script-based chatbot, involving intent-response
pairs, allows for a high degree of standardization needed for a
rigorous experimental setup. The dialogue was designed to follow the
principles of cognitive apprenticeship (Collins et al., 1987; Collins,
2005) (see Section 2.4). The chatbot never gave the answer to a task on
its own but was intended to stimulate a critical stance on the part of the
participants by asking questions. For a more detailed insight into the
construction of the chatbot, see Appendices A and B.
3.3.2. Checklist intervention material
The guiding questions for the checklist were derived from materials
such as the CRAAP checklist (Blakeslee, 2004) and instructional mate-
rials from an information literacy MOOC (Dreisiebner et al., 2021). No
existing checklist framework was used in its original form. Instead, the
question prompts of the checklist condition were the same as those of the
chatbot condition, which were manually selected to stimulate a critical
stance appropriate for the particular training task, with the question
prompts adapted in a checklist-like format, aspect by aspect. This was
done to allow for maximum comparability between the chatbot and the
checklist condition, so that the expected effectiveness of the chatbot
would be due to its interactive nature, rather than differences in se-
mantic content (see Appendix B for the checklist items).
3.3.3. EVON test for the evaluation of online information
To evaluate the interventions effectiveness, a psychometrically
sound and ecologically valid performance-based assessment was used:
the Evaluation of Online Information test environment (EVON; Hahnel
et al., 2020). The EVON is an interactive, computer-based test envi-
ronment that includes eight different online information evaluation
tasks as items (see Appendices A and B for examples), as well as a
tutorial to familiarize users with the simulation environment. As such,
the EVON is used to assess the ability to evaluate and judge online in-
formation in terms of relevance and credibility for a given information
problem by identifying the appropriate link or web page for the given
search task.
Each item contains a simulated search engine results page and
various links and simulated Web pages that can be explored. The
exploration of the various Web pages depends on the learners choices
and may vary from student to student. The items differ in the number of
links presented (3 versus 5 links), the attractiveness of the links in the
search engine results (i.e., the extent to which the links have charac-
teristics that inuence their perceived informational value for solving a
search task), and the congruence between the links and their corre-
sponding Web pages (i.e., the extent to which the links can create ex-
pectations that may not be met by the information on the Web page).
Such a simulated environment offers two unique advantages over
1
In the pre-registration of this study, it was proposed to conduct an
ANCOVA. After methodological clarication, it was deemed appropriate to
disregard this plan and to conduct a moderation analysis instead, as this method
was suitable to test Hypothesis 3.
2
This exploratory analysis was not preregistered. The important work of
Salomon et al. (1991) in this context was not discovered until after the
pre-registration.
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
5
self-report questionnaires or knowledge tests. First, the controlled
setting of a simulation lends itself well to experimental research,
ensuring standardization for each participant. Second, the near-realistic
nature of a simulated web environment allows for the assessment of
behavioral indicators of online evaluation skills that are very similar to
the behavior used in everyday online searches, and aligns assessment
procedures with implementation goals (Kuhail et al., 2023; Wollny et al.,
2021).
In the present study, the chatbot (or checklist) was designed to
provide didactic support for the rst three items of the EVON (Hahnel
et al., 2020) by training participants to learn specic characteristics of
online information (e.g., lack of credibility due to excessive advertising)
that were also found in the remaining tasks (effects with technology). In
the baseline condition, the rst three items had to be solved without any
instructional support. The remaining ve items had to be completed by
all three groups during the test independently of any assistance (effects of
technology). Training and transfer items were similar in structure and
complexity. Only the topics and occasionally the number of available
links differed. The topics were chosen broadly in the construction of the
original assessment and did not require specialized domain knowledge.
The dependent variable was the number of correct choices of target links
that were the most reliable and relevant among the other non-target
links in the simulated search engine environment. The number of cor-
rect choices (sum scores) of the rst three EVON items were explor-
atively analyzed as effects with technology (training), and those of the
remaining ve items were analyzed as effects of technology (transfer) to
test the hypotheses.
3.3.4. Self-efcacy concerning information searching behavior
The SWE-IV-16, a scale assessing self-efcacy concerning informa-
tion searching behavior (Behm, 2018), is conceptually based on the
process model of information-related problem solving (Brand-Gruwel
et al., 2009) and consists of 16 items. Behm (2018) reported a Cron-
bachs
α
=.85 and a retest reliability (618 months) of r
tt
=0.47 to 0.70.
This reliability was replicated in the present study (Cronbachs
α
=.88).
An example item from the SWE-IV-16 reads: When I am looking for in-
formation on a certain topic or a specic question, I am able to judge the
quality of information (e.g., a specic web page or a journal article)
based on distinct criteria.’’ Individual missing values on this scale were
handled by imputation using the R package mice (Van Buuren and
Groothuis-Oudshoorn, 2011).
3.3.5. Additional measures
Demographic variables (i.e., age, gender, eld of study, semester of
study, and German prociency) were collected. In addition, cognitive
load was measured with three items (modeled after Klepsch & Seufert,
2021), and judgment of learning (JOL) (modeled after Zaromb et al.,
2010) was measured with a single item: ’’Please estimate the percentage
of tasks you solved correctly.’’ These measures were assessed because it
is valuable to observe how cognitive load varies according to different
learning modalities, as cognitive load is central to information pro-
cessing and learning. Both constructs were analyzed exploratory only for
potential differences between treatments.
In addition, participants in the chatbot condition were given an
additional nine items that exploratory evaluated their experience with
the chatbot (see Appendix C). Usability and user experience are
important to determine the perceived usefulness of this learning tool and
to identify issues that require renement in the potential future devel-
opment of comparable tools. Participants were asked whether they had
interacted with the chatbot at all, which served as a manipulation
control. In addition, they rated the extent to which they used the chatbot
to complete tasks, and whether they felt the chatbot interacted with
them and responded to their needs. Participants were also asked to rate
their user experience (i.e., whether interacting with the chatbot was
pleasant, motivating, or difcult). There was also a free text eld to
mention any technical difculties in interacting with the chatbot.
Finally, participants general evaluations of the learning experience
with the chatbot, its usefulness for learning, the perceived emotional
valence during the interaction, and a global evaluation were assessed.
All items were Likert scaled from 1 to 5.
3.4. Procedure
The study was conducted as a laboratory experiment. After partici-
pants selected a seat, initial randomization was performed using the
SoSciSurvey platform. After randomization, all remaining data were
collected using the EVON (Hahnel et al., 2020) environment itself,
where participants were rst asked to provide informed consent to
participate. They then completed the SWE-IV-16 questionnaire (Behm,
2018) to assess their self-efcacy concerning information searching
behavior. Next, participants started the EVON web simulation, in which
they were asked to solve eight information problems. Participants in the
chatbot condition were asked to solve the rst three problems in
collaboration with the chatbot, which appeared in the right corner of the
screen. Participants in the checklist condition were shown the checklist
items next to the problems, allowing for simultaneous processing. In the
baseline condition, participants solved the EVON items without any
assistance. After completing the rst three problems, the remaining ve
problems had to be solved without any assistance in all three conditions.
Afterwards, cognitive load and judgments of learning were measured. In
addition, participants who interacted with the chatbot were asked
additional questions regarding their evaluation and perception of the
chatbot. The study concluded with the collection of socio-demographic
information. All participants had to wait until everyone in the lab was
nished to minimize incentives to "rush" the experiment.
4. Results
4.1. Main analysis
To test H1 and H2, three one-tailed Welch t-tests were performed,
including Bonferroni correction for multiple testing. H1 tested whether
the chatbot condition would outperform the checklist condition and the
baseline condition on test performance (number of items solved
correctly). H2 tested whether the checklist condition would be able to
outperform the baseline condition. To test H3, an additional multiple
regression model with dummy-coded interaction terms was computed,
considering SWE-IV-16 as a potential moderator. All analyses were
performed with R statistical software (v4.2.3; R Core Team, 2023).
4.1.1. Testing the hypotheses
Welch t-tests were conducted to test the three pre-registered hy-
potheses, with a Bonferroni adjusted alpha level of 0.016. Hypothesis 1
was partially supported, showing that the mean number of correctly
solved items was signicantly higher in the chatbot condition than in the
baseline condition, t(86.01) =2.33, p =.011, d =0.46, 95 % CI [0.07,
0.85], indicating a small to medium effect. However, performance in the
chatbot condition was not signicantly better than in the checklist
condition, t(86.33) =1.42, p =.08. In addition, Hypothesis 2, that
participants in the checklist condition would outperform those in the
baseline condition, was not supported, t(98.66) =0.87, p =.19. Result
patterns regarding the differences in test performance on the EVON
items between the three conditions are presented in Fig. 1 (on the right).
For Hypothesis 3, the intervention groups were dummy coded with
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
6
the baseline control group as the reference. A multiple regression
analysis was conducted to examine whether the effectiveness of in-
terventions on test performance was moderated by participants self-
efcacy, F(5, 150) =2.54, p =.03, R
2
=0.078, R
adj
2
=0.047. The pre-
dictor self-efcacy concerning information searching behavior was sig-
nicant, b = 0.87, SE =0.38, t(150) = 2.28, p =.024. The negative
estimate suggests that individuals who report higher levels of self-
reported ability to nd and evaluate information online actually score
lower on a performance-based test. The chatbot condition ×self-efcacy
interaction term was included in the model to test for moderation ef-
fects, but it was not signicant, b =0.76, SE =0.49, t(150) =1.55, p =
.123. Similarly, checklist condition ×self-efcacy was not statistically
signicant as well, b =0.53, SE =0.46, t(150) =1.14, p =.255. Overall,
these results suggest that self-efcacy did not moderate the relationship
between training condition and test performance.
4.2. Exploratory analyses
4.2.1. Effects with technology
As described in Section 2.5, we also examined performance differ-
ences during the training phase, following Salomon et al.s (1991)
distinction between effects of technology and effects with technology. Thus,
we repeated the statistical procedure reported earlier, but this time with
the number of correct solutions during the training phase as the
dependent variable. The t-tests revealed that participants in the chatbot
condition outperformed those in the baseline condition, t(97.99) =2.51,
p =0.006, d =0.49, 95 % CI [0.09, 0.90], indicating a small to medium
effect. Furthermore, no signicant difference was found between the
chatbot condition and the checklist condition, t(98.71) =1.65, p =.051,
and no difference was found between the checklist condition and the
baseline condition, t(108.32) =0.95, p =0.17, similar to the relation-
ships found in the main analysis (see Fig. 1, on the left).
4.2.2. Comparing cognitive load between training conditions
It was expected that the perceived cognitive load during task
Fig. 1. Barplots of correct solutions in the training and test phase, separately for each condition.
Note. Maximum score in the training phase: 3, in the test phase: 5. *p <.05, **p <.01.
Fig. 2. Barplots of Cognitive Load and Judgement of Learning between each experimental condition.
Note. Cognitive load was measured by three items, ranging from 1 to 5; Judgement of Learning was measured by one item. *p <.05, **p <.01.
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
7
processing would be lower for the chatbot and checklist conditions
compared to the baseline condition, as they were assisted in information
evaluation procedures during the training phase, potentially providing
heuristics for the test items. However, the ANOVA revealed no signi-
cant differences between the three conditions in terms of cognitive load,
F(2, 153) =0.15, p =.861. Descriptive data indicated that cognitive load
was moderate across conditions (see Fig. 2, on the left).
4.2.3. Comparing Judgement of Learning between training conditions
It was also expected that judgments of learning (JOL) would be
higher in the chatbot and checklist conditions compared to the baseline
condition, as these conditions were designed to support learning. The
ANOVA revealed a signicant effect of training condition on judgment
of learning, F(2, 151) =3.17, p =.045. Pairwise post-hoc Welch t-tests
revealed signicantly higher JOL in the chatbot condition compared to
the baseline condition, t(99.97) =2.29, p =.012, as well as compared to
the checklist condition, t(98.11) =2.03, p =.022, consistent with the
originally proposed hypothesis 1. The difference between the checklist
and baseline conditions on JOL was not signicant, t(96.91) =0.31, p =
.377, reecting the performance data (see Fig. 2, on the right).
4.2.4. Evaluation of chatbot interactions
With a maximum of ve points on all scales, the studentsevaluation
of the learning experience with the chatbot was rather positive, M =
3.97, SD =0.88. The rating of the perceived helpfulness of the chatbot
for the specic task of evaluating online information, M =3.71, SD =
0.90, as well as the emotional valence during the interaction (sympathy,
fulllment of expectations, etc.), M =3.67, SD =0.93, were also slightly
above the mean. The general evaluation of the chatbot was positive as
well, M =3.88, SD =0.88. An examination of the free-text responses
that participants using the chatbot were asked to provide for feedback
purposes was largely positive (e.g., "I am now more aware of the criteria I
use to evaluate information/sources"). However, some critical feedback
was also found (e.g., 1. "He did not always understand my written answer";
2. "The chatbot did not actively intervene"; 3. "No problems, but in general my
answers were answered with very long texts, so I was somewhat inundated";
4. "You had to go through his script to be able to answer the next question with
him"). These evaluations were mainly related to the fact that the chatbot
constructed for this study followed a scripted dialogue and was addi-
tionally trained on words/phrases that the bot is likely to receive. Par-
ticipants may have expected capabilities that they have experienced
with large language models.
5. Discussion
5.1. Effect of the chatbot intervention on online evaluation skills
This study investigated whether an educational chatbot, as a highly
personalized and scalable intervention technique, can promote students
online evaluation skills. Based on the collaborative argumentation
paradigm, the dialogue with the chatbot was expected to improve the
critical evaluation of online information ability (Zimmermann &
Mayweg-Paus, 2021), which is also in line with the assumptions of the
ICAP framework (Chi & Wylie, 2014) that interactive forms of learning
are superior to purely constructivist, active, or passive forms. The per-
formance of participants who trained their online information evalua-
tion skills with a chatbot was compared to the performance of
participants who trained with a semantically similar but non-interactive
checklist, and to participants in a baseline condition who received no
further support during training.
Participants in the chatbot condition performed signicantly better
on an ecologically valid test than participants in the baseline condition,
suggesting that the chatbot intervention was able to promote more so-
phisticated evaluation skills for online information, even in its short
duration. This nding is particularly compelling given that no benet
was found for the static checklist condition compared to the baseline
condition. However, performance in the chatbot and checklist condi-
tions was not signicantly different. This may be due to the limited
interactive qualities of the chatbot due to its script-based nature. In its
current form, it is merely a more interactive approach to the use of
checklists, but does not include question prompting that adapts to in-
dividual skill levels and provides very limited real-time feedback. These
features could be enhanced by using an LLM-based chatbot, which in
turn lacks standardization due to its probabilistic nature, thus creating a
trade-off point for future research (see also Section 5.3). Statistically, to
further investigate why we did not nd signicant differences, neither
between the checklist and baseline conditions, nor between the chatbot
and checklist conditions, we conducted a post-hoc sensitivity analysis.
Using G*Power (Faul et al., 2007) for one-tailed t-tests, a Bonferroni
adjusted alpha of 0.016, group sizes of approximately n =50, and an
empirical power of 1-β =0.80, the sensitivity analysis suggested that it
would be possible to detect effect sizes of d =0.60. The effect sizes for
the non-signicant comparisons were d =0.17 and d =0.32. Thus, our
experimental setup was not suitable for detecting such small effects.
However, in real-world educational settings, at least moderate effect
sizes are desirable to reect relevant impact (Fritz et al., 2012; Hattie,
2008). Therefore, the effect of our chatbot intervention compared to the
baseline condition is still relevant for practical settings, suggesting that
an interactive educational chatbot can serve as an effective and efcient
intervention strategy to promote online evaluation skills.
Furthermore, a moderation model was calculated on the assumption
that individual differences in information seeking self-efcacy prior to
the study would signicantly affect the effectiveness of the in-
terventions. However, no such moderation effect was found, suggesting
that the chatbot intervention is effective regardless of individuals sub-
jective self-efcacy. Interestingly, self-efcacy concerning information-
searching behavior was negatively correlated with actual test perfor-
mance. This suggests that participantsmetacognition may be system-
atically biased in the context of online evaluation skills. The higher their
actual skills, the lower their self-efcacy. This has implications for future
education and training efforts, as it can be assumed that people with
greater needs will not participate in such efforts because they tend to
overestimate their actual abilities (Kruger & Dunning, 1999; Tempelaar
et al., 2020).
We analyzed not only participants performance on the test (i.e.,
effects of technology), but also their performance during training (i.e.,
effects with technology; Salomon et al., 1991), depending on the experi-
mental condition. This exploratory analysis mirrored the pattern of re-
sults from the main analysis. While participants interacted with the
chatbot, they showed signicantly better performance on the training
items than in the baseline condition, while there were no signicant
differences between the chatbot and checklist conditions, nor between
the checklist and baseline conditions. Furthermore, participants in the
chatbot condition also rated their subjective level of performance higher
in retrospect than in the checklist and baseline conditions (i.e., judg-
ments of learning).
5.2. Limitations
The current work has limitations that should be considered in future
research. While a simulated environment such as the EVON (Hahnel
et al., 2020) used in this study lends itself perfectly to a controlled
setting and rigorous experimental design, this work aimed to stan-
dardize both the information search environment and the chatbot
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
8
interactions to ensure comparability. However, this approach poten-
tially limits the generalizability of the ndings, since in the context of
real-world Internet searches, users are typically confronted with more
than three or ve target links, making the pool of potentially conicting
information much larger. Moreover, the transformation of media plat-
forms in recent years has led to an increase in algorithmic processes and
automation in the production, curation, and ltering of information
(Valtonen et al., 2019). These dynamic aspects, rooted in machine
learning techniques used for targeted advertising or the spread of false
information, could not be replicated and represented in this simulation.
Nevertheless, these aspects may pose greater challenges to usersonline
evaluation skills, making the process of information acquisition, critical
evaluation, and reasoning based on the acquired information compo-
nents more complex in real life than the evaluative competencies
operationalized in a simulation-based approach (Molerov et al., 2020).
In addition, the EVON items were on average quite easy, which was
already noted when the test was constructed (Hahnel et al., 2020). In the
present study, test performance was also quite high (see Fig. 1). More
difcult items would allow for a higher degree of differentiation and a
reduction of ceiling effects, and possibly also lead to a signicant dif-
ference in performance between the chatbot and checklist conditions.
Thus, future studies should either pay attention to the construction of
more difcult items when working with simulation environments or
integrate their information literacy intervention into the real Internet,
accepting the loss of standardization. However, another limitation that
may have inuenced the high average test scores is the use of university
psychology students as the sample of this study - a typical WEIRD
(Western, Educated, Industrialized, Rich, and Democratic) population
(Henrich et al., 2010) - which may not generalize well to other pop-
ulations, especially those with lower baseline digital literacy. In
particular, these populations could be the ones who could benet the
most from the proposed scalable solution of the educational chatbot.
Regarding the non-signicant difference between the chatbot and
checklist conditions, this could be due to low statistical power, or rather,
we assumed too large an effect in our sampling design, an effect that may
be much smaller in reality (see Section 5.1). Alternatively, the chatbot
intervention may have been too short to fully unfold its effectiveness.
For example, Wineburg et al. (2022) found that high school students
who received about 6 h of lateral reading instruction signicantly
improved their judgments about the credibility of different online
sources. Thus, it may take a "heavier lift" to undo studentsonline habits
that have been built up over thousands of hours (Wineburg et al., 2022).
Accordingly, longer interactions with an educational chatbot may have
similar effects, so developing longer interventions than the one in the
current study and monitoring studentsskill development longitudinally
may be a promising way to better understand the impact of chatbots.
While not feasible within the scope of this research, a paradigm shift in
the way we search for information, for example through assisted infor-
mation retrieval such as AI-powered chatbots, could have a lasting
impact if designed with information literacy instructional design prin-
ciples in mind.
5.3. Prospects and future research
The use of chatbots has recently become much more common,
largely due to the widespread availability of large language models such
as ChatGPT or Google Gemini. This has numerous implications for the
information architectures that individuals navigate, as well as for the
nature of information itself. For example, information evaluation pro-
cesses may evolve in the future, making encounters with AI-generated
information and synthetic media much more likely. The development
of automated content creation will change the face of the Internet, likely
making it even more difcult to navigate the vast amount of content. In
the wake of such changes, the importance of educating students in
classic aspects of information and media literacy (e.g., attending to the
authorship or semantic quality of online information) may diminish, and
other factors, such as the development of AI literacy (Knoth, Decker,
et al., 2024; Long & Magerko, 2020; Ng et al., 2021), may become
increasingly important as complementary competencies.
On the other hand, the rapid rise of AI-based chatbots, such as
ChatGPT, and their increasing integration into search engines could also
represent a great opportunity to help citizens of all ages and walks of life
build their online evaluation skills. This study has shown that, in prin-
ciple, chatbots have the potential to foster important critical evaluation
skills if they are constructed and aligned with pedagogical processes. In
this regard, AI-based chatbots should not replace instruction but enrich
it by their adaptive and exible mode. Implementations such as
Microsofts Bing AI may change the way individuals search, write, or
think about digital information altogether. If these chatbots are con-
structed to encourage critical evaluation on the part of the end user, this
development could provide an opportunity to empower citizens across
the board and contribute a valuable element to dealing with infodemic,
conspiratorial, disinformation-infested, or simply low-quality informa-
tion landscapes. The efcacy of such chatbots depends, of course, also on
the quality of the underlying LLMs, which could also be an issue for
future research. The argument sketched here is visualized in Fig. 3,
which compares the simulated websites and constructed educational
chatbot used in this study with a similar website found on the real
Internet and a Bing AI chatbot interface. As such, Fig. 3 not only visu-
alizes the ecological validity of the current study, but also suggests
promising ways to transfer theory to practice in real search engines,
where AI chatbots could be valuable tools for thought if designed
properly.
Although the scenarios described above are possible in principle,
Fig. 3. Simulated website with the educational chatbot (left) and a similar real website with GPT-based Bing AI (right).
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
9
further research is needed to test the ability of LLM-based chatbots to
support individuals online evaluation skills in real Internet environ-
ments. While such LLM-based chatbots may address users partial
dissatisfaction with the scripted chatbot of the current study (see Section
4.2.4), building educational chatbots based on LLMs limits their exper-
imental standardization and may lead to hallucinations and confabula-
tions. Methodologically, future research that integrates LLMs into
instructional scaffolding procedures, such as educational chatbots,
should keep in mind that the quality of LLMsoutput largely depends on
the prompts given by the user (Knoth, Tolzin, et al., 2024). Therefore, an
individuals skill level in prompt engineering may be key to how much
one can benet from these tools.
Moreover, the construction and orientation of the chatbot needs to
be geared towards stimulating critical evaluation in the user, rather than
taking this critical part away from the user, which could lead to des-
killing in this important competency (Rafner et al., 2021). As informa-
tion literacy and online evaluation skills play a crucial role in democratic
functioning and opinion formation, the idea of machines telling us what
information is true the cognitive ofoading of critical evaluative
processes must be rejected. Such a path could lead to high levels of
abuse by bad actors or autocratic systems (Filgueiras, 2022), or could
lead to overreliance on such systems (Zhai et al., 2024). AI tools that
promote online evaluation skills should not act as gatekeepers of in-
formation, but rather as facilitators of critical and reective digital
navigation (Jakesch et al., 2023). As such, information evaluation pro-
cesses should be enriched by chatbots in a hybrid-intelligent manner
(Dellermann et al., 2019; J¨
arvel¨
a et al., 2023), empowering users rather
than disempowering them, and keeping human decision-making in the
loop rather than further automating information display.
6. Conclusions
The primary goal of this study was not to promote online evaluation
skills through articial intelligence per se, but to foster what has been
termed a hybrid intelligence relationship (Akata et al., 2020; Deller-
mann et al., 2019; J¨
arvel¨
a et al., 2023). This approach advocates a
collaborative problem-solving model in which humans and AI systems
interact synergistically, enhancing the strengths and mitigating the
weaknesses of each (Akata et al., 2020). Through the use of educational
chatbots, our study explored the potential of these hybrid intelligence
relationships in cultivating online evaluation skills by acting as a sup-
porting tutor for evaluating the credibility of web content (Wierzbicki,
2018). Our ndings suggest that educational chatbots, which aim to
train online evaluation skills, can serve as effective tools to support in-
formation search and evaluation processes during active use. Notably,
even in the transfer test, participants who had interacted with the
chatbot showed better skills for evaluating online information when
solving problems on their own. Despite the brevity of the intervention,
participants in the chatbot condition selected signicantly more credible
and relevant websites than participants in the baseline condition.
However, comparisons with the checklist condition suggest that this
chatbot-based approach, when implemented in a short-term study, may
not be strong enough to outperform this traditional approach. Never-
theless, its interactive and potentially personalized design holds prom-
ising pedagogical advantages. As a result, educational chatbots could be
a valuable addition to our toolbox for teaching online evaluation skills,
although more research is needed to address the open questions. In
particular, our ndings underscore the promise of hybrid intelligent
systems in educational contexts, suggesting that such tools can serve not
only as repositories of information, but also as active, collaborative
partners in the learning process.
Looking ahead, it is critical that further research be conducted to
rene these AI-based educational tools. Future studies should focus on
creating robust, user-centered designs that evaluate the effectiveness of
these tools not only in terms of efciency and information retrieval, but
also in terms of their ability to foster co-constructive human-AI re-
lationships (e.g., longitudinal studies), examining the long-term conse-
quences of intensive LLM use, and comparing the effects of standard
tools vs. tools ne-tuned to follow pedagogical approaches. These re-
lationships should foster critical thinking and guard against potential
harm, particularly in complex domains such as political elections or
climate change management. The current study provides a rst glimpse
into the potential congurations of AI-enabled digital information en-
vironments. It is imperative that future developments in this eld pri-
oritize the enhancement of human cognitive capabilities through AI
integration, rather than allowing our critical faculties to be subsumed by
an algorithm-driven attention economy. This balanced approach is
essential to address the many challenges of the AI-infused digital in-
formation age.
CRediT authorship contribution statement
Nils Knoth: Writing original draft, Visualization, Validation,
Software, Methodology, Investigation, Formal analysis, Data curation,
Conceptualization. Carolin Hahnel: Writing review & editing, Soft-
ware, Resources, Methodology, Data curation. Mirjam Ebersbach:
Writing review & editing, Supervision, Resources, Project adminis-
tration, Funding acquisition.
Declaration of Generative AI and AI-assisted technologies in the
writing process
Statement: During the preparation of this work the author(s) used
DEEPL Write in order to improve readability and streamline language.
After using this tool/service, the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the content of the
publication.
Funding
The results presented in this article were developed in the research
project Komp-HI funded by the German Federal Ministry of Education
and Research (BMBF, grant 16DHBKI073). We thank the BMBF for
supporting our research.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Appendices.
Appendix A
Snippets of the interaction with the chatbot during task processing (EVON items 1 to 3).
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
10
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
11
Appendix B
Used Scaffolds (Chatbot & Checklist) Cognitive Apprenticeship Modeling of Relevance and Trustworthiness Evaluations.
Task Scenario Designed to train Used Items
Battery
(Item 1)
The laptop battery can no longer be fully charged. Evaluation of relevance and usefulness of various
options on a search engine results page.
Do the search results match my topic?
Before buying a new battery, participants should rst try to
repair it themselves. For this purpose, they should select the
website with the most relevant and trustworthy information.
Does this information meet my current needs?
What is it primarily about there?
Does this information answer my questions?
Cold
(Item 2)
There is a cold epidemic on campus, and participants have
caught it from their classmates. Now they are searching the
Internet for helpful tips on how to get well quickly. For this
purpose, they should select the website with the most useful
and trustworthy information.
Evaluation of objectivity, purpose and authority of
the various presented websites.
Take a closer look at the two websites with the
titles "Cough, cold, common cold?" and
"Pharmaceutical newspaper online".
Read the presented texts and pay attention to the
design features of the websites.
What do you notice about the trustworthiness of
the sites?*
Stress
(Item 3)
In order to get through the upcoming exam period as stress-free
as possible, the participants search the Internet for helpful tips.
This task was specically designed to train the
evaluation of both relevance and trustworthiness
of the various presented websites.
Relevance (selection):Is the amount of
information sufcient to answer your question?
For this purpose, they should select the website with the most
useful and trustworthy information.
Can you identify a target group?
Trustworthiness (selection):Is there an author?
What is the purpose of the information: Is it to
inform, sell, entertain, or persuade?
* In the chatbot condition, users received additional feedback on their answers. Because this question required users to provide a written response to the chatbot
regarding their evaluation of the trustworthiness of both sites, they received either positive feedback or a corrective response depending on their answer.
* It was key to determine that the abundance of advertising and sponsorship on the "Cough, Cold, Common Cold?" page was a sign that this page did not contain
trustworthy information. In contrast, it was key to determine that the lack of advertising, the authority and transparency of the author, and the self-applicable evi-
dence-based tips on the "Pharmaceutical Newspaper Online" page were indicators of trustworthiness and relevance to the information problem.
APPENDIX C
Complete Survey Instrument.
Construct Information and Literature Source Indicator Statements
Self-Efcacy Scale for Information Searching
Behavior
Scale adapted from:
Behm (2018)
When I am looking for information on a certain topic or a specic question
Scale responses: [strongly disagree - strongly
agree]
SES-IB-16_item1 , I know precisely how to select relevant information which is most helpful to answer my question.
SES-IB-16_item2 , I am able to quickly identify the information which is most meaningful and should therefore be
preferred.
SES-IB-16_item3 , I nd it easy to integrate new information with prior knowledge.
SES-IB-16_item4 , I am often unsure when it comes to accessing the information sources that I would like to use. [reverse
coded]
SES-IB-16_item5 , I am quickly clear about the type of information (e.g., scientic publications, statistics, expert opinions,
technical data) required for answering my question.
SES-IB-16_item6 , I am well able to weigh contradictory information adequately.
SES-IB-16_item7 , I am usually able to estimate how much time and effort I still have to invest at any point in time.
SES-IB-16_item8 , I am able to judge the quality of information (e.g., a specic web page or a journal article) based on
distinct criteria.
SES-IB-16_item9 I quickly capture which aspects of my topic or question are more or less important.
SES-IB-16_item10 , and try to appraise whether I have gained a sufcient overview, I am usually right.
SES-IB-16_item11 , I am able to use different information sources in a way that makes me obtain a maximum of relevant
information.
SES-IB-16_item12 and nd some new information [e.g., a website, a book or an expert opinion], I am able to decide quickly
whether it is worth to be considered in detail.
SES-IB-16_item13 , I unerringly recognize how to proceed best to answer my question.
SES-IB-16_item14 , I constantly have the same problems with information searching and have no ideas for improvement.
[reverse coded]
SES-IB-16_item15 , I know exactly which sources of information I should use to nd relevant information on my topic.
SES-IB-16_item16 , I can easily assess whether I chose the optimal course of action or should better change my approach.
Cognitive Load (modeled after Load_1 Please estimate how much mental effort it cost you to complete the tasks.
Klepsch and Seufert (2021) Load_2 Please estimate how much mental effort you invested in completing the tasks.
Load_3 Please estimate how difcult it was for you to complete the tasks.
Judgements of Learning (JoL) (modeled after
Zaromb et al., 2010)
JoL Please estimate, in percent, the proportion of tasks you solved correctly: (i.e., 0 % ="I did not solve any
tasks" to 100 % ="I was able to solve all tasks")
Evaluation of Chatbot interaction Manipulation
check
Have you interacted with the chatbot and explored information together?[yes no]
Items developed by the authors Interaction_1 How often did you use the chatbot to complete tasks?[not at all - very often]
Interaction_2 Did you feel that the chatbot interacted with you and responded to you? [never always]
(continued on next page)
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
12
(continued)
Construct Information and Literature Source Indicator Statements
Scale responses unless otherwise indicated: Learning
experience_1
I found the interaction with the chatbot pleasant
[strongly disagree - strongly agree] Learning
experience_2
I found the interaction with the chatbot motivating
Learning
experience_3
I found the interaction with the chatbot difcult[reverse coded]
Interaction
difculties
Were there any problems in the interaction with the chatbot? If yes, please outline these in bullet points.
Helpfulness_1 The chatbot helped me to process the tasks better.
Helpfulness_2 In general, I nd the chatbot useful for this type of task processing.
Helpfulness_3 The chatbot was a competent help in processing the task.
Valence_1 The chatbot met my expectations.
Valence_2 Personally, I like the chatbot.
Valence_3 I felt comfortable interacting with the chatbot.
Valence_4 I would like to learn with a chatbot more often.
Global evaluation My overall assessment of my experience with this chatbot.[very bad - very good]
Demographics Age Please indicate your age.
Gender Please specify your gender.
Language
prociency
Please indicate your level of prociency in the German language.
Study subject Please indicate your study subject.
Semester count Please indicate the number of semesters you have been studying.
References
Ahmad, A., Schneider, J., Grifths, D., Biedermann, D., Schiffner, D., Greller, W., &
Drachsler, H. (2022). Connecting the dots a literature review on learning analytics
indicators from a learning design perspective. Journal of Computer Assisted Learning.
https://doi.org/10.1111/jcal.12716. Article jcal.12716. Advance online publication.
Akata, Z., Balliet, D., Rijke, M. de, Dignum, F., Dignum, V., Eiben, G., Fokkens, A.,
Grossi, D., Hindriks, K., Hoos, H., Hung, H., Jonker, C., Monz, C., Neerincx, M.,
Oliehoek, F., Prakken, H., Schlobach, S., van der Gaag, L., van Harmelen, F.,
Welling, M. (2020). A research agenda for hybrid intelligence: Augmenting human
intellect with collaborative, adaptive, responsible, and explainable articial
intelligence. Computer, 53(8), 1828. https://doi.org/10.1109/MC.2020.2996587
Andreassen, R., & Bråten, I. (2013). Teachers source evaluation self-efcacy predicts
their use of relevant source features when evaluating the trustworthiness of web
sources on special education. British Journal of Educational Technology, 44(5),
821836. https://doi.org/10.1111/j.1467-8535.2012.01366.x
Annamalai, N., Eltahir, M. E., Zyoud, S. H., Soundrarajan, D., Zakarneh, B., &
Salhi, N. R. A. (2023). Exploring English language learning via chabot: A case study
from a self determination theory perspective. Computers and Education: Articial
Intelligence, 5, Article 100148. https://doi.org/10.1016/j.caeai.2023.100148
Behm, T. (2018). SWE-IV-16 - Skala zur Erfassung der Informationsverhaltensbezogenen
Selbstwirksamkeitserwartung (SWS-IV-16). https://doi.org/10.23668/PSYCHARCH
IVES.4598.
Blakeslee, S. (2004). The CRAAP test. LOEX Quarterly, 31(3). https://commons.emich.
edu/loexquarterly/vol31/iss3/4.
Brand-Gruwel, S., Wopereis, I., & Walraven, A. (2009). A descriptive model of
information problem solving while using internet. Computers & Education, 53(4),
12071217. https://doi.org/10.1016/j.compedu.2009.06.004
Bråten, I., & Braasch, J. L. G. (2018). The role of conict in multiple source use. In
Handbook of multiple source use (pp. 184201). Routledge. https://doi.org/10.4324/
9781315627496-11.
Breakstone, J., Smith, M., Wineburg, S., Rapaport, A., Carle, J., Garland, M., &
Saavedra, A. (2021). Studentscivic online reasoning: A national portrait.
Educational Researcher, 50(8), 505515. https://doi.org/10.3102/
0013189X211017495
Bromme, R., & Goldman, S. R. (2014). The publics bounded understanding of science.
Educational Psychologist, 49(2), 5969. https://doi.org/10.1080/
00461520.2014.921572
Bronstein, J. (2014). The role of perceived self-efcacy in the information seeking
behavior of library and information science students. The Journal of Academic
Librarianship, 40(2), 101106. https://doi.org/10.1016/j.acalib.2014.01.010
Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of
learning. Educational Researcher, 18(1), 32. https://doi.org/10.2307/1176008
Budiu, R. (2018, November 25). The user experience of chatbots. Nielsen Norman Group.
https://www.nngroup.com/articles/chatbots/.
Cabales, V. (2019). Muse. In Extended abstracts of the 2019 CHI conference on human
factors in computing systems (pp. 16). ACM. https://doi.org/10.1145/
3290607.3308450.
Chen, J., Wang, M., Kirschner, P. A., & Tsai, C.-C. (2018). The role of collaboration,
computer use, learning environments, and supporting strategies in CSCL: A meta-
analysis. Review of Educational Research, 88(6), 799843. https://doi.org/10.3102/
0034654318791584
Cheng, L., Croteau, E., Baral, S., Heffernan, C., & Heffernan, N. (2024). Facilitating
student learning with a chatbot in an online math learning platform. Journal of
Educational Computing Research, 62(4), 907937. https://doi.org/10.1177/
07356331241226592
Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to
active learning outcomes. Educational Psychologist, 49(4), 219243. https://doi.org/
10.1080/00461520.2014.965823
Chinn, C. A., & Clark, D. B. (2013). Learning through collaborative argumentation. In
C. E. Hmelo-Silver (Ed.), Educational psychology handbook series. The international
handbook of collaborative learning. Routledge. https://doi.org/10.4324/
9780203837290.ch18.
Chiu, T. K., Xia, Q., Zhou, X., Chai, C. S., & Cheng, M. (2023). Systematic literature
review on opportunities, challenges, and future research recommendations of
articial intelligence in education. Computers and Education: Articial Intelligence, 4,
Article 100118. https://doi.org/10.1016/j.caeai.2022.100118
Collins, A. (2005). Cognitive apprenticeship. In The Cambridge handbook of the learning
sciences (pp. 4760). Cambridge University Press. https://doi.org/10.1017/
CBO9780511816833.005.
Collins, A., Brown, J. S., & Newman, S. E. (1987). Cognitive apprenticeship: Teaching the
crafts of reading, writing, and mathematics. In L. B. Resnick, & R. Glaser (Eds.),
Psychology of education and instruction series. Knowing, learning, and instruction: Essays
in honor of Robert Glaser (pp. 453494). Routledge. https://doi.org/10.4324/
9781315044408-14.
Costello, T. H., Pennycook, G., & Rand, D. G. (2024). Durably reducing conspiracy beliefs
through dialogues with AI. Science, 385(6714). https://doi.org/10.1126/science.
adq1814
DMello, S. K., & Graesser, A. (2023). Intelligent tutoring systems. In Handbook of
educational psychology (pp. 603629). Routledge. https://doi.org/10.4324/
9780429433726-31.
Daud, S. H. (2020). E-JAVA chatbot for learning programming language: A post-
pandemic alternative virtual tutor. International Journal of Emerging Trends in
Engineering Research, 8(7), 32903298. https://doi.org/10.30534/ijeter/2020/
67872020
Dellermann, D., Ebel, P., S¨
ollner, M., & Leimeister, J. M. (2019). Hybrid intelligence.
Business & Information Systems Engineering, 61(5), 637643. https://doi.org/
10.1007/s12599-019-00595-2
Dreisiebner, S., Polzer, A. K., Robinson, L., Libbrecht, P., Bot´
e-Vericad, J.-J., Urbano, C.,
Mandl, T., Vilar, P., ˇ
Zumer, M., Juric, M., Pehar, F., & Striˇ
cevi´
c, I. (2021).
Facilitation of information literacy through a multilingual MOOC considering
cultural aspects. Journal of Documentation, 77(3), 777797. https://doi.org/
10.1108/JD-06-2020-0099
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*power 3: A exible statistical
power analysis program for the social, behavioral, and biomedical sciences. Behavior
Research Methods, 39(2), 175191. https://doi.org/10.3758/bf03193146
Filgueiras, F. (2022). The politics of AI: Democracy and authoritarianism in developing
countries. Journal of Information Technology & Politics, 19(4), 449464. https://doi.
org/10.1080/19331681.2021.2016543
Fraillon, J., Ainley, J., Schulz, W., Friedman, T., & Duckworth, D. (2020). Preparing for
life in a digital world: Iea international computer and information literacy study 2018
international report (1st ed.). Springer International Publishing, 2020 https://library.
oapen.org/handle/20.500.12657/39546, 10.1007/978-3-030-38781-5.
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use,
calculations, and interpretation. Journal of Experimental Psychology: General, 141(1),
218. https://doi.org/10.1037/a0024338
Fryer, L. K., Thompson, A., Nakao, K., Howarth, M., & Gallacher, A. (2020). Supporting
self-efcacy beliefs and interest as educational inputs and outcomes: Framing AI and
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
13
Human partnered task experiences. Learning and Individual Differences, 80, Article
101850. https://doi.org/10.1016/j.lindif.2020.101850
Gasser, U., Cortesi, S. C., Malik, M., & Lee, A. (2012). Youth and digital media: From
credibility to information quality. https://doi.org/10.2139/ssrn.2005272.
Graesser, A. C., Chipman, P., Haynes, B. C., & Olney, A. (2005). Autotutor: An intelligent
tutoring system with mixed-initiative dialogue. IEEE Transactions on Education, 48
(4), 612618. https://doi.org/10.1109/te.2005.856149
Graesser, A. C., Wiley, J., Goldman, S. R., OReilly, T., Jeon, M., & McDaniel, B. (2007).
Seek Web tutor: Fostering a critical stance while exploring the causes of volcanic
eruption. Metacognition and Learning, 2(23), 89105. https://doi.org/10.1007/
s11409-007-9013-x
Gregor, S., Kruse, L., & Seidel, S. (2020). Research perspectives: The anatomy of a design
principle. Journal of the Association for Information Systems, 21, 16221652. https://
doi.org/10.17705/1jais.00649
Grothaus, C., Dolch, C., & Zawacki-Richter, O. (2021). Use of digital media in higher
education across country contexts: A comparison between Germany and Thailand.
International Journal of Emerging Technologies in Learning (IJET), 16(20), 64. https://
doi.org/10.3991/ijet.v16i20.24263
Guggemos, J., Moser, L., & Seufert, S. (2022). Learners dont know best: Shedding light
on the phenomenon of the K-12 MOOC in the context of information literacy.
Computers & Education, 188, Article 104552. https://doi.org/10.1016/j.
compedu.2022.104552
Guo, K., Wang, J., & Chu, S. K. W. (2022). Using chatbots to scaffold EFL students
argumentative writing. Assessing Writing, 54, Article 100666. https://doi.org/
10.1016/j.asw.2022.100666
Hafenbr¨
adl, S., Waeger, D., Marewski, J. N., & Gigerenzer, G. (2016). Applied decision
making with fast-and-frugal heuristics. Journal of Applied Research in Memory and
Cognition, 5(2), 215231. https://doi.org/10.1016/j.jarmac.2016.04.011
Hahnel, C., Eichmann, B., & Goldhammer, F. (2020). Evaluation of online information in
university students: Development and scaling of the screening instrument EVON.
Frontiers in Psychology, 11, 562128. https://doi.org/10.3389/fpsyg.2020.562128
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to
achievement. Routledge.
Hendriks, F., Mayweg-Paus, E., Felton, M., Iordanou, K., Jucks, R., & Zimmermann, M.
(2020). Constraints and affordances of online engagement with scientic
information-A literature review. Frontiers in Psychology, 11, Article 572744. https://
doi.org/10.3389/fpsyg.2020.572744
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature,
466(7302), 29. https://doi.org/10.1038/466029a
Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian Journal
of Information Systems, 19(2). https://aisel.aisnet.org/sjis/vol19/iss2/4.
Hilligoss, B., & Rieh, S. Y. (2008). Developing a unifying framework of credibility
assessment: Construct, heuristics, and interaction in context. Information Processing &
Management, 44(4), 14671484. https://doi.org/10.1016/j.ipm.2007.10.001
Hobbs, R., & Jensen, A. (2013). The past, present, and future of media literacy education.
Journal of Media Literacy Education, 1(1). https://doi.org/10.23860/jmle-1-1-1
Hobert, S., & Berens, F. (2024). Developing a digital tutor as an intermediary between
students, teaching assistants, and lecturers. Educational Technology Research &
Development, 72(2), 797818. https://doi.org/10.1007/s11423-023-10293-2
Huang, W. (2019). Designing and evaluating three chatbot-enhanced activities for a
ipped graduate course. International Journal of Mechanical Engineering and Robotics
Research, 813818. https://doi.org/10.18178/ijmerr.8.5.813-818
Jakesch, M., Bhat, A., Buschek, D., Zalmanson, L., & Naaman, M. (2023). Co-writing with
opinionated language models affects usersviews. In R. Albers, S. Sadeghian,
M. Laschke, & M. Hassenzahl (Eds.), Dying, death, and the afterlife in human-computer
interaction. A scoping review (pp. 115). Universit¨
atsbibliothek der Universit¨
at Siegen.
https://doi.org/10.1145/3544548.3581196.
J¨
arvel¨
a, S., Nguyen, A., & Hadwin, A. (2023). Human and articial intelligence
collaboration for socially shared regulation in learning. British Journal of Educational
Technology, 54(5), 10571076. https://doi.org/10.1111/bjet.13325
Kaloudis, M. (2021). Sovereignty in the digital age how can we measure digital
sovereignty and support the EUs action plan? New Global Studies, 16(3), 275299.
https://doi.org/10.1515/ngs-2021-0015
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F.,
Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G.,
Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T.,
Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of Large
Language models for education. Center for Open Science. https://doi.org/10.35542/
osf.io/5er8f
Kavanagh, & Rich, M. D. (2018). Truth decay: An initial exploration of the diminishing role
of facts and analysis in American public life. Rand Corporation.
Kerres, M. (2020). Bildung in der digitalen Welt: Über Wirkungsannahmen und die
soziale Konstruktion des Digitalen. MedienP¨
adagogik Zeitschrift Für Theorie Und Praxis
Der Medienbildung, 132. https://doi.org/10.21240/mpaed/jb17/2020.04.24.x
Klepsch, M., & Seufert, T. (2021). Making an effort versus experiencing load. Frontiers in
Education, 6. https://doi.org/10.3389/feduc.2021.645284
Knoth, N., Decker, M., Laupichler, M. C., Pinski, M., Buchholtz, N., Bata, K., & Schultz, B.
(2024). Developing a holistic AI literacy assessment matrix bridging generic,
domain-specic, and ethical competencies. Computers and Education Open, 6, Article
100177. https://doi.org/10.1016/j.caeo.2024.100177
Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its
implications for prompt engineering strategies. Computers and Education: Articial
Intelligence, 6, Article 100225. https://doi.org/10.1016/j.caeai.2024.100225
Kozlowska-Barrios, A. (2023). Media and information literacy (MIL) in library
classrooms: Content analysis of news evaluative criteria in instructional worksheets
and checklists. The Journal of Academic Librarianship, 49(3), Article 102680. https://
doi.org/10.1016/j.acalib.2023.102680
Kozyreva, A., Lewandowsky, S., & Hertwig, R. (2020). Citizens versus the internet:
Confronting digital challenges with cognitive tools. Psychological Science in the Public
Interest: A Journal of the American Psychological Society, 21(3), 103156. https://doi.
org/10.1177/1529100620946707
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difculties in
recognizing ones own incompetence lead to inated self-assessments. Journal of
Personality and Social Psychology, 77(6), 11211134. https://doi.org/10.1037/0022-
3514.77.6.1121
Kuhail, M. A., Alturki, N., Alramlawi, S., & Alhejori, K. (2023). Interacting with
educational chatbots: A systematic review. Education and Information Technologies,
28(1), 9731018. https://doi.org/10.1007/s10639-022-11177-3
Laato, S., Morschheuser, B., Hamari, J., & Bj¨
orne, J. (2023). AI-assisted learning with
ChatGPT and Large Language models: Implications for higher education. In 2023
IEEE international conference on advanced learning technologies (ICALT) (pp. 226230).
IEEE. https://doi.org/10.1109/icalt58122.2023.00072.
Leeder, C., & Shah, C. (2016). Practicing critical evaluation of online sources improves
student search behavior. The Journal of Academic Librarianship, 42(4), 459468.
https://doi.org/10.1016/j.acalib.2016.04.001
Lenker, M. (2017). Developmentalism: Learning as the basis for evaluating information.
Portal: Libraries and the Academy, 17(4), 721737. https://doi.org/10.1353/
pla.2017.0043
Levin, D., De La Paz, S., Lee, Y., & Escola, E. N. (2021). Use of cognitive apprenticeship
models of instruction to support middle school studentsconstruction and critique of
written scientic explanations and arguments. Learning Disabilities: A
Multidisciplinary Journal, 26(1). https://doi.org/10.18666/ldmj-2021-v26-i1-10380
Lewandowsky, S., Ecker, U. K. H., Seifert, C. M., Schwarz, N., & Cook, J. (2012).
Misinformation and its correction: Continued inuence and successful debiasing.
Psychological Science in the Public Interest: A Journal of the American Psychological
Society, 13(3), 106131. https://doi.org/10.1177/1529100612451018
Lewandowsky, S., & van der Linden, S. (2021). Countering misinformation and fake news
through inoculation and prebunking. European Review of Social Psychology, 32(2),
348384. https://doi.org/10.1080/10463283.2021.1876983
List, A., & Alexander, P. A. (2017). Cognitive affective engagement model of multiple
source use. Educational Psychologist, 52(3), 182199. https://doi.org/10.1080/
00461520.2017.1329014
List, A., Grossnickle, E. M., & Alexander, P. A. (2016). Undergraduate students
justications for source selection in a digital academic context. Journal of Educational
Computing Research, 54(1), 2261. https://doi.org/10.1177/0735633115606659
Long, D., & Magerko, B. (2020). What is AI literacy? Competencies and design
considerations. In ACM conferences, proceedings of the 2020 CHI conference on human
factors in computing systems (pp. 116). Association for Computing Machinery.
https://doi.org/10.1145/3313831.3376727.
Lorenz-Spreen, P., Lewandowsky, S., Sunstein, C. R., & Hertwig, R. (2020). How
behavioural sciences can promote truth, autonomy and democratic discourse online.
Nature Human Behaviour, 4(11), 11021109. https://doi.org/10.1038/s41562-020-
0889-7
Luan, S., Schooler, L. J., & Gigerenzer, G. (2011). A signal-detection analysis of fast-and-
frugal trees. Psychological Review, 118(2), 316338. https://doi.org/10.1037/
a0022684
Lucassen, T., & Schraagen, J. M. (2013). The inuence of source cues and topic
familiarity on credibility evaluation. Computers in Human Behavior, 29(4),
13871392. https://doi.org/10.1016/j.chb.2013.01.036
Mandalios, J. (2013). Radar: An approach for helping students evaluate Internet sources.
Journal of Information Science, 39(4), 470478. https://doi.org/10.1177/
0165551513478889
Mayweg-Paus, E., & Macagno, F. (2016). How dialogic settings inuence evidence use in
adolescent students. Zeitschrift für Padagogische Psychologie, 30(23), 121132.
https://doi.org/10.1024/1010-0652/a000171
Mayweg-Paus, E., Zimmermann, M., Le, N.-T., & Pinkwart, N. (2021). A review of
technologies for collaborative online information seeking: On the contribution of
collaborative argumentation. Education and Information Technologies, 26(2),
20532089. https://doi.org/10.1007/s10639-020-10345-7
McGrew, S., Smith, M., Breakstone, J., Ortega, T., & Wineburg, S. (2019). Improving
university studentsweb savvy: An intervention study. British Journal of Educational
Psychology, 89(3), 485500. https://doi.org/10.1111/bjep.12279
Metzger, M. J., & Flanagin, A. J. (2015). Psychological approaches to credibility
assessment online. In The handbook of the psychology of communication technology (pp.
445466). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118426456.ch20.
Meßmer, A., S¨
angerlaub, A., & Schulz, L. (2021). Quelle: Internet? Digitale Nachrichten- und
Informationskompetenzen der deutschen Bev¨
olkerung im Test. Stiftung Neue
Verantwortung e.V. https://www.stiftung-nv.de/sites/default/les/studie_quellei
nternet.pdf.
Molerov, D., Zlatkin-Troitschanskaia, O., Nagel, M.-T., Brückner, S., Schmidt, S., &
Shavelson, R. J. (2020). Assessing university students critical online reasoning
ability: A conceptual and assessment framework with preliminary evidence. Frontiers
in Education, 5, Article 577843. https://doi.org/10.3389/feduc.2020.577843.
Article 577843.
Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021). Conceptualizing AI
literacy: An exploratory review. Computers and Education: Articial Intelligence, 2,
Article 100041. https://doi.org/10.1016/j.caeai.2021.100041
Ninaus, M., & Sailer, M. (2022). Closing the loop - the human role in articial
intelligence for education. Frontiers in Psychology, 13, Article 956798. https://doi.
org/10.3389/fpsyg.2022.956798
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
14
Noroozi, O., Weinberger, A., Biemans, H. J., Mulder, M., & Chizari, M. (2012).
Argumentation-based computer supported collaborative learning (ABCSCL): A
synthesis of 15 years of research. Educational Research Review, 7(2), 79106. https://
doi.org/10.1016/j.edurev.2011.11.006
OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https://chat.openai.
com/chat.
Ouyang, F., & Jiao, P. (2021). Articial intelligence in education: The three paradigms.
Computers and Education: Articial Intelligence, 2, Article 100020. https://doi.org/
10.1016/j.caeai.2021.100020
P´
erez-Marín, D. (2021). A review of the practical applications of pedagogic
conversational agents to Be used in school and university classrooms. Digit, 1(1),
1833. https://doi.org/10.3390/digital1010002
R Core Team. (2023). R: A language and environment for statistical computing. Vienna,
Austria: R Foundation for Statistical Computing. URL https://www.R-project.org/.
Rafner, J., Dellermann, D., Hjorth, A., Veraszt´
o, D., Kampf, C., Mackay, W., & Sherson, J.
(2021). Deskilling, upskilling, and reskilling: A case for hybrid intelligence. Morals &
Machines, 1(2), 2439. https://doi.org/10.5771/2747-5174-2021-2-24
Rapp, D. N. (2016). The consequences of reading inaccurate information. Current
Directions in Psychological Science, 25(4), 281285. https://doi.org/10.1177/
0963721416649347
Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in the Web.
Journal of the American Society for Information Science and Technology, 53(2),
145161. https://doi.org/10.1002/asi.10017
Rousseau, D. M., & Gunia, B. C. (2016). Evidence-based practice: The psychology of EBP
implementation. Annual Review of Psychology, 67, 667692. https://doi.org/
10.1146/annurev-psych-122414-033336
Salomon, G., Perkins, D. N., & Globerson, T. (1991). Partners in cognition: Extending
human intelligence with intelligent technologies. Educational Researcher, 20(3), 29.
https://doi.org/10.3102/0013189X020003002
S´
anchez-Canut, S., Usart-Rodríguez, M., Grimalt-´
Alvaro, C., Martínez-Requejo, S., &
Lores-G´
omez, B. (2023). Professional digital competence: Denition, frameworks,
measurement, and gender differences: A systematic literature review. Human
Behavior and Emerging Technologies, 2023(1), 122. https://doi.org/10.1155/2023/
8897227
Schmitt, A., Zierau, N., Janson, A., & Leimeister, J. M. (2021). Voice as a contemporary
frontier of interaction design. In European conference on information systems (ECIS),
virtual, 2021. SSRN. https://ssrn.com/abstract=3910609.
Senkbeil, M., Ihme, J. M., & Sch¨
ober, C. (2019). Wie gut sind angehende und
fortgeschrittene Studierende auf das Leben und Arbeiten in der digitalen Welt
vorbereitet? Ergebnisse eines Standard Setting-Verfahrens zur Beschreibung von
ICT-bezogenen Kompetenzniveaus. Zeitschrift für Erziehungswissenschaft, 22(6),
13591384. https://doi.org/10.1007/s11618-019-00914-z
Shao, C., Ciampaglia, G. L., Varol, O., Yang, K.-C., Flammini, A., & Menczer, F. (2018).
The spread of low-credibility content by social bots. Nature Communications, 9(1),
4787. https://doi.org/10.1038/s41467-018-06930-7
Spante, M., Hashemi, S. S., Lundin, M., & Algers, A. (2018). Digital competence and
digital literacy in higher education research: Systematic review of concept use.
Cogent Education, 5(1), Article 1519143. https://doi.org/10.1080/
2331186X.2018.1519143
Tempelaar, D., Rienties, B., & Nguyen, Q. (2020). Subjective data, objective data and the
role of bias in predictive modelling: Lessons from a dispositional learning analytics
application. PLoS One, 15(6), Article e0233977. https://doi.org/10.1371/journal.
pone.0233977
Valtonen, T., Tedre, M., M¨
akitalo, K., & Vartiainen, H. (2019). Media literacy education
in the age of machine learning. Journal of Media Literacy Education, 11(2). https://
doi.org/10.23860/jmle-2019-11-2-2
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by
chained equations inR. Journal of Statistical Software, 45(3). https://doi.org/
10.18637/jss.v045.i03
Vijayakumar, B., H¨
ohn, S., & Schommer, C. (2019). Quizbot: Exploring formative
feedback with conversational interfaces. In S. Draaijer, D. J. Brinke, & E. Ras (Eds.),
Communications in computer and information science: Vol. 1014, technology enhanced
assessment: 21st international conference, TEA 2018, Amsterdam, The Netherlands,
december 10-11, 2018, revised selected papers (1st ed. 2019, pp. 102120). Springer
International Publishing; Imprint. https://doi.org/10.1007/978-3-030-25264-9_8.
Springer.
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science,
359(6380), 11461151. https://doi.org/10.1126/science.aap9559
Wambsganss, T., Kueng, T., Soellner, M., & Leimeister, J. M. (2021). ArgueTutor: An
adaptive dialog-based learning system for argumentation skills. In CHI 21:
Proceedings of the 2021 CHI conference on human factors in computing systems. https://
doi.org/10.1145/3411764.3445781
Wierzbicki, A. (2018). Web content credibility. Springer eBooks. https://doi.org/10.1007/
978-3-319-77794-8
Wineburg, S., Breakstone, J., McGrew, S., Smith, M. D., & Ortega, T. (2022). Lateral
reading on the open internet: A district-wide eld study in high school government
classes. Journal of Educational Psychology, 114(5), 893909. https://doi.org/
10.1037/edu0000740
Winkler, R., & Soellner, M. (2018). Unleashing the potential of chatbots in education: A
state-of-the-art analysis. Academy of Management Proceedings, 2018(1), Article
15903. https://doi.org/10.5465/AMBPP.2018.15903abstract
Wollny, S., Schneider, J., Di Mitri, D., Weidlich, J., Rittberger, M., & Drachsler, H.
(2021). Are we there yet? - a systematic literature review on chatbots in education.
Frontiers in Articial Intelligence, 4, 654924. https://doi.org/10.3389/
frai.2021.654924
Zaromb, F. M., Karpicke, J. D., & Roediger, H. L. (2010). Comprehension as a basis for
metacognitive judgments: Effects of effort after meaning on recall and
metacognition. Journal of Experimental Psychology: Learning, Memory, and Cognition,
36(2), 552557. https://doi.org/10.1037/a0018277
Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review
of research on articial intelligence applications in higher education where are the
educators? International Journal of Educational Technology in Higher Education, 16(1),
127. https://doi.org/10.1186/s41239-019-0171-0
Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue
systems on students cognitive abilities: A systematic review. Smart Learning
Environments, 11(1). https://doi.org/10.1186/s40561-024-00316-7
Zierau, N., Wambsganss, T., Janson, A., Sch¨
obel, S., & Leimeister, J. M. (2020). The
anatomy of user experience with conversational agents: A taxonomy and
propositions of service clues. In International conference on information systems (ICIS
2020), Hyderabad, India, december 1316, 2020. SSRN. https://ssrn.com/abstract
=3921754.
Zimmermann, M., & Mayweg-Paus, E. (2021). The role of collaborative argumentation in
future teachers selection of online information. Zeitschrift für Padagogische
Psychologie, 35(23), 185198. https://doi.org/10.1024/1010-0652/a000307
Zlatkin-Troitschanskaia, O., Hartig, J., Goldhammer, F., & Krstev, J. (2021). Students
online information use and learning progress in higher education a critical
literature review. Studies in Higher Education, 46(10), 19962021. https://doi.org/
10.1080/03075079.2021.1953336
N. Knoth et al.
Computers in Human Behavior: Articial Humans 4 (2025) 100160
15
... Non-experts seem to have benefitted from tools like chatbots when it comes to factchecking [29,14] and understanding their medical information [30]. However, medical experts assisted by GPT-4 have performed worse, on average, than both the model itself and unassisted experts [22]. ...
Preprint
Full-text available
By late 20th century, the rationality wars had launched debates about the nature and norms of intuitive and reflective thinking. Those debates drew from mid-20th century ideas such as bounded rationality, which challenged more idealized notions of rationality observed since the 19th century. Now that 21st century cognitive scientists are applying the resulting dual process theories to artificial intelligence, it is time to dust off some lessons from this history. So this paper synthesizes old ideas with recent results from experiments on humans and machines. The result is Strategic Reflectivism, which takes the position that one key to intelligent systems (human or artificial) is pragmatic switching between intuitive and reflective inference to optimally fulfill competing goals. Strategic Reflectivism builds on American Pragmatism, transcends superficial indicators of reflective thinking such as model size or chains of thought, and becomes increasingly actionable as we learn more about the value of intuition and reflection.
Article
Full-text available
The growing integration of artificial intelligence (AI) dialogue systems within educational and research settings highlights the importance of learning aids. Despite examination of the ethical concerns associated with these technologies, there is a noticeable gap in investigations on how these ethical issues of AI contribute to students’ over-reliance on AI dialogue systems, and how such over-reliance affects students’ cognitive abilities. Overreliance on AI occurs when users accept AI-generated recommendations without question, leading to errors in task performance in the context of decision-making. This typically arises when individuals struggle to assess the reliability of AI or how much trust to place in its suggestions. This systematic review investigates how students’ over-reliance on AI dialogue systems, particularly those embedded with generative models for academic research and learning, affects their critical cognitive capabilities including decision-making, critical thinking, and analytical reasoning. By using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, our systematic review evaluated a body of literature addressing the contributing factors and effects of such over-reliance within educational and research contexts. The comprehensive literature review spanned 14 articles retrieved from four distinguished databases: ProQuest, IEEE Xplore, ScienceDirect, and Web of Science. Our findings indicate that over-reliance stemming from ethical issues of AI impacts cognitive abilities, as individuals increasingly favor fast and optimal solutions over slow ones constrained by practicality. This tendency explains why users prefer efficient cognitive shortcuts, or heuristics, even amidst the ethical issues presented by AI technologies.
Article
Full-text available
Artificial intelligence technologies are rapidly advancing. As part of this development, large language models (LLMs) are increasingly being used when humans interact with systems based on artificial intelligence (AI), posing both new opportunities and challenges. When interacting with LLM-based AI system in a goal-directed manner, prompt engineering has evolved as a skill of formulating precise and well-structured instructions to elicit desired responses or information from the LLM, optimizing the effectiveness of the interaction. However, research on the perspectives of non-experts using LLM-based AI systems through prompt engineering and on how AI literacy affects prompting behavior is lacking. This aspect is particularly important when considering the implications of LLMs in the context of higher education. In this present study, we address this issue, introduce a skill-based approach to prompt engineering, and explicitly consider the role of non-experts' AI literacy (students) in their prompt engineering skills. We also provide qualitative insights into students’ intuitive behaviors towards LLM-based AI systems. The results show that higher-quality prompt engineering skills predict the quality of LLM output, suggesting that prompt engineering is indeed a required skill for the goal-directed use of generative AI tools. In addition, the results show that certain aspects of AI literacy can play a role in higher quality prompt engineering and targeted adaptation of LLMs within education. We, therefore, argue for the integration of AI educational content into current curricula to enable a hybrid intelligent society in which students can effectively use generative AI tools such as ChatGPT.
Article
Full-text available
Motivated by a holistic understanding of AI literacy, this work presents an interdisciplinary effort to make AI literacy measurable in a comprehensive way, considering generic and domain-specific AI literacy as well as AI ethics. While many AI literacy assessment tools have been developed in the last 2-3 years, mostly in the form of self-assessment scales and less frequently as knowledge-based assessments, previous approaches only accounted for one specific area of a comprehensive understanding of AI competence, namely cognitive aspects within generic AI literacy. Considering the demand for AI literacy development for different professional domains and reflecting on the concept of competence in a way that goes beyond mere cognitive aspects of conceptual knowledge, there is an urgent need for assessment methods that capture domain-specific AI literacy on each of the three competence dimensions of cognition, behavior, and attitude. In addition, competencies for AI ethics are becoming more apparent, which further calls for a comprehensive assessment of AI literacy for this very matter. This conceptual paper aims to provide a foundation upon which future AI literacy assessment instruments can be built and provides insights into what a framework for item development might look like that addresses both generic and domain-specific aspects of AI literacy as well as AI ethics literacy, and measures more than just knowledge-related aspects based on a holistic approach.
Article
Full-text available
People tend to hold overly favorable views of their abilities in many social and intellectual domains. The authors suggest that this overestimation occurs, in part, because people who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it. Across 4 studies, the authors found that participants scoring in the bottom quartile on tests of humor, grammar, and logic grossly overestimated their test performance and ability. Although their test scores put them in the 12th percentile, they estimated themselves to be in the 62nd. Several analyses linked this miscalibration to deficits in metacognitive skill, or the capacity to distinguish accuracy from error. Paradoxically, improving the skills of participants, and thus increasing their metacognitive competence, helped them recognize the limitations of their abilities.
Article
Full-text available
Individualized learning support is an essential part of formal educational learning processes. However, in typical large-scale educational settings, resource constraints result in limited interaction among students, teaching assistants, and lecturers. Due to this, learning success in those settings may suffer. Inspired by current technological advances, we transfer the concept of chatbots to formal educational settings to support not only a single task but a full lecture period. Grounded on an expert workshop and prior research, we design a natural language-based digital tutor acting as an intermediary among students, teaching assistants, and lecturers. The aim of the digital tutor is to support learners automated during the lecture period in natural language-based chat conversations. We implement a digital tutor in an iterative design process and evaluate it extensively in a large-scale field setting. The results demonstrate the applicability and beneficial support of introducing digital tutors as intermediaries in formal education. Our study proposes the concept of using digital tutors as intermediaries and documents the development and underlying principles.
Article
Full-text available
In the current context of increasing digitization, professionals need to be digitally competent. In addition, women’s low participation in the technology field indicates the persistence of a digital gender gap in the economic and social spheres. A key aspect to help reducing digital inequality is the role that the digital competence (DC) plays in the professional development of women, allowing them to enter to a job market still coped by men. The current systematic literature review, following the PRISMA protocol, analyzes the existing definitions of professional DC, the frameworks used to develop it at the workplace, and the gender differences observed. Four main ideas emerge from the review of the 41 selected articles: (1) the need of an enabling professional DC definition to help understand how it operates specifically in professional environments; (2) the expanding role of the DigComp framework to carry out initiatives for assessing, training, developing, advising, or certifying digital competence in professional environments; (3) the identification of seven key dimensions of professional DC; and (4) the need of future studies that go further in the measurement of women’s professional DC, as a response of the lack of data about gender differences in this field. Although the limitations of a systematic literature review, such as publications and database bias, these results are aimed at fostering a shared definition and framework of professional DC that standardizes the measurement and development of this competence, allowing workers, and women in particular, to adapt to the digital transformation, assuring equal access to qualified jobs.
Article
Full-text available
This study applied Self Determination Study to understand 25 undergraduate students' motivation to learn the English language via chatbot. The data collected from interviews were categorized based on three psychological needs of learners: autonomy, competence and relatedness. The interview data were categorized based on the thematic analysis suggested by Braun and Clarke. The findings revealed that chatbots support competence, autonomy, and relatedness. However, the findings also revealed that chatbots lack an emotional environment and give inaccurate English language learning information. To address these problems, students suggested that chatbots should be used solely for assessment during teaching. They also recommended a blended learning approach or a traditional classroom teaching that will clear their doubts after the use of chatbots. Overall, this study adds to the body of knowledge on chatbots and English language learning by highlighting their potential as useful teaching aids and providing guidance for researchers, educators, and developers on how to further improve chatbot-based language learning.
Article
Conspiracy theory beliefs are notoriously persistent. Influential hypotheses propose that they fulfill important psychological needs, thus resisting counterevidence. Yet previous failures in correcting conspiracy beliefs may be due to counterevidence being insufficiently compelling and tailored. To evaluate this possibility, we leveraged developments in generative artificial intelligence and engaged 2190 conspiracy believers in personalized evidence-based dialogues with GPT-4 Turbo. The intervention reduced conspiracy belief by ~20%. The effect remained 2 months later, generalized across a wide range of conspiracy theories, and occurred even among participants with deeply entrenched beliefs. Although the dialogues focused on a single conspiracy, they nonetheless diminished belief in unrelated conspiracies and shifted conspiracy-related behavioral intentions. These findings suggest that many conspiracy theory believers can revise their views if presented with sufficiently compelling evidence.
Article
Chatbots represent a promising technology for engaging students in math learning. Guided by Jerome Bruner’s constructivism and Lev Vygotsky’s Zone of Proximal Development, we designed and developed a chatbot that incorporates scaffolding strategies and social-emotional considerations, and we integrated it into ASSISTments, an online math learning platform. We conducted an experimental study to examine the influence of learning math with the chatbot compared to traditional learning with hints. This study involved 85 middle and high school students from three diverse school settings in the United States. The results revealed no significant differences in students' math learning performance and perceived helpfulness and interest between the chatbot and traditional hints conditions. However, students in the chatbot condition displayed significantly lower confidence in solving a similar problem after the intervention, likely due to the removal of the high level of support provided by the chatbot. Despite this, students’ open responses indicated that a significantly higher number of students had positive attitudes towards chatbots. They appreciated the chatting feature, breaking down a problem into steps, and real-time support. The study concludes with a discussion of the findings and implications for chatbot designers and developers and presents avenues for future research and practice in chatbot-assisted learning. In support of Open Science, this study has been preregistered and both the data and the analysis code used in this study are publicly available at https://osf.io/am3p8/.