PreprintPDF Available

Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Job interviews play a critical role in shaping one's career, yet practicing interview skills can be challenging, especially without access to human coaches or peers for feedback. Recent advancements in large language models (LLMs) present an opportunity to enhance the interview practice experience. Yet, little research has explored the effectiveness and user perceptions of such systems or the benefits and challenges of using LLMs for interview practice. Furthermore, while prior work and recent commercial tools have demonstrated the potential of AI to assist with interview practice, they often deliver one-way feedback, where users only receive information about their performance. By contrast, dialogic feedback, a concept developed in learning sciences, is a two-way interaction feedback process that allows users to further engage with and learn from the provided feedback through interactive dialogue. This paper introduces Conversate, a web-based application that supports reflective learning in job interview practice by leveraging large language models (LLMs) for interactive interview simulations and dialogic feedback. To start the interview session, the user provides the title of a job position (e.g., entry-level software engineer) in the system. Then, our system will initialize the LLM agent to start the interview simulation by asking the user an opening interview question and following up with questions carefully adapted to subsequent user responses. After the interview session, our back-end LLM framework will then analyze the user's responses and highlight areas for improvement. Users can then annotate the transcript by selecting specific sections and writing self-reflections. Finally, the user can interact with the system for dialogic feedback, conversing with the LLM agent to learn from and iteratively refine their answers based on the agent's guidance.
Content may be subject to copyright.
Conversate: Supporting Reflective Learning in Interview
Practice Through Interactive Simulation and Dialogic
Feedback
TAUFIQ DARYANTO, Computer Science, Virginia Tech, United States
XIAOHAN DING, Computer Science, Virginia Tech, United States
LANCE T WILHELM, Computer Science, Virginia Tech, United States
SOPHIA STIL, Computer Science, Virginia Tech, United States
KIRK MCINNIS KNUTSEN, Computer Science, Virginia Tech, United States
EUGENIA H RHO, Computer Science, Virginia Tech, United States
Fig. 1. The overview of Conversate supporting reflective learning in interview practice. Step 1: Interview
Simulation. The AI agent conducts an interactive interview simulation, asking questions and dynamically
generating contextually relevant follow-up questions based on the user’s responses using an LLM. Step 2:
AI-Assisted Annotation. Aer the simulation, the system analyzes the user’s responses and hints suggested
areas for improvement by highlighting the corresponding timestamps on the playback bar. (B). Users can
then select specific transcript sections (C, D), either based on their own assessment or the LLM-highlighted
areas, and provide self-reflections (E). Step 3: Dialogic Feedback. The user can then engage in dialogic
feedback (F) focused on the annotated moments, learning from and iteratively refining their answers (G)
based on the feedback from the AI agent.
Job interviews play a critical role in shaping one’s career, yet practicing interview skills can be challenging,
especially without access to human coaches or peers for feedback. Recent advancements in large language
models (LLMs) present an opportunity to enhance the interview practice experience. Yet, little research has
explored the eectiveness and user perceptions of such systems or the benets and challenges of using LLMs
for interview practice. Furthermore, while prior work and recent commercial tools have demonstrated the
potential of AI to assist with interview practice, they often deliver one-way feedback, where users only
receive information about their performance. By contrast, dialogic feedback, a concept developed in learning
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the
full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
GROUP ’25, January 12–15, 2025,
©2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-XXXX-X/18/06
https://doi.org/XXXXXXX.XXXXXXX
1
arXiv:2410.05570v1 [cs.HC] 8 Oct 2024
GROUP ’25, January 12–15, 2025,
sciences, is a two-way interaction feedback process that allows users to further engage with and learn from the
provided feedback through interactive dialogue. This paper introduces Conversate, a web-based application
that supports reective learning in job interview practice by leveraging large language models (LLMs) for
interactive interview simulations and dialogic feedback. To start the interview session, the user provides the
title of a job position (e.g., entry-level software engineer) in the system. Then, our system will initialize the
LLM agent to start the interview simulation by asking the user an opening interview question and following
up with questions carefully adapted to subsequent user responses. After the interview session, our back-end
LLM framework will then analyze the user’s responses and highlight areas for improvement. Users can then
annotate the transcript by selecting specic sections and writing self-reections. Finally, the user can interact
with the system for dialogic feedback, conversing with the LLM agent to learn from and iteratively rene
their answers based on the agent’s guidance. To evaluate Conversate, we conducted a user study with
19 participants to understand their perceptions of using LLM-supported interview simulation and dialogic
feedback. Our ndings show that participants valued the adaptive follow-up questions from LLMs, as they
enhanced the realism of interview simulations and encouraged deeper thinking. Participants also appreciated
the AI-assisted annotation, as it reduced their cognitive burden and mitigated excessive self-criticism in their
own evaluation of their interview performance. Moreover, participants found the LLM-supported dialogic
feedback to be benecial, as it promoted personalized and continuous learning, reduced feelings of judgment,
and allowed them to express disagreement.
CCS Concepts: Human-centered computing
Human computer interaction (HCI)
Empirical
studies in HCI;
Additional Key Words and Phrases: Job interviews, interview practice, large language models, conversational
AI, interactive simulations, dialogic feedback, reective learning, user study
ACM Reference Format:
Tauq Daryanto, Xiaohan Ding, Lance T Wilhelm, Sophia Stil, Kirk McInnis Knutsen, and Eugenia H Rho.
2024. Conversate: Supporting Reective Learning in Interview Practice Through Interactive Simulation and
Dialogic Feedback. In .ACM, New York, NY, USA, 32 pages. https://doi.org/XXXXXXX.XXXXXXX
1 INTRODUCTION
Job interviews are a critical rst step in shaping an individual’s career trajectory. Research shows
that interview performance is a signicant determinant of employment oers and subsequent
work success, which plays a crucial role in hiring decisions [
50
,
79
,
96
]. Performing well in job
interviews, however, is not without its challenges for many people. Many individuals struggle with
low condence [
39
], anxiety [
43
,
78
], and imposter syndrome [
97
] despite having the required
qualications for a job position. Such obstacles often diminish the candidate’s ability to convey
their abilities for the position, leading to poor interview performances [44].
Research has shown that mock interviews can help alleviate some of these barriers by providing
an opportunity for individuals to practice answering interview questions and receive feedback with
the help of a practice interviewer [
58
]. The process typically involves simulating real interview
scenarios, where the practice interviewer acts as the employer, asking interview questions and
providing critique based on the responses [
52
]. This helps most interviewees to learn and assess
their strengths and areas for improvement in a lower-pressure environment, allowing them to
improve their performance prior to an actual interview [
58
]. In fact, engaging in mock interviews
is generally associated with higher condence and a greater likelihood of success in real interviews
[76, 77] across most industry sectors.
Yet despite such benets, mock interview settings, which typically require another person
as a practice interviewer, can have limitations. For instance, not all individuals have access to
career counselors or expert interviewers [
108
], especially in underserved communities and rural
areas where professional sta may be limited [
80
]. Furthermore, mock interviews can take up a
substantive amount of time [
75
]. Logistical issues, such as scheduling conicts, also restrict access
2
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
to qualied mock interviewers [
55
]. In addition, the cost of hiring mock interviewers could be
very expensive. Platforms like Blind [
17
] or Interviewing.Io [
60
] allow individuals to connect with
current employers at top technology companies like Amazon, Google, and Facebook with whom
they can engage in mock interview sessions. These sessions, however, can cost anywhere from
$100 per hour to $1800 per package, which consists of several practice interview sessions over time.
A more nancially viable option would be to practice with friends or peers. However, research
shows that practicing with peers can be associated with discomfort due to personality dierences
[26].
To address some of these challenges, both industry and academia have begun to explore the
potential of leveraging articial intelligence (AI), particularly large language models (LLMs), to
facilitate mock interviews [
7
,
35
,
55
,
59
,
69
,
84
,
100
,
118
]. For instance, tech companies have
developed interview practice systems, such as Meta.AI Mock Interview [
2
], FinalRoundAI [
1
],
and Yoodli [
3
], which are supported by LLMs that facilitate interview simulations and provide
feedback.
Nevertheless, LLM-mediated interview practice systems oer great potential to overcome some
of the traditional barriers in mock interviews with human partners, and are gradually amassing
public excitement and anticipation [
13
]. However, our knowledge of how users interact with or
perceive such systems remains limited. Furthermore, a key limitation of commercially available
LLM-mediated mock interview systems is that their feedback is unidirectional - meaning the system
outputs feedback to the user based on the user’s performance without further opportunity for
in-depth engagement as part of the user’s learning process. For instance, Yoodli.AI [
3
] gives users a
general assessment of the user’s performance with suggested improvement strategies based on the
user’s responses to the questions. Although these insights are helpful in supporting lower levels of
reection in learning [
47
], researchers argue that feedback should actively involve the user in the
learning process rather than just delivering assessments [19].
To make learning through feedback more interactive and meaningful to the user, some studies
have developed and investigated other feedback approaches, such as the dialogic feedback method
[
27
,
28
,
86
,
114
,
116
]. Dialogic feedback refers to learning about and from feedback through dialogue
by actively involving learners in the sense-making and interpreting of the provided feedback [
114
].
Dialogic feedback incorporates two-way interaction in the feedback process [
28
] where the learner
is not a mere recipient of feedback but an active participant who intentionally seeks feedback for
learning by engaging in dialogue with the feedback provider [
19
]. Studies have shown that this
interactive approach allows users to engage more deeply with the provided feedback, leading to
more ecient up-skilling [40] and better learning experiences and outcomes [53].
Some studies in Human-Computer Interaction (HCI) have demonstrated the benets of using
interactive discourse mediated by LLM agents to improve users’ learning experiences [
34
,
107
].
Such studies are in line with the idea of dialogic feedback, which highlights the importance of
dialogue as a means of encouraging users to actively engage with the feedback process. Currently,
most commercial mock interview systems do not adopt such approaches. While the nature of
open-domain conversation capabilities in LLMs theoretically allows users to inquire further about
their performance, the provided feedback is often prescriptive [
3
], limiting the extent to which
users can take a proactive learning approach [
19
] and engage in agentic sense-making [
115
] from
the feedback process.
To address this gap, we introduce Conversate, an LLM-supported web application that facili-
tates job interview practice with dialogic feedback. Conversate incoporates three main steps: 1)
Interview Simulation. An AI agent conducts an interactive interview simulation, asking initial
questions (e.g., "Tell me about yourself?") and dynamically generating contextually relevant follow-
up questions based on the user’s responses using an LLM. 2) AI-Assisted Annotation. After
3
GROUP ’25, January 12–15, 2025,
the mock interview session, the system analyzes the user’s audio-recorded responses and hints
suggested areas for improvement by highlighting the corresponding timestamps on the playback
bar. Users can also identify potential improvement areas by selecting timestamps corresponding
to responses they felt were weak. As part of their annotations on the selected timestamps, users
can write self-assessment notes, or engage with the LLM agent by asking questions or requesting
targeted feedback on the selected areas. 3) Dialogic Feedback. Users can then engage in interac-
tive discussions around the feedback provided by the LLM agent based on their annotations. For
instance, users can ask the LLM-agent to provide further feedback, clarify its feedback, or request
examples that demonstrate the feedback by conversing with the LLM agent. After this interactive
sense-making discussion, the user can then immediately apply their rened understanding of the
feedback by revising their responses, allowing them to not only put their learning into practice
right away, but also to iteratively improve their skills through repeated practice.
To evaluate Conversate, we conducted a user study with 19 participants to learn about their
perception and use of our system. Our ndings show that one major benet of practicing with an AI
as opposed to a human partner was the aordance to make mistakes in a low-stakes environment
for repeated practice. Participants also found the AI-assisted annotation particularly helpful in
reducing their cognitive burden by prioritizing potential areas that needed improvement. Moreover,
dialogic feedback allows participants to engage in continuous back-and-forth interaction with the
AI, allowing them to engage with the feedback actively, seek to understand areas for improvement,
and promote continuous learning. Additionally, participants valued the ability to revise their
answers while receiving dialogic feedback through the system’s revision feature, as it enabled
them to apply the feedback immediately. Furthermore, our study also discusses the necessity to
balance low-stakes AI practice with high-pressure realism. Moreover, while LLM-based dialogic
feedback has its potential, several challenges exist; for example, when users express disagreement
during the dialogic feedback, the LLM easily agrees with users’ disagreement, which is related to
the sycophantic behavior of LLMs (i.e., LLMs tend to respond to user by providing the answers
that align with the user’s preferences [94]).
The key contributions of this work are as follows:
Conversate, an interactive web-based application that leverages LLMs to facilitate reec-
tive learning through interactive interview simulation, AI-assisted annotation, and dialogic
feedback for job interview practice.
Insights into the users’ perceptions of using LLM-powered conversational AI for interview
practice.
Discussions on the benets of dialogic feedback, comparison of LLM-based dialogic feedback
with dialogic feedback from humans, challenges of using LLMs for dialogic feedback, and
implications of dialogic feedback.
2 BACKGROUND
2.1 Technology-Assisted Interview Practice
Practicing for job interviews is important for job seekers to improve their performance and reduce
anxiety during actual interviews [
124
]. One practical approach to enhancing interview readiness
is interview simulations [
102
]. Early virtual agents for interview simulation, such as MACH [
55
]
and TARDIS [
7
], focused on asking common questions and recognizing user responses through
nonverbal cues. However, they lacked the ability to actively listen and respond appropriately to
the conversation context, which is essential for eective interview practice [
106
,
122
]. To enhance
interactivity, recent studies have enabled agents to ask follow-up questions by leveraging machine
learning models. These models classify responses and generate relevant questions based on the
4
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
user’s preceding response [
59
,
99
], consider dialog history, resumes, and job descriptions [
69
], and
external knowledge from the web [
98
]. Notably, studies demonstrate that practicing with such
virtual agents can be as eective as practicing with human interviewers [
38
,
73
,
74
], suggesting
their potential as valuable interview preparation tools.
Researchers have also studied various mechanisms for providing feedback following practice
sessions. For example, Chou et al. [
35
] evaluated the interviewees’ verbal responses from the
context of soft skills, while Naim et al. [
84
] evaluated their responses on uency and structure.
Advancements in multimodal deep learning have also allowed researchers to develop systems that
can evaluate users’ non-verbal communication skills during interview sessions [
118
]. Similarly,
recent advancements in LLMs and conversational AI like ChatGPT [
88
] enable more human-like
dialogue capabilities, which presents an opportunity to enhance the interview practice experience by
facilitating natural conversations akin to actual interviews [
62
]. Commercial tools like Meta.AI Mock
Interview [
2
], FinalRoundAI [
1
], and Yoodli [
3
] have leveraged LLMs to simulate interactive mock
interviews. While promising, how users perceive such LLM-powered AI systems for interactive
interview practice is unclear. Building upon prior studies and recent technological developments in
mock interview systems, we aim to examine how users perceive and interact with LLM-supported
systems for mock interview practice and feedback.
2.2 Facilitating Reflective Learning Through Technological Design
The concept of reective learning refers to critically examining self-experiences to gain a deeper
understanding and learn from them [
20
]. By doing so, reective learning helps users to identify
areas for improvement and develop their skills. In the context of interview practice, reection
is important for individuals to think about their experience and how they respond to interview
questions [
67
]. As such, studies have demonstrated how incorporating reective learning in mock
interviews aids not only people’s learning experience and process but also their interview skills
and performance [67].
Prior HCI researchers have incorporated various technological designs in training systems to
support reective learning [
9
,
23
,
33
,
128
]. Some of these features include facilitating role-play
[
33
], reviewing recordings [
9
,
128
], annotation [
9
,
33
], and feedback [
33
]. For example, Pin-MI
[
33
] facilitates reective learning in mental health counseling through role-playing with real-time
annotation and peer feedback. Additionally, Zhou et al. [
128
] used virtual reality (VR) to enable
individuals to review their public speaking recordings, facilitating reective learning.
Furthermore, designing eective reective learning systems requires three key components
proposed by Slovak et al. [
111
]: explicit,social, and personal. The explicit component refers to
structuring experiences for learners to simulate their practice, such as through role-playing scenarios
or simulations. The social component recognizes the value of a supportive learning environment,
for instance, through supportive mentors who can provide feedback. The personal component
relies on learners’ motivation to actively engage with and learn from their experiences [111].
When simulating experiences in reective learning, Slovak et al. [
111
] emphasize that the
experience should be realistic enough [
110
] while ensuring that learners are not overwhelmed by
emotions. Hence, to support the explicit component of reective learning, Conversate aims to
make interview simulations more realistic by generating contextually relevant follow-up questions.
For the personal component, users can annotate and self-reect on their transcripts. Furthermore,
to incorporate the social component, Conversate engages users in a dialogic feedback process
with the LLM agent.
2.2.1 Annotation to Support Reflective Learning. Annotation has been widely used to facilitate
reective learning [
9
,
33
,
37
,
81
] as it helps people to review and analyze their own recordings [
103
].
5
GROUP ’25, January 12–15, 2025,
In the context of interview practice, annotation can be benecial as it helps individuals recognize
important moments and identify their strengths and weaknesses [
33
]. However, annotations during
reection can be overwhelming, especially when the practice session is lengthy [
24
]. To address
this challenge, several studies have explored leveraging machine learning to identify signicant
moments [
9
,
10
]. For instance, INWARD [
9
] uses machine learning to detect important visual
cues during video-recorded executive coaching sessions, assisting users in annotating relevant
segments for reective learning. Inspired by these studies, Conversate incorporates an AI-assisted
annotation feature. It uses AI to highlight user responses that require improvement, facilitating the
annotation process during reective learning.
2.2.2 Feedback to Support Reflective Learning. Feedback is essential in supporting reective learn-
ing in interview practice, as it provides external evaluation and identies areas for improvement
[
8
,
124
]. Several studies in reective learning have explored the use of feedback to enhance the
learning experience, such as incorporating peer feedback [33, 126] or AI-generated feedback [57].
While peer feedback can be eective by oering new perspectives and facilitating discussions [
33
],
it involves the participation of other individuals, which may not be accessible or convenient for
individual interview practice [
55
]. Additionally, prior work that provides AI feedback for reective
learning [
57
] typically oers one-way feedback, limiting the ability for users to engage in dialogue
and seek clarication or additional guidance. To address these limitations, Conversate aims to
implement dialogic feedback to support reective learning in interview practice.
2.3 Dialogic Feedback
Feedback is important to enhance interview performance as it identies areas of improvement [
54
].
However, feedback involves more than simply providing information to learners about their practice
[
19
]. Eective feedback necessitates engaging users in a way that results in noticeable changes [
19
].
Prior work by Boud et al. [
19
] emphasizes that for feedback to be eective, the feedback loop must
be closed, meaning that feedback is not only provided but also received, understood, and acted upon
by the learners. Such studies emphasize that meaningful engagement with feedback for learners
is a process rather than an outcome, where learners are not merely recipients of information but
active agents who actively seek information for learning. One way to achieve this is through a
two-way interaction between the learner and the feedback provider, where the learner plays a
more active role in seeking and using the feedback [
19
]. This two-way interaction in the feedback
process can be termed "Dialogic Feedback."
Several studies have focused on conceptualizing and exploring dialogic feedback [
27
,
28
,
86
,
114
,
116
]. Based on Steen et al. [
114
], dialogic feedback refers to learning about and from feedback
through dialogue, involving learners in actively interpreting and making meaning from the feedback.
It is a process where feedback occurs through a conversational exchange, enabling learners to seek
clarication, ask questions, and engage with the feedback to construct a deeper understanding
[114].
To conceptualize dialogic feedback, Steen et al. [
114
] suggested four main dimensions that are
important to support learners: (1) emotional and relational support, where dialogic feedback is
viewed as a relational and emotional process, thus facilitating a supportive learning environment
using empathetic language is important; (2) maintenance of the dialogue, ensuring the continuation
of the feedback dialogue to enrich the interactive exchange; (3) opportunities for users to express
themselves, allowing learners to articulate their thoughts and reect on their understandings within
the dialogue; and (4) contribution to individual growth, creating opportunities where learners
receive feedback as they express themselves, leading to cognitive development.
6
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
Similar to the concept of dialogic feedback, some related works in HCI align with the principles
of interactive dialogue and feedback exchange aimed at enhancing user’s learning experiences.
For instance, a study by Chopra et al. [
34
] developed an LLM-based conversational AI to assist in
software debugging and demonstrated that multi-turn conversational interaction provides users
with better explanations than one-way interaction. Another study by Ross et al. [
107
] showed
that users valued an LLM-based conversational programming assistant as it helped them ask
follow-up questions and better understand code. While these prior works have built conversational
AI to facilitate learning, they are not grounded in the literature on dialogic feedback theories
[
27
,
28
,
86
,
114
,
116
], meaning that they may overlook key elements that make dialogic feedback
more eective for learners (e.g., the four elements that we mentioned above based on Steen et
al. [
114
]). Hence, most of these studies do not empirically assess the benets and challenges of
incorporating LLM-based dialogic feedback in learning systems. By contrast, the novelty of our
work lies in integrating dialogic feedback in an LLM-based learning system by incorporating key
components, grounding it rmly in the literature, and empirically examining both the benets and
challenges of leveraging LLMs to facilitate dialogic feedback in a learning context.
Inspired by prior studies demonstrating the benets of AI-powered conversations in learning
contexts, Conversate incorporates a dialogic feedback approach to facilitate an interactive dialogue
between the AI agent and the user. By doing so, we aim to improve the user’s meaning-making
process by discoursively engaging more in-depth with the provided feedback. Our design imple-
mentation is also grounded in the four main dimensions of dialogic feedback identied by Steen
et al. [
114
]. With this design approach, our work examines how users interact with and perceive
LLM-facilitated dialogic feedback in our mock interview system.
3 CONVERSATE DESCRIPTION
There are three main components of Conversate: 1) Interview Simulation, 2) AI-Assisted Anno-
tation, and 3) Dialogic Feedback. These three components align with the key components in the
reective learning system proposed by Slovak et al. [
111
]: explicit, personal, and social (§2.2). Con-
versate supports the explicit component of reective learning through the interactive Interview
Simulation. For the personal component, users can annotate and self-reect on their transcripts
using the AI-assisted annotation feature. Furthermore, to incorporate the social component, Con-
versate engages users in a Dialogic Feedback process with the LLM agent. The design decision for
each component was informed by existing literature, as explained in the related work section and
described further in each subsection.
3.1 User Flow Overview
When using Conversate, the user ow begins with the user entering the job title they want to
practice an interview for. They then engage in an Interview Simulation session with an AI agent.
The AI agent asks pre-dened interview questions one by one. After each question, the user records
their verbal response, and the AI agent dynamically generates and asks a contextually relevant
follow-up question based on the user’s response. This back-and-forth interaction continues until
the user has answered all the questions.
Upon completing the interview simulation session, the user proceeds to the AI-Assisted An-
notation step. In this step, the system analyzes the user’s recorded responses and highlights
areas for improvement. The user can then view the transcript, select specic sections based on the
highlighted areas or their own preferences, and provide self-reections on their performance.
Next, the user engages in Dialogic Feedback focused on the annotated moments. The user
interacts with the dialogic feedback to help them learn from the feedback and iteratively rene
7
GROUP ’25, January 12–15, 2025,
Fig. 2. Step 1: Interview Simulation. The AI agent conducts an interactive interview simulation, asking
initial questions (e.g., "Tell me about yourself?") and dynamically generating contextually relevant follow-up
questions based on the user’s responses using an LLM. Note: The pixelated area shows the participants who
appeared on camera. It has been pixelated to comply with anonymization rules.
their answers. Once this process is nished, the user can save this session and get feedback on
other moments as desired.
3.2 Interview Simulation
The rst step provides an interactive simulation for users to practice answering interview questions
with the help of an AI agent. The AI agent verbally asks interview questions, and the user records
their answers. Leveraging GPT-3.5-Turbo from OpenAI, combined with their text-to-speech and
speech-to-text models, the AI agent can dynamically generate and ask contextually relevant follow-
up questions based on the user’s responses. This aims to enhance the interactivity and realism of
the simulated interview experience.
For the interview simulation, we curated a primary set of common interview questions drawing
from prior research [
6
] and consultations with two career counselors and a career center director
at an R1 university. In the simulation, we prompted GPT-3.5-Turbo to act as the interviewer agent,
asking questions one by one to the user. In response to each question, the user records their response.
Subsequently, GPT-3.5-Turbo generates a contextually relevant follow-up question that is adapted
based on the user’s previous responses and conversation history. This approach aims to make the
interview simulation more realistic and aligned with real-world interview settings, as follow-up
questions from the interviewer in actual interviews are commonly based on interviewee responses
[
68
]. During the user study, we restricted the number of follow-up questions for each main question
to one in order to maintain a manageable duration for the simulated interview session.
3.3 AI-Assisted Annotation
Transcript-Navigation & Playback Bar: After the interview session, our system will show a
timestamped transcription of the interview as shown in the left-hand panel of (Fig 3). Additionally,
we implement a transcript-based navigation feature to support user interaction, allowing users to
navigate through the audio recording by clicking on sentences in the generated transcript. When a
8
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
Fig. 3. Step 2: AI-Assisted Annotation. Aer the simulation, the system analyzes the user’s responses and
highlights areas for improvement (A). Users can then select specific transcript sections (B, C), either based on
their own preferences or the highlighted areas, and provide self-reflections (D).
user clicks on a sentence, the audio playback bar on the top will move its cursor to the corresponding
timestamp. This enables users to access specic timestamps by reading the transcript and clicking
on sentences to hear the corresponding audio recordings. This navigation approach is similar to a
feature found on platforms like edX [
41
] and TED talk platforms [
117
], which allow users to click
on words in the transcript to jump to the specic timestamp [93].
3.3.1 AI-Highlighted Hints. After the interview simulation, the system analyzes the user’s recorded
responses using GPT-3.5-Turbo and highlights areas as hints to improvement areas in the user’s
mock interview performance. Specically, we prompt-engineered the LLM to evaluate each of the
user’s answers based on these criteria: answer detail, structure, and adherence to good communi-
cation principles, including eectiveness, appropriateness, eciency, clarity, and competence in
answering the interview question correctly [
112
]. Answers that lack sucient detail, have a weak
structure, or fail to follow these communication principles are classied by the LLM as needing
improvement. These evaluation criteria were developed based on prior research [
35
,
84
] and in
consultation with the director of the career center and two career counselors at the university. We
did not do any additional ne-tuning for our model. We chose simple prompt engineering for our
approach since the focus of this paper is more on the early-stage exploration of user perceptions
towards LLM for interview practice (The detailed prompt is written in Appendix B.2). As such, we
are aware of some limitations related to simple prompt engineering and have mentioned them in
the limitation section.
After the LLM evaluates the user’s answer, the system then highlights the corresponding sections
in the transcript, as well as the relevant time ranges on the audio recording’s playback bar in orange,
as shown in (Fig. 3A). This highlighting helps users during the annotation process by visually
indicating portions of the transcript and audio recording where the AI suggests the user’s answers
9
GROUP ’25, January 12–15, 2025,
could be rened or expanded upon. By doing so, this feature aims to help users identify and revisit
important moments quickly, reducing the cognitive overload of reviewing the entire transcript,
which can be time-consuming and tedious [9, 24].
3.3.2 Annotation. During this annotation process (Fig 3), users can select specic time ranges
from the recording that they want to reect on, either based on the highlighted moments or their
own preferences. After selecting the relevant time ranges, users can provide comments as part of a
self-assessment, reecting on their performance during those specied moments. After that, they
will engage in dialogic feedback focused on the specied moment for further reection.
This time-range selection also aims to make the following feedback more focused on the specic
moments that users wish to analyze. Additionally, the time-range selection also aims to shorten
the input context for the LLM, as LLMs often struggle to comprehend and analyze lengthy input
contexts, especially when relevant information is in the middle [
72
]. Directly inputting the entire
practice interview transcript and asking the LLM to provide feedback can be problematic due to
the potential length of the sessions. By enabling users to select specic time ranges, we aim to
improve the quality, relevance of the feedback, and overall user experience by allowing them to
focus on specic moments in the interview session for targeted feedback.
3.4 Dialogic Feedback
After selecting specic time ranges, users engage in dialogic feedback (Fig 4) with the AI agent
powered by GPT-4. This interaction begins with users asking questions (e.g., ’How can I improve my
answer?’). The AI agent provides feedback by considering the user’s transcript from the specied
time ranges selected by the user or the areas highlighted by the LLM agent. This dialogic feedback
takes the form of a chat-based interaction, allowing users to ask questions and engage in discussion.
A complete example of a dialogic feedback interaction, based on a sample annotated transcript, can
be found in the Appendix C.
This dialogic feedback feature is designed to provide a more interactive way for users to partici-
pate in their feedback-learning process, thereby improving their learning experience [
19
,
28
]. This
approach positions the user as an active learner who agentically seeks and learns from feedback
through question-answering [
19
]. When designing this feature, we consider four main dimensions
of dialogic feedback (§2.3) to enhance learning [114]:
(1)
Emotional and relational support: The LLM is prompted to acknowledge the user’s answer
in a supportive manner by oering constructive feedback to enhance their responses. This
approach aims to boost learners’ self-esteem [
87
] and to avoid negative self-perceptions [
15
].
(2)
Maintenance of the dialogue: Upon receiving feedback, users can ask additional questions to
deepen their understanding of the LLM agent’s feedback. The LLM agent will then further
engage by responding to the user, thereby maintaining a continuous dialogue with the user.
(3)
Opportunities for users to express themselves: During the dialogic feedback process, we
added a revision feature for users to revise their answers by clicking the record button
and revising their responses post-feedback. This is to provide users a chance to immediately
apply their understanding of the feedback into action [19].
(4)
Contribution to individual growth: After users revise their answers, the LLM will provide
feedback again, and users can continue revising repeatedly. This iterative process oers users
multiple opportunities to improve their answers as repeated practices have been signicantly
associated with improved learning up-take and outcomes [25, 61].
Additionally, we designed our back-end prompting framework such that the LLM-agent provides
feedback grounded in communication theories [
112
] and the STAR (Situation, Task, Action, Result)
method [
21
]. The STAR method is one of the most widely used frameworks for answering job
10
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
interview questions [
6
], where candidates break down their responses into distinct components:
the situation they faced, the task at hand, the specic actions they took, and the resulting outcomes.
By oering a step-by-step breakdown with examples, we align with the eective feedback principle
of clarifying what constitutes good performance [
87
]. This approach helps learners understand the
criteria for a well-structured, comprehensive answer.
3.5 Implementation Notes
The Conversate web application is implemented using HTML, CSS, Bootstrap, and TypeScript,
with the VueJS framework [
121
] used for building UI components. The backend uses Python with
Flask framework [
46
]. It incorporates OpenAI’s text-to-speech model "tts-1" [
89
] for converting
interview questions into voice and uses the "whisper-1" speech-to-text model [
89
] for transcribing
user voice recordings into text. For simulating interviews, OpenAI’s GPT-3.5-Turbo [
90
] is used
to generate follow-up questions based on user responses. Additionally, “GPT-4-0613“ [
90
] is used
to provide dialogic feedback. User data is stored in MongoDB [
83
] for textual data and Firebase
cloud storage [
51
] for voice recordings. The production version of the web application is hosted on
Vercel [120].
Fig. 4. Step 3: Dialogic Feedback. The user engages in dialogic feedback focused on annotated moments,
learning from and iteratively refining their understanding based on the AI agent’s guidance. (1) The user
initiates the interaction by asking a question (e.g., "How can I improve this part?"), followed by AI feedback.
(2) The user refines their answer based on the feedback by clicking the microphone buon to record their
revised answer verbally. (3) The user receives further feedback or airmation from the AI. This iterative
process of refining and receiving feedback can continue if the user’s answer is not satisfactory yet or if the
user wants to improve their answer further.
11
GROUP ’25, January 12–15, 2025,
4 USER STUDY
4.1 Participants
Nineteen participants were recruited through various channels, including university announce-
ments, online advertisements (Craigslist), social media platforms (Reddit and Facebook groups), and
word of mouth. During the recruitment process, we initially distributed a screening questionnaire
containing inquiries about participants’ prior experience with interview practice and demographic
information. Subsequently, we selected individuals with prior experience with interview prac-
tice while considering demographic diversity. At the end of the user study, participants were
compensated with a $20 Amazon e-gift card.
Our selected participants were diverse in terms of demographic factors such as age, gender, and
occupation. We recruited 19 participants, including 9 university students and 10 non-students (see
Table 1). All participants were over 18 years old and had experience preparing for job interviews.
We also included participants who had not attended any job interviews yet as long as they had
experience preparing for one.
4.2 Procedure
We conducted a semi-structured interview with the participants. First, we explained the Conversate
tool and showed the participants a demo of the tool. We then asked the participants to engage
with the tool by using the interview simulation, AI-assisted annotation, and dialogic feedback.
Participants interacted with the AI agent by answering four main interview questions, each followed
by a follow-up question, for approximately 10 minutes [
33
]. Subsequently, they used the AI-assisted
annotation and dialogic feedback features within a 15-minute timeframe while using our system.
We then followed up with questions to gather insights into their perception and experience with
Table 1. Participant Demographics
ID Gender Age Occupation Interview Count Practice Frequency
P1 Male 26 PhD student 1-5 Occasionally
P2 Other 25 Receptionist 1-5 Occasionally
P3 Male 30 Social worker 6-10 At least once a week
P4 Male 25 Clerk at a store 1-5 Everyday
P5 Male 22 Undergraduate student 1-5 At least once a week
P6 Female 28 Undergraduate student 0 At least once a week
P7 Male 25 Undergraduate student 1-5 Everyday
P8 Male 21 Undergraduate student 1-5 At least once a week
P9 Male 30 Master student >10 Occasionally
P10 Female 19 Undergraduate student 1-5 At least once a week
P11 Female 25 Unemployed >10 Everyday
P12 Female 23 Paralegal at law company 6-10 Everyday
P13 Male 20 Undergraduate student 1-5 Everyday
P14 Female 34 Unemployed >10 Occasionally
P15 Female 31 Unemployed 6-10 Occasionally
P16 Female 21 Undergraduate student 1-5 Occasionally
P17 Female 33 Operations specialist >10 Everyday
P18 Male 40 Government consultant >10 Everyday
P19 Male 23 Unemployed 1-5 Occasionally
12
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
our system. During this session, we also asked questions about their prior experiences in preparing
for and practicing job interviews and using AI assistance for interview practice.
4.3 Data Collection
Audio-Video Recordings & Researcher Notes: Participants were recorded when using the tool. The
researchers additionally made observational notes during the sessions.
User Input: While participants were using the tool, all their user inputs, such as the interaction
between participants and the AI during the practice, the moments they selected during the anno-
tation, and their interactions with the AI in dialogic feedback, were collected and stored in the
database.
Interview: After participants completed using the tool, the researchers conducted an audio-
recorded semi-structured interview with them. The interview included semi-structured questions
and additional questions pertaining to the observations made by the researchers during the par-
ticipants’ use of the tool. We also asked participants to recall notable moments when interacting
with our system. Through this approach, we were able to identify specic interactions that were
mentioned during the interview.
4.4 Data Analysis
The interview data was transcribed using Zoom, and then a researcher manually veried the
accuracy of the transcript against the original recording. We used a thematic analysis [
22
] approach
to analyze the interviews. The rst author conducted an initial round of open coding [
31
] on the
interview transcripts. Subsequently, this researcher presented the open codes to the three other
researchers for discussion and nalization. Afterward, the researcher used the nalized list of codes
to code the interviews.
We also complemented our qualitative analysis by observing user interactions with the AI based
on the user interaction data logged on our server. Specically, we noted how participants used the
AI feedback, including the questions they asked the AI, the AI’s responses, and the revisions made
to the users’ answers.
5 FINDINGS
5.1 Perceived Benefits of Interview Practice Simulation With AI
5.1.1 Fostering Improvement Through Repeated Practice in a Low-Stakes Environment. One major
benet of practicing with an AI as opposed to a human partner was the aordance to make mistakes
in a low-stakes environment for repeated practice. For example, according to P17, Conversate
allows you to “make your mistakes behind the scenes." Additionally, for many like P6, blundering in
front of a computer as opposed to a human counterpart was less nerve-wracking. Similarly, those
with high interview anxiety, such as P7, who worries about “losing my cool in front of an actual
person, perceived AI as a “non-judgmental machine.
As such, the absence of a critical human observer is associated with reduced fear of judgment
among participants (6 | P1-P2, P6-P8, P17). In return, this allows participants to concentrate solely
on formulating their responses rather than managing interpersonal reactions or expectations (3 |
P2, P6, P7). P6 emphasized that practicing with AI helped her focus more on "how I thought about
those words" during interview practice:
"When I’m practicing with people, I’m just thinking about the words I said rather than how
I thought about saying those words. [But] this tool helps me think about how I thought
about those words in the rst place" - P6
13
GROUP ’25, January 12–15, 2025,
By not "feeling nervous," P6 reported that she could delve deeper into her thought process to come
up with better responses while interacting with the system. Similarly, according to P7, practicing
with something non-judgmental allows one to "prepare for something judgmental" (P7), namely
the real job interview itself.
Furthermore, the low-stakes nature of an AI-facilitated practice interview encouraged participants
to engage in repeated practice, easing their fears of judgment (3 | P2, P6, P7). P10 perceived such
opportunity for repeated practice on specic improvement areas as one of the primary advantages
of Conversate, in contrast to human-led sessions where time constraints and the one-take nature
of the practice sessions make revision interview responses in-situ nearly impossible.
5.1.2 Perception of AI as a More Professional Interviewer Than Peers. Even though participants felt
less pressure when practicing with AI, this does not negate their perception that AI can provide a
more formal and serious setting for interview practice:
"Sometimes I feel like with friends, it can be a little bit more casual, and this kind of [AI]
simulates a more serious environment. I actually prefer this a little bit more. [...] this
experience itself was, I think, a good practice for interviews." - P10
This aligns with how P5 feels when practicing with friends, which is he is "very comfortable with
[them]," and therefore does not provide a "professional setting." By contrast, some participants (P5,
P10) were impressed by how professional our LLM agent came across as an interview facilitator.
P5 attributed Conversate’s professional demeanor as a mock interviewer to kinds of questions it
asked and how the questions were asked, noting how the system generated "very detailed questions"
similar to those asked in real-life interviews for the position P5 was currently seeking.
5.1.3 Contextually Relevant Follow-up estions Encourage Deeper Thinking. A key feature of
Conversate that enhances the interview practice is the contextually relevant follow-up questions
that are generated from the LLM. Overall, participants mentioned that the follow-up questions
were helpful (19 | P1-P19) and made the practice more engaging (8 | P1-P2, P4-P7, P14-P16) and
felt more like a real interview (13 | P1-P2, P4-P11, P14-P15, P19). This engagement also helped
participants think more deeply about their responses (7 | P1, P4, P6, P9, P13, P15, P17). For instance,
P1 felt the follow-ups were able to "test dierent angles" and prompted him to elaborate more:
"I think the follow-up questions are pretty good because it’s keeping the conversation
engaging, and I think the system is asking really good [follow-up] questions based on
what I have said, testing dierent angles as well as picking context from my answer” - P1
Based on the sample illustration in Fig 5, we observed that the generated follow-up questions
from the LLM contained repeated keywords from the user’s response. Those repeated keywords that
resonated in the conversation made participants perceive that the follow-up questions were contex-
tually relevant. Additionally, by using follow-up questions, the AI interviewer helped participants
like P9 to further elaborate on their initial response. P9 felt the follow-up question "delves further
[into my answer] and creates an opportunity for me to explain [more]". Additionally, participants
found the follow-up questions were valuable, especially when they sometimes struggled to provide
good answers, particularly when feeling nervous:
"I know that sometimes, even now, I can get nervous and blank out. So the [follow-up]
questions help me think more" - P15
For some participants, the follow-up questions posed by our LLM agent nudged them to "think
on the spot":
14
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
Fig. 5. Sample Conversation During Interview Practice (P9): The text highlighted in orange indicates repeated
keywords that made participants feel the follow-up questions were contextually relevant. These follow-up
questions encouraged participants to delve deeper into their responses.
"The follow-up questions, in between the main questions, would take part of my answer
and then [build on it]. I think that’s great practice because the follow-up questions are
not usually the kind of things you can nd online easily. It felt like it really helped me
practice thinking on the spot [about] my real experiences." - P6
Some participants expressed that they often felt unprepared when confronted with a question they
have not anticipated or practiced in actual interviews (3 | P5, P8, P19). However, for participants
like P6 and P5, the system’s follow-up questions encouraged them to think on the spot instead of
reciting "responses from your memory" (P5). Furthermore, P13 mentioned that sometimes he does
not receive follow-up questions when practicing with human partners.
5.2 Perceived Benefits of AI-Assisted Annotation
5.2.1 Reducing Cognitive Burden by Prioritizing Focus Areas. Participants found the AI-Highlighted
Hints feature particularly helpful in reducing their cognitive burden by prioritizing potential areas
that needed improvement ( 6 | P1, P6, P8, P11, P12, P16). According to P1, this feature was helpful
because reviewing his mock interview session recordings was often overwhelming:
"I think [this feature] is doing a pretty good job because when I clicked on the marked
orange bar, it took me to the part of the conversation where I didn’t answer very clearly. I
think the suggested part was helpful because sometimes it [can be] overwhelming to read
[through] a lot of transcript[s]." - P1
Similarly, P6 appreciated not having to "hear the entire recording" as she typically did in her
own mock interviews with human partners, nding her usual method "more tedious." Overall,
participants agreed that the AI-suggested hints allowed them to allocate their learning eorts more
eciently, sparing them from having to review from the very beginning.
Additionally, AI-Highlighted Hints helped participants to identify blindspots , as P17 explained:
15
GROUP ’25, January 12–15, 2025,
"I don’t think I would think too heavily about those questions, like with the question ’tell
us about yourself. I gave a very short answer, and I don’t think I would have gone back to
that specic part unless the orange bar popped up to indicate it." - P17
Previously, P17 did not consider giving a short answer to "Tell me about yourself" as problematic.
Hence, the visual cue prompted him to review that part of the conversation. Some participants
recalled reviewing and identifying weaknesses in mock interview practice as challenging, especially
when practicing alone or with peers. According to P8, people usually only remember "the beginning
and the end" of the practice session, leading them to miss other potential areas of improvement.
5.2.2 Mitigating Excessive Self-Criticism. Beyond the practical benets, AI highlighting areas for
improvement can also mitigate excessive self-criticism during the reection. Our study found that
some participants (3 | P12, P15, P17) were overly critical of themselves during the reection and
thought their answers were mostly inadequate. P12 mentioned:
“I agree with the AI’s suggestions about the areas I would need to practice more. [...]
Honestly, I probably would have marked more [weak] areas for me. So I think [the AI] is
being generous. - P12
Similarly, P17 felt the "whole bar should be orange," indicating she thought her entire interview
was poor. These responses suggest the participants were overly harsh on themselves and unable to
evaluate their performance objectively. For some, seeing that the AI highlighted only a few areas
helped mitigate their tendencies to be overly self-critical (3 | P12, P15, P17)
Interestingly, participants who received numerous highlighted areas did not feel discouraged as
long as it helped them improve. As P16 mentioned, despite having many areas marked:
"I don’t feel necessarily disappointed because it’s just things I need to work on, so I don’t
feel discouraged at all" - P16
The sense of acceptance when receiving suggestions from AI can be partially explained by how
participants perceive AI as a non-judgmental agent, providing a safe space to make mistakes in a
low-stakes environment (§5.1.1). This also relates to P6’s view that AI analyzes answers objectively
without any emotional judgment, thus removing the "social fatigue" of receiving feedback from a
person:
"When people give feedback, It’s often quite weighty, almost emotional, you know? And
there’s none of that here. They’re just analyzing what you said objectively. So I basically
don’t have the social fatigue of listening to how someone else feels about my answer" - P6
Some hypothesized that receiving extensive highlighted areas could potentially discourage them,
as P14 stated:
"If [the AI] says the whole thing was bad, then there’s going to be less motivation [for me]
to click on every single thing and improve upon it again [...] Even though I think that the
whole point is to improve. It is humbling cause we all want to be perfect the rst time
around [...] I think it needs to have that balance there" - P14
According to P14, feeling discouraged may stem from wanting to be "perfect" initially. She perceived
that the issue lies not with the AI’s suggestions but rather with her internal expectations for awless
performance. Nonetheless, P14 mentioned that there should be a "balance" when the AI provides
feedback. If all responses were highlighted as weak areas, some participants might feel discouraged
and unmotivated to address everything. Moreover, this can also negate the practical benet of this
feature to help users focus on specic moments (§5.2.1) if all moments were marked as needing
improvement.
16
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
5.3 Perceived Benefits of Dialogic Feedback
5.3.1 Dialogic Feedback Promotes Personalized, Continuous Learning. A major benet of using
dialogic feedback with AI is enabling a personalized learning experience. The ability to "ask your
own question" and receive "specic feedback" (P10) made the feedback feel like a "personalized
give-and-take dialogue" (P18). According to P6, this dialogue approach allowed her to ask specic
questions, which helped overcome "tiny little blockers" hindering her understanding, something
generalized feedback alone might not address:
"I think the [dialogue format] is better because if I have any specic or strange questions,
the AI can [...] answer that. Usually, I have small little blocks in my brain that [lead to]
weird questions. The AI is good at answering those in dialogue rather than when you get
generalized feedback." - P6.
This dialogue approach also helps participants keep learning (3 | P13, P17, P18). For instance, P17
mentioned that getting feedback through dialogue allowed her to "continue to ask questions [to]
keep advancing your response." Additionally, P18 appreciated this approach as " [it] showed me
how to improve, and then gave me a chance to do it, and then it gave me more feedback on that."
Participants perceived that continuous back-and-forth interaction allowed them to engage with the
feedback actively, seek to understand areas for improvement, and promote continuous learning.
5.3.2 Dialogic Feedback Helps Users Learn Without Feeling Judged. The participants’ willingness to
learn continuously was also attributed to their perception that dialogic feedback allowed them to
learn without feeling judged. As P8 stated, engaging in dialogue with the AI "felt less like being
evaluated and more like helping you grow." Participants preferred this dialogue approach over
being graded with percentages, as P2 mentioned:
"[...] Through the dialogue, I am able to get feedback if [I am] doing well as opposed
to [getting] maybe a percentage. [If] I’m graded in terms of percentage, that’s not good
[because] I wouldn’t know where I was wrong right [or] where I need to improve." - P2
As such, this interactive nature of dialogic feedback allowed participants to ask in-depth questions
multiple times, helping them to concentrate on learning and gaining detailed insights into what
was right, wrong, or needed improvement, without the pressure of feeling assessed.
5.3.3 Dialogic Feedback Enables Users to Express Disagreement. While most participants agreed
with the AI’s feedback, a few (4 | P6, P10, P11, P14) disagreed with certain parts. However, the
dialogic format enabled them to express their disagreement. For instance, P6 disagreed when the AI
suggested using the STAR method to answer "What is your greatest strength?" (Fig. 6). P6 reported
that the AI "insisted on using the STAR method in situations where it should not have."
Similarly, P10 expressed disagreement through dialogue when receiving incorrect feedback due
to a transcription error during the interview recording. She felt clarifying this made the ensuing
feedback more helpful:
"I think if I weren’t able to say that [disagreement], then I feel like the feedback [that]
I would have gotten after that wouldn’t have been as helpful. So I think that’s a good
feature to have" - P10
These examples from P6 and P10 illustrate how dialogic feedback allows participants to express
disagreement, receive more relevant feedback, and feel satised with it. On the other hand, P11,
who mentioned her disagreement with certain feedback after using the tool, did not express her
disagreement in the dialogue. This was because she was unsure if that was an option. She perceived
that feedback is usually only a one-way interaction and stated:
17
GROUP ’25, January 12–15, 2025,
Fig. 6. Expressing Disagreement in Dialogic Feedback (P6)
"I didn’t even know that was an option for me to type in ’I don’t agree with this’ [...]
having a conversation or dialogue [like that] isn’t something that I thought would be an
option" - P11.
Therefore, the concept of dialogic feedback gives P11 a new perspective that challenges her
thoughts on the usual way of receiving one-way feedback.
5.3.4 Revision in Dialogic Feedback Helps to Reinforce Learning. Participants especially valued the
ability to revise their answers while receiving dialogic feedback through the system’s revision
feature, as it allowed them to apply the feedback discussion immediately:
“I actually do like the feature because it’s like a do-over and you can get a critique on your
[revision]. So, you can keep practicing answering the same question over and over until
you can get a perfect [answer]. - P7
The revision feature facilitated a continuous feedback loop, enabling participants to iteratively
improve their responses until achieving a satisfactory outcome.
Participants perceived revision as essential in reinforcing the feedback, ensuring it was under-
stood and internalized. For instance, P18 remarked that the revision feature "allows you to actually
apply the feedback and see if you’ve actually learned it or not." Moreover, participants felt the
revision process enhanced retention of the feedback itself. P10 contrasted practicing with dialogic
feedback to peer practice sessions, noting that:
"Sometimes I’ll get feedback [from friends], but we won’t necessarily go back and revise
the [answer]. So I don’t always remember [their] feedback. But having the opportunity to
kind of go back and re-answer the question helps me to remember that specic advice that
I was given." - P10
18
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
Furthermore, participants also felt that revision allows them to end the practice on a positive
note. Ending practice attempts on a positive note after successful revisions nurtured participants’
condence, as mentioned by P12: "You can end your practice on a positive note...you’re not thinking
badly about yourself." Receiving armation from the AI after an eective revision further boosted
self-assurance, as P6 remarked: "Getting armation was great [when] I wasn’t 100% sure."
5.4 Perceived Challenges in Using AI for Interview Simulation
5.4.1 Low-Stakes Environment May Not Fully Reflect Real Interviews. While 17 out of 19 participants
felt practicing with Conversate would adequately prepare them for future interviews, two (P9, P19)
argued that supplemental practice with humans was still necessary. This stemmed from practicing
in a low-stakes environment with "less pressure" (P19) and "no emotion from the AI" (P9).
For instance, while P19 found the tool helpful for initial practice and preparation, he preferred
higher-pressure practice, stating in real life there were many "environmental factors and inner
factors" that caused anxiety during actual interviews, even after thorough preparation. Hence,
P19 preferred what he described as a "perfect pressure interview" to handle the anticipated stress
better. Additionally, P19 noted that when the AI asked follow-up questions, he felt it was too
"positive or receptive to every answer," unlike human interviewers who sometimes move on
without such positive transitions. To address these concerns, P19 suggested adding variability to the
AI interviewer’s tone. He also suggested involving a human-in-the-loop approach during interview
practice.
"Maybe there can be a sort of like option where instead of the bot speaking out the question.
I can show this [tool] to my friend, and he would act as the interview [...] and follow the
script on a screen." - P19
According to P19, this approach would incorporate a human element while still leveraging the
AI’s generated questions and feedback capabilities, potentially replicating the pressure of a real
interview better. Moreover, he felt this would enhance the experience when practicing with peers,
making his peer act as an "expert interviewer" comparable to career counselors.
5.5 Perceived Challenges in Using LLM-Based Dialogic Feedback
5.5.1 Sycophantic Behavior in LLM During Dialogic Feedback. We observed that when using dialogic
feedback, participants were able to express disagreement with the AI (§5.3.3), and as a result, the
LLM sometimes exhibited its inherent sycophantic behavior. For instance, P14 mentioned:
"I thought that it makes sense for you to just say [if] I disagree. But then it just kind of
ended up agreeing with me.[...] But it’s a good option. " - P14
While participants generally perceived the ability to express disagreement with the AI’s feedback to
be valuable (§5.3.3), if the AI were to simply always agree with them it could leave them confused
as implied by P14 above. Instead of simply agreeing with her, P14 expected the AI to push back
and explain "what triggered this [feedback]," providing more detailed reasoning for the feedback
provided.
5.5.2 Never-Ending Revision May Trigger Self-Frustration. Two users (P2, P11) felt disappointed
when they continued to receive feedback prompting further improvements, even after multiple
revisions during dialogic feedback. For instance, P2 felt annoyed with themselves for not providing
satisfactory answers even after multiple revisions:
“I feel disappointed sometimes when I know that I’ve not quite improved. However, I feel
happy because at least I’ve got a [feedback], because if I’m not really doing well, that’s
the truth, and I need to improve. [...] But sometimes I’m so annoyed with myself, not the
19
GROUP ’25, January 12–15, 2025,
AI but myself. Maybe because I didn’t didn’t get something right, or I didn’t improve as
much as I wished to. - P2
As mentioned above, although P2 appreciated the AI feedback, the cycle of revisions triggered
self-directed frustration due to the failure to meet expectations. Additionally, P11 noted this behavior
may cause a shift from learning the core concepts to simply seeking AI approval: "[I would think
about] what would get the AI to sign o, not what is the best answer to the interview question."
6 DISCUSSIONS
6.1 Understanding the Role of LLM in Supporting Reflective Learning in Interview
Practice
Drawing upon the framework proposed by Slovak et al. [
111
], reective learning requires three key
components: explicit,social, and personal (§2.2). Since the personal component relies on learners’
own’ motivation, it is harder to be aected by technology. However, prior work suggests this can
be compensated by supporting the explicit and social component [
111
]. Therefore, in this section,
we discussed the role of LLM in supporting the explicit and social components in reective learning
for interview practice. In each component, we also discussed the key insights why LLM is eective
in supporting the component and what are some of the considerations.
6.1.1 LLM to support explicit component. Explicit component refers to reshaping experiences
for learners through practice simulation [
111
]. Based on our ndings, LLM is able to support
the explicit component by providing realistic simulation by acting as a mock interviewer that
is able to generate adaptive follow-up questions (§5.1.3). Participants in our study often found
the LLM’s follow-up questions contextually relevant, largely due to the repetition of keywords
throughout the conversation (§5.1.3; Fig. 5). Interestingly, the LLM’s natural responses, which
echoed some of the keywords from users’ answers, were similar to prior approaches using keyword
extraction methods [
59
]. However, unlike traditional keyword extraction methods, the LLM’s ability
to generate contextually nuanced responses enables a more dynamic interaction [16].
On the other hand, a remaining challenge is that the level of realism created by the LLM’s
follow-up questions may fall short for some users because the AI lacks emotional feedback (§5.4.1).
While enhancing the realism of interview practice is important, the experience must strike a careful
balance to ensure reective learning. If the simulation is "too real," it can become overwhelming and
detract from the learning focus [
111
]. This highlights the need to balance low-stakes AI practice
with preparation for the high-pressure environment of real-world interviews. Future work in this
area could explore studies aimed at balancing emotional responses in LLM-mediated interview
training systems.
6.1.2 LLM to support social component. The social component in reective learning refers to
providing support in the learning process [
111
]. In Conversate, LLM scaolds the role of a
mentor to support the learning process by providing evaluation and feedback. Our study found that
participants appreciated the LLM-based evaluation. Moreover, an important factor observed during
the user study was the trust participants placed in the LLM’s evaluation, viewing its judgment
as ’objective’ (§5.2.2). While overly trusting an LLM may pose risks, as we discussed further in
section 6.4.2, trust is crucial for making feedback eective for learning [
12
]. Hence, eorts to
enhance user trust in the LLM are crucial. One factor that can build trust is explainability [
45
].
Although we did not incorporate specic components in the AI-assisted annotation to provide
explainability, the dialogic feedback feature fullls this role. This feature allows users to ask specic
explanations about feedback through dialogue (§5.3.1), thereby providing explainability. As such,
20
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
LLM-based dialogic feedback is essential for ensuring explainability, which can complement the
LLM’s evaluation and contribute to user trust.
Additionally, the dialogic feedback feature also allows user to revise their answer and get
additional feedback on that. This feature supports reective learning since it helps the user to do
active experimentation and reect on their action [
111
]. As opposed to prior works on reective
learning systems [
9
,
33
], they did not focus on addressing this aspect. A key property of LLMs that
facilitates this revision feature is their ability to provide personalized feedback by adapting to the
context of the conversation [
11
,
16
]. This revision feature in dialogic feedback also allows users to
reinforce their learning (§5.3.4), eectively completing the feedback loop [19].
6.2 Leveraging LLMs for Dialogic Feedback
Our work demonstrates several benets of dialogic feedback facilitated by large language models
(LLMs), such as promoting personalized and continuous learning (§5.3.1), which supports previ-
ous ndings on the value of conversational AI agents [
107
]. Additionally, compared to one-way
feedback, dialogic feedback oers several advantages. Our nding shows that dialogic feedback
reduces the feelings of judgment among users compared to one-way feedback (§5.3.2). While the
participants already viewed AI as non-judgmental, dialogic feedback further reduces the feeling of
judgment. This is because the user is not simply being graded but can engage in a two-way conver-
sation to learn more details on how to improve. Feeling less judged when receiving feedback can
potentially increase the acceptance and internalization of the feedback [
125
]. This is related to prior
research that highlights that for feedback to be eective, users should be receptive and open to the
feedback [
125
]. In contrast, one-way feedback can discourage users, especially when presented as
an assessment. For instance, a study by Bridgeman and Giraldez-Hayes [
23
] on using AI-enhanced
video review software for reective learning in coaching sessions shows that participants can be
overly nervous due to being assessed ("I was conscious of ... being assessed by the criteria").
We have highlighted the benets of LLM-based dialogic feedback in our ndings and discussed
its potential to support reective learning in §6.1.2. Prior works have also conceptualized some
general benets of dialogic feedback [
19
,
114
]. The remaining question is: what are the specic
characteristics of ’LLM-based’ dialogic feedback, and what are some of the implications of LLM-
based dialogic feedback? Hence, in this section, we discussed 1) How LLM-based dialogic feedback
compares with dialogic feedback from humans, 2) The challenges of using LLMs for dialogic
feedback, and 3) The implications of LLM-based dialogic feedback."
6.2.1 Comparing LLM-Based Dialogic Feedback With Dialogic Feedback From Humans. Dialogic
feedback facilitated by large language models (LLMs) and by humans has distinct advantages and
trade-os. With human feedback providers, there is no guarantee of positivity or constructive
framing, as individuals may be inuenced by personal biases or emotions [
82
]. Hence, negative
feedback can discourage users [
33
]. LLMs, conversely, can be explicitly prompted to provide
feedback in a positive and supportive manner as we did in Conversate (§3.4). Participants valued
the positive feedback that Conversate gave them, as P11 mentioned:
"If it [were] just negative ..., I would not go back to that, knowing that it would just tear
me down or rip apart my responses. So, being able to [get] positive feedback ... is super
imperative." - P11
However, excessive or unrealistic positivity can make the feedback seem untrustworthy [
18
,
48
].
Moreover, prior studies show that AI agents can be perceived as non-judgmental [
101
]. This
is also related to our nding that users perceive AI as a non-judgmental agent and that it helps
them practice without fear of judgment (§5.3.1). Hence, receiving feedback from AI can alleviate
the emotional burden that people may face when receiving feedback from humans, as mentioned
21
GROUP ’25, January 12–15, 2025,
by P6: "When people give feedback, It’s often quite weighty, almost emotional, ... And there’s none of
that here.".
While LLM-based dialogic feedback oers consistency, scalability, and the potential for positive
framing, it may lack the nuanced understanding that human feedback can provide [
85
]. A study by
Chandra et al. [
29
] shows that human-like competencies in conversational AI agents, specically
cognitive, relational, and emotional competencies, play an important role in fostering user engage-
ment. Hence, the ideal approach may lie in striking a balance, designing dialogic feedback systems
that leverage the consistent nature of AI while incorporating mechanisms to infuse human-like
nuance and emotional intelligence, fostering a feedback experience that combines the best of both
worlds.
6.2.2 Challenges of Using LLMs for Dialogic Feedback. While LLM-based dialogic feedback has its
potential, several challenges exist. For example, when users express disagreement during the dialogic
feedback (§5.3.3), the LLM easily agrees with users’ disagreement. This kind of phenomenon is
related to the sycophantic behavior of LLM [
56
,
95
] as it tends to provide users with their preferred
answer. Hence, this can be challenging in the context of dialogic feedback, as when the user
expresses disagreement in the dialogue, the LLM will easily agree with the user and may not
maintain its objectivity (§5.5.1). Hence, based on our study, several users reported that behavior
could cause confusion. Several works have aimed to reduce sycophancy in LLM, such as ne-
tuning LLM on synthetic data [
123
] or using activation steering [
104
]. Based on our ndings
(§5.5.1), one consideration to reduce the sycophancy during the dialogic feedback is to mention
the detailed reasoning behind its initial response. This approach can make the feedback discussion
more substantive, providing clear feedback rather than merely acquiescing to the user’s stance
without justication.
6.2.3 Implications of LLM-based Dialogic Feedback. While our work explores the potential of
LLM-based dialogic feedback in the context of interview practice, we envision that it can also
be applied to various other systems, particularly those facilitating learning through AI feedback.
Traditional AI feedback mechanisms often rely on one-way communication [
35
,
84
]. However, the
study by Escalante et al. [
42
] demonstrated that half of the students still preferred human tutor
feedback over one-way AI-generated feedback for learning due to its interactive nature and ability
to facilitate discussions. In this case, LLM-based dialogic feedback can potentially make AI feedback
more comparable to human tutors by engaging users in discussions. Therefore, the integration of
LLM-based dialogic feedback represents a shift towards more dynamic and human-like interactions
between users and AI systems, especially in a learning context.
Furthermore, integrating LLM-based dialogic feedback into a learning system is not simply
about creating an LLM-based chatbot for users to interact with. To ensure an eective learning
process, several considerations must be followed based on dialogic feedback theory [
114
] (§2.3).
There are four components that need to be considered [
114
]: 1) emotional and relational support; 2)
maintenance of the dialogue; 3) opportunities for users to express themselves; and 4) contribution
to individual growth. Therefore, an interaction that considers these four components is crucial for
eective LLM-based dialogic feedback.
6.3 Reshaping Learners as Active Agents in Human-AI Collaborative Learning
Human-AI collaboration has received attention from HCI researchers, especially within GROUP
communities [
49
,
109
]. The concept of human-AI collaboration in learning involves the collaboration
between human learners and AI systems to better achieve learning outcomes [
63
]. To support
learning activities, there are various forms of human-AI collaboration, such as personalized learning
platforms [
32
] and intelligent tutoring systems [
113
]. Conversate can be considered as a system
22
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
supporting human-AI collaboration in learning, as it facilitates collaboration between human
learners and AI to help practice and improve interview skills.
An often overlooked aspect of human-AI collaborative learning systems is the role of humans as
active agents in the learning process [
4
]. Having a learner as an active agent is important to make
the learning process more adaptive and personalized [
63
,
91
]. Conversate incorporates AI-Assisted
Annotation and Dialogic Feedback that put learners as an active agent. For instance, in AI-assisted
annotation, while AI highlights areas for improvement, learners engage in self-assessment of their
performance. Our study found that this self-reection process enables users to actively assess their
performance (3 | P13, P15, P17). In addition to that, learners also act as active agents in the dialogic
feedback process. Using the feature, users can actively express their specic needs, which results in
a more personalized learning experience (§5.3.1).
These examples demonstrate that positioning humans as active agents can improve the learning
process in human-AI collaboration. This aligns with the principle that eective human-AI collab-
oration should leverage the strengths of both humans and AI, with AI augmenting rather than
replacing the learner’s role [
127
]. Furthermore, this approach connects to reective learning theory,
where the ’personal’ component, which relies on the learner’s active engagement in the learning
process, plays a crucial role in enhancing the learning process [111].
However, putting humans as active agents in the learning process can present challenges. Since
control over the learning process relies heavily on the learner, maintaining the learner’s motivation
to engage consistently in the process can be challenging [
64
]. For instance, during our user study,
although participants did not mention this explicitly, we observed that during the annotation process,
a few participants only wrote short self-assessments instead of more detailed self-assessments. We
hypothesize that this phenomenon is partially caused by the learner’s lack of motivation to engage
fully in the learning process [
70
]. To address this issue, we recommend that AI systems should be
designed to detect when learners start to disengage from the learning process and nudge users to
re-engage [
105
]. This could involve prompts for more detailed responses or adaptive feedback based
on the user’s level of engagement. Future work could focus on nding the right balance between
having learners as active agents and determining when the AI should take a more proactive role in
engaging the learners in the process.
6.4 Socio-Technical Implication for LLM-assisted Interview Practice
6.4.1 Postive Implications. There are some positive implications, such as increased accessibility
to interview practice, as users can engage in practice sessions at any time without the need for
a human partner. Additionally, as users perceived the evaluation from LLM as non-judgmental,
this potentially lowers the barrier for repeated practice, allowing users to focus on rening their
interview responses. Furthermore, incorporating LLM-mediated practice alongside traditional
interview preparation can create a more holistic preparation strategy. For instance, the scalability
of LLM-mediated practice allows institutions to oer personalized interview training to a larger
audience without the constraints of scheduling or limited access to career counselors. Additionally,
the data collected from AI-mediated practice sessions can provide valuable insights into common
interview challenges and trends, informing the development of better training materials and
strategies [65].
While our study shows user appreciation for LLM-based agents in interview practice, we can-
not conclude that users would prefer LLM-based agents over human interviewers in actual job
interviews. For example, previous research on AI for interview practice has shown similar user ap-
preciation [
38
], but studies on AI in real job interviews reveal user uncertainty and disengagement
[
71
]. This disparity highlights the dierence between user perspectives on AI for practice versus ac-
tual interviews. While our ndings suggest that LLMs provides human-like interaction, potentially
23
GROUP ’25, January 12–15, 2025,
increasing engagement, they do not address the uncertainty users feel when being assessed by AI
in real interviews [
71
]. In our case, when using AI for interview practice, we have dialogic feedback
where users can ask for further explanation about their assessment, thus mitigating uncertainty.
However, such feedback is typically unavailable in actual interviews [
14
]. Therefore, further re-
search is necessary to understand users’ perspectives on LLM-based interviewers conducting real
job interviews.
6.4.2 Negative Implications. However, there are also negative implications to consider. While
users in our study perceived the suggestions from the LLM positively (§5.2; §5.3), users were
not necessarily trained to recognize best interview practices. Hence, it is possible that LLM may
produce unhelpful suggestions that sound valid, as LLM is known to create seemingly credible
misinformation [
92
]. Lastly, since we are using LLM for our system, there is a risk of inherent bias
in the model [
66
]. This risk may be amplied in our context, as some participants view LLM-based
systems as "objective" and trust their judgment over their own (§5.2.2). Therefore, evaluation of the
LLM output in AI-mediated interview practice is important to consider in future work.
Assesment Plan. Considering the potential risks mentioned above that may raised when lever-
aging LLM for interview practice, future work can consider an assessment regarding the model’s
reliability. Currently, the most widely accepted form of assessment for LLM-based applications in
research is human evaluation [
5
,
119
] despite some limitations [
36
]. Hence, one possible way of
assessing our LLM-based interview practice is using external evaluators, for example, involving
multiple expert interviewers. For the assessment, we can test the model using the MIT Interview
Dataset [55, 84], which consists of 138 recorded mock interviews, and the Asynchronous Written
Interview dataset [
99
] that consists of 1000 triplets of an interview question, response, and a
follow-up.
Given that we are using the LLM in multiple aspects of our application, the assessment can
consider three specic use cases: 1) using the LLM to generate follow-up questions, 2) using the
LLM to evaluate user answers, and 3) using the LLM to provide feedback on user answers. The LLM
can be evaluated based on several criteria [
30
]. For instance, prior works on AI-mediated interview
practice suggest evaluating the model based on relevance [
99
] (i.e., how well the model’s generated
follow-up questions align with the context of the conversation and the relevance of the feedback)
and accuracy [
35
] (i.e., the model’s accuracy in evaluating users’ answers). Another criterion to
consider is the identication of biases [
30
]. Multiple expert interviewers could assess the model
using these criteria for each use case. Additionally, we can report the Inter-Annotator Agreement
score with condence intervals to ensure the reliability of our evaluation [119].
7 LIMITATIONS
It is important to note that our study was qualitative and exploratory in nature. We aimed to
gain insights into users’ experiences and perceptions of Conversate rather than quantitatively
measuring the outcomes or ecacy of the system for interview practice. Hence, future work could
conduct a quantitative experiment to evaluate Conversate to improve users’ interview skills.
Another limitation of our study is that we used a simple prompt engineering approach, as our
focus was more on early exploration of how users perceive such an LLM-based system. While
participants appreciated the follow-up questions generated by the LLM and the feedback provided,
future work could explore more robust prompting frameworks or incorporate additional ne-tuning
to improve model performance.
24
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
8 CONCLUSION
In this paper, we introduced Conversate, an AI-mediated interview practice system that facilitates
reective learning through interview simulation, AI-assisted annotation, and dialogic feedback.
Through a qualitative study with 19 participants, we evaluated Conversate for interview practice
and explored the benets and challenges of leveraging large language models (LLMs) for interview
simulation and dialogic feedback. Our study showed that participants valued the adaptive follow-up
questions from LLMs in enhancing the realism of interview simulations and encouraging them to
think deeper. While interactive simulations facilitated by LLMs were useful in helping participants
focus on preparing their answers, we also found some drawbacks, such as the inability to replicate
the same pressure as a real interview. Participants also found the LLM-based dialogic feedback
benecial, as it promoted personalized and continuous learning, reduced feelings of judgment,
and allowed them to express disagreement. However, we found the challenge when leveraging
LLMs for dialogic feedback, which is the inherent sycophantic behavior of LLMs. Moreover, our
ndings on dialogic feedback suggest the potential for HCI researchers to reconsider how we
design feedback-based learning systems for future work. This provides a new perspective on how
AI feedback should involve two-way interaction instead of the usual one-way approach.
REFERENCES
[1] [n. d.]. Final Round AI. https://www.nalroundai.com//. Accessed: April 21, 2024.
[2] [n. d.]. Meta AI. https://www.meta.ai/. Accessed: April 21, 2024.
[3] [n. d.]. Yoodli AI. https://app.yoodli.ai/use-cases/interview-preparation//. Accessed: April 21, 2024.
[4]
Babak Abedin, Christian Meske, Iris Junglas, Fethi Rabhi, and Hamid R Motahari-Nezhad. 2022. Designing and
managing human-AI interactions. Information Systems Frontiers 24, 3 (2022), 691–697.
[5]
Bhashithe Abeysinghe and Ruhan Circi. 2024. The Challenges of Evaluating LLM Applications: An Analysis of
Automated, Human, and LLM-Based Approaches. arXiv:2406.03339 [cs.CL] https://arxiv.org/abs/2406.03339
[6]
Shamim Akhtar. 2012. What to Expect from Across the Table: 25 Most Frequently Asked General Interview Questions.
The IUP Journal of Soft Skills 6, 1 (2012), 51–63.
[7]
Keith Anderson, Elisabeth André, T. Baur, Sara Bernardini, M. Chollet, E. Chryssadou, I. Damian, C. Ennis, A. Egges,
P. Gebhard, H. Jones, M. Ochs, C. Pelachaud, Kaśka Porayska-Pomsta, P. Rizzo, and Nicolas Sabouret. 2013. The
TARDIS Framework: Intelligent Virtual Agents for Social Coaching in Job Interviews. In Advances in Computer
Entertainment, Dennis Reidsma, Haruhiro Katayose, and Anton Nijholt (Eds.). Springer International Publishing,
Cham, 476–491.
[8]
Khairani Dian Anisa, Lailatun Nurul Aniq, et al
.
2020. Oral Peer Feedback in A Flipped Speaking Job Interview Class:
Practice And Learners’ Attitudes. JEELS (Journal of English Education and Linguistics Studies) 7, 2 (2020), 245–268.
[9]
Riku Arakawa and Hiromu Yakura. 2020. INWARD: A Computer-Supported Tool for Video-Reection Improves
Eciency and Eectiveness in Executive Coaching. In Proceedings of the 2020 CHI Conference on Human Factors in
Computing Systems. 1–13.
[10]
Riku Arakawa and Hiromu Yakura. 2022. Human-AI communication for human-human communication: Applying
interpretable unsupervised anomaly detection to executive coaching. arXiv preprint arXiv:2206.10987 (2022).
[11]
Riku Arakawa, Hiromu Yakura, and Masataka Goto. 2023. CatAlyst: domain-extensible intervention for preventing
task procrastination using large generative models. In Proceedings of the 2023 CHI Conference on Human Factors in
Computing Systems. 1–19.
[12]
Adelle Atkinson, Christopher J. Watling, and Paul L. P. Brand. 2021. Feedback and coaching. European Journal of
Pediatrics (2021). https://doi.org/10.1007/s00431-021- 04118-8
[13]
Stephen Atlas. 2023. ChatGPT for higher education and professional development: A guide to conversational AI.
(2023).
[14]
Mahnaz Behroozi, Shivani Shirolkar, Titus Barik, and Chris Parnin. 2020. Debugging hiring: what went right and
what went wrong in the technical interview process. In Proceedings of the ACM/IEEE 42nd International Conference on
Software Engineering: Software Engineering in Society (Seoul, South Korea) (ICSE-SEIS ’20). Association for Computing
Machinery, New York, NY, USA, 71–80. https://doi.org/10.1145/3377815.3381372
[15]
Marit Bentvelzen, Paweł W Woźniak, Pia SF Herbes, Evropi Stefanidi, and Jasmin Niess. 2022. Revisiting reection in
hci: Four design resources for technologies that support reection. Proceedings of the ACM on Interactive, Mobile,
Wearable and Ubiquitous Technologies 6, 1 (2022), 1–27.
25
GROUP ’25, January 12–15, 2025,
[16]
Ananya Bhattacharjee, Yuchen Zeng, Sarah Yi Xu, Dana Kulzhabayeva, Minyi Ma, Rachel Korneld, Syed Ishtiaque
Ahmed, Alex Mariakakis, Mary P Czerwinski, Anastasia Kuzminykh, et al
.
2024. Understanding the Role of Large
Language Models in Personalizing and Scaolding Strategies to Combat Academic Procrastination. In Proceedings of
the CHI Conference on Human Factors in Computing Systems. 1–18.
[17] Blind. [n. d.]. Blind. https://www.teamblind.com. Accessed on: May 8, 2024.
[18]
Lauren Borden, Paul E Levy, and Stanley B Silverman. 2018. Leader arrogance and subordinate outcomes: The role of
feedback processes. Journal of Business and Psychology 33 (2018), 345–364.
[19] David Boud. 2015. Feedback: ensuring that it leads to enhanced learning. Clin. Teach. 12, 1 (Feb. 2015), 3–7.
[20] David Boud, Rosemary Keogh, and David Walker. 2013. Reection: Turning experience into learning. Routledge.
[21]
Lori Boyer, Jamie Pleasant, and Donald Vest. [n. d.]. S.T.A.R. Performance: A Quantitative Exploration of Behavioral
Responses in Simulated Selection Interviews. College of Business E-Journal 14, 1 ([n. d.]), Article 3.
[22] Virginia Braun and Victoria Clarke. 2012. Thematic analysis. American Psychological Association.
[23]
James Bridgeman and Andrea Giraldez-Hayes. 2024. Using articial intelligence-enhanced video feedback for reective
practice in coach development: benets and potential drawbacks. Coaching: An International Journal of Theory,
Research and Practice 17, 1 (2024), 32–49.
[24] Ian Brooks and Sarah Wright. 2007. A survey of executive coaching practices in New Zealand. (2007).
[25] Robert F Bruner. 2001. Repetition is the rst principle of all learning. Available at SSRN 224340 (2001).
[26]
David F. Caldwell and Jerry M. Burger. 1998. PERSONALITY CHARACTERISTICS OF JOB APPLICANTS AND
SUCCESS IN SCREENING INTERVIEWS. Personnel Psychology 51, 1 (March 1998), 119–136. https://doi.org/10.1111/
j.1744-6570.1998.tb00718.x
[27]
David Carless. 2012. Trust and its role in facilitating dialogic feedback. In Feedback in higher and professional education.
Routledge, 90–103.
[28] David Carless. 2016. Feedback as dialogue. Encyclopedia of educational philosophy and theory (2016), 1–6.
[29]
Shalini Chandra, Anuragini Shirish, and Shirish C Srivastava. 2022. To be or not to be. . . human? Theorizing the
role of human-like competencies in conversational articial intelligence agents. Journal of Management Information
Systems 39, 4 (2022), 969–1005.
[30]
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang,
Yidong Wang, et al
.
2024. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems
and Technology 15, 3 (2024), 1–45.
[31] Kathy Charmaz. 2006. Constructing grounded theory: A practical guide through qualitative analysis. sage.
[32]
Muhammad Ali Chaudhry and Emre Kazim. 2022. Articial Intelligence in Education (AIEd): A high-level academic
and industry note 2021. AI and Ethics 2, 1 (2022), 157–165.
[33]
Tianying Chen, Michael Xieyang Liu, Emily Ding, Emma O’Neil, Mansi Agarwal, Robert E Kraut, and Laura Dabbish.
2023. Facilitating Counselor Reective Learning with a Real-time Annotation tool. In Proceedings of the 2023 CHI
Conference on Human Factors in Computing Systems. 1–17.
[34]
Bhavya Chopra, Yasharth Bajpai, Param Biyani, Gustavo Soares, Arjun Radhakrishna, Chris Parnin, and Sumit
Gulwani. 2024. Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants.
arXiv preprint arXiv:2402.06229 (2024).
[35]
Yi-Chi Chou, Felicia R Wongso, Chun-Yen Chao, and Han-Yen Yu. 2022. An AI mock-interview platform for interview
performance analysis. In 2022 10th International Conference on Information and Education Technology (ICIET) (Matsue,
Japan). IEEE.
[36]
Elizabeth Clark, Tal August, Soa Serrano, Nikita Haduong, Suchin Gururangan, and Noah A Smith. 2021. All that’s’
human’is not gold: Evaluating human evaluation of generated text. arXiv preprint arXiv:2107.00061 (2021).
[37]
Meg Colasante. 2011. Using video annotation to reect on and evaluate physical education pre-service teaching
practice. Australasian Journal of Educational Technology 27, 1 (2011).
[38]
Ionut Damian, Tobias Baur, Birgit Lugrin, Patrick Gebhard, Gregor Mehlmann, and Elisabeth André. 2015. Games are
better than books: In-situ comparison of an interactive job interview game with conventional training. In Articial
Intelligence in Education: 17th International Conference, AIED 2015, Madrid, Spain, June 22-26, 2015. Proceedings 17.
Springer, 84–94.
[39]
Andreas Dimopoulos. 2020. Applicant’s Self Condence Inuence in Employment Interview Process According to
Recruiters Perceptions. An Exploratory Study in Greece. International Journal of Human Resource Studies 10, 2 (March
2020), 82. https://doi.org/10.5296/ijhrs.v10i2.16701
[40]
Shokhsanam Djalilova. 2019. Dialogic feedback and its eects on English language learners’ writing development: a
case study. (2019).
[41] edX. n.d.. edX: Online courses from the world’s best universities. https://www.edx.org/
[42]
Juan Escalante, Austin Pack, and Alex Barrett. 2023. AI-generated feedback on writing: insights into ecacy and ENL
student preference. International Journal of Educational Technology in Higher Education 20, 1 (2023), 57.
26
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
[43]
Justin R Feeney, Julie M McCarthy, and Richard Gon. 2015. Applicant anxiety: Examining the sex-linked anxiety
coping theory in job interview contexts. International journal of selection and assessment 23, 3 (2015), 295–305.
[44]
Amanda R. Feiler and Deborah M. Powell. 2016. Behavioral Expression of Job Interview Anxiety. Journal of Business
and Psychology 31, 1 (March 2016), 155–171. https://doi.org/10.1007/s10869-015-9403- z
[45]
Andrea Ferrario and Michele Loi. 2022. How explainability contributes to trust in AI. In Proceedings of the 2022 ACM
Conference on Fairness, Accountability, and Transparency. 1457–1466.
[46] Flask. [n. d.]. https://ask.palletsprojects.com/.
[47]
Rowanne Fleck and Geraldine Fitzpatrick. 2010. Reecting on reection: framing a design landscape. In Proceedings of
the 22nd conference of the computer-human interaction special interest group of australia on computer-human interaction.
216–223.
[48]
Nathan M Fulham, Kori L Krueger, and Taya R Cohen. 2022. Honest feedback: Barriers to receptivity and discerning
the truth in feedback. Current Opinion in Psychology 46 (2022), 101405.
[49]
Robert P. Gauthier, Catherine Pelletier, Laurie-Ann Carrier, Maude Dionne, Ève Dubé, Samantha Meyer, and James R.
Wallace. 2022. Agency and Amplication: A Comparison of Manual and Computational Thematic Analyses by
Public Health Researchers. Proc. ACM Hum.-Comput. Interact. 7, GROUP, Article 2 (dec 2022), 22 pages. https:
//doi.org/10.1145/3567552
[50]
Caren Goldberg and Ann Perry. 1998. Who Gets Hired: Interviewing Skills Are a Prehire Variable. Journal of Career
Planning & Employment 58, 2 (1998), 47–50. ERIC Number: EJ566053.
[51] Google. [n. d.]. https://rebase.google.com/docs/storage.
[52]
Katharine Hansen, Gary C Oliphant, Becky J Oliphant, and Randall S Hansen. 2009. Best practices in preparing
students for mock interviews. Business Communication Quarterly 72, 3 (2009), 318–327.
[53]
Jennifer Hill and Harry West. 2019. Improving the student learning experience through dialogic feed-forward
assessment. Assessment & Evaluation in Higher Education (2019).
[54]
Monika Hooda, Chhavi Rana, Omdev Dahiya, Ali Rizwan, and Md Shamim Hossain. 2022. Articial intelligence for
assessment and feedback to enhance student success in higher education. Math. Probl. Eng. 2022 (May 2022), 1–19.
[55]
Mohammed (Ehsan) Hoque, Matthieu Courgeon, Jean-Claude Martin, Bilge Mutlu, and Rosalind W. Picard. 2013.
MACH: my automated conversation coach. In Proceedings of the 2013 ACM International Joint Conference on Pervasive
and Ubiquitous Computing (Zurich, Switzerland) (UbiComp ’13). Association for Computing Machinery, New York,
NY, USA, 697–706. https://doi.org/10.1145/2493432.2493502
[56]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng,
Xiaocheng Feng, Bing Qin, et al
.
2023. A survey on hallucination in large language models: Principles, taxonomy,
challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).
[57]
Zeyuan Huang, Qiang He, Kevin Maher, Xiaoming Deng, Yu-Kun Lai, Cuixia Ma, Sheng-feng Qin, Yong-Jin Liu, and
Hongan Wang. 2023. SpeechMirror: A multimodal visual analytics system for personalized reection of online public
speaking eectiveness. IEEE Transactions on Visualization and Computer Graphics (2023).
[58]
Robyn Huss, Tasnuva Jhileek, and Judy Butler. 2017. Mock Interviews in the Workplace: Giving Interns the Skills
They Need for Success. 17, 3 (2017).
[59]
Koji Inoue, Kohei Hara, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, and Tatsuya Kawahara.
2020. Job interviewer android with elaborate follow-up question generation. In Proceedings of the 2020 International
Conference on Multimodal Interaction. 324–332.
[60] InterviewingIO. [n. d.]. Interviewing.io. https://interviewing.io/. Accessed on: May 8, 2024.
[61] Sehoon Jung et al. 2013. The eect of task repetition and corrective feedback in L2 writing: A pilot study. (2013).
[62]
Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser,
Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia
Nerdel, Jürgen Pfeer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen
Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. ChatGPT for good? On opportunities and challenges of large language
models for education. Learning and Individual Dierences 103 (2023), 102274. https://doi.org/10.1016/j.lindif.2023.
102274
[63]
Jinhee Kim, Hyunkyung Lee, and Young Hoan Cho. 2022. Learning design to support student-AI collaboration:
Perspectives of leading teachers for AI in education. Education and Information Technologies 27, 5 (2022), 6069–6104.
[64]
Kyong-Jee Kim. 2009. Motivational challenges of adult learners in self-directed e-learning. Journal of Interactive
Learning Research 20, 3 (2009), 317–335.
[65]
Minsun Kim, SeonGyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Hyungseung Lim, Jieun
Han, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong, and Tak Yeon Lee. 2024. Designing Prompt
Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing. arXiv:2405.19691 [cs.HC] https:
//arxiv.org/abs/2405.19691
27
GROUP ’25, January 12–15, 2025,
[66]
Hadas Kotek, Rikker Dockum, and David Sun. 2023. Gender bias and stereotypes in large language models. In
Proceedings of the ACM collective intelligence conference. 12–24.
[67]
Christine Lackner and Tanya Martini. 2017. Helping University Students Succeed at Employment Interviews: The
Role of Self-Reection in E-Portfolios. Teaching & Learning Inquiry 5, 2 (2017), 3–15.
[68]
Julia Levashina, Christopher J Hartwell, Frederick P Morgeson, and Michael A Campion. 2014. The structured
employment interview: Narrative and quantitative review of the research literature. Personnel Psychology 67, 1 (2014),
241–293.
[69]
Mingzhe Li, Xiuying Chen, Weiheng Liao, Yang Song, Tao Zhang, Dongyan Zhao, and Rui Yan. 2023. EZInterviewer:
To Improve Job Interview Performance with Mock Interview Generator. arXiv:2301.00972 [cs.CL]
[70]
Raymond Boon Tar Lim, Claire Gek Ling Tan, Kenneth Wee Beng Hoe, Cecilia Woon Chien Teng, Andre Matthias
Müller, Julian Azfar, Suganthi Narayanasamy, and Chee Hsiang Liow. 2022. Correlates, motivating factors, and
barriers of engaging in regular self-reection among public health students in higher education—A mixed methods
approach. Frontiers in Public Health 10 (2022), 1023439.
[71]
Bingjie Liu, Lewen Wei, Mu Wu, and Tianyi Luo. 2023. Speech production under uncertainty: how do job applicants
experience and communicate with an AI interviewer? Journal of Computer-Mediated Communication 28, 4 (2023),
zmad028.
[72]
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2023.
Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172 [cs.CL]
[73]
Gale M Lucas, Jonathan Gratch, Aisha King, and Louis-Philippe Morency. 2014. It’s only a computer: Virtual humans
increase willingness to disclose. Computers in Human Behavior 37 (2014), 94–100.
[74]
Gale M Lucas, Albert Rizzo, Jonathan Gratch, Stefan Scherer, Giota Stratou, Jill Boberg, and Louis-Philippe Morency.
2017. Reporting mental health symptoms: breaking down barriers to care with virtual human interviewers. Frontiers
in Robotics and AI 4 (2017), 51.
[75]
Melanie Marks and Abigail H O’Connor. 2006. The round-robin mock interview: Maximum learning in minimum
time. Business Communication Quarterly 69, 3 (2006), 264–275.
[76]
Todd Maurer, Jerry Solamon, and Deborah Troxtel. 1998. Relationship of coaching with performance in situational
employment interviews. Journal of Applied Psychology 83, 1 (1998), 128–136. https://doi.org/10.1037/0021-9010.83.1.
128 Place: US Publisher: American Psychological Association.
[77]
Todd J. Maurer, Jerry M. Solamon, Kimberly D. Andrews, and Deborah D. Troxtel. 2001. Interviewee coaching,
preparation strategies, and response strategies in relation to performance in situational employment interviews:
An extension of Maurer, Solamon, and Troxtel (1998). Journal of Applied Psychology 86, 4 (2001), 709–717. https:
//doi.org/10.1037/0021-9010.86.4.709 Place: US Publisher: American Psychological Association.
[78]
Julie McCarthy and Richard Gon. 2004. Measuring Job Interview Anxiety: Beyond Weak Knees and Sweaty
Palms. Personnel Psychology 57, 3 (2004), 607–637. https://doi.org/10.1111/j.1744-6570.2004.00002.x _eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1744-6570.2004.00002.x.
[79]
Michael A McDaniel, Deborah L Whetzel, Frank L Schmidt, and Steven D Maurer. [n.d.]. The Validity of Employment
Interviews: A Comprehensive Review and Meta-Analysis. ([n. d.]).
[80]
Fiona Haslam McKenzie. 2007. Attracting and retaining skilled and professional sta in remote locations. Number
DKCRC Report 21. CRC for Desert Knowledge.
[81]
Negin Mirriahi, Srećko Joksimović, Dragan Gašević, and Shane Dawson. 2018. Eects of instructional conditions and
experience on student reection: a video annotation study. Higher Education Research & Development 37, 6 (2018),
1245–1259.
[82]
Elizabeth Molloy, Francesc Borrell-Carrio, and Ron Epstein. 2012. The impact of emotions in feedback. In Feedback in
higher and professional education. Routledge, 50–71.
[83] MongoDB. [n. d.]. https://www.mongodb.com/products/platform/atlas-database.
[84]
Iftekhar Naim, Md. Iftekhar Tanveer, Daniel Gildea, and Mohammed Ehsan Hoque. 2018. Automated Analysis and
Prediction of Job Interview Performance. IEEE Transactions on Aective Computing 9, 2 (2018), 191–204. https:
//doi.org/10.1109/TAFFC.2016.2614299
[85]
Fawad Naseer, Muhammad Usama Khalid, Nafees Ayub, Akhtar Rasool, Tehseen Abbas, and Muhammad Waleed Afzal.
2024. Automated Assessment and Feedback in Higher Education Using Generative AI. In Transforming Education
With Generative AI: Prompt Engineering and Synthetic Content Creation. IGI Global, 433–461.
[86]
David Nicol. 2014. From monologue to dialogue: improving written feedback processes in mass higher education. In
Approaches to assessment that enhance learning in higher education. Routledge, 11–27.
[87]
David J Nicol and Debra Macfarlane-Dick. 2006. Formative assessment and self-regulated learning: A model and
seven principles of good feedback practice. Studies in higher education 31, 2 (2006), 199–218.
[88] OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
[89] OpenAI. 2024. https://platform.openai.com/docs/guides.
28
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
[90] OpenAI. 2024. https://platform.openai.com/docs/models.
[91]
Fan Ouyang and Pengcheng Jiao. 2021. Articial intelligence in education: The three paradigms. Computers and
Education: Articial Intelligence 2 (2021), 100020.
[92]
Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Yang Wang. 2023. On the risk
of misinformation pollution with large language models. arXiv preprint arXiv:2305.13661 (2023).
[93]
Amy Pavel, Colorado Reed, Björn Hartmann, and Maneesh Agrawala. 2014. Video digests: a browsable, skimmable
format for informational lecture videos.. In UIST, Vol. 10. Citeseer, 2642918–2647400.
[94]
Ethan Perez, Sam Ringer, Kamil
˙
e Lukoši
¯
ut
˙
e, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson,
Sandipan Kundu, Saurav Kadavath, et al
.
2022. Discovering language model behaviors with model-written evaluations.
arXiv preprint arXiv:2212.09251 (2022).
[95]
Ethan Perez, Sam Ringer, Kamil
˙
e Lukoši
¯
ut
˙
e, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson,
Sandipan Kundu, Saurav Kadavath, et al
.
2022. Discovering language model behaviors with model-written evaluations.
arXiv preprint arXiv:2212.09251 (2022).
[96]
Nathan P. Podsako, Steven W. Whiting, Philip M. Podsako, and Paresh Mishra. 2011. Eects of organizational
citizenship behaviors on selection decisions in employment interviews. Journal of Applied Psychology 96, 2 (2011),
310–326. https://doi.org/10.1037/a0020948
[97]
Richard A. Posthuma, Frederick P. Morgeson, and Michael A. Campion. 2002. Beyond Employment Interview Validity:
A Comprehensive Narrative Review of Recent Research and Trends Over Time. Personnel Psychology 55, 1 (2002), 1–81.
https://doi.org/10.1111/j.1744-6570.2002.tb00103.x _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1744-
6570.2002.tb00103.x.
[98]
Chuan Qin, Hengshu Zhu, Dazhong Shen, Ying Sun, Kaichun Yao, Peng Wang, and Hui Xiong. 2023. Automatic
Skill-Oriented Question Generation and Recommendation for Intelligent Job Interviews. ACM Trans. Inf. Syst. 42, 1,
Article 27 (aug 2023), 32 pages. https://doi.org/10.1145/3604552
[99]
Pooja Rao SB, Manish Agnihotri, and Dinesh Babu Jayagopi. 2021. Improving asynchronous interview interaction
with follow-up question generation. (2021).
[100]
Sowmya Rasipuram and Dinesh Babu Jayagopi. 2020. Automatic multimodal assessment of soft skills in social
interactions: a review. Multimedia Tools and Applications 79, 19-20 (2020), 13037–13060.
[101]
Samira Rasouli, Garima Gupta, Moojan Ghafurian, and Kerstin Dautenhahn. 2022. Proposed Applications of Social
Robots in Interventions for Children and Adolescents with Social Anxiety. In Proceedings of the Sixteenth International
Conference on Tangible, Embedded, and Embodied Interaction. 1–7.
[102]
Gregory Reddan. 2008. The benets of job-search seminars and mock interviews in a work experience course.
International Journal of Work-Integrated Learning 9, 2 (2008), 113.
[103]
Peter J Rich and Michael Hannan. 2009. Video annotation tools: Technologies to scaold, structure, and transform
teacher reection. Journal of teacher education 60, 1 (2009), 52–67.
[104]
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. 2023. Steering
llama 2 via contrastive activation addition. arXiv preprint arXiv:2312.06681 (2023).
[105]
M Elena Rodríguez, Ana Elena Guerrero-Roldán, David Baneres, and Abdulkadir Karadeniz. 2022. An intelligent
nudging system to guide online learners. International Review of Research in Open and Distributed Learning 23, 1
(2022), 41–62.
[106] Carl R. Rogers and Richard E. Farson. 1984. Active Listening. Organizational Psychology 4th Ed. (1984), 255–266.
[107]
Steven I Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D Weisz. 2023. The programmer’s
assistant: Conversational interaction with a large language model for software development. In Proceedings of the
28th International Conference on Intelligent User Interfaces. 491–514.
[108]
Michael Schaub and David M. Tokar. 2005. The role of personality and learning experiences in social cognitive career
theory. Journal of Vocational Behavior 66, 2 (April 2005), 304–325. https://doi.org/10.1016/j.jvb.2004.09.005
[109]
Beau G Schelble, Christopher Flathmann, Nathan J McNeese, Guo Freeman, and Rohit Mallick. 2022. Let’s think
together! Assessing shared mental models, performance, and trust in human-agent teams. Proceedings of the ACM on
Human-Computer Interaction 6, GROUP (2022), 1–29.
[110]
Donald A Schön. 1987. Educating the reective practitioner: Toward a new design for teaching and learning in the
professions. Jossey-Bass.
[111]
Petr Slovák, Christopher Frauenberger, and Geraldine Fitzpatrick. 2017. Reective practicum: A framework of
sensitising concepts to design for transformative reection. In Proceedings of the 2017 CHI Conference on Human
Factors in Computing Systems. 2696–2707.
[112]
Brian H Spitzberg. 2000. What is good communication? Journal of the Association for Communication Administration
29, 1 (2000), 7.
[113]
John Stamper, Ruiwei Xiao, and Xinying Hou. 2024. Enhancing llm-based feedback: Insights from intelligent tutoring
systems and the learning sciences. In International Conference on Articial Intelligence in Education. Springer, 32–43.
29
GROUP ’25, January 12–15, 2025,
[114]
Anna Steen-Utheim and Anne Line Wittek. 2017. Dialogic feedback and potentialities for student learning. Learning,
Culture and Social Interaction 15 (2017), 18–30.
[115]
Anna Therese Steen-Utheim. 2019. Students’ sense making of feedback. Dialogue, interaction and emotions. (2019).
[116] Paul Sutton. 2009. Towards dialogic feedback. Critical and Reective Practice in Education 1, 1 (2009).
[117] TED. n.d.. TED: Ideas worth spreading. https://www.ted.com/
[118]
Jinal H Thakkar, Chinchu Thomas, and Dinesh Babu Jayagopi. 2023. Automatic assessment of communication skill in
real-world job interviews: A comparative study using deep learning and domain adaptation.. In Proceedings of the
Fourteenth Indian Conference on Computer Vision, Graphics and Image Processing. 1–11.
[119]
Chris Van Der Lee, Albert Gatt, Emiel Van Miltenburg, Sander Wubben, and Emiel Krahmer. 2019. Best practices for
the human evaluation of automatically generated text. In Proceedings of the 12th International Conference on Natural
Language Generation. 355–368.
[120] Vercel. [n. d.]. https://vercel.com/.
[121] VueJS. [n. d.]. https://vuejs.org/.
[122]
Harry Weger Jr, Gina R Castle, and Melissa C Emmett. 2010. Active listening in peer interviews: The inuence of
message paraphrasing on perceptions of listening skill. The Intl. Journal of Listening 24, 1 (2010), 34–49.
[123]
Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, and Quoc V Le. 2023. Simple synthetic data reduces sycophancy in
large language models. arXiv preprint arXiv:2308.03958 (2023).
[124]
Kate Z Williams. 2008. Eects of practice and feedback on interview performance. Ph.D. Dissertation. Clemson
University.
[125]
Naomi E Winstone, Robert A Nash, James Rowntree, and Michael Parker. 2017. ‘It’d be useful, but I wouldn’t use it’:
barriers to university students’ feedback seeking and recipience. Studies in Higher Education 42, 11 (2017), 2026–2041.
[126]
Ying Xie, Fengfeng Ke, and Priya Sharma. 2008. The eect of peer feedback for blogging on college students’ reective
learning processes. The Internet and Higher Education 11, 1 (2008), 18–25.
[127]
Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Eect of condence and explanation on accuracy and
trust calibration in AI-assisted decision making. In Proceedings of the 2020 conference on fairness, accountability, and
transparency. 295–305.
[128]
Hangyu Zhou, Yuichiro Fujimoto, Masayuki Kanbara, and Hirokazu Kato. 2021. Virtual reality as a reection technique
for public speaking training. Applied Sciences 11, 9 (2021), 3988.
A MAIN QUESTIONS DURING INTERVIEW SIMULATION
(1) "Tell me about yourself?",
(2) "How has your previous education and experience prepared you for this job?",
(3) "What do you consider to be your greatest strength and why?",
(4)
"What do you consider to be your greatest weakness? How are you going about improving
up on it?",
Notes: each main questions were followed by a generated follow-up question
BCONVERSATE PROMPTS
B.1 Prompt for Interview Simulation
B.1.1 First Prompt. You have a role as an interviewer for a Behavioral Job Interview for the job
position <input_job>. Act naturally as an interviewer with a dynamic yet professional approach.
Begin by saying ’Hi, nice to meet you, then introduce yourself as the Hiring Manager. Afterward,
ask this initial question as the rst question for the interview: <initial_question_1>
B.1.2 Subsequent Prompt for the Follow-up estion. As an interviewer, ask a relevant follow-
up question about the job based on the user’s previous answers and the ongoing conversation.
Ensure that your follow-up question is distinct from the questions listed [<initial_question_1>,
<initial_question_2>, ...], and avoid repeating your previous questions.
B.1.3 Subsequent Prompt for the Next Initial estions. As an interviewer, smoothly transition to
the next question. Ask the interviewee the following question: <initial_question_i>.
30
Conversate: AI-Mediated Interview Practice GROUP ’25, January 12–15, 2025,
B.2 Prompt for AI-Highlighted Hints
Your task is to decide whether the way the user answers the interview is ’good’ or ’needs improve-
ment’ based on the User Answer delimited by triple backticks. Format your answer as a string of
’good’ or ’need improvement. The answer is dened as ’needs improvement’ if it lacks some of
these points:
(1) Lack of details
(2)
If the interview can be answered using the STAR method but the user does not do it. For
other questions like the introduction, the STAR method is not required.
(3)
Also consider the interview answer in terms of Eectiveness, Appropriateness, Eciency,
Clarity, and Competence in answering the interview correctly).
User Answer: “‘<answer>“‘
B.3 Prompt for Dialogic Feedback
You are a mentor, and your task is to respond to the conversation based on the transcript of
an interview and considering user comments (although your answer may dier from the user’s
comments). Ensure the response is concise and presented in bullet points.
(1) Ignore user typos and grammatical errors
(2)
If the user seeks to enhance a segment of their interview response that is already satisfactory
in terms of detail, eectiveness, appropriateness, eciency, clarity, and task achievement,
using the STAR method (Situation, Task, Action, Result) when applicable, arm its adequacy
and do not ask for further revisions.
(3) When improvement is necessary, oer constructive feedback.
(4)
Provide an evaluation of the interview performance, kindly but honestly pointing out areas
of weakness.
(5)
Help the user to answer the question using the STAR method for feedback on performance-
based questions, excluding it for introduction (tell me about yourself).
(6)
Instead of suggesting the use of the STAR method, demonstrate it by breaking it down step
by step.
(7) Start with the positives in the user’s response before oering areas for improvement.
(8) Limit your response to 200 words and format it in HTML.
Transcript: “‘<transcript>“‘ Comment: “‘<comment>“‘
<APPEND CONVERSATION>
C DIALOGIC FEEDBACK EXAMPLE
Sample Annotated Transcript
Assistant: Can you tell me about a specic project that you worked on during your
coursework that presented a particularly challenging programming problem and how you
went about solving it?
User: I had challenges debugging a specic program that involved a particular game, and I
was unable to nish the program.
Sample Dialogic Feedback Interaction
User: How can I improve this part?
Assistant: Indeed, your response could use more detail. Let’s use the STAR method to improve it:
31
GROUP ’25, January 12–15, 2025,
Situation: Describe the context of the project. What was the game about? Why was it
signicant?
Task: What was your specic role or task in this project? What was the problem you were
trying to solve?
Action: What steps did you take to debug the program? Did you use any specic strategies
or tools?
Result: What was the outcome of your actions? Did you manage to debug the program
eventually?
By providing these details, you’ll give a more comprehensive and engaging answer.
Assistant: Do you want to try saying this part again in a better way? I can give you feedback again
based on that.
User: One of the biggest challenges I had in my coursework was a particular debugging issue in one
of my programs where I was developing a game system. I had challenges in the implementation and
I had trouble debugging a feature that I had never used before. What I learned from this experience
was that I was able to collaborate with my team and although we didn’t nish the program in time,
I ended up a better programmer and a better debugger.
Assistant: Great improvement! You’ve provided more context and shared what you learned from
the experience. However, let’s make it even better:
Situation: During my coursework, I was developing a game system.
Task: I encountered a challenging debugging issue in a feature I had never used before.
Action: Despite the diculties, I collaborated with my team to try and resolve the issue.
Result: Although we didn’t nish the program in time, I ended up becoming a better pro-
grammer and debugger from the experience.
This format provides a clear and concise response to the interviewer’s question.
User: During my coursework, I was developing a game system. I had challenges in the implementa-
tion and I had trouble debugging a feature that I had never used before. I tried to resolve the issue
by collaborating with my team. Although we didn’t nish the program in time, I ended up learning
a lot to be a better programmer and debugger from the experience.
Assistant: Excellent! Your response is now clear, concise, and informative. You’ve eectively used
STAR method to describe the situation, your task, the actions you took, and the results of your
eorts. This gives the interviewer a good understanding of your problem-solving skills and ability
to learn from challenging situations. Well Done!
Received 8 May 2024; revised 21 August 2024
32
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate , where to evaluate , and how to evaluate . Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the ‘where’ and ‘how’ questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.
Article
Full-text available
The question of how generative AI tools, such as large language models and chatbots, can be leveraged ethically and effectively in education is ongoing. Given the critical role that writing plays in learning and assessment within educational institutions, it is of growing importance for educators to make thoughtful and informed decisions as to how and in what capacity generative AI tools should be leveraged to assist in the development of students' writing skills. This paper reports on two longitudinal studies. Study 1 examined learning outcomes of 48 university English as a new language (ENL) learners in a six-week long repeated measures quasi experimental design where the experimental group received writing feedback generated from Chat-GPT (GPT-4) and the control group received feedback from their human tutor. Study 2 analyzed the perceptions of a different group of 43 ENLs who received feedback from both ChatGPT and their tutor. Results of study 1 showed no difference in learning outcomes between the two groups. Study 2 results revealed a near even split in preference for AI-generated or human-generated feedback, with clear advantages to both forms of feedback apparent from the data. The main implication of these studies is that the use of AI-generated feedback can likely be incorporated into ENL essay evaluation without affecting learning outcomes, although we recommend a blended approach that utilizes the strengths of both forms of feedback. The main contribution of this paper is in addressing generative AI as an automatic essay evaluator while incorporating learner perspectives.
Article
Full-text available
This field study addressed the question of whether voluntary participation in interview coaching is related to performance in situational interviews. Promotional procedures in 4 different police and fire department jobs were involved, allowing replication in separate samples. In 3 of 4 jobs, when controlling for indicators of candidates’ precoaching job knowledge and motivation to do well on the promotional procedures, attendance at a coaching session was significantly related to interview performance. Following a discussion of study limitations, an agenda for future research is discussed, including a proposed general model of coaching process and outcome variables.
Chapter
This chapter explores the integration of generative AI in higher education assessment, addressing the inadequacies of traditional methods in meeting the diverse needs of contemporary learners. It highlights the potential of AI technologies, such as natural language processing and computer vision, to offer personalized, scalable, and insightful evaluations. The chapter critically examines both the enhanced capabilities introduced by AI in educational settings and the ethical challenges it poses. Emphasizing the need for a balanced approach, it suggests synergizing AI's analytical strengths with human expertise to ensure equitable and effective assessments. This work aims to guide educators, administrators, and policymakers through the complexities of AI adoption in academic evaluation, focusing on maintaining academic integrity and inclusivity while leveraging the transformative potential of AI in education.
Article
As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement. This paper introduces SpeechMirror, a visual analytics system facilitating reflection on a speech based on insights from a collection of online speeches. The system estimates the impact of different speech techniques on effectiveness and applies them to a speech to give users awareness of the performance of speech techniques. A similarity recommendation approach based on speech factors or script content supports guided exploration to expand knowledge of presentation evidence and accelerate the discovery of speech delivery possibilities. SpeechMirror provides intuitive visualizations and interactions for users to understand speech factors. Among them, SpeechTwin, a novel multimodal visual summary of speech, supports rapid understanding of critical speech factors and comparison of different speech samples, and SpeechPlayer augments the speech video by integrating visualization of the speaker's body language with interaction, for focused analysis. The system utilizes visualizations suited to the distinct nature of different speech factors for user comprehension. The proposed system and visualization techniques were evaluated with domain experts and amateurs, demonstrating usability for users with low visualization literacy and its efficacy in assisting users to develop insights for potential improvement.