ChapterPDF Available

AI-Based Multilingual Interactive Exam Preparation

Authors:

Abstract and Figures

Our previous analysis on 26 languages which represent over 2.9 billion speakers and 8 language families demonstrated that cross-lingual automatic short answer grading allows students to write answers in exams in their native language and graders to rely on the scores of the system [1]. With lower deviations than 14% (0.72 points out of 5 points) on the corpus of the short answer grading data set of the University of North Texas [2], our natural language processing models show better performances compared to the human grader variability (0.75 points, 15%). In this paper we describe our latest analysis of the integration and application of a multilingual model in interactive training programs to optimally prepare students for exams. We present a multilingual interactive conversational artificial intelligence tutoring system for exam preparation. Our approach leverages and combines learning analytics, crowdsourcing and gamification to automatically allow us to evaluate and adapt the system as well as to motivate students and increase their learning experience. In order to have an optimal learning effect and enhance the user experience, we also tackle the challenge of explainability with the help of keyword extraction and highlighting techniques. Our system is based on Telegram since it can be easily integrated into massive open online courses and other online study systems and has already more than 400 million users worldwide [3].
Content may be subject to copyright.
AI-based Multilingual Interactive Exam Preparation
Tim Schlippe and Jörg Sawatzki
IU International University of Applied Sciences
tim.schlippe@iu.org
Abstract. Our previous analysis on 26 languages which represent over 2.9 bil-
lion speakers and 8 language families demonstrated that cross-lingual automatic
short answer grading allows students to write answers in exams in their native
language and graders to rely on the scores of the system [1]. With lower devia-
tions than 14% (0.72 points out of 5 points) on the corpus of the short answer
grading data set of the University of North Texas [2], our investigated natural
language processing (NLP) models show better performances compared to the
human grader variability (0.75 points, 15%). In this paper we describe our latest
analysis of the integration and application of these NLP models in interactive
training programs to optimally prepare students for exams: We present a multi-
lingual interactive conversational artificial intelligence tutoring system for ex-
am preparation. Our approach leverages and combines learning analytics,
crowdsourcing and gamification to automatically allow us to evaluate and adapt
the system as well as to motivate students and increase their learning experi-
ence. In order to have an optimal learning effect and enhance the user experi-
ence, we also tackle the challenge of explainability with the help of keyword
extraction and highlighting techniques. Our system is based on Telegram since
it can be easily integrated into massive open online courses and other online
study systems and has already more than 400 million users worldwide [3].
Keywords: artificial intelligence in education, cross-lingual short answer grad-
ing, conversational AI, keyword extraction, natural language processing.
1 Introduction
Access to education is one of people's most important assets and ensuring inclusive
and equitable quality education is goal 4 of United Nation's Sustainable Development
Goals [4]. Massive open online courses and other online study opportunities are
providing easier access to education for more and more people around the world.
However, one big challenge is still the language barrier: Most courses are offered in
English, but only 16% of the world population speaks English [5]. To reach the rest of
the people with online study opportunities, courses would need to better support more
languages. The linguistic challenge is especially evident in written exams, which are
usually not provided in the student's native language. To overcome these inequities,
we present and analyze a multilingual interactive conversational artificial intelligence
tutoring system for exam preparation (multilingual exam trainer).
2
Our system is based on a Multilingual Bidirectional Encoder Representations from
Transformers model (M-BERT) [6] and is able to fairly score free-text answers in 26
languages in a fully-automatic way (en, ceb, sv, de, fr, nl, ru, it, es, pl, vi, ja, zh, ar,
uk, pt, fa, ca, sr, id, no, ko, fi, hu, cs, sh) [1]. Thus, foreign students have the possibil-
ity to write answers in their native language during the exam preparation. Since our
used multilingual NLP model has been pre-trained with a total of 104 languages, our
exam trainer can be easily extended with new languages.
Fig. 1 illustrates the concept of our multilingual exam trainer: Iteratively, an exam
question is displayed to the user. The user enters the answer (student answer) using
our chatbot interface. Then the student answer is processed with two AI modelsthe
multilingual automatic short answer grading (ASAG) model and the keyword match-
ing modelwhich deliver quantitative feedback in terms of a score and qualitative
feedback by displaying the model answer and highlighting the keywords matching the
student answer and the model answer.
Fig. 1. Concept of the AI-based Interactive Exam Preparation.
To evaluate our approach, we conducted a study where students, former students, and
people who enjoy continuing education used our implementation and then completed
a questionnaire.
In the next section, we present the latest approaches of other researchers for the
components of our multilingual exam trainer. Sec. 3 describes our specific implemen-
tation. Sec. 4 delineates our experimental setup. Our experiments and results are out-
lined in Sec. 5. We conclude our work in Sec. 6 and suggest further steps.
2 Related Work
The research area “AI in Education” addresses the application and evaluation of Arti-
ficial Intelligence (AI) methods in the context of education and training [7]. One of
the main focuses of this research is to analyze and improve teaching and learning
processes with natural language processing (NLP) models. In the following sections,
we describe the use of NLP components in related work for multilingual NLP, ASAG,
conversational AI, and keyword extraction to address the challenges of our system.
3
2.1 Multilingual Natural Language Processing Models
To allow users of our system to answer the exam questions in their native language,
we used of a multilingual NLP model and adapted it to the task of ASAG. Multilin-
gual NLP models are provided by multiple institutions, e.g., M-BERT [6], ROBERTa
model [8], or XLM-R [9]. They have the benefit that they can be adapted to a certain
task with task-specific labeled text data in 1 or more languages (transfer learning) and
then perform this learned task in other languages (cross-lingual transfer) [10].
To give the users of our system qualitative feedback on their answers, we used M-
BERT as the basic multilingual model which is pre-trained from monolingual corpo-
ra in 104 languages [10] and adapted it to the task of cross-lingual ASAG.
2.2 Automatic Short Answer Grading
ASAG helps us provide feedback on the student answer in the form of a score. The
field of ASAG is becoming more relevant since many educational institutionspublic
and privatealready conduct their courses and examinations online [1,12].
A good overview of approaches in ASAG before the deep learning era is given in
[11]. [12] investigate and compare state-of-the-art deep learning techniques for
ASAG. Their experiments demonstrate that systems based on BERT performed best
for English and German. [13] report that their multilingual ROBERTa model [8]
shows a stronger generalization across languages on English and German.
We extend ASAG to 26 languages and use the smaller M-BERT [10] model to
conduct a larger study concerning the cross-lingual transfer [1].
2.3 Conversational AI
For the interaction with the users, we used a conversational AI that takes the input
from the users and sends messages based on a dialog flow. The messages of the con-
versational AI contain the exam question, the student answer score, the model answer
with highlighted keywords, information about the progress and motivations.
Conversational assistants in education enable learners to access data and services
and exchange information by simulating human-like conversations in the form of a
natural language dialogue on a given topic [14]. There are various technologies,
frameworks, and services for building a conversational AI, such as Rasa [15], Google
DialogFlow [16] or Telegram [17].
Our conversational AI is based on Telegram [17] since it can be easily integrated
into massive open online courses and other online study systems and has already more
than 400 million users worldwide [3]. However, it can be ported to other chatbot
technologies as well. To provide our conversational AIs messages in the students’
native languages, we translated them into our 26 languages using Google's Neural
Machine Translation System [23]. An overview of the system’s BLEU scores over
languages is given in [23]. We did not post-correct the translations, as we wanted to
check whether our system from scratch already delivers a good user interface in dif-
ferent languages.
4
2.4 Keyword Extraction and Semantic Similarity
To explain our users the difference between student answer and model answer, we
highlight the keywords and their synonyms which are contained in both the student
answer and the model answer. This combines two tasks: Keyword extraction and
semantic similarity.
Good overviews of automatic keyword and keyphrase extraction are provided in
[18] and [19]. A survey of the evolution in semantic similarity is given in [20]. The
latest trend for both tasks is to embed the words into a semantic vector space thus
working with word embeddings since the semantically similar words are located
nearby in vector space.
In our system SpaCy [21] is used to exclude stop words, convert the remaining
words into vectors and compute the word similarities.
3 AI-based Interactive Exam Preparation
In this section we describe what components our multilingual exam trainer consists of
and how they were implemented.
3.1 Dialog Flow of our Conversational AI
Fig. 2 shows the dialog flow of our exam trainer with the following steps:
1. The user activates the conversational AI with the /start command.
2. The conversational AI welcomes the user and presents a list of 26 languages
to select from.
3. The conversational AI asks the user a question in the selected language.
4. The user types the answer (any of the 104 languages used in M-BERT is
possible).
5. The conversational AI gives feedback in terms of a score and highlights sim-
ilarities between student and model answer.
6. If the total points collected are equal or greater than THRESHOLD, the goal
is reached and the game ends.
7. Otherwise, the user is presented with another student answer that he or she
needs to score, considering the given model answer.
8. Proceed with step 3.
5
Fig. 2. Dialog Flow of the AI-based Interactive Exam Preparation.
3.2 Gamification and Motivation
Users have the motivation to use our multilingual exam trainer to improve answering
open exam questions. However, studies have shown that gamification creates another
incentive in learning [22]. To give the users of our system this further incentive, we
came up with the following gamification approach: Users are in space and have the
goal to fly with their spaceship from Earth to Mars. To get closer to Mars with the
spaceship, the users have to answer the displayed exam questions. The points for the
answers are converted into kilometers. With better answers, the users get more points
and get to Mars faster. Based on the achievement in the student answer, the user is
praised and motivated by certain phrases, e.g., “Awesome, that gives us fuel for 3
million more kilometers” and with information of the distance to go. Fig. 3 illustrates
our gamification in the conversation between a Dutch user and our conversational AI.
3.3 Quantitative Feedback: Multilingual Automatic Short Answer Grading
The AI model which processes the student answers in their native language and deliv-
ers the user with quantitative feedback in terms of a score is based on M-BERT. The
model was downloaded and fine-tuned through the Transformers library. We trained 6
epochs with a batch size of 8 using the AdamW optimizer with an initial learning rate
of 0.00004. We supplemented each fine-tuned BERT model with a linear regression
layer that outputs a prediction of the score given an answer. The model expects the
model answer and the student answer as input.
6
Fig. 3. Conversation with greeting, language selection, exam question,
student answer, scoring, model answer and motivation.
The ASAG data set of the University of North Texas [2] provided the exam questions,
model answers and training data for fine-tuning M-BERT. It contains 87 questions
with corresponding model answer and on average 28.1 manually graded answers per
question about the topic Data Structures from undergraduate studies.
After fine-tuning with this original English ASAG data, our model would be able
to receive a model answer together with a student answer in 1 of the other 103 lan-
guages and return a score in terms of pointswithout the need of fine-tuning with
ASAG data in the other languages (cross-lingual transfer). However, since we figured
out that adding translations of the ASAG data in more languages even improves fine-
tuning, we added translations in the 5 languages German, Dutch, Finnish, Japanese,
and Chinese [1]. With Mean Absolute Errors between 0.41 and 0.72 points out of 5
points in our analysis of the 26 covered languages, our model has even less discrepan-
cy than the 2 graders which graded the ASAG corpus of the University of North Tex-
as with a discrepancy of 0.75 points [2].
To provide the exam questions and the model answers in our multilingual exam
trainer in 26 languages and to get the translations in the 5 listed languages for fine-
tuning M-BERT, we used Google's Neural Machine Translation System [23].
Google’s Neural Machine Translation System is also used by other researchers who
experiment with multilingual NLP models since it comes close to the performance of
professional translators [24].
3.4 Qualitative Feedback: Keyword Extraction and Highlighting
Fig. 5 shows the keyword highlighting in a snippet of the conversation between the
user and the chatbot. For simplicity, we have implemented keyword extraction and
highlighting for English only in our prototype. Porting the method to other languages
is possible using word vectors and a distance measure.
Our algorithm for keyword extraction and highlighting is shown in Fig. 6. Given
are the word tokens of the model answer and the word tokens of the student answer.
7
Fig. 5. Conversation with Keyword Highlighting.
# Iterate through all tokens in model answer
for model_token in model_answer:
# Process only tokens not in stop word list and alphanumeric
if not model_token.is_stop and model_token.is_alpha:
# Iterate through tokens in the student's answer
for answer_token in student_answer:
# If answer token is not a stop word and alphanumeric:
# Highlight tokens if
# their vectors' cosine similarity exceeds given threshold
if not answer_token.is_stop and model_token.is_alpha and \
model_token.similarity(answer_token) > THRESHOLD:
highlight(model_token)
highlight(answer_token)
Fig. 6. Algorithm for keyword extraction and highlighting.
We iterate over the word tokens in the model answer and over the word tokens in the
student answer and remove stop words and word tokens which do not contain alpha-
numerical characters. Each remaining word token of the model answer and the student
answer is converted into a word vector. Then the word vectors of the model answer
are compared with the word vectors of the student answer. If the similarity between
two word vectors is lower than a threshold, we consider them as synonyms.
3.5 Crowdsourcing and Peer-Reviewing
In order to continuously improve our multilingual ASAG model with high quality
human labeled training data in a crowdsourcing approach, the user also has the task of
scoring another student's answer as part of the game (step 7 in Sec. 3.1). Studies such
as [25] have shown that peer-based proofreading is as effective as a professional
proofreader. Consequently, the same student answer is demonstrated to different us-
ers. This peer-review process makes it possible to detect and filter outliers which
would have a negative impact on the model. However, this process also has another
advantage for the user: The student deals with the question again, but this time from a
different perspective.
4 Experimental Setup
In this section, we describe the structure of our questionnaire and the participants.
8
4.1 Questionnaire Design
To evaluate our approach, we conducted a study where students, former students, and
people who enjoy continuing education first tried our exam trainer and then complet-
ed a questionnaire. The study was conducted on a subset of the possible languages
and examined 5 different aspects: Learning experience, user experience, motivation,
quality of NLP models, and gamification. Our questionnaire contains the following
4 parts:
1. General questions about the scenario of a multilingual interactive conversa-
tional AI tutoring system for exam preparation.
2. Specific questions concerning our implementation.
3. Specific questions concerning extensions and improvements.
4. Personal questions (profile and demographic information)
To obtain detailed results, we asked for a score range where it makes sense. The score
range follows the rules of a forced choice Likert scale, which ranges from (1) strongly
disagree to (5) strongly agree.
4.2 Participants
51 people from 6 countries filled out our questionnaire, giving us a first impression of
the quality and impact of our system. Most were students from the University of Os-
nabrück, IU International University of Applied Sciences, Karlsruhe Institute of
Technology, and Karlsruhe University of Applied Sciences. These people tested our
exam trainer in German (64.7%), English (21.6%), Dutch (3.9%), Italian (3.9%),
French (3.9%), and Spanish (2.0%).
5 Experiments and Results
As described, our study examined 5 different aspects: Learning experience, user expe-
rience, motivation, quality of NLP models, and gamification.
5.1 Learning Experience
Fig. 7 shows that participants responded positively to questions about improving the
learning experience, meaningfulness, use, and helping fellow studentsboth in general
and for our implementation. The majority also believe that our implementation can
accelerate the learning process and that scoring other students’ answers is helpful.
There is a more divided opinion on the questions whether the exam trainer is good to
get familiar with the subject in the native language first, when the actual exam is in
English anyway, and whether it can help elderly people to study online. The differ-
ence in distribution for the last question about support for elderly people shows that
most participants generally rate it as "neutral", while our system scores a bit lower.
This feedback plus comments from the participants on this topic lead us to believe
that it is possible to optimize such a system in cooperation with elderly people.
9
Fig. 7. Learning Experience.
Fig. 8. Motivation.
Fig. 9. User Experience.
10
5.2 Motivation
Fig. 8 shows a tendency for such an exam trainer in general and for our implementa-
tion to motivate people to prepare more for exams.
5.3 User Experience
Fig. 9 indicates that the clear majority of participants find that our interface is easy to
use and that operating our exam trainer is fun.
5.4 Quality of Natural Language Processing Models
Fig. 10 shows that the clear majority of participants rates the machine-translated
questions as linguistically correct and understandable. This shows that post-correcting
the translations seems not to be necessary. The scoring with the help of the ASAG
was only rated average. Through the users' comments, we learned that many users had
randomly entered words as answers and received points for this. This was because
these random answers did not appear in the training data of our ASAG model and
therefore could not be scored correctly. The training data was taken from exams,
where usually no student dares to enter "I don't know" as an answer. Here we see
potential for improvement through the evaluation of other students and through
simple rules that evaluate such entries with 0 points. Our explainability approach with
keyword highlighting was well rated. However, we did not get as much feedback on it
because it was only implemented in English.
5.5 Gamification
Fig. 11 illustrates the clear majority of participants who like the story and the theme
of the game. This demonstrates that even with a simple storylike the trip to Mars
and without special graphical features, a good gamification can be created.
Fig. 10. Quality of NLP Models.
11
Fig. 11. Gamification.
6 Conclusion and Future Work
We presented a multilingual interactive conversational AI tutoring system for exam
preparation which combines multilingual NLP components, ASAG, conversational
AI, keyword extraction, learning analytics, crowdsourcing, and gamification. With
this multilingual exam trainer, we received positive feedback in a survey regarding
learning experience, user experience, motivation, quality of NLP models, and gamifi-
cation. The results of our survey support our proof-of-concept where users have tested
6 languages so far. Future work may include the extension to other languages. In ad-
dition, we would like to further address the issue of explainability to provide even
better support to the users of our multilingual exam trainer. To optimize the dialog, it
could be investigated how to create a more emotional dialog in written form, e.g., by
visualizing voice characteristics and emotions in the textual representation [26,27].
References
1. Sawatzki, J., Schlippe, T.: Cross-Lingual Automatic Short Answer Grading. In: AIED,
Utrecht, Netherlands (2021).
2. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to Grade Short Answer Questions using
Semantic Similarity Measures and Dependency Graph Alignments. In: ACL-HLT, Port-
land, Oregon, USA. (2011).
3. Statista: Number of monthly active Telegram users worldwide from March 2014 to April
2020 (2021), https://www.statista.com/statistics/234038/telegram-messenger-mau-users
4. United Nations: Sustainable Development Goals: 17 Goals to Transform our World
(2021), https://www.un.org/sustainabledevelopment/sustainable-development-goals
5. Statista: The Most Spoken Languages Worldwide in 2019 (2020),
https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide
6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirec-
tional Transformers for Language Understanding. In: NAACL-HLT, Minneapolis, Minne-
sota (2019).
7. Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for Student and
Teacher: Concept for an AI based Information Literacy Tutoring System. In: CIKM. Gal-
way, Ireland (2020).
8. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer,
L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach.
arXiv:1907.11692 (2019).
12
9. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave,
E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised Cross-lingual Representation
Learning at Scale. arXiv:1911.02116 (2018).
10. Pires, T., Schlinger, E., Garrette, D.: How Multilingual is Multilingual BERT? In: ACL.
Florence, Italy (2019).
11. Burrows, S., Gurevych, I., Stein, B.: The Eras and Trends of Automatic Short Answer
Grading. In: IJAIED 25, 60117 (2014).
12. Camus, L., Filighera, A.: Investigating Transformers for Automatic Short Answer Grad-
ing. AIED. Cyberspace (2020).
13. Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep Learning Techniques for Automatic
Short Answer Grading: Predicting Scores for English and German Answers. In: AIET,
Wuhan, China (2021).
14. Wölfel, M.: Towards the Automatic Generation of Pedagogical Conversational Agents
from Lecture Slides. In: EAI ICMTEL. Cyberspace (2021).
15. Bocklisch, T., Faulkner, J., Pawlowski, N., & Nichol, A.: Rasa: Open Source Language
Understanding and Dialogue Management. Cornell University. arXiv: 1712.05181 (2017).
16. Reyes, R., Garza, D., Garrido, L., De la Cueva, V., Ramirez, J.: Methodology for the Im-
plementation of Virtual Assistants for Education Using Google Dialogflow. In: MICAI.
Xalapa, Mexico (2019).
17. Setiaji, H., Paputungan, I.V.: Design of Telegram Bots for Campus Information Sharing.
In: ICITDA. Yogyakarta, Indonesia (2017).
18. Hasan, K.S.: Automatic Keyphrase Extraction: A Survey of the State of the Art. In: ACL.
Baltimore, Maryland, USA (2014)
19. Alami Merrouni, Z., Frikh, B., Ouhbi, B.: Automatic Keyphrase Extraction: A Survey and
Trends. In: JIIS. vol. 54, pp. 391424 (2020).
20. Chandrasekaran, D., Mago, V.: Evolution of Semantic Similarity - A Survey. arXiv:
2004.13820 (2021).
21. Honnibal, M., Montani, I. (n.d.). spaCy. https://spacy.io.
22. de Sousa Borges, S., Durelli, V. H. S., Reis, H. M., Isotani, S.: 2014. A Systematic Map-
ping on Gamification Applied to Education. In: SAC. New York, NY, USA (2014).
23. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao,
Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L.,
Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W.,
Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., Dean,
J.: Google's Neural Machine Translation System: Bridging the Gap between Human and
Machine Translation. CoRR abs/1609.08144 (2016).
24. Aiken, M.: An Updated Evaluation of Google Translate Accuracy. Studies in Linguistics
and Literature 3, 253 (2019).
25. Luo, H., Robinson, A., Park, J.-Y.: Peer Grading in a MOOC: Reliability, Validity, and
Perceived Effects. Online Learning: Official Journal of the Online Learning Consortium.
18. 1-14. 10.24059/olj.v18i2.429. (2014).
26. Wölfel, M., Schlippe, T., Stitz, A.: Voice Driven Type Design. In: SpeD. Bucharest, Ro-
mania (2015).
27. Schlippe, T., Alessai, S., El-Taweel, G., Wölfel, M., Zaghouani, W.: Visualizing Voice
Characteristics with Type Design in Closed Captions for Arabic. In: Cyberworlds. Caen,
France (2020).
... Ashvini et al. [51] recently developed a dynamic NLP-enabled Chatbot for Rural Health Care in India. In another interesting study by Schlippe et al. [52] a multilingual interactive conversational artificial intelligence tutoring system is developed for exam preparation learning processes with NLP models. Conversational AI bots powered by NLP are also assisting farmers regarding all the intricacies of farming reducing costs significantly and increasing revenues. ...
Chapter
Full-text available
Conversational AI has seen unprecedented growth in recent years due to which Chatbots have been made available. Conversational AI primarily focuses on text or speech inputs, identifying the intention behind them, and responding to users with relevant information. Natural Language Processing (NLP), Natural Language Understanding (NLU), Machine Learning (ML), and speech recognition offer a personalized experience that mimics human-like engagement in conversational AI systems. Conversational AI systems like Google Meena, Amazon’s Alexa, Facebook’s BlenderBot, and OpenAI’s GPT-3 are trained using Deep Learning (DL) techniques that mimic a human brain-like structure and are trained on huge amounts of text data to provide open-domain conversations. The aim of this chapter is to highlight Conversational AI and NLP techniques behind it. The chapter focuses on DL architectures useful in building Conversational AI systems. The chapter discusses what are the recent advances in Conversational AI and how they are useful, what are the challenges, and what is the scope and future of conversational AI. This will help researchers to understand state-of-the-art frameworks and how they are useful in building Conversational AI models.
... This ensures that the collaboration among learners is based on an equal relationship (MR15). They have to solve specific tasks like quizzes and joint narratives [12,96]. The PCA should take a moderator role in collaborative learning to accompany the exchange between the learners [54] (US15). ...
Conference Paper
Full-text available
Pedagogical conversational agents (PCAs) such as chatbots and voice assistants can support learners in their studies. However, interactions with PCAs are often perceived as less motivating. Gamifying PCAs has been proposed as one approach to counteract this issue and increase learners' engagement. However, there is currently little prescriptive knowledge on how to design gamified PCAs. To address this, we conducted interviews with learners and reviewed relevant literature to derive 18 meta-requirements, five design principles, and 20 design features for gamified PCAs. We then applied our design knowledge to create a conceptual prototype, which we validated through an experiment with 76 participants. The experiment results demonstrate that our design knowledge can positively influence motivation and enjoyment in learning with PCAs.
... The results showed high average confidence levels, providing a foundation for developing systems that support student learning through voice interaction with multi-language mobile robots. Furthermore, Schlippe et al. [70] developed a multilingual interactive conversational AI tutoring system for exam preparation. The system utilized a multilingual bidirectional encoder representations from transformers (M-BERT) model to automatically score free-text answers in 26 languages. ...
Article
In the digital era, human-robot interaction is rapidly expanding, emphasizing the need for social robots to fluently understand and communicate in multiple languages. It is not merely about decoding words but about establishing connections and building trust. However, many current social robots are limited to popular languages, serving in fields like language teaching, healthcare and companionship. This review examines the AI-driven language abilities in social robots, providing a detailed overview of their applications and the challenges faced, from nuanced linguistic understanding to data quality and cultural adaptability. Last, we discuss the future of integrating advanced language models in robots to move beyond basic interactions and towards deeper emotional connections. Through this endeavor, we hope to provide a beacon for researchers, steering them towards a path where linguistic adeptness in robots is seamlessly melded with their capacity for genuine emotional engagement.
... AI applied to teaching can be used to automate administrative tasks, such as grading exams [3] and managing students [4], as well as to personalize the learning process by adapting teaching to the level of knowledge and learning pace of each student [5]. Therefore, the applications of AI in teaching can be classified into the following groups: ...
... Moreover, people all around the world are now able to communicate in a variety of languages on social media platforms because of machine translation systems [5]. Furthermore, machine translation has shown considerable potential in terms of revolutionizing foreign language teaching and other applications in the field of education [6][7][8][9], and research also demonstrates that machine translation has increased international trading [10]. ...
Article
Full-text available
French is a strategically and economically important language in the regions where the African language Twi is spoken. However, only a very small proportion of Twi speakers in Ghana speak French. The development of a Twi–French parallel corpus and corresponding machine translation applications would provide various advantages, including stimulating trade and job creation, supporting the Ghanaian diaspora in French-speaking nations, assisting French-speaking tourists and immigrants seeking medical care in Ghana, and facilitating numerous downstream natural language processing tasks. Since there are hardly any machine translation systems or parallel corpora between Twi and French that cover a modern and versatile vocabulary, our goal was to extend a modern Twi–English corpus with French and develop machine translation systems between Twi and French: Consequently, in this paper, we present our Twi–French corpus of 10,708 parallel sentences. Furthermore, we describe our machine translation experiments with this corpus. We investigated direct machine translation and cascading systems that use English as a pivot language. Our best Twi–French system is a direct state-of-the-art transformer-based machine translation system that achieves a BLEU score of 0.76. Our best French–Twi system, which is a cascading system that uses English as a pivot language, results in a BLEU score of 0.81. Both systems are fine tuned with our corpus, and our French–Twi system even slightly outperforms Google Translate on our test set by 7% relative.
... To enhance online and distance learning, our next step includes to analyze the integration and application for online exams on the one hand but on the other hand for interactive training programs to prepare students optimally for exams. Figure 6 demonstrates our visualization of a multilingual interactive conversational artificial intelligence tutoring system for exam preparation [30], where students can prepare for exams in their native language, e.g., Dutch, in a gamification approach and automatically receive points for their free text answers. Figure 6. ...
Chapter
Full-text available
Massive open online courses and other online study opportunities are providing easier access to education for more and more people around the world. However, one big challenge is still the language barrier: Most courses are available in English, but only 16% of the world’s population speaks English [1]. The language challenge is especially evident in written exams, which are usually not provided in the student’s native language. To overcome these inequities, we analyze AI-driven cross-lingual automatic short answer grading. Our system is based on a Multilingual Bidirectional Encoder Representations from Transformers model [2] and is able to fairly score free-text answers in 26 languages in a fully-automatic way with the potential to be extended to 104 languages. Augmenting training data with machine translated task-specific data for fine-tuning even improves performance. Our results are a first step to allow more international students to participate fairly in education.KeywordsCross-lingual automatic short answer gradingArtificial intelligence in educationNatural language processingDeep learning
... In 39.6% of the cases the suggested score only needs to be corrected 1 point up or down out of 6, 8 and 10 points. This means that in 70.6% the total score does not have to be corrected at all or only by 1 point, which could lead to significant time savings in the correction process and to be used for learning systems that prepare students for exams [20]. ...
Chapter
Full-text available
We investigate and compare state-of-the-art deep learning techniques for Automatic Short Answer Grading. Our experiments demonstrate that systems based on the Bidirectional Encoder Representations from Transformers (BERT) [1] performed best for English and German. Our system achieves a Pearson correlation coefficient of 0.73 and a Mean Absolute Error of 0.4 points on the Short Answer Grading data set of the University of North Texas [2]. On our German data set we report a Pearson correlation coefficient of 0.78 and a Mean Absolute Error of 1.2 points. Our approach has the potential to greatly simplify the life of proofreaders and to be used for learning systems that prepare students for exams: 31% of the student answers are correctly graded and in 40% the system deviates on average by only 1 point out of 6, 8 and 10 points.KeywordsAutomatic short answer gradingArtificial intelligence in educationNatural language processingDeep learning
Chapter
More and more educational institutions are making lecture videos available online. Since 100+ empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video [1], it makes sense to provide those lecture videos with captions. However, studies also show that the words themselves contribute only 7% and how we say those words with our tone, intonation, and verbal pace contributes 38% to making messages clear in human communication [2]. Consequently, in this paper, we address the question of whether an AI-based visualization of voice characteristics in captions helps students further improve the watching and learning experience in lecture videos. For the AI-based visualization of the speaker’s voice characteristics in the captions we use the WaveFont technology [3–5], which processes the voice signal and intuitively displays loudness, speed and pauses in the subtitle font. In our survey of 48 students, it could be shown that in all surveyed categories—visualization of voice characteristics, understanding the content, following the content, linguistic understanding, and identifying important words—always a significant majority of the participants prefers the WaveFont captions to watch lecture videos.
Chapter
Massive open online courses and other online study opportunities are providing easier access to education for more and more people around the world. To cope with the large number of exams to be assessed in these courses, AI-driven automatic short answer grading can recommend teaching staff to assign points when evaluating free text answers, leading to faster and fairer grading. But what would be the best way to work with the AI? In this paper, we investigate and evaluate different methods for explainability in automatic short answer grading. Our survey of over 70 professors, lecturers and teachers with grading experience showed that displaying the predicted points together with matches between student answer and model answer is rated better than the other tested explainable AI (XAI) methods in the aspects trust, informative content, speed, consistency and fairness, fun, comprehensibility, applicability, use in exam preparation, and in general.
Chapter
Full-text available
Massive open online courses and other online study opportunities are providing easier access to education for more and more people around the world. However, one big challenge is still the language barrier: Most courses are available in English, but only 16% of the world’s population speaks English [1]. The language challenge is especially evident in written exams, which are usually not provided in the student’s native language. To overcome these inequities, we analyze AI-driven cross-lingual automatic short answer grading. Our system is based on a Multilingual Bidirectional Encoder Representations from Transformers model [2] and is able to fairly score free-text answers in 26 languages in a fully-automatic way with the potential to be extended to 104 languages. Augmenting training data with machine translated task-specific data for fine-tuning even improves performance. Our results are a first step to allow more international students to participate fairly in education.KeywordsCross-lingual automatic short answer gradingArtificial intelligence in educationNatural language processingDeep learning
Chapter
Full-text available
We investigate and compare state-of-the-art deep learning techniques for Automatic Short Answer Grading. Our experiments demonstrate that systems based on the Bidirectional Encoder Representations from Transformers (BERT) [1] performed best for English and German. Our system achieves a Pearson correlation coefficient of 0.73 and a Mean Absolute Error of 0.4 points on the Short Answer Grading data set of the University of North Texas [2]. On our German data set we report a Pearson correlation coefficient of 0.78 and a Mean Absolute Error of 1.2 points. Our approach has the potential to greatly simplify the life of proofreaders and to be used for learning systems that prepare students for exams: 31% of the student answers are correctly graded and in 40% the system deviates on average by only 1 point out of 6, 8 and 10 points.KeywordsAutomatic short answer gradingArtificial intelligence in educationNatural language processingDeep learning
Article
Full-text available
Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.
Conference Paper
Full-text available
We present the concept of an intelligent tutoring system which combines web search for learning purposes and state-of-the-art natural language processing techniques. Our concept is described for the case of teaching information literacy, but has the potential to be applied to other courses or for independent acquisition of knowledge through web search. The concept supports both, students and teachers. Furthermore, the approach integrates issues like AI explainability, privacy of student information, assessment of the quality of retrieved information and automatic grading of student performance.
Conference Paper
Full-text available
Diversification of fonts in video captions based on the voice characteristics, namely loudness, speed and pauses, can affect the viewer receiving the content. This study evaluates a new method, WaveFont, which visualizes the voice characteristics for captions in an intuitive way. The study was specifically designed to test captions, which aims to add a new experience for Arabic viewers. The results indicate that our visualization is comprehensible and acceptable and provides significant added value—for hearing-impaired and non-hearing impaired participants: Significantly more participants stated that WaveFont improves their watching experience more than standard captions.
Chapter
Full-text available
We developed a virtual assistant that enables students to access interactive content adapted for an introductory undergraduate course on artificial intelligence. This chatbot is able to show answers to frequently asked questions in a hierarchical structured manner, leading students by either voice, text or tactile input to the content that better solves their questions and doubts. It was developed using Google Dialogflow as a simple way to generate and train a natural language model. Another convenience of this platform is its ability to collect usage data that is potentially useful for lecturers as learning indicators. The main purpose of this paper is to outline the methodology that guided our implementation so that it can be reproduced in different educational contexts and study chatbots as tools for learning. At the moment, several articles, news and blogs are writing about the potential, implementation and impact chatbots have in general contexts, however there is little to no literature proposing a methodology to reproduce them for educational purposes. In that respect, we developed four main categories as a generic structure of course content and focused on quick implementation, easy updating and generalization. The final product received a general approbation of the students due to its accessibility and well structured data.
Article
Full-text available
In 2011, a comprehensive evaluation of accuracy using 51 languages with Google Translate showed that many European languages had good results, but several Asian languages performed poorly. The online service has improved its accuracy over the intervening eight years, and a reevaluation using the same text as the original study shows a 34% improvement based upon BLEU scores. This new study shows that translations between English and German, Afrikaans, Portuguese, Spanish, Danish, Greek, Polish, Hungarian, Finnish, and Chinese tend to be the most accurate.
Chapter
Although corresponding technological and didactical models have been known for decades, the digitization of teaching has hardly advanced beyond simple non-interactive formats (e.g. downloadable slides are provided within a learning management system). The COVID-19 crisis is changing this situation dramatically, creating a high demand for highly interactive formats and fostering exchange between conversation partners about the course content. Systems are required that are able to communicate with students verbally, to answer their questions, and to check the students’ knowledge. While technological advances have made such systems possible in principle, the game stopper is the large amount of manual work and knowledge that must be put into designing such a system and feeding it the right content. In this publication, we present a first system to overcome the aforementioned drawback by automatically generating a corresponding dialog system from slide-based presentations, such as PowerPoint, OpenOffice, or Keynote, which can be dynamically adapted to the respective students and their needs. Our first experiments confirm the proof of concept and reveal that such a system can be very handy for both respective groups, learners and lecturers, alike. The limitations of the developed system, however, also reminds us that many challenges need to be addressed to improve the feasibility and quality of such systems, in particular in the understanding of semantic knowledge.
Chapter
Recent advancements in the field of deep learning for natural language processing made it possible to use novel deep learning architectures, such as the Transformer, for increasingly complex natural language processing tasks. Combined with novel unsupervised pre-training tasks such as masked language modeling, sentence ordering or next sentence prediction, those natural language processing models became even more accurate. In this work, we experiment with fine-tuning different pre-trained Transformer based architectures. We train the newest and most powerful, according to the glue benchmark, transformers on the SemEval-2013 dataset. We also explore the impact of transfer learning a model fine-tuned on the MNLI dataset to the SemEval-2013 dataset on generalization and performance. We report up to 13% absolute improvement in macro-average-F1 over state-of-the-art results. We show that models trained with knowledge distillation are feasible for use in short answer grading. Furthermore, we compare multilingual models on a machine-translated version of the SemEval-2013 dataset.