ChapterPDF Available

AI-Based Multilingual Interactive Exam Preparation

Authors:

Abstract and Figures

Our previous analysis on 26 languages which represent over 2.9 billion speakers and 8 language families demonstrated that cross-lingual automatic short answer grading allows students to write answers in exams in their native language and graders to rely on the scores of the system [1]. With lower deviations than 14% (0.72 points out of 5 points) on the corpus of the short answer grading data set of the University of North Texas [2], our natural language processing models show better performances compared to the human grader variability (0.75 points, 15%). In this paper we describe our latest analysis of the integration and application of a multilingual model in interactive training programs to optimally prepare students for exams. We present a multilingual interactive conversational artificial intelligence tutoring system for exam preparation. Our approach leverages and combines learning analytics, crowdsourcing and gamification to automatically allow us to evaluate and adapt the system as well as to motivate students and increase their learning experience. In order to have an optimal learning effect and enhance the user experience, we also tackle the challenge of explainability with the help of keyword extraction and highlighting techniques. Our system is based on Telegram since it can be easily integrated into massive open online courses and other online study systems and has already more than 400 million users worldwide [3].
Content may be subject to copyright.
AI-based Multilingual Interactive Exam Preparation
Tim Schlippe and Jörg Sawatzki
IU International University of Applied Sciences
tim.schlippe@iu.org
Abstract. Our previous analysis on 26 languages which represent over 2.9 bil-
lion speakers and 8 language families demonstrated that cross-lingual automatic
short answer grading allows students to write answers in exams in their native
language and graders to rely on the scores of the system [1]. With lower devia-
tions than 14% (0.72 points out of 5 points) on the corpus of the short answer
grading data set of the University of North Texas [2], our investigated natural
language processing (NLP) models show better performances compared to the
human grader variability (0.75 points, 15%). In this paper we describe our latest
analysis of the integration and application of these NLP models in interactive
training programs to optimally prepare students for exams: We present a multi-
lingual interactive conversational artificial intelligence tutoring system for ex-
am preparation. Our approach leverages and combines learning analytics,
crowdsourcing and gamification to automatically allow us to evaluate and adapt
the system as well as to motivate students and increase their learning experi-
ence. In order to have an optimal learning effect and enhance the user experi-
ence, we also tackle the challenge of explainability with the help of keyword
extraction and highlighting techniques. Our system is based on Telegram since
it can be easily integrated into massive open online courses and other online
study systems and has already more than 400 million users worldwide [3].
Keywords: artificial intelligence in education, cross-lingual short answer grad-
ing, conversational AI, keyword extraction, natural language processing.
1 Introduction
Access to education is one of people's most important assets and ensuring inclusive
and equitable quality education is goal 4 of United Nation's Sustainable Development
Goals [4]. Massive open online courses and other online study opportunities are
providing easier access to education for more and more people around the world.
However, one big challenge is still the language barrier: Most courses are offered in
English, but only 16% of the world population speaks English [5]. To reach the rest of
the people with online study opportunities, courses would need to better support more
languages. The linguistic challenge is especially evident in written exams, which are
usually not provided in the student's native language. To overcome these inequities,
we present and analyze a multilingual interactive conversational artificial intelligence
tutoring system for exam preparation (multilingual exam trainer).
2
Our system is based on a Multilingual Bidirectional Encoder Representations from
Transformers model (M-BERT) [6] and is able to fairly score free-text answers in 26
languages in a fully-automatic way (en, ceb, sv, de, fr, nl, ru, it, es, pl, vi, ja, zh, ar,
uk, pt, fa, ca, sr, id, no, ko, fi, hu, cs, sh) [1]. Thus, foreign students have the possibil-
ity to write answers in their native language during the exam preparation. Since our
used multilingual NLP model has been pre-trained with a total of 104 languages, our
exam trainer can be easily extended with new languages.
Fig. 1 illustrates the concept of our multilingual exam trainer: Iteratively, an exam
question is displayed to the user. The user enters the answer (student answer) using
our chatbot interface. Then the student answer is processed with two AI modelsthe
multilingual automatic short answer grading (ASAG) model and the keyword match-
ing modelwhich deliver quantitative feedback in terms of a score and qualitative
feedback by displaying the model answer and highlighting the keywords matching the
student answer and the model answer.
Fig. 1. Concept of the AI-based Interactive Exam Preparation.
To evaluate our approach, we conducted a study where students, former students, and
people who enjoy continuing education used our implementation and then completed
a questionnaire.
In the next section, we present the latest approaches of other researchers for the
components of our multilingual exam trainer. Sec. 3 describes our specific implemen-
tation. Sec. 4 delineates our experimental setup. Our experiments and results are out-
lined in Sec. 5. We conclude our work in Sec. 6 and suggest further steps.
2 Related Work
The research area “AI in Education” addresses the application and evaluation of Arti-
ficial Intelligence (AI) methods in the context of education and training [7]. One of
the main focuses of this research is to analyze and improve teaching and learning
processes with natural language processing (NLP) models. In the following sections,
we describe the use of NLP components in related work for multilingual NLP, ASAG,
conversational AI, and keyword extraction to address the challenges of our system.
3
2.1 Multilingual Natural Language Processing Models
To allow users of our system to answer the exam questions in their native language,
we used of a multilingual NLP model and adapted it to the task of ASAG. Multilin-
gual NLP models are provided by multiple institutions, e.g., M-BERT [6], ROBERTa
model [8], or XLM-R [9]. They have the benefit that they can be adapted to a certain
task with task-specific labeled text data in 1 or more languages (transfer learning) and
then perform this learned task in other languages (cross-lingual transfer) [10].
To give the users of our system qualitative feedback on their answers, we used M-
BERT as the basic multilingual model which is pre-trained from monolingual corpo-
ra in 104 languages [10] and adapted it to the task of cross-lingual ASAG.
2.2 Automatic Short Answer Grading
ASAG helps us provide feedback on the student answer in the form of a score. The
field of ASAG is becoming more relevant since many educational institutionspublic
and privatealready conduct their courses and examinations online [1,12].
A good overview of approaches in ASAG before the deep learning era is given in
[11]. [12] investigate and compare state-of-the-art deep learning techniques for
ASAG. Their experiments demonstrate that systems based on BERT performed best
for English and German. [13] report that their multilingual ROBERTa model [8]
shows a stronger generalization across languages on English and German.
We extend ASAG to 26 languages and use the smaller M-BERT [10] model to
conduct a larger study concerning the cross-lingual transfer [1].
2.3 Conversational AI
For the interaction with the users, we used a conversational AI that takes the input
from the users and sends messages based on a dialog flow. The messages of the con-
versational AI contain the exam question, the student answer score, the model answer
with highlighted keywords, information about the progress and motivations.
Conversational assistants in education enable learners to access data and services
and exchange information by simulating human-like conversations in the form of a
natural language dialogue on a given topic [14]. There are various technologies,
frameworks, and services for building a conversational AI, such as Rasa [15], Google
DialogFlow [16] or Telegram [17].
Our conversational AI is based on Telegram [17] since it can be easily integrated
into massive open online courses and other online study systems and has already more
than 400 million users worldwide [3]. However, it can be ported to other chatbot
technologies as well. To provide our conversational AIs messages in the students’
native languages, we translated them into our 26 languages using Google's Neural
Machine Translation System [23]. An overview of the system’s BLEU scores over
languages is given in [23]. We did not post-correct the translations, as we wanted to
check whether our system from scratch already delivers a good user interface in dif-
ferent languages.
4
2.4 Keyword Extraction and Semantic Similarity
To explain our users the difference between student answer and model answer, we
highlight the keywords and their synonyms which are contained in both the student
answer and the model answer. This combines two tasks: Keyword extraction and
semantic similarity.
Good overviews of automatic keyword and keyphrase extraction are provided in
[18] and [19]. A survey of the evolution in semantic similarity is given in [20]. The
latest trend for both tasks is to embed the words into a semantic vector space thus
working with word embeddings since the semantically similar words are located
nearby in vector space.
In our system SpaCy [21] is used to exclude stop words, convert the remaining
words into vectors and compute the word similarities.
3 AI-based Interactive Exam Preparation
In this section we describe what components our multilingual exam trainer consists of
and how they were implemented.
3.1 Dialog Flow of our Conversational AI
Fig. 2 shows the dialog flow of our exam trainer with the following steps:
1. The user activates the conversational AI with the /start command.
2. The conversational AI welcomes the user and presents a list of 26 languages
to select from.
3. The conversational AI asks the user a question in the selected language.
4. The user types the answer (any of the 104 languages used in M-BERT is
possible).
5. The conversational AI gives feedback in terms of a score and highlights sim-
ilarities between student and model answer.
6. If the total points collected are equal or greater than THRESHOLD, the goal
is reached and the game ends.
7. Otherwise, the user is presented with another student answer that he or she
needs to score, considering the given model answer.
8. Proceed with step 3.
5
Fig. 2. Dialog Flow of the AI-based Interactive Exam Preparation.
3.2 Gamification and Motivation
Users have the motivation to use our multilingual exam trainer to improve answering
open exam questions. However, studies have shown that gamification creates another
incentive in learning [22]. To give the users of our system this further incentive, we
came up with the following gamification approach: Users are in space and have the
goal to fly with their spaceship from Earth to Mars. To get closer to Mars with the
spaceship, the users have to answer the displayed exam questions. The points for the
answers are converted into kilometers. With better answers, the users get more points
and get to Mars faster. Based on the achievement in the student answer, the user is
praised and motivated by certain phrases, e.g., “Awesome, that gives us fuel for 3
million more kilometers” and with information of the distance to go. Fig. 3 illustrates
our gamification in the conversation between a Dutch user and our conversational AI.
3.3 Quantitative Feedback: Multilingual Automatic Short Answer Grading
The AI model which processes the student answers in their native language and deliv-
ers the user with quantitative feedback in terms of a score is based on M-BERT. The
model was downloaded and fine-tuned through the Transformers library. We trained 6
epochs with a batch size of 8 using the AdamW optimizer with an initial learning rate
of 0.00004. We supplemented each fine-tuned BERT model with a linear regression
layer that outputs a prediction of the score given an answer. The model expects the
model answer and the student answer as input.
6
Fig. 3. Conversation with greeting, language selection, exam question,
student answer, scoring, model answer and motivation.
The ASAG data set of the University of North Texas [2] provided the exam questions,
model answers and training data for fine-tuning M-BERT. It contains 87 questions
with corresponding model answer and on average 28.1 manually graded answers per
question about the topic Data Structures from undergraduate studies.
After fine-tuning with this original English ASAG data, our model would be able
to receive a model answer together with a student answer in 1 of the other 103 lan-
guages and return a score in terms of pointswithout the need of fine-tuning with
ASAG data in the other languages (cross-lingual transfer). However, since we figured
out that adding translations of the ASAG data in more languages even improves fine-
tuning, we added translations in the 5 languages German, Dutch, Finnish, Japanese,
and Chinese [1]. With Mean Absolute Errors between 0.41 and 0.72 points out of 5
points in our analysis of the 26 covered languages, our model has even less discrepan-
cy than the 2 graders which graded the ASAG corpus of the University of North Tex-
as with a discrepancy of 0.75 points [2].
To provide the exam questions and the model answers in our multilingual exam
trainer in 26 languages and to get the translations in the 5 listed languages for fine-
tuning M-BERT, we used Google's Neural Machine Translation System [23].
Google’s Neural Machine Translation System is also used by other researchers who
experiment with multilingual NLP models since it comes close to the performance of
professional translators [24].
3.4 Qualitative Feedback: Keyword Extraction and Highlighting
Fig. 5 shows the keyword highlighting in a snippet of the conversation between the
user and the chatbot. For simplicity, we have implemented keyword extraction and
highlighting for English only in our prototype. Porting the method to other languages
is possible using word vectors and a distance measure.
Our algorithm for keyword extraction and highlighting is shown in Fig. 6. Given
are the word tokens of the model answer and the word tokens of the student answer.
7
Fig. 5. Conversation with Keyword Highlighting.
# Iterate through all tokens in model answer
for model_token in model_answer:
# Process only tokens not in stop word list and alphanumeric
if not model_token.is_stop and model_token.is_alpha:
# Iterate through tokens in the student's answer
for answer_token in student_answer:
# If answer token is not a stop word and alphanumeric:
# Highlight tokens if
# their vectors' cosine similarity exceeds given threshold
if not answer_token.is_stop and model_token.is_alpha and \
model_token.similarity(answer_token) > THRESHOLD:
highlight(model_token)
highlight(answer_token)
Fig. 6. Algorithm for keyword extraction and highlighting.
We iterate over the word tokens in the model answer and over the word tokens in the
student answer and remove stop words and word tokens which do not contain alpha-
numerical characters. Each remaining word token of the model answer and the student
answer is converted into a word vector. Then the word vectors of the model answer
are compared with the word vectors of the student answer. If the similarity between
two word vectors is lower than a threshold, we consider them as synonyms.
3.5 Crowdsourcing and Peer-Reviewing
In order to continuously improve our multilingual ASAG model with high quality
human labeled training data in a crowdsourcing approach, the user also has the task of
scoring another student's answer as part of the game (step 7 in Sec. 3.1). Studies such
as [25] have shown that peer-based proofreading is as effective as a professional
proofreader. Consequently, the same student answer is demonstrated to different us-
ers. This peer-review process makes it possible to detect and filter outliers which
would have a negative impact on the model. However, this process also has another
advantage for the user: The student deals with the question again, but this time from a
different perspective.
4 Experimental Setup
In this section, we describe the structure of our questionnaire and the participants.
8
4.1 Questionnaire Design
To evaluate our approach, we conducted a study where students, former students, and
people who enjoy continuing education first tried our exam trainer and then complet-
ed a questionnaire. The study was conducted on a subset of the possible languages
and examined 5 different aspects: Learning experience, user experience, motivation,
quality of NLP models, and gamification. Our questionnaire contains the following
4 parts:
1. General questions about the scenario of a multilingual interactive conversa-
tional AI tutoring system for exam preparation.
2. Specific questions concerning our implementation.
3. Specific questions concerning extensions and improvements.
4. Personal questions (profile and demographic information)
To obtain detailed results, we asked for a score range where it makes sense. The score
range follows the rules of a forced choice Likert scale, which ranges from (1) strongly
disagree to (5) strongly agree.
4.2 Participants
51 people from 6 countries filled out our questionnaire, giving us a first impression of
the quality and impact of our system. Most were students from the University of Os-
nabrück, IU International University of Applied Sciences, Karlsruhe Institute of
Technology, and Karlsruhe University of Applied Sciences. These people tested our
exam trainer in German (64.7%), English (21.6%), Dutch (3.9%), Italian (3.9%),
French (3.9%), and Spanish (2.0%).
5 Experiments and Results
As described, our study examined 5 different aspects: Learning experience, user expe-
rience, motivation, quality of NLP models, and gamification.
5.1 Learning Experience
Fig. 7 shows that participants responded positively to questions about improving the
learning experience, meaningfulness, use, and helping fellow studentsboth in general
and for our implementation. The majority also believe that our implementation can
accelerate the learning process and that scoring other students’ answers is helpful.
There is a more divided opinion on the questions whether the exam trainer is good to
get familiar with the subject in the native language first, when the actual exam is in
English anyway, and whether it can help elderly people to study online. The differ-
ence in distribution for the last question about support for elderly people shows that
most participants generally rate it as "neutral", while our system scores a bit lower.
This feedback plus comments from the participants on this topic lead us to believe
that it is possible to optimize such a system in cooperation with elderly people.
9
Fig. 7. Learning Experience.
Fig. 8. Motivation.
Fig. 9. User Experience.
10
5.2 Motivation
Fig. 8 shows a tendency for such an exam trainer in general and for our implementa-
tion to motivate people to prepare more for exams.
5.3 User Experience
Fig. 9 indicates that the clear majority of participants find that our interface is easy to
use and that operating our exam trainer is fun.
5.4 Quality of Natural Language Processing Models
Fig. 10 shows that the clear majority of participants rates the machine-translated
questions as linguistically correct and understandable. This shows that post-correcting
the translations seems not to be necessary. The scoring with the help of the ASAG
was only rated average. Through the users' comments, we learned that many users had
randomly entered words as answers and received points for this. This was because
these random answers did not appear in the training data of our ASAG model and
therefore could not be scored correctly. The training data was taken from exams,
where usually no student dares to enter "I don't know" as an answer. Here we see
potential for improvement through the evaluation of other students and through
simple rules that evaluate such entries with 0 points. Our explainability approach with
keyword highlighting was well rated. However, we did not get as much feedback on it
because it was only implemented in English.
5.5 Gamification
Fig. 11 illustrates the clear majority of participants who like the story and the theme
of the game. This demonstrates that even with a simple storylike the trip to Mars
and without special graphical features, a good gamification can be created.
Fig. 10. Quality of NLP Models.
11
Fig. 11. Gamification.
6 Conclusion and Future Work
We presented a multilingual interactive conversational AI tutoring system for exam
preparation which combines multilingual NLP components, ASAG, conversational
AI, keyword extraction, learning analytics, crowdsourcing, and gamification. With
this multilingual exam trainer, we received positive feedback in a survey regarding
learning experience, user experience, motivation, quality of NLP models, and gamifi-
cation. The results of our survey support our proof-of-concept where users have tested
6 languages so far. Future work may include the extension to other languages. In ad-
dition, we would like to further address the issue of explainability to provide even
better support to the users of our multilingual exam trainer. To optimize the dialog, it
could be investigated how to create a more emotional dialog in written form, e.g., by
visualizing voice characteristics and emotions in the textual representation [26,27].
References
1. Sawatzki, J., Schlippe, T.: Cross-Lingual Automatic Short Answer Grading. In: AIED,
Utrecht, Netherlands (2021).
2. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to Grade Short Answer Questions using
Semantic Similarity Measures and Dependency Graph Alignments. In: ACL-HLT, Port-
land, Oregon, USA. (2011).
3. Statista: Number of monthly active Telegram users worldwide from March 2014 to April
2020 (2021), https://www.statista.com/statistics/234038/telegram-messenger-mau-users
4. United Nations: Sustainable Development Goals: 17 Goals to Transform our World
(2021), https://www.un.org/sustainabledevelopment/sustainable-development-goals
5. Statista: The Most Spoken Languages Worldwide in 2019 (2020),
https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide
6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirec-
tional Transformers for Language Understanding. In: NAACL-HLT, Minneapolis, Minne-
sota (2019).
7. Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for Student and
Teacher: Concept for an AI based Information Literacy Tutoring System. In: CIKM. Gal-
way, Ireland (2020).
8. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer,
L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach.
arXiv:1907.11692 (2019).
12
9. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave,
E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised Cross-lingual Representation
Learning at Scale. arXiv:1911.02116 (2018).
10. Pires, T., Schlinger, E., Garrette, D.: How Multilingual is Multilingual BERT? In: ACL.
Florence, Italy (2019).
11. Burrows, S., Gurevych, I., Stein, B.: The Eras and Trends of Automatic Short Answer
Grading. In: IJAIED 25, 60117 (2014).
12. Camus, L., Filighera, A.: Investigating Transformers for Automatic Short Answer Grad-
ing. AIED. Cyberspace (2020).
13. Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep Learning Techniques for Automatic
Short Answer Grading: Predicting Scores for English and German Answers. In: AIET,
Wuhan, China (2021).
14. Wölfel, M.: Towards the Automatic Generation of Pedagogical Conversational Agents
from Lecture Slides. In: EAI ICMTEL. Cyberspace (2021).
15. Bocklisch, T., Faulkner, J., Pawlowski, N., & Nichol, A.: Rasa: Open Source Language
Understanding and Dialogue Management. Cornell University. arXiv: 1712.05181 (2017).
16. Reyes, R., Garza, D., Garrido, L., De la Cueva, V., Ramirez, J.: Methodology for the Im-
plementation of Virtual Assistants for Education Using Google Dialogflow. In: MICAI.
Xalapa, Mexico (2019).
17. Setiaji, H., Paputungan, I.V.: Design of Telegram Bots for Campus Information Sharing.
In: ICITDA. Yogyakarta, Indonesia (2017).
18. Hasan, K.S.: Automatic Keyphrase Extraction: A Survey of the State of the Art. In: ACL.
Baltimore, Maryland, USA (2014)
19. Alami Merrouni, Z., Frikh, B., Ouhbi, B.: Automatic Keyphrase Extraction: A Survey and
Trends. In: JIIS. vol. 54, pp. 391424 (2020).
20. Chandrasekaran, D., Mago, V.: Evolution of Semantic Similarity - A Survey. arXiv:
2004.13820 (2021).
21. Honnibal, M., Montani, I. (n.d.). spaCy. https://spacy.io.
22. de Sousa Borges, S., Durelli, V. H. S., Reis, H. M., Isotani, S.: 2014. A Systematic Map-
ping on Gamification Applied to Education. In: SAC. New York, NY, USA (2014).
23. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao,
Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L.,
Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W.,
Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., Dean,
J.: Google's Neural Machine Translation System: Bridging the Gap between Human and
Machine Translation. CoRR abs/1609.08144 (2016).
24. Aiken, M.: An Updated Evaluation of Google Translate Accuracy. Studies in Linguistics
and Literature 3, 253 (2019).
25. Luo, H., Robinson, A., Park, J.-Y.: Peer Grading in a MOOC: Reliability, Validity, and
Perceived Effects. Online Learning: Official Journal of the Online Learning Consortium.
18. 1-14. 10.24059/olj.v18i2.429. (2014).
26. Wölfel, M., Schlippe, T., Stitz, A.: Voice Driven Type Design. In: SpeD. Bucharest, Ro-
mania (2015).
27. Schlippe, T., Alessai, S., El-Taweel, G., Wölfel, M., Zaghouani, W.: Visualizing Voice
Characteristics with Type Design in Closed Captions for Arabic. In: Cyberworlds. Caen,
France (2020).
... To enhance online and distance learning, our next step includes to analyze the integration and application for online exams on the one hand but on the other hand for interactive training programs to prepare students optimally for exams. Figure 6 demonstrates our visualization of a multilingual interactive conversational artificial intelligence tutoring system for exam preparation [30], where students can prepare for exams in their native language, e.g., Dutch, in a gamification approach and automatically receive points for their free text answers. Figure 6. ...
... In 39.6% of the cases the suggested score only needs to be corrected 1 point up or down out of 6, 8 and 10 points. This means that in 70.6% the total score does not have to be corrected at all or only by 1 point, which could lead to significant time savings in the correction process and to be used for learning systems that prepare students for exams [20]. ...
Conference Paper
Full-text available
We present the concept of an intelligent tutoring system which combines web search for learning purposes and state-of-the-art natural language processing techniques. Our concept is described for the case of teaching information literacy, but has the potential to be applied to other courses or for independent acquisition of knowledge through web search. The concept supports both, students and teachers. Furthermore, the approach integrates issues like AI explainability, privacy of student information, assessment of the quality of retrieved information and automatic grading of student performance.
Conference Paper
Full-text available
Diversification of fonts in video captions based on the voice characteristics, namely loudness, speed and pauses, can affect the viewer receiving the content. This study evaluates a new method, WaveFont, which visualizes the voice characteristics for captions in an intuitive way. The study was specifically designed to test captions, which aims to add a new experience for Arabic viewers. The results indicate that our visualization is comprehensible and acceptable and provides significant added value—for hearing-impaired and non-hearing impaired participants: Significantly more participants stated that WaveFont improves their watching experience more than standard captions.
Chapter
Full-text available
We developed a virtual assistant that enables students to access interactive content adapted for an introductory undergraduate course on artificial intelligence. This chatbot is able to show answers to frequently asked questions in a hierarchical structured manner, leading students by either voice, text or tactile input to the content that better solves their questions and doubts. It was developed using Google Dialogflow as a simple way to generate and train a natural language model. Another convenience of this platform is its ability to collect usage data that is potentially useful for lecturers as learning indicators. The main purpose of this paper is to outline the methodology that guided our implementation so that it can be reproduced in different educational contexts and study chatbots as tools for learning. At the moment, several articles, news and blogs are writing about the potential, implementation and impact chatbots have in general contexts, however there is little to no literature proposing a methodology to reproduce them for educational purposes. In that respect, we developed four main categories as a generic structure of course content and focused on quick implementation, easy updating and generalization. The final product received a general approbation of the students due to its accessibility and well structured data.
Article
Full-text available
In 2011, a comprehensive evaluation of accuracy using 51 languages with Google Translate showed that many European languages had good results, but several Asian languages performed poorly. The online service has improved its accuracy over the intervening eight years, and a reevaluation using the same text as the original study shows a 34% improvement based upon BLEU scores. This new study shows that translations between English and German, Afrikaans, Portuguese, Spanish, Danish, Greek, Polish, Hungarian, Finnish, and Chinese tend to be the most accurate.
Article
Full-text available
Due to the exponential growth of textual data and web sources, an automatic mechanism is required to identify relevant information embedded within them. The utility of Automatic Keyphrase Extraction (AKPE) cannot be overstated, given its widespread adoption in many Information Retrieval (IR), Natural Language Processing (NLP) and Text Mining (TM) applications, and its potential ability to solve difficulties related to extracting valuable information. In recent years, a wide range of AKPE techniques have been proposed. However, they are still impaired by low accuracy rates and moderate performance. This paper provides a comprehensive review of recent research efforts on the AKPE task and its related techniques. More concretely, we highlight the common process of this task, while also illustrating the various approaches used (supervised, unsupervised, and Deep Learning) and released techniques. We investigate the major challenges that such techniques face and depict the specific complexities they address. Besides, we provide a comparison study of the best performing techniques, discuss why some perform better than others and propose recommendations to improve each stage of the AKPE process.
Chapter
Although corresponding technological and didactical models have been known for decades, the digitization of teaching has hardly advanced beyond simple non-interactive formats (e.g. downloadable slides are provided within a learning management system). The COVID-19 crisis is changing this situation dramatically, creating a high demand for highly interactive formats and fostering exchange between conversation partners about the course content. Systems are required that are able to communicate with students verbally, to answer their questions, and to check the students’ knowledge. While technological advances have made such systems possible in principle, the game stopper is the large amount of manual work and knowledge that must be put into designing such a system and feeding it the right content. In this publication, we present a first system to overcome the aforementioned drawback by automatically generating a corresponding dialog system from slide-based presentations, such as PowerPoint, OpenOffice, or Keynote, which can be dynamically adapted to the respective students and their needs. Our first experiments confirm the proof of concept and reveal that such a system can be very handy for both respective groups, learners and lecturers, alike. The limitations of the developed system, however, also reminds us that many challenges need to be addressed to improve the feasibility and quality of such systems, in particular in the understanding of semantic knowledge.
Chapter
Recent advancements in the field of deep learning for natural language processing made it possible to use novel deep learning architectures, such as the Transformer, for increasingly complex natural language processing tasks. Combined with novel unsupervised pre-training tasks such as masked language modeling, sentence ordering or next sentence prediction, those natural language processing models became even more accurate. In this work, we experiment with fine-tuning different pre-trained Transformer based architectures. We train the newest and most powerful, according to the glue benchmark, transformers on the SemEval-2013 dataset. We also explore the impact of transfer learning a model fine-tuned on the MNLI dataset to the SemEval-2013 dataset on generalization and performance. We report up to 13% absolute improvement in macro-average-F1 over state-of-the-art results. We show that models trained with knowledge distillation are feasible for use in short answer grading. Furthermore, we compare multilingual models on a machine-translated version of the SemEval-2013 dataset.