Conference PaperPDF Available

TipTopTalk! Mobile application for speech training using minimal pairs and gamification


Abstract and Figures

This demonstration describes the TipTopTalk! mobile application , a serious game for foreign language (L2) pronunciation training, based on the minimal-pairs technique. Multiple Spoken Language Technologies (SLT) such as speech recognition and text-to-speech conversion are integrated in our system. User's interaction consists in a sequence of challenges along time, for instance exposure, discrimination and production exercises. The application implements gamification resources with the aim of promoting continued practice. A specific feedback is also given to the user in order to avoid the performance drop detected after the protracted use of the tool. The application can be used in different languages , such as Spanish, Portuguese (European and Brazilian), English, Chinese, and German.
Content may be subject to copyright.
TipTopTalk! Mobile application for speech
training using minimal pairs and gamification
Cristian Tejedor-Garc´ıa1, David Escudero-Mancebo1,
esar Gonz´alez-Ferreras1, Enrique C´amara-Arenas2, and
Valent´ın Carde˜noso-Payo1
1Department of Computer Science
2Department of English Philology
University of Valladolid
Abstract. This demonstration describes the TipTopTalk! mobile appli-
cation, a serious game for foreign language (L2) pronunciation training,
based on the minimal-pairs technique. Multiple Spoken Language Tech-
nologies (SLT) such as speech recognition and text-to-speech conversion
are integrated in our system. User’s interaction consists in a sequence of
challenges along time, for instance exposure, discrimination and produc-
tion exercises. The application implements gamification resources with
the aim of promoting continued practice. A specific feedback is also given
to the user in order to avoid the performance drop detected after the
protracted use of the tool. The application can be used in different lan-
guages, such as Spanish, Portuguese (European and Brazilian), English,
Chinese, and German.
Keywords: serious game, speech technology, computer assisted pronun-
ciation training, gamification, learning analytics, L2 pronunciation, min-
imal pairs
1 Introduction
There are many software tools that rely on speech technologies for providing to
users L2 pronunciation training in the field of Computer Assisted Pronunciation
Training (CAPT)[4]. While such tools undoubtedly engage users in learning-
oriented practice, there have been very few attempts to objectively assess the
actual improvement attained by them [8][7]. The volume of technological services
for smartphones and other smart devices is growing everyday [1]. Currently the
most popular mobile and desktop operating systems grant users a free access to
several Text-To-Speech (TTS) and Automatic Speech Recognition (ASR) sys-
tems. Besides, the combination of adequate teaching methods and gamification
strategies will increase user engagement, provide an adequate feedback and, at
the same time, keep users active and comfortable [10][9].
This paper describes the software tool TipTopTalk!1[5][11][12] a second gen-
eration serious game application designed for L2 pronunciation training and
2 TipTopTalk! Mobile application for speech training
testing. It is a two-years project focused on advanced research in speech train-
ing technology, such as speech recognition and text-to-speech conversion and
the successful joint integration of them in a multilingual and multimodal infor-
mation retrieval system. The languages considered in the project are Spanish,
Portuguese (European and Brazilian), English, simplified Chinese, and German.
The rest of the paper is structured as follows. Section 2 offers an overview of
our system, the application dynamics and the user interface. Section 3 describes
the demonstration’s script. Finally, section 4 provides the conclusions and future
2 Description of the system
2.1 General overview of TipTopTalk!
Three main elements are involved in our system, an Android client application,
an own web server and external services provided by Google. See references
[13][12][11] for more specific details. Figure 1 represents the conceptual architec-
ture of the Android client application. The Control module includes the appli-
cation’s business logic. The minimal pairs database is accessed by the Control
component in order to extract the minimal pairs lists of each language. The
Game Interface component presents each pair to the users in accordance with
the game dynamics. The Control component makes use of an ASR component
that translates spoken words into text. When the patterns produced by the ASR
component match those of the target words, the pronunciation is correct. The
TTS component is used to generate a spoken version of any required word. It
allows users to listen to a model pronunciation of the words before they try to
pronounce them themselves. We use both Google’s free ASR and TTS system.
However, TipTopTalk! adapts to any ASR or TTS that works with Android.
AConfiguration component selects the language in which the ASR and TTS
components operate. Furthermore, it allows selecting among different sets of min-
imal pairs according to the language to be tested. Results will show the capital
importance of a proper selection of minimal pairs. The minimal pairs database
–which constitutes the knowledge database of the system– can be updated in
order to improve the system or to include new challenges.
Finally, a Game Report is generated at the end of each game. This report reg-
isters user dynamics, including the timing of the oral turns (both for recognition
and for synthesis) and the results obtained. We gather relevant quantitative data
from all emerging events in the visual interface of the application with which we
feed a daily log for each user in order to determine whether her or his pronunci-
ation skills are improving. In addition, we send depersonalized user’s interaction
events to our Google Analytics account in order to compute how often a given
event has occurred.
2.2 Pedagogical activities cycle
TipTopTalk! follows a learning methodology based on the sequencing of three
different learning stages: exposure, discrimination and pronunciation [3]. It relies
TipTopTalk! Mobile application for speech training 3
Fig. 1. Conceptual components of the client’s system.
on the use of minimal pairs. They raise users’ awareness of the potential risks of
generating wrong meanings when phonemes are not properly produced[2]. The
lists of minimal pairs used by the tool are selected by expert linguists in order
to obtain the best possible results. TipTopTalk! tries to adapt this methodology
with gamification elements since it is a serious game.
As a consequence, there are three main game modes. The first one is the
exposure mode, players become familiar with the distinctive phonemes within
sequences of minimal pairs selected by a native linguist and presented at random.
The aural correlate of each word is played a maximum of five times. Then, users
decide whether to move on to next round of words, or to record their own
realization of the words to compare it with the TTS version.
Secondly, in the discrimination mode, users test their ability to discriminate
between the elements of minimal pairs. They listen to the aural correlate of
any of the words in each pair and must match it with the correct written form
on the screen. As part of the gamification strategy, the game randomly asks
users to pick the word that has not been uttered, rather than the uttered one.
At higher levels of difficulty, the phonetic transcription of each word, otherwise
visible, is removed. These strategies aim at the promotion of user adaptation
and engagement.
Finally, in the pronunciation mode, participants are asked to separately read
aloud (and record) both words of each minimal pair. A real-time feedback is
provided instantly. Native model pronunciations of each word can be played as
many times as the user needs. Speech is recorded and played using third party
ASR and TTS applications.
4 TipTopTalk! Mobile application for speech training
2.3 Gamification
TipTopTalk! adapts to the player in function of the interaction results giving a
specific feedback. New training modes are suggested based on the results of the
current one. For instance, in discrimination mode, if an user achieves the maxi-
mum score, advancement to a pronunciation mode will be suggested. Otherwise,
going back to exposure mode will be automatically recommended after a low
score has been attained in discrimination. Each TipTopTalk! teaching strategy
has its visual user interface containing different game elements. Figure 3 shows
three visual user interface screenshots of the main game modes, that is, exposure,
discrimination and pronunciation.
Gamification is an informal umbrella term for the use of video game elements
in non-gaming systems to improve user experience (UX) and user engagement[6].
In TipTopTalk! users add points to their phonetic level and reach several achieve-
ments dependent on the mode and difficulty level (see Figure 2 (b)). There are
also different language-dependent leaderboards, based on scores attained and the
number of completed rounds, where all players are ranked to increase engage-
ment through competition (see Figure 2 (a)).
(a) (b)
Fig. 2. Examples of gamification elements: a leaderboard (a) and a list of user’s trophies
Sharing results via social networks plays an important role in the gamifica-
tion strategy by virtue of the competitiveness that it promotes. There are other
gamification elements such as a limited time to complete the current round or a
game; the granting of more or less points depending on the difficulty level and
the number of attempts required for completion; the allotting of a number of re-
serve lives to allow further playing; the dispensation of an amount of clear tickets
which allow users to skip the current round and move on to next one; and the
TipTopTalk! Mobile application for speech training 5
graphical display of the visual percentage of a game list result. Finally, we incor-
porate a system of push notifications that sends motivational and challenging
messages to users in order to trigger their engagement.
3 Activities in the demonstration
The demonstration will consist on an interactive session showing all different
modes in the client application (see 2.2). People will be able to ask for help
during the presentation. At the beginning, all attending people can download
the application with a given URL or taking a photo of a QR picture. Once
downloaded, the demonstration begins choosing the Spanish language. The first
step is to complete an exposure activity, listening to and repeating all words. The
first image (a) of Figure 3 shows a basic round of the exposure training mode.
There is a menu-options bar at the top in which users can exit the current game,
go forward to the next round or go back. There is also a status bar below the
menu-options bar that indicates to users the current round. The system allows
us to register whether users play the model for both words at the beginning of
each round. Orthographic forms and phonetic transcriptions are displayed at the
center of the screen. We keep track of the number of times users synthesize a
word or record themselves. We save the recorded voice in a file for subsequent
analyses and corpus compilation.
The second screenshot (b) of Figure 3 (discrimination mode) includes new
elements such as a timer at the top and both discrimination wrong and correct
counters. There is a background colour as a gamification element. If the colour
is green, users must choose the word they think is being played. However, if the
background colour is red, they must choose the wrong one. In the right bottom
corner there is a button that plays another time the sound of the word.
The third screen capture (c) in Figure 3 represents a snapshot of a pronuncia-
tion mode round. This part of the game introduces more feedback elements than
the previous. When the user utters the test word correctly, the related elements
change their base color to green, and the word gets disabled as a positive feedback
message appears. Otherwise, a message appears containing the words recognized
by the ASR (different from the test word) together with a non-positive feedback.
The mispronounced word changes its base color to red and remains active before
it gets disabled only after five unrecognized realizations by the user. There is a
limit of five wrong attempts per word.
The last screenshot (d) represents a round of Infinite Mode with the variant of
the discrimination mode. The aim of this mode is to complete the highest number
of rounds possible. There are new elements such as number of remaining lives at
the left-top corner, the current round at the top-right corner and a skip-rounf
button at the left-bottom corner. Discrimination and pronunciation challenges
are presented randomly in each round. Users start with a finite number of lives
that will decrease in one each time they fail. Also, the game’s difficulty level
increases with each round. For instance, from the tenth round on, the chance
that the orthographic representation a word is substituted by asterisks is raised
6 TipTopTalk! Mobile application for speech training
(a) (b)
(c) (d)
Fig. 3. Visual user interface of exposure (a), discrimination (b), pronunciation (c) and
Infinite (discrimination variant) (d) modes.
TipTopTalk! Mobile application for speech training 7
to 50%. From the twentieth round on, a 50% chance that the TTS button is
absent is introduced. The amount of time allotted for round completion is also
progressively reduced.
4 Conclusions and future work
In this demonstration we presented a serious game implemented by a mobile
application leaning on third party services. The main goal of our system is to
provide a tool for improving L2 pronunciation with gamification elements. The
client application was developed for Android version 2.3.3 and using the Eclipse
development environment. On the one hand, it connects to an own web server.
It works under a GNU/Linux operating system gathering data such as log files,
messages and picture files. On the other hand, it relies on several Google services,
for instance Google Voice Search, Google Analytics and Google Play Games.
TipTopTalk!’s dependence on both external ASR and TTS systems for as-
sessing speech production may be a long-term problem since they are black-box
systems. We are considering the possibility of using other open source platforms
or creating a new one adapted specifically.
There are some points that can be improved in future versions. We are now
working on some international collaborations to expand the range of available
languages. We are also working in the portability to other mobile operating sys-
tems. Finally, despite the introduction of gamification elements, an habituation
factor leads to a fall in interest and performance after protracted use. This sug-
gests us to be able to incorporate mechanisms to provide real particularized
feedback based on automatically identified errors.
Acknowledgements. This work was partially funded by the Ministerio de
Econom´ıa y Competitividad y Fondos FEDER – project key: TIN2014-59852-
R Videojuegos Sociales para la Asistencia y Mejora de la Pronunciaci´on de la
Lengua Espa˜nola – and Junta de Castilla y Le´on – pro ject key: VA145U14 Eval-
uaci´on Autom´atica de la Pronunciaci´on del Espa˜nol Como Lengua Extranjera
para Hablantes Japoneses. We would like to thank Andreia Rauber, Anabela
Rato and Junming Yao for their contribution of the minimal pairs lists.
1. Campbell, S.W., Park, Y.J.: Social implications of mobile telephony: The rise of
personal communication society. Sociology Compass 2(2), 371–387 (2008)
2. Celce-Murcia, M., Brinton, D.M., Goodwin, J.M.: Teaching pronunciation: A ref-
erence for teachers of English to speakers of other languages. Cambridge University
Press (1996)
3. amara-Arenas, E.: Native Cardinality: on teaching American English vowels to
Spanish students. S. de Publicaciones de la Universidad de Valladolid (2012)
8 TipTopTalk! Mobile application for speech training
4. Escudero-Mancebo, D., Carranza, M.: Nuevas propuestas tecnol´ogicas para la
pr´actica y evaluaci´on de la pronunciaci´on del espa˜nol como lengua extranjera.
Actas del L Congreso de la Asociaci´on Europea de Profesores de Espanol, Burgos
5. Escudero-Mancebo, D., C´amara-Arenas, E., Tejedor-Garc´ıa, C., Gonz´alez-Ferreras,
C., Cardenoso-Payo, V.: Implementation and test of a serious game based on min-
imal pairs for pronunciation training. SLaTE-2015 pp. 125–130 (2015)
6. Kapp, K.M.: What is Gamification? The Gamification of Learning and Instruction:
Gamebased Methods and Strategies for Training and Education, San Francisco,
CA: Pfeiffer 13, 1–24 (2014)
7. Kartushina, N., Hervais-Adelman, A., Frauenfelder, U.H., Golestani, N.: The effect
of phonetic production training with visual feedback on the perception and pro-
duction of foreign speech sounds. The Journal of the Acoustical Society of America
138(2), 817–832 (2015)
8. Linebaugh, G., Roche, T.: Evidence that L2 production training can enhance per-
ception. Journal of Academic Language & Learning. 9(1), A1–A17 (2015)
9. McFarlane, A., Sparrowhawk, A., Heald, Y.: Report on the educational use of
games. TEEM (Teachers evaluating educational multimedia), Cambridge (2002)
10. Muntean, C.I.: Raising engagement in e-learning through gamification. In: Proc.
6th International Conference on Virtual Learning ICVL. pp. 323–329 (2011)
11. Tejedor-Garc´ıa, C., Cardenoso-Payo, V., C´amara-Arenas, E., Gonz´alez-Ferreras,
C., Escudero-Mancebo, D.: Playing around minimal pairs to improve pronunciation
training. IFCASL (2015)
12. Tejedor-Garc´ıa, C., Cardenoso-Payo, V., C´amara-Arenas, E., Gonz´alez-Ferreras,
C., Escudero-Mancebo, D.: Measuring pronunciation improvement in users of
CAPT tool TipTopTalk! Interspeech pp. 1178–1179 (2016)
13. Tejedor-Garc´ıa, C., Escudero-Mancebo, D., C´amara-Arenas, E., Gonz´alez-Ferreras,
C., Cardenoso-Payo, V.: Improving L2 production with a gamified computer-
assisted pronunciation training tool, TipTopTalk! IberSpeech 2016: IX Jornadas
en Tecnolog´ıas del Habla and the V Iberian SLTech Workshop events
... Este experimento fue el primero en intentar responder a las preguntas de investigación RQ2 (e Issue 2.1, Issue 2.2, Issue 2.3) y RQ3, junto a la pregunta de investigacion RQ1 (e Issue 1.1), tratando aspectos relativos a los objetivos RO1, RO2, RO3 y RO4. Los principales resultados han sido publicados en [18], [19], [20], [21], [22]. En la Sección 7.3 se encuentra la descripción completa de dicho experimento. ...
... FIGURE 7.7: TipTopTalk! CAPT system screenshots of the minimal pair lists selection (first picture), the main leaderboard (second picture), and the list of achieved trophies (third picture), adapted from [19]. ...
... Finally, results from the questionnaire provided at the end of the experiment. The most important results are reported in the following subsections and discussed in Chapter 8. Performance-related results have been partially published in [18], [19], [20]; whereas results related to the gamification elements included in the CAPT system have been published in [21], [22]. ...
Full-text available
The quality of speech technology (automatic speech recognition, ASR, and text–to–speech, TTS) has considerably improved and, consequently, an increasing number of computer-assisted pronunciation (CAPT) tools has included it. However, pronunciation is one area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness of tools and games that include speech technology in the field of pronunciation training and teaching. This PhD thesis addresses the design and validation of an innovative CAPT system for smart devices for training second language (L2) pronunciation. Particularly, it aims to improve learner's L2 pronunciation at the segmental level with a specific set of methodological choices, such as learner's first and second language connection (L1–L2), minimal pairs, a training cycle of exposure–perception–production, individualistic and social approaches, and the inclusion of ASR and TTS technology. The experimental research conducted applying these methodological choices with real users validates the efficiency of the CAPT prototypes developed for the four main experiments of this dissertation. Data is automatically gathered by the CAPT systems to give an immediate specific feedback to users and to analyze all results. The protocols, metrics, algorithms, and methods necessary to statistically analyze and discuss the results are also detailed. The two main L2 tested during the experimental procedure are American English and Spanish. The different CAPT prototypes designed and validated in this thesis, and the methodological choices that they implement, allow to accurately measuring the relative pronunciation improvement of the individuals who trained with them. Both rater's subjective scores and CAPT's objective scores show a strong correlation, being useful in the future to be able to assess a large amount of data and reducing human costs. Results also show an intensive practice supported by a significant number of activities carried out. In the case of the controlled experiments, students who worked with the CAPT tool achieved better pronunciation improvement values than their peers in the traditional in-classroom instruction group. In the case of the challenge-based CAPT learning game proposed, the most active players in the competition kept on playing until the end and achieved significant pronunciation improvement results.
... The basic dynamics consists of the iteration of exposure-discriminationproduction cycles. We use Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) technology within a gamified environment, as described in [16,17,18,19]. On this occasion, we present the results of a controlled experiment for which an adaptation of the tool was required. ...
... In phonology, a pair of words is considered minimal when they differ in only one phoneme, as in bet-bed; primarily devised as a technique for elucidating the phonological system of unknown languages, minimal-pairs have been used for increasing phonemic awareness in second language teaching for more than half a century [20]. For the adapted version of the tool we reduced the gaming component and made it mandatory for participants to watch a set of videos containing articulatory instructions for each vowel and carefully designed exposure cycles, closely following here the native cardinality approach [19]. The training program included a fixed number of compulsory discrimination and production exercises within each working session. ...
... We have developed an Android application that is technologically similar to the prototypes used in previous work [16,17,18,19]. In this version, we have eliminated several game elements and turned it into a strictly guided pedagogical tool. ...
Conference Paper
Full-text available
Feedback is an important concern in Computer-Assisted Pronunciation Training (CAPT), inasmuch as it bears on a sys-tem's capability to correct users' input and promote improved L2 pronunciation performance in the target language. In this paper, we test the use of synthetic voice as a corrective feedback resource. A group of students used a CAPT tool for carrying out a battery of minimal-pair discrimination-production tasks; to those who failed in production routines, the system offered the possibility of undergoing extra training by using synthetic voice as a model in a round of exposure exercises. Participants who made use of this resource significantly outperformed those who directly repeated the previously failed exercise. Results suggest that the Text-To-Speech systems offered by current operating systems (Android in our case) must be considered a relevant feedback resource in pronunciation training, especially when combined with efficient teaching methods.
... Additional work is available regarding ASR systems for the Portuguese language. The app TipTopTalk! [27] by Tejedor-García et al. uses Google's ASR systems to implement a pronunciation training application for various languages including both European and Brazilian variants of Portuguese. ...
Full-text available
Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model creation, and evaluation. A transfer learning approach is proposed considering an English language-optimized model as starting point; a target composed of European Portuguese; and the contribution to the transfer process by a source from a different domain consisting of a multiple-variant Portuguese language dataset, essentially composed of Brazilian Portuguese. A domain adaptation was investigated between European Portuguese and mixed (mostly Brazilian) Portuguese. The proposed optimization evaluation used the NVIDIA NeMo framework implementing the QuartzNet15×5 architecture based on 1D time-channel separable convolutions. Following this transfer learning data-centric approach, the model was optimized, achieving a state-of-the-art word error rate (WER) of 0.0503.
... In particular, game-based results of the Alpha experiment have been partially published in [15]. Performance-related results of the TipTopTalk! prototype have been partially published in [16,20,21]; whereas results related to the gamification elements included in the CAPT tool have been published in [22,23]. Regarding the latest game-based prototype, COP, the main results about performance, motivation, and pronunciation improvement have been published in [17] (Journal Citation Reports, JCR Q1). ...
Conference Paper
Recent advances on speech technologies (automatic speech recognition, ASR, and text-to-speech, TTS, synthesis) have led to their integration in computer-assisted pronunciation training (CAPT) tools. However, pronunciation is an area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness of CAPT tools and games that include ASR/TTS. In this manuscript, we summarize the findings presented in Cristian Tejedor-García's Ph.D. Thesis (University of Valladolid, 2020). In particular, this dissertation addresses the design and validation of an innovative CAPT system for smart devices for training second language (L2) pronunciation at the segmental level with a specific set of methodological choices, such as the inclusion of ASR/TTS technologies with minimal pairs, learner's native-foreign language connection, a training cycle of exposure-perception-production, and individual/social approaches. The experimental research conducted applying these methodological choices with real users validates the efficiency of the CAPT prototypes developed for the four main experiments of this dissertation about English and Spanish as L2. We were able to accurately measure the relative pronunciation improvement of the individuals who trained with them. Expert raters on phonetics' subjective scores and CAPT's objective scores showed a strong correlation, being useful in the future to be able to assess a large amount of data and reducing human costs.
... All the minimal pairs of the tool were tested following a protocol developed for similar tools like Japañol (Tejedor-García et al. 2018b) and TipTopTalk! (Tejedor- García et al. 2016) to ensure that the speech synthesizer and recognition utilities manage the material of the tool without problems. First, the pairs were tested with the synthesizer EKI kõnesüntesaator 2 (Mihkla et al. 2012 andEesti Keele Instituut 2017), and only well-synthesized words (i.e., those with native-like pronunciation, good quality, correct stress, and quantity) were included in the tool. ...
Full-text available
Over the past few years the number of online language teaching materials for non-native speakers of Estonian has increased. However, they focus mainly on vocabulary and pay little attention to pronunciation. In this study we introduce a computerassisted pronunciation training tool, Estoñol, developed to help native speakers of Spanish to train their perception and production of Estonian vowels. The tool’s training program involves seven vowel contrasts, /i-y/, /u-y/, /ɑ-o/, /ɑ-æ/, /e-æ/, /o-ø/, and /o-ɤ/, which have proven to be difficult for native speakers of Spanish. The training activities include theoretical videos and four training modes (exposure, discrimination, pronunciation, and mixed) in every lesson. The tool is integrated into a pre/post-test design experiment with native speakers of Spanish and Estonian to assess the language learners’ perception and production improvement. It is expected that the tool will have a positive effect on the results, as has been shown in previous studies using similar methodology. Kokkuvõte. Katrin Leppik ja Cristian Tejedor-García: Estoñol, mobiilirakendus hispaania emakeelega eesti keele õppijatele vokaalide häälduse ja taju treenimiseks. Eesti keele õppimiseks on loodud mitmeid e-kursusi ja mobiilirakendusi, kuid need keskenduvad peamiselt sõnavara ja gram matika õpetamisele ning pööravad väga vähe tähelepanu hääldusele. Eesti keele häälduse omandamise lihtsustamiseks töötati välja mobiilirakendus Estoñol, mis on mõeldud hispaania emakeelega eesti keele õppijatele. Varasemad uurimused on näidanud, et hispaania emakeelega eesti keele õppijatele valmistab raskusi vokaalide /ɑ, y, ø, æ, ɤ/ hääldamine. Mobiilirakenduse sisu on jagatud seitsmeks peatükiks, kus on võimalik harjutada vokaalipaaride /i-y/, /u-y/, /ɑ-o/, /ɑ-æ/, /e-æ/, /o-ø/, /o-ɤ/ tajumist ja hääldamist. Iga peatükk algab teoreetilise videoga, millele järgnevad taju- ja hääldusharjutused. Mobiilirakenduse mõju hindamiseks keeleõppija hääldusele ja tajule plaanitakse läbi viia eksperiment. Märksõnad: CAPT, eesti keel, hispaania keel, L2, hääldus, taju, vokaalid, Estoñol
... No que diz respeito a avaliação automática de fluência, percebe-se que existe uma tendência em aplicar gamificação na avaliação da fluência em leitura com foco no aprendizado de uma segunda língua. Nesse sentido, [Tejedor-García et al. 2016] propõem uma ferramenta chamada TipTop Talk!, que auxilia no desenvolvimento da pronúncia de uma segunda língua. A gamificaçãoé aplicada através de pontos, troféus e outras conquistas com o intuito de motivar o estudo continuado do idioma. ...
Conference Paper
Full-text available
Being able to read fluently directly affects the individual's interaction with society. However, few tools help in the process of developing fluency and in the diagnosis of problems in a playful way. Although applied to foreign language learning, automatic speech recognition (ASR) techniques are not widely used in the literature to support mother-tongue development. In this sense, this work proposes a gamified computational approach to diagnose failures in literacy. The proof of concept shows that the tool developed is capable of automatically producing reports that are consistent with reality and useful for teachers and managers in decision-making.
Full-text available
Un par mínimo es un conjunto de dos palabras que difieren en sólo uno de los fonemas que constituyen su producción oral, cambiando por completo su significado. Existen programas informáticos que emplean pares mínimos para el entrenamiento de la pronunciación de lengua extranjera, principalmente para el inglés. En este artículo se presenta una herramienta que utiliza voz sintética y un sistema de reconocimiento automática del habla para, en combinación con un ciclo de exposición, discriminación y producción de pares mínimos, entrenar la pronunciación de idiomas extranjeros. Para adaptar estas herramientas al español es necesario, primero disponer de una lista de pares mínimos, y segundo, elegir los pares mínimos que pueden ser interesantes para determinados tipos de hablantes de español como lengua extranjera, en función siempre de su idioma nativo. En este trabajo se presenta una estrategia de selección de pares mínimos basada en métodos automáticos. Un algoritmo encuentra en un diccionario y en una serie de textos el conjunto total de pares mínimos del español. Un nuevo algoritmo, filtra y elige los pares mínimos más adecuados en función del par de fonemas concreto con el que se quiera trabajar. Finalmente, el profesor selecciona las actividades más apropiadas de entre aquellas propuestas por el programa informático. El resultado final es un programa informático de fácil manejo que permite configurar actividades de pronunciación del español para estudiantes ELE de diversas lenguas maternas.
Conference Paper
Full-text available
We present a L2 pronunciation training serious game based on the minimal-pairs technique, incorporating sequences of exposure, discrimination and production, and using text-to-speech and speech recognition systems. We have measured the quality of users' production during a period of time in order to assess improvement after using the application. Substantial improvement is found among users with poorer initial performance levels. The program's gamification resources manage to engage a high percentage of users. A need is felt to include feedback for users in future versions with the purpose of increasing their performance and avoiding the performance drop detected after protracted use of the tool.
Conference Paper
Full-text available
We present a foreign language (L2) pronunciation training serious game, TipTopTalk!, based on the minimal-pairs technique. We carried out a three-week test experiment where participants had to overcome several challenges including exposure, discrimination and production , while using Text-To-Speech (TTS) and Automatic Speech Recognition (ASR) systems in a mobile application. The quality of users' production is measured in order to assess their improvement. The application implements gamification resources with the aim of promoting continued practice. Preliminary results show that users with poorer initial performance levels make relatively more progress than the rest. However, it is desirable to include specific and individualized feedback in future versions so as to avoid the performance drop detected after the protracted use of the tool.
Conference Paper
Full-text available
Computer Assisted Pronunciation Training (CAPT) apps are becoming widespread to aid learning new languages. However, they are still highly criticized for the lack of the unreplaceable need of direct feedback from a human expert. The combination of the right learning methodology with a gamification design strategy can, nevertheless, increase engagement and provide adequate feedback while keeping users active and comfortable. In this paper, we introduce the second generation of a serious game[1] designed to aid pronunciation training for non-native students of English, Spanish or Chinese. The design of the new version of the game supports a learning methodology which is based in the combination of three different learning strategies: exposure, discriminations and pronunciation[2]. In exposure mode, players are helped to become familiar with the sounds of sequences of minimal pairs or trios, selected by a native linguist and presented at random. When in discrimination mode, users test their ability to discriminate between the phonetics of minimal pairs. They listen to the sound of one of the words in the pair and have to choose the right word on screen. In pronunciation mode, finally, subjects are asked to separately read aloud (and record) both words of each round of minimal pairs lists. Native pronunciation of a word can be played as many times as a user needs. When the test word is correctly uttered by the user, the corresponding icon changes its base colour to green, and gets disabled as a positive feedback message appears. Otherwise, a message with the recognized words appears on the graphical interface and a non-positive feedback message is presented. The word changes its base colour to red and gets disabled after five failures. Speech is recorded and played using commercial off-the-shelf ASR and TTS. Our game adapts to the player as a function of right and wrong answers. Users collect points to reach a " phonetic level " and obtain different achievements, in order to encourage their engagement. There are different language dependent leaderboards based on points too, to increase the desire to play. Sharing results in social networks is another option that is under way. From a pedagogical point of view, the use of Minimal Pairs[3] favours users awareness on the potential risks of producing wrong meanings when the correct phonemes are not properly realized. The discrimination of the words that make up a minimal pair is a challenging task for the ASR, since the phonetic distance between each couple of words can be really small, although clearly perceptible for a native speaker. To be efficient, minimal pairs lists are to be selected by expert linguists for each language. Real use data acquisition and processing is still ongoing , but preliminary results are promising and show that this learning and gaming strategy provides measurable improvement of learners' pronunciation. The app offers an enjoying opportunity for anywhere anytime self-learning, and a tool for teachers to design challenging games.
Conference Paper
Full-text available
This paper introduces the architecture and interface of a serious game intended for pronunciation training and assessment for Spanish students of English as second language. Users will confront a challenge consisting in the pronunciation of a minimal-pair word battery. Android ASR and TTS tools will prove useful in discerning three different pronunciation proficiency levels, ranging from basic to native. Results also provide evidence of the weaknesses and limitations of present-day technologies. These must be taken into account when defining game dynamics for pedagogical purposes.
Full-text available
Second-language learners often experience major difficulties in producing non-native speech sounds. This paper introduces a training method that uses a real-time analysis of the acoustic properties of vowels produced by non-native speakers to provide them with immediate, trial-by-trial visual feedback about their articulation alongside that of the same vowels produced by native speakers. The Mahalanobis acoustic distance between non-native productions and target native acoustic spaces was used to assess L2 production accuracy. The experiment shows that 1 h of training per vowel improves the production of four non-native Danish vowels: the learners' productions were closer to the corresponding Danish target vowels after training. The production performance of a control group remained unchanged. Comparisons of pre- and post-training vowel discrimination performance in the experimental group showed improvements in perception. Correlational analyses of training-related changes in production and perception revealed no relationship. These results suggest, first, that this training method is effective in improving non-native vowel production. Second, training purely on production improves perception. Finally, it appears that improvements in production and perception do not systematically progress at equal rates within individuals.
Full-text available
It is often readily accepted that perception precedes production in second language acquisition. According to Flege’s (1995) Speech Learning Model and Broselow and Park’s (1995) Split Parameter Setting Hypothesis, accurate second language (L2) perception necessarily precedes accurate L2 production. This paper examines whether, contrary to that assumption, production can inform perception, whether training in the production of problematic L2 sounds can enhance perception of those sounds. Participants were XXXX Arabic speaking learners of English and took part in a between-groups experiment. They were assigned to either an articulatory training or focused exposure condition for learning three problematic English contrasts: /æ, ʌ/, /ɜ, ɔ/ and /g, ʤ/. Performance on pre-, post- and post-post-condition perceptual discrimination tests was used to assess participants’ improvement in ability to perceptually discriminate the sounds after training in production or after focused aural exposure. Results point to the efficacy of the articulatory training, and thereby provide strong evidence that production can inform perception and that L2 acquisition can be facilitated through targeted training in articulation.
Games are part of day to day life, entertaining users, but at the same time modelling behaviors. By applying game mechanics and dynamics to tasks and e-learning processes we can increase user engagement with an e-learning application and its specific tasks. While having multiple uses in commercial practices, gamification implies well established techniques similar to those found in games. We will take a closer look at the ones that are appropriate to the learning process and moreover to e-learning and analyze relevant examples.
An exploration by TEEM of the contribution which games can make to the education process.