Conference Paper

MIAPARLE: Online training for discrimination and production of stress contrasts

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The web application MIAPARLE, developed by [1], aims to address this gap and hence provides a range of tools to train the perception and production of lexical stress. Lexical stress describes the phenomenon that not all syllables within an uttered word are perceived as equally prominent [2]. ...
... This means that the way prosody is produced in a learner's mother tongue (L1) leads to inaccurate or erroneous production in the target language (L2). Lexical stress, poses difficulty for speakers whose L1 is a fixed-stress language like French when learning a free-stress language such as English [1]. In fixed-stress languages, stress placement is determined by a fixed rule for all words, whereas in free-stress language, stress patterns differ from word to word and might be assigned according to more complex rules. ...
... With respect to traditional classroom instruction, CAPT systems go beyond merely addressing the individual segmental aspects of a language. They extend their reach into the realm of suprasegmental elements, including word accent, sentence stress, rhythm, and intonation (see, a.o., Donaldson 2009;Sztahó et al. 2018;Goldman and Schwab 2018). ...
Article
Full-text available
Little attention is paid to prosody in second language (L2) instruction, but computer-assisted pronunciation training (CAPT) offers learners solutions to improve the perception and production of L2 suprasegmentals. In this study, we extend with acoustic analysis a previous research showing the effectiveness of self-imitation training on prosodic improvements of Japanese learners of Italian. In light of the increased degree of correct match between intended and perceived pragmatic functions (e.g., speech acts), in this study, we aimed at quantifying the degree of prosodic convergence towards L1 Italian speakers used as a model for self-imitation training. To measure convergence, we calculated the difference in duration, F0 mean, and F0 max syllable-wise between L1 utterances and the corresponding L2 utterances produced before and after training. The results showed that after self-imitation training, L2 learners converged to the L1 speakers. The extent of the effect, however, varied based on the speech act, the acoustic measure, and the distance between L1 and L2 speakers before the training. The findings from perceptual and acoustic investigations, taken together, show the potential of self-imitation prosodic training as a valuable tool to help L2 learners communicate more effectively.
... Oral exercises require additional technologies, speech recognition for productive exercises and speech generation for receptive ones, which add to the likelihood of the software making a mistake when generating the exercise or assessing the user input. There are, however, existing tools for supporting the oral part of language learning, e.g. in the area of computer-assisted pronunciation training (CAPT) (Fouz-González, 2015;Schwab and Goldman, 2018). ...
Article
Full-text available
This work describes a blueprint for an application that generates language learning exercises from parallel corpora. Word alignment and parallel structures allow for the automatic assessment of sentence pairs in the source and target languages, while users of the application continuously improve the quality of the data with their interactions, thus crowdsourcing parallel language learning material. Through triangulation, their assessment can be transferred to language pairs other than the original ones if multiparallel corpora are used as a source. Several challenges need to be addressed for such an application to work, and we will discuss three of them here. First, the question of how adequate learning material can be identified in corpora has received some attention in the last decade, and we will detail what the structure of parallel corpora implies for that selection. Secondly, we will consider which type of exercises can be generated automatically from parallel corpora such that they foster learning and keep learners motivated. And thirdly, we will highlight the potential of employing users, that is both teachers and learners, as crowdsourcers to help improve the material.
... A series of early studies reported that learners following such a method improved their pronunciation (deBot, 1983;Weltens & deBot, 1984a, 1984b. Further beneficial effects were found in terms of accentedness (Hardison, 2004), global oral proficiency (Gorjian et al., 2013), intonation (Hincks & Edlund, 2009;Ramirez Verdugo, 2006), and the accuracy of stress patterns (Schwab & Goldman, 2018;Tanner & Landon, 2009). ...
Thesis
Full-text available
This thesis proposes an overview of the theoretical background on embodied cognition. gesture studies, and L2 phonological acquisition to motivate the use of embodied prosodic training with hand gestures and kinesthetic movements as an efficient method to improve L2 learners' perception and pronunciation. It is composed of three independent empirical studies looking at three different techniques in different learning contexts.
... Their scale can vary from complex ranges of numbers to simple binary outputs reporting user's success or failure in the activity. In particular for pronunciation training, this important resource could be used to assess the goodness of the user interaction and to measure the user proficiency [150], [154], [155]. For instance, in [151], an overall score (0 to 100) from a user's karaoke performance is shown to the learner. ...
Thesis
Full-text available
The quality of speech technology (automatic speech recognition, ASR, and text–to–speech, TTS) has considerably improved and, consequently, an increasing number of computer-assisted pronunciation (CAPT) tools has included it. However, pronunciation is one area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness of tools and games that include speech technology in the field of pronunciation training and teaching. This PhD thesis addresses the design and validation of an innovative CAPT system for smart devices for training second language (L2) pronunciation. Particularly, it aims to improve learner's L2 pronunciation at the segmental level with a specific set of methodological choices, such as learner's first and second language connection (L1–L2), minimal pairs, a training cycle of exposure–perception–production, individualistic and social approaches, and the inclusion of ASR and TTS technology. The experimental research conducted applying these methodological choices with real users validates the efficiency of the CAPT prototypes developed for the four main experiments of this dissertation. Data is automatically gathered by the CAPT systems to give an immediate specific feedback to users and to analyze all results. The protocols, metrics, algorithms, and methods necessary to statistically analyze and discuss the results are also detailed. The two main L2 tested during the experimental procedure are American English and Spanish. The different CAPT prototypes designed and validated in this thesis, and the methodological choices that they implement, allow to accurately measuring the relative pronunciation improvement of the individuals who trained with them. Both rater's subjective scores and CAPT's objective scores show a strong correlation, being useful in the future to be able to assess a large amount of data and reducing human costs. Results also show an intensive practice supported by a significant number of activities carried out. In the case of the controlled experiments, students who worked with the CAPT tool achieved better pronunciation improvement values than their peers in the traditional in-classroom instruction group. In the case of the challenge-based CAPT learning game proposed, the most active players in the competition kept on playing until the end and achieved significant pronunciation improvement results.
... Their scale can vary from complex ranges of numbers to simple binary outputs reporting user's success or failure in the activity. In particular for pronunciation training, this important resource could be used to assess the goodness of the user interaction and to measure the user proficiency [150], [154], [155]. For instance, in [151], an overall score (0 to 100) from a user's karaoke performance is shown to the learner. ...
... They serve as a resource to evaluate the goodness of user interventions and permit user proficiency to be measured. Most of the time, however, the result is a simple binary output, reporting whether the user correctly performed or failed the activity [69], [73], [74]. In Murad et al. [70], an overall score (0 to 100) from a user's karaoke performance is shown to the learner. ...
Article
Full-text available
Learning games have a remarkable potential for education. They provide an emergent form of social participation that deserves the assessment of their usefulness and efficiency in learning processes. This study describes a novel learning game for foreign pronunciation training in which players can challenge each other. Native Spanish speakers performed several pronunciation activities during a onemonth competition using a mobile application, designed under a minimal pairs approach, to improve their pronunciation of English as a foreign language. This game took place in a competitive scenario in which students had to challenge other participants in order to get high scores and climb up a leaderboard. Results show intense practice supported by a significant number of activities and playing regularity, so the most active and motivated players in the competition achieved significant pronunciation improvement results. The integration of automatic speech recognition (ASR) and text-to-speech (TTS) technology allowed users to improve their pronunciation while being immersed in a highly motivational game.
Conference Paper
Full-text available
It has been claimed that a correlation does not exist between how accurately experienced late leamers produce and perceive pho-netic segments in a second language (L2). According to one theory , learners of an L2 are no longer able to align segmental production and perception after the close of a critical period. This contribution reviews studies that have examined L2 production and perception. All of the studies yielded significant, albeit modest , correlations. Possible explanations for why stronger correlations have not been observed are presented.
Article
Full-text available
We tested the usability of prosody visualization techniques for second language (L2) learners. Eighteen Danish learners realized target sentences in German based on different visualization techniques. The sentence realizations were annotated by means of the phonological Kiel Intonation Model and then analyzed in terms of (a) prosodic-pattern consistency and (b) correctness of the prosodic patterns. In addition, the participants rated the usability of the visualization techniques. The results from the phonological analysis converged with the usability ratings in showing that iconic techniques, in particular the stylized “hat pattern” visualization, performed better than symbolic techniques, and that marking prosodic information beyond intonation can be more confusing than instructive. In discussing our findings, we also provide a description of the new Danish-German learner corpus we created: DANGER. It is freely available for interested researchers upon request.
Conference Paper
Full-text available
Lexical stress plays an important role in the prosody of Ger-man, and presents a considerable challenge to native speakers of languages such as French who are learning German as a foreign language. These learners stand to benefit greatly from Computer-Assisted Pronunciation Training (CAPT) systems which can offer individualized corrective feedback on such errors, and reliable automatic detection of these errors is a prerequisite for developing such systems. With this motivation, this paper presents an exploration of the use of machine learning methods to classify non-native German lexical stress errors. In classification experiments using a manually-annotated corpus of German word utterances by native French speakers, the highest observed agreement between the classifier's output and the gold-standard labels exceeded the inter-annotator agreement between humans asked to classify lexical stress errors in the same data. These results establish the viability of classification-based diagnosis of lexical stress errors for German CAPT.
Article
Full-text available
EasyAlign is a user-friendly automatic phonetic alignment tool for continuous speech. It is developed as a plug-in of Praat, and it is freely available. Its main advantage is that one can easily align speech from an orthographic transcription. It requires a few minor manual steps and the result is a multi-level annotation within a TextGrid composed of phonetic, syllabic, lexical and utterance tiers. Evaluation of EasyAlign was performed according to three approaches: a boundary-based, a duration-based and segment-based approach. Results are very promising, showing, on the one hand, little difference between EasyAlign and human alignment, and a good generalization of the training, on the other one. EasyAlign is fully available for French and Spanish, while other languages such as English, Taiwan Min are under development thanks to a growing interest of community users.
Conference Paper
Full-text available
Work in the last decade shows, that Computer-Assisted Pronunciation Teaching (CAPT) systems are useful, flexible tools for giving pronunciation instructions and evaluating at subject's speech. This paper describes a newly developed CAPT system that intends to address appropriate teaching of such supra-segmental parameters as intonation, stress and speech rhythm. Two modules are implemented: (1) intonation and stress teaching, and (2) rhythm teaching using dynamic time warping. The automatic feedback of the system is evaluated by using speech samples from hard of hearing children. The automatic assessment methods give automatic feedback that is consistent with the subjective decisions of teachers. Visual feedback was also proposed which is based on the dynamic time warping algorithm and gives simple and understandable visualization of the intonation and rhythm of the subject's utterance.
Article
Full-text available
This paper evaluates the relative contribution of two prosodic cues, lengthening and f0 contour, in the processes of speech segmentation and storage of new words. More precisely, we investigate the role of prosodic information in the acquisition by French learners of a mini-language constructed for the experiment. The results show that presence of prosodic information facilitates the speech segmentation and therefore, the acquisition of the new language. Indeed, lengthening or f0 rise on the word final syllable is used by listeners to infer the presence of a boundary. However, the presence of the two cues manipulated does not improve performance; and when only one cue is present, f0 induces slightly more accurate segmentation than lengthening. Finally, the storage of the "stressable" property of the word-final syllables in French is discussed.
Article
Full-text available
The ability to discern the use of a nonstandard dialect is often enough information to also determine the speaker’s ethnicity, and speakers may consequently suffer discrimination based on their speech. This article, detailing four experiments, shows that housing discrimination based solely on telephone conversations occurs, dialect identification is possible using the word hello, and phonetic correlates of dialect can be discovered. In one experiment, a series of telephone surveys was conducted; housing was requested from the same landlord during a short time period using standard and nonstandard dialects. The results demonstrate that landlords discriminate against prospective tenants on the basis of the sound of their voice during telephone conversations. Another experiment was conducted with untrained participants to confirm this ability; listeners identified the dialects significantly better than chance. Phonetic analysis reveals that phonetic variables potentially distinguish the dialects.
Conference Paper
Full-text available
So far, applied research aiming at computer-assisted pro- nunciation training has normally concentrated on segmental as- pects. Here, we present a database with realizations of non- native English speakers with German, French, Spanish, and Ital- ian as native language. We concentrate on the acoustic-prosodic modelling of word accent position and use a large prosodic fea- ture vector to automatically recognize erroneous word accent positions produced by non-native English speakers.
Conference Paper
Full-text available
This paper describes a system for semi-automatic transcription of prosody based on a stylization of the fundamental frequency data (contour) for vocalic (or syllabic) nuclei. The stylization is a simulation of tonal perception of human listeners. The system requires a time-aligned phonetic annotation. The transcription has been applied to several speech corpora.
Article
Full-text available
In recent years the application of computer software to the learning process has been acknowledged an indisputably effective tool supporting traditional teaching methods. A particular focus has been put on the application of computational techniques based on speech and language processing to second language learning. At present, a number of commercial self-study programs using speech synthesis and recognition are available. Most of them, however, focus on segmental features only. The paper presents technical and linguistic specifications for the Euronounce project [1] which aims at creating an intelligent tutoring system with multimodal feedback functions for acquiring not only foreign languages' pronunciation but also prosody. The project focuses on German as a target language for native speakers of Polish, Slovak, Czech and Russian and vice versa. The paper outlines the Euronounce feedback system and presents the Pitch Line program which can be implemented in the prosody training module of the Euronounce tutoring system.
Article
Full-text available
We recorded non-native English productions of 55 speakers; a subset of these productions was assessed by 60 native English speakers as for their quality w. r. t. intelligibility, rhythm, etc. Applying multiple linear regression on a large prosodic feature vector – modelling approaches known from the literature as well as generic prosody – we can automatically predict the listener's assessments with correlations of up to .85. We discuss most important features and limitations of this approach.
Article
Full-text available
Using the verbal-guise technique, 190 Anglo and Hispanic adolescents listened to and evaluated a series of Anglo- and Hispanic-accented speakers reading an ethnically neutral radio announcement across a broad range of seven judgmental dimensions. Anglo-accented speakers were evaluated more favorably across all dimensions, although the effect was attenuated for Hispanic raters. The reported linguistic landscape of the raters was also investigated to determine its role in predicting language attitudes. While this had no effect on Anglo raters, the linguistic landscape significantly affected Hispanic ratings; the more Spanish the perceived local climate (e.g., in terms of road signs, media available), the less favorably Anglo-accented speakers were rated, whereas the more English their perceived landscape, the more favorably Anglo speakers were rated.
Conference Paper
Full-text available
Seventeen French-English bilinguals read aloud a set of English sentences and performed an ABX discrimination task that assessed their perception of the English / I /-/i/ contrast. Global nativelikeness in production correlated with pronunciation accuracy for the vowels / I / and /i/, and both production measures correlated with self-estimated pronunciation skills. However, performance on the perception task did not correlate with either global nativelikeness or /I,i/ pronunciation accuracy. These results are discussed in light of theories about the relation between perception and production in L2 phonological processing.
Article
Spanish but not French uses accent to distinguish between words (e.g., tópo vs topó). Two populations of subjects were tested on the same materials to determine whether this difference has an impact on the perceptual capacities of listeners. In Experiment 1, using an ABX paradigm, we found that French subjects had significantly more difficulties than Spanish subjects in performing an ABX classification task based on accent. In Experiment 2, we found that Spanish subjects were unable to ignore irrelevant differences in accent in a phoneme-based ABX task, whereas French subjects had no difficulty at all. In Experiment 3, we replicated the basic French finding and found that Spanish subjects benefited from redundant accent information even when phonemic information alone was sufficient to perform the task. In our final experiment, we showed that French subjects can be made to respond to the acoustic correlates of accent; therefore their difficulty in Experiment 1 seems to be located at the level of short-term memory. The implications of these findings for language-specific processing and acquisition are discussed.
Article
Segmentation of continuous speech into its component words is a nontrivial task for listeners. Previous work has suggested that listeners develop heuristic segmentation procedures based on experience with the structure of their language; for English, the heuristic is that strong syllables (containing full vowels) are most likely to be the initial syllables of lexical words, whereas weak syllables (containing central, or reduced, vowels) are nonword-initial, or, if word-initial, are grammatical words. This hypothesis is here tested against natural and laboratory-induced missegmentations of continuous speech. Precisely the expected pattern is found: listeners erroneously insert boundaries before strong syllables but delete them before weak syllables; boundaries inserted before strong syllables produce lexical words, while boundaries inserted before weak syllables produce grammatical words.
Article
This study investigated the effect of foreign accent and speaking rate on native speaker comprehension. The speakers for the study were three native speakers of Chinese, with TSE (Test of Spoken English) comprehensibility scores of 180, 200, and 260, and one native speaker of American English. The speakers each read passages at three different speaking rates. The tape-recorded passages were then presented to native speakers of American English who responded to them by taking a listening comprehension test and rating the speech samples. The results showed that the comprehension scores were significantly higher for the native passages than for the nonnative passages and significantly higher at the regular rate than at the fast rate for all speakers. It was also found that the increase in speaking rate from the regular to the fast rate resulted in a greater decrease in comprehension for the most heavily accented speaker than for the other speakers, indicating that speaking rate is more critical for the comprehension of heavily accented speech. In addition, the results suggested that prosodie deviance may affect comprehension more adversely than does segmentai deviance.
Article
Non-native speech is harder to understand than native speech. We demonstrate that this “processing difficulty” causes non-native speakers to sound less credible. People judged trivia statements such as “Ants don't sleep” as less true when spoken by a non-native than a native speaker. When people were made aware of the source of their difficulty they were able to correct when the accent was mild but not when it was heavy. This effect was not due to stereotypes of prejudice against foreigners because it occurred even though speakers were merely reciting statements provided by a native speaker. Such reduction of credibility may have an insidious impact on millions of people, who routinely communicate in a language which is not their native tongue.
La estructuración acentual: estudio comparativo en la interlengua español-francés. Caso de la lectura
  • Mª A Barquero
  • I Racine
  • L Baqué
  • S Schwab
Barquero, Mª A., Racine, I., Baqué, L. & Schwab, S. (2014). La estructuración acentual: estudio comparativo en la interlengua español-francés. Caso de la lectura. In: Y. Congosto Martín, M. L. Montero Curiel & A. Salvador Plans (Eds.), Fonética experimental, educación superior e investigación. (Vol. II, pp. 9-28). Madrid: Arco/Libros.
La perception et la production de l'accent lexical de l'espagnol par des francophones: aspects phonétiques et psycholinguistes
  • Muñoz Garcia
Muñoz Garcia, M. (2010). La perception et la production de l'accent lexical de l'espagnol par des francophones: aspects phonétiques et psycholinguistes. Thèse de doctorat, U. Toulouse 2/Universitat Autònoma de Barcelona.
Assessment of Non-native Prosody for Spanish as L2 using quantitative scores and perceptual evaluation
  • V Cardeñoso-Payo
  • C González-Ferreras
  • D Escudero-Mancebo
Cardeñoso-Payo, V., González-Ferreras, C., Escudero-Mancebo, D. (2014). Assessment of Non-native Prosody for Spanish as L2 using quantitative scores and perceptual evaluation. LREC 2014.
Explicit and implicit training methods for the learning of stress contrasts in Spanish
  • S Schwab
  • V Dellwo
Schwab, S. & Dellwo, V. (2018). Explicit and implicit training methods for the learning of stress contrasts in Spanish. In Lahoz-Bengoechea, J. M., Pérez Ramón, R., & Villa Villa, J. (Eds.), Subsidia. Tools and resources for speech sciences. Malaga: Universidad de Malaga.