Conference Paper

Designing Pronunciation Learning Tools: The Case for Interactivity against Over-Engineering

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Paired role-play is a common collaborative activity in language learning classrooms, adding meaning and cultural context to the learning process. This is complemented by teachers' immediate and explicit feedback. Interactive tools that provide explicit feedback during collaborative learning are scarce, however. More commonly, supporting dialogue practice takes the form of computer-aided single-student read-and-record activities. This limitation is partly due to the complexity of processing language learners' speech in unconstrained tasks. In this paper, we assess the value of pronunciation error detection algorithms within a realistic, software-aided, paired role-playing task with beginning learners of French. We found that students' pronunciations improve regardless of the type of error detector employed -- even for those using simple heuristics. We suggest that speech technologies for language learning have been too focused on engineering goals. Instead, new interactive designs supporting collaboration may be used to overcome engineering limitations and properly support students' engagement.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... They also provided a fundamental evaluation of the effectiveness of computer-assisted language learning systems. Robertson et al. [77], advocated that new interactive designs supporting collaboration can be used to overcome engineering limitations. In our work, we also propose interactive courses in our system. ...
... different pronunciation proficiencies) and how to define the best set of feedback. Our idea of involving exaggerated feedback and our system of determining the best set of exaggeration parameters can be beneficial for the general area of computer-aided language learning [25,29,66,77] and exaggerated feedback [28] systems in HCI. ...
... different pronunciation proficiencies), and 2) how to define the best set of feedback. Our idea of involving exaggerated feedback and our system of determining the best set of exaggeration parameters would be an important finding to the general area of computer-aided language learning [25,29,66,77] and exaggerated feedback [28] systems in HCI. ...
Full-text available
Preprint
Second language (L2) English learners often find it difficult to improve their pronunciations due to the lack of expressive and personalized corrective feedback. In this paper, we present Pronunciation Teacher (PTeacher), a Computer-Aided Pronunciation Training (CAPT) system that provides personalized exaggerated audio-visual corrective feedback for mispronunciations. Though the effectiveness of exaggerated feedback has been demonstrated, it is still unclear how to define the appropriate degrees of exaggeration when interacting with individual learners.To fill in this gap, we interview {100 L2 English learners and 22 professional native teachers} to understand their needs and experiences. Three critical metrics are proposed for both learners and teachers to identify the best exaggeration levels in both audio and visual modalities. Additionally, we incorporate the personalized dynamic feedback mechanism given the English proficiency of learners. Based on the obtained insights, a comprehensive interactive pronunciation training course is designed to help L2 learners rectify mispronunciations in a more perceptible, understandable, and discriminative manner. Extensive user studies demonstrate that our system significantly promotes the learners' learning efficiency.
... In studies that explore the user's demands, they may report that some systems can be used to simply entertain themselves [63]. When researchers develop a system or tool, they set a hypothesis that their method can lead to enjoyable experiences [92]. Researchers can consider a goal by hypothesizing and assessing enjoyment for their work. ...
Preprint
An experience of fun can be an important factor for validating the value of games. Research on non-game HCI has been attempted to measure the enjoyment of work. However, a majority of the studies do not discuss the importance and value of the result. It is not clear as to how the term fun is understood in a non-game context. To analyze this shortcoming, we reviewed extant studies, and explored as to how researchers determine if the value of an activity is fun. Consequently, we discussed and categorized the usage of the terms and analyzed the methodologies that are used in extant studies that evaluate the effects of fun and related terms. To gain a better understanding of fun in HCI, we provided several directions that can be discussed for strengthening enjoyable HCI research beyond applications involving games.
... By analyzing human ratings in the pre-test, mid-test, and post-test, Fleiss' Kappa, κ was used to measure the degree of internal variability. For the pre-test, κ = 0.44 indicates that the level of agreement between Our study uses a Multilevel Model (MLM) based on Robertson et al. (2018) to assess the students' pronunciation improvement. The MLM hierarchy of "levels" can be used to define the vowels (level 1) spoken by individual participants (level 2) belonging to different L1 languages (level 3). ...
Article
Learning a foreign language pronunciation is the most challenging task for non-native speakers. Improving pronunciation based on feedback on pronunciation error scores is also not easy for learners. Our goal is to develop an alternative approach to pronunciation training that can point out articulation errors at the phoneme level to users through a smartphone-based system. Using self-correction as a means of ensuring learners' engagement, we focus on self-correction of pronunciation by adjusting articulation. In order to identify articulation concerning pronunciation, the system evaluates both audible and inaudible acoustic signals to examine fine-grained frequency-shifting direction of mouth movements and tongue position simultaneously. The result shows that the system can reach an average accuracy of 99.09% and is robust in different scenarios and genders. Additionally, the evaluation of the user study reveals that the proposed system provides positive user experiences and allows learners to improve their pronunciation more efficiently.
... Based on the numerous previous studies on designing interfaces for older adults, e.g. [4,11] followed by the most recent detailed exploration in this field of study [2,3,15] alongside with depicted tendencies of mainstream voice-based interfaces [14], based on numerous research and studies on direct intergenerational engaging of older adults into technology and participatory design [9,7,8,17] we decided to explore the area of older adults interactions with voice assistants. Especially that a recent study by Pradhan et al. [14] demonstrated that voice assistant technology has a great potential for becoming a useful tool for multiple groups of users with varying needs, as voice assistants are already used by people with disabilities. ...
Chapter
Voice User Interfaces (VUIs) owing to recent developments in Artificial Intelligence (AI) and Natural Language Processing (NLP), are becoming increasingly intuitive and functional. They are especially promising for older adults, also with special needs, as VUIs remove some barriers related to access to Information and Communications Technology (ICT) solutions. In this pilot study we examine interdisciplinary opportunities in the area of VUIs as assistive technologies, based on an exploratory study with older adults, and a follow-up in-depth pilot study with two participants regarding the needs of people who are gradually losing their sight at a later age.
... By studying how the animation compares to the game, we simplifed the level of interactivity rather than adding a layer of more sophisticated interaction. In contrast, in the CHI community, most new work adds layers of interactivity (e.g., speech recognition [41] and augmented reality [16]). More research needs to be done with new technology to advance our understanding of the efect of increasing but also decreasing interactivity on learning outcomes, especially in the interplay with cognitive load and enjoyment. ...
... There have been many works on providing meaningful feedback for spoken language learners [8,21,22]. On the practical side, Rosetta Stone provides both waveform and spectrograph feedback for pronunciation mistakes by comparing acoustic waves of a learner to that of a native speaker [31]. There has also been some recent work [36]. ...
Full-text available
Conference Paper
Languages are best learned in immersive environments with rich feedback. This is specially true for signed languages due to their visual and poly-componential nature. Computer Aided Language Learning (CALL) solutions successfully incorporate feedback for spoken languages, but no such solution exists for signed languages. Current Sign Language Recognition (SLR) systems are not inter- pretable and hence not applicable to provide feedback to learners. In this work, we propose a modular and explainable machine learning system that is able to provide fine-grained feedback on location, movement and hand-shape to learners of ASL. In addition, we also propose a waterfall architecture for combining the sub-modules to prevent cognitive overload for learners and to reduce computation time for feedback. The system has an overall test accuracy of 87.9 % on real-world data consisting of 25 signs with 3 repetitions each from 100 learners.
... • Interactivity-A student in a traditional lecture is merely an observer rather than an active part of the sys-1 3 tem. Students should be able to take control and guide themselves through the learning system and interact with others to increase their involvement in the class [5]. This ability depends crucially on the interactivity present in the laboratory exercise. ...
Full-text available
Article
An extensive metamorphosis is currently taking place in the education industry due to the rapid adoption of different technologies and the proliferation of new student-instructor and student–student interaction models. While traditional face-to-face interaction is still the norm, mobile, online and virtual augmentations are increasingly adopted worldwide. Moreover, with the advent of gaming technology besides the 3D visual paradigm, the “touch” and “feel” paradigm is slowly taking its place in the user interface design through gamification. While haptic (force feedback) devices were barely available a decade ago outside research laboratories, the rapid rise in gaming technology has driven the cost significantly lower enabling the spread of these devices in many households and the wide public. This article presents a novel haptic-based training tool implemented as a gaming scenario to assist students in learning of abstract concepts in Physics. The focus is on electromagnetism as one of the fundamental forces in nature and specifically the abstractions used as building blocks around the Lorentz force. Experimental results suggest that by introducing well designed visual-haptic interfaces in presenting abstract concepts, students become better engaged in the classrooms and superior learning outcomes can be achieved.
... Conversational Agents supporting skills related to communication, emotion expression and socialization, are still scarce [29], and we are still far away from having clear evidence of the therapeutic effectiveness of this technology in the NDD area. Rachel [30], for instance, is an embodied CA designed for autistic children's skills that aims to create semantically emotionful narratives. ...
Full-text available
Conference Paper
Our research aims at exploiting the advances in conversational technology to support people with Neurodevelopmental Disorder (NDD). NDD is a group of conditions that are characterized by severe deficits in the cognitive, emotional and motor areas and produce severe impairments in communication and social functioning. This paper presents the design, technology and exploratory evaluation of Emoty, a spoken Conversational Agent (CA) created specifically for individuals with NDD. The goal of Emoty is to help these persons enhancing communication abilities related to emotional recognition and expression, which are fundamental in any form of human relationship. The system exploits emotion detection capabilities based on the semantics of the speech by calling the IBM Watson Tone Analyzer API and from the harmonic features of the audio thanks to an "all-of-us" Deep Learning model. The design and evaluation of Emoty are based on the close collaboration among computer engineers and specialists in NDD (psychologists, neurological doctors, educators).
... Voice-based interfaces can also support collaborative learning. Recent work describes an approach to supporting collaborative, computer-assisted language learning by speaking to/with a mobile application, or how "speech-in-the-background" can help people complete tasks [11]. This approach could also transfer to supporting people with language and learning disabilities. ...
Conference Paper
Voice interfaces such as in-home and mobile digital assistants, mobile screen readers, and chatbots are tools that can support communication, collaboration, and information seeking, and are becoming increasingly commonplace. Because they don't require the motor skills needed for text input through a keyboard, the barriers of entry and use for older adults and people with disabilities are lowered. Yet, accessibility of speech interaction can still be a challenge. Using and designing voice interfaces is radically different from graphical interfaces, redefining how we must think about accessibility and what it means for a conversation to be accessible. This workshop invites submissions from researchers whose work advances the study of, design, and use of voice-based interfaces by older adults and people with disabilities. At the workshop, we will 1) explore recent advances in accessibility and voice interface research, 2) situate voice-based accessibility in prior work and existing theoretical frameworks, 3) discuss open challenges in the design of voice-based systems, and 4) identify opportunities for interdisciplinary collaboration to continue research in this field.
Full-text available
Conference Paper
Blind people often need to identify objects around them, from packages of food to items of clothing. Automatic object recog­ nition continues to provide limited assistance in such tasks because models tend to be trained on images taken by sighted people with different background clutter, scale, viewpoints, occlusion, and image quality than in photos taken by blind users. We explore personal object recognizers, where visually impaired people train a mobile application with a few snap­ shots of objects of interest and provide custom labels. We adopt transfer learning with a deep learning system for user-defined multi-label k-instance classification. Experiments with blind participants demonstrate the feasibility of our approach, which reaches accuracies over 90% for some participants. We analyze user data and feedback to explore effects of sample size, photo-quality variance, and object shape; and contrast models trained on photos by blind participants to those by sighted participants and generic recognizers.
Full-text available
Conference Paper
Many authentication schemes ask users to manually compare compact representations of cryptographic keys, known as fingerprints. If the fingerprints do not match, that may signal a man-in-the-middle attack. An adversary performing an attack may use a fingerprint that is similar to the target fingerprint, but not an exact match, to try to fool inattentive users. Fingerprint representations should thus be both usable and secure. We tested the usability and security of eight fingerprint representations under different configurations. In a 661-participant between-subjects experiment, participants compared fingerprints under realistic conditions and were subjected to a simulated attack. The best configuration allowed attacks to succeed 6% of the time; the worst 72%. We find the seemingly effective compare-and-select approach performs poorly for key fingerprints and that graphical fingerprint representations, while intuitive and fast, vary in performance. We identify some fingerprint representations as particularly promising.
Full-text available
Conference Paper
Learning from captioned foreign language videos is highly effective, but the availability of such videos is limited. By using speech-to-text technology to generate partially correct transcripts as a starting point, we see an opportunity for learners to build accurate foreign language captions while learning at the same time. We present a system where learners correct captions using automatic transcription and machine-generated suggested alternative words for scaffolding. In a lab study of 49 participants, we found that compared to watching the video with accurate caption, learning and quality of experience were not significantly impaired by the secondary caption correction task using interface designs either with or without scaffolding from speech-to-text generated alternative words. Nevertheless, aggregating corrections reduced word error rate from 19% to 5.5% without scaffolding from suggested-alternatives, and 1.8% with scaffolding. Feedback from participants suggest that emphasizing the learning community contribution aspect is important for motivating learners and reducing frustration.
Full-text available
Conference Paper
Existing pronunciation error detection research assumes that second language learners' speech is advanced enough that its segments are generally well articulated. However, learners just beginning their studies, especially when those studies are organized according to western, dialogue-driven pedagogies, are unlikely to abide by those assumptions. This paper presents an evaluation of pronunciation error detectors on the utterances of second language learners just beginning their studies. A corpus of nonnative speech data is collected through an experimental application teaching beginner French. Word-level binary labels are acquired through successive pairwise comparisons made by language experts with years of experience teaching. Six error detectors are trained to classify these data: a classifier inspired by phonetic distance algorithms; the Goodness of Pronunciation classifier [1]; and four GMM-based discriminative classifiers modelled after [2]. Three partitioning strategies for 4-fold cross-validation are tested: one based on corpus distribution , another leaving speakers out, and another leaving annota-tors out. The best error detector, a log-likelihood ratio of native versus nonnative GMMs, achieved detector-annotator agreement of up to κ = .41, near the expected between-annotator agreement.
Full-text available
Article
The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to gain insight into the potential of this prototype and the possible ways in which it could be further adapted or improved, with a view to developing specific language learning products. The evaluation revealed that domain experts and users (teachers and students) are generally positive about the system and intend to use it if they get the opportunity. In addition, recommendations have been made which range from specific changes and additions to the system to more general statements about the pedagogical and technological issues involved. These recommendations can be useful to improve this prototype and to develop other ASR-based systems, which can be deployed either as language courseware or as research tools to investigate design hypotheses and language acquisition processes.
Full-text available
Article
We surveyed 67 ESL programs in Canada to determine to what extent pronunciation is taught and which resources are most often used. The survey also requested demographic information about the respondents and their ESL programs, classes and students, methods of teaching, and participants' attitudes. The respondents from approximately half the programs offer stand-alone pronunciation courses, and the balance reported that they integrate pronunciation teaching in their general ESL classes. The majority of respondents said that it was important to teach pronunciation at all levels, although few teachers have special training in this area. Resources preferred by the participants are discussed with regard to their emphases on segmental and suprasegmental aspects of pronunciation.
Full-text available
Conference Paper
Many HCI and ubiquitous computing systems are characterized by two important properties: their output is uncertain—it has an associated accuracy that researchers attempt to optimize—and this uncertainty is user-facing—it directly affects the quality of the user experience. Novel classifiers are typically evaluated using measures like the F1 score—but given an F-score of (e.g.) 0.85, how do we know whether this performance is good enough? Is this level of uncertainty actually tolerable to users of the intended application—and do people weight precision and recall equally? We set out to develop a survey instrument that can systematically answer such questions. We introduce a new measure, acceptability of accuracy, and show how to predict it based on measures of classifier accuracy. Out tool allows us to systematically select an objective function to optimize during classifier evaluation, but can also offer new insights into how to design feedback for user-facing classification systems (e.g., by combining a seemingly-low-performing classifier with appropriate feedback to make a highly usable system). It also reveals potential issues with the ubiquitous F1-measure as applied to user-facing systems.
Full-text available
Article
. Evaluating the motivational impact of CALL systems: current practices and future directions Stephen Bodnar, Catia Cucchiarini, Helmer Strik, and Roeland van Hout A major aim of computer-assisted language learning (CALL) is to create computer environments that facilitate students’ second language (L2) acquisition. To achieve this aim, CALL employs technological innovations to create novel types of language practice. Evaluations of the new practice types serve the important role of distinguishing effective practice environments from less effective environments, while simultaneously informing educational practices and second language acquisition (SLA) theory. Accordingly, evaluations of CALL systems necessarily deal with multiple criteria. Most researchers would probably agree that motivation is an important criterion in CALL evaluations: a system can provide sufficient L2 input and opportunities for L2 output, yet fail to be pedagogically effective if learners are unwilling to participate. Furthermore, knowledge of the motivational impact of practice can provide valuable context linking individual language learners, practice effort and learning outcomes. From the perspective of recent theoretical developments in L2 motivation theory, this paper surveys a representative sample of CALL system evaluations that include motivational impact. Our analysis suggests not only that CALL needs to do more to align its treatment of motivation with recent L2 motivation theories, but also that it is well positioned to do so. We find that (1) few CALL studies treat motivation as it relates to practice as a dynamic variable, (2) behavioural practice logs are underexploited and (3) very few evaluations take into account learners’ individual interests and goals. Drawing on these and other findings, we suggest four new directions for developing the motivation dimension in CALL evaluations. Keywords:CALL; evaluation; motivation; review
Full-text available
Article
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.
Full-text available
Article
This review summarizes evidence for the effectiveness of technology use in foreign language (FL) learning and teaching, with a focus on empirical studies that compare the use of newer technologies with more traditional methods or materials. The review of over 350 studies (including classroom-based technologies, individual study tools, network-based social computing, and mobile and portable devices) revealed that, in spite of an abundance of publications available on the topic of technology use in FL learning and teaching, evidence of efficacy is limited. However, strong support for the claim that technology made a measurable impact in FL learning came from studies on computer-assisted pronunciation training, in particular, automatic speech recognition (ASR). These studies demonstrated that ASR can facilitate the improvement of pronunciation and can provide feedback effectively. Additional studies provided strong support for the use of chat in FL learning. These studies showed that, with chat, both the amount of learners' language production and its complexity significantly increased. The literature revealed moderate support for claims that technology enhanced learners' output and interaction, affect and motivation, feedback, and metalinguistic knowledge.
Full-text available
Conference Paper
This presentation gives a review of the large amount of research on automatic pronunciation error detection that has been conducted over the past 10-15 years. The goal is to provide a linkage between the various research approaches and work streams in order to aid development of the next generation of algorithms. A vision of an ideal pronunciation error detection system is presented and used as a reference to determine current challenges and possible next steps in research efforts. Lastly, an extensive list of references on the field is provided.
Full-text available
Article
This study assessed the effect of English-language experience on non-native speakers' production and perception of English vowels. Twenty speakers each of German, Spanish, Mandarin, and Korean, as well as a control group of 10 native English (NE) speakers, participated. The non-native subjects, who were first exposed intensively to English when they arrived in the United States (mean age=25years), were assigned to relatively experienced or inexperienced subgroups based on their length of residence in the US (M=7.3vs. 0.7years). The 90 subjects' accuracy in producing English /iIε æ/ was assessed by having native English-speaking listeners attempt to identify which vowels had been spoken, and through acoustic measurements. The same subjects also identified the vowels in syntheticbeat-bit(/i/-/I/) andbat-bet(/æ/-/ε/) continua. The experienced non-native subjects produced and perceived English vowels more accurately than did the relatively inexperienced non-native subjects. The non-native subjects' degrees of accuracy in producing and perceiving English vowels were related. Finally, both production and perception accuracy varied as a function of native language (L1) background in a way that appeared to depend on the perceived relation between English vowels and vowels in the L1 inventory.
Full-text available
Article
Computer Mediated Communication (CMC) permits users to engage in purposeful exchanges with other humans (and with on-line databases) both synchronously and asynchronously. Yet, disappointment with previous technological "revolutions" may cause language teachers to be less receptive to the pedagogical uses of this new medium. A historical review of some of pedagogical claims of Computer Assisted Language Learning (CALL), multimedia applications, and their eventual outcomes, as well as new research in Second Language Acquisition (SLA), support the proposition that CMC gives second language learners the opportunity to enhance their learning experience. A theoretical framework is suggested for the development of pedagogical tasks based on CMC environments.
Full-text available
Article
Since the late 1980s there has been a top-down movement to reform English language teaching (ELT) in the People's Republic of China (PRC). An important component of this reform has been an effort to import communicative language teaching (CLT) in the Chinese context. CLT, however, has failed to make the expected impact on ELT in the PRC. This paper examines one of the most important potential constraints on the adoption of CLT in the Chinese classroom, namely, the Chinese culture of learning. It argues that CLT and the Chinese culture of learning are in conflict in several important respects, including philosophical assumptions about the nature of teaching and learning, perceptions of the respective roles and responsibilities of teachers and students, learning strategies encouraged, and qualities valued in teachers and students. In view of such fundamental differences, the paper contends that it is counterproductive to take an 'autonomous' attitude, rather than an 'ideological' one, to pedagogical innovations developed in a different sociocultural milieu. It concludes by arguing for the necessity of taking a cautiously eclectic approach and making well-informed pedagogical choices that are grounded in an understanding of sociocultural influences.
Full-text available
Article
This article is organised in five main sections. First, the sub-area of task-based instruction is introduced and contextualised. Its origins within communicative language teaching and second language acquisition research are sketched, and the notion of a task in language learning is defined. There is also brief coverage of the different and sometimes contrasting groups who are interested in the use of tasks. The second section surveys research into tasks, covering the different perspectives (interactional, cognitive) which have been influential. Then a third section explores how performance on tasks has been measured, generally in terms of how complex the language used is, how accurate it is, and how fluent. There is also discussion of approaches to measuring interaction. A fourth section explores the pedagogic and interventionist dimension of the use of tasks. The article concludes with a survey of the various critiques of tasks that have been made in recent years.
Full-text available
Article
Some researchers suggest that recasts are effective in showing learners how their current interlanguage differs from the target (Long & Robinson, 1998). Others have argued that recasts are ambiguous and may be perceived by the learner as confirmation of meaning rather than feedback on form (Lyster, 1998a). We review research on the effectiveness of recasts in first and second language acquisition, paying particular attention to how recasts have been defined and how their impact has been assessed in observational and experimental studies. We conclude that recasts appear to be most effective in contexts where it is clear to the learner that the recast is a reaction to the accuracy of the form, not the content, of the original utterance.
Full-text available
Conference Paper
A series of novel capabilities have been designed to extend the repertoire of Ville, a virtual language teacher for Swedish, created at the Centre for Speech technology at KTH. These capabilities were tested by twenty-seven language students at KTH. This paper reports on qualitative surveys and quantitative performance from these sessions which suggest some general lessons for automated language training.
Full-text available
Conference Paper
The availability of real-time continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in human-computer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused on limited domains with constrained grammars. In this paper, we present a preliminary case study on the porting and optimization of CMU Sphinx-11, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to hand-held devices. The resulting system operates in an average 0.87 times real-time on a 206 MHz device, 8.03 times faster than the baseline system. To our knowledge, this is the first hand-held LVCSR system available under an open-source license
Full-text available
Conference Paper
Current practice in Human Computer Interaction as encouraged by educational institutes, academic review processes, and institutions with usability groups advocate usability evaluation as a critical part of every design process. This is for good reason: usability evaluation has a significant role to play when conditions warrant it. Yet evaluation can be ineffective and even harmful if naively done 'by rule' rather than 'by thought'. If done during early stage design, it can mute creative ideas that do not conform to current interface norms. If done to test radical innovations, the many interface issues that would likely arise from an immature technology can quash what could have been an inspired vision. If done to validate an academic prototype, it may incorrectly suggest a design's scientific worthiness rather than offer a meaningful critique of how it would be adopted and used in everyday practice. If done without regard to how cultures adopt technology over time, then today's reluctant reactions by users will forestall tomorrow's eager acceptance. The choice of evaluation methodology - if any - must arise from and be appropriate for the actual problem or research question under consideration.
Full-text available
Conference Paper
We studied a group of immigrants who were following regular, teacher-fronted Dutch classes, and who were assigned to three groups using either a) Dutch CAPT, an ASR-based Computer Assisted Pronunciation Training (CAPT) system that provides feedback on a number of Dutch speech sounds that are problematic for L2 learners b) a CAPT system without feedback c) no CAPT system. Participants were tested before and after the training. The results show that the ASR-based feedback was effective in correcting the errors addressed in the training.
Full-text available
Article
The psychological and statistical literature contains several proposals for calculating and plotting confidence intervals (CIs) for within-subjects (repeated measures) ANOVA designs. A key distinction is between intervals supporting inference about patterns of means (and differences between pairs of means, in particular) and those supporting inferences about individual means. In this report, it is argued that CIs for the former are best accomplished by adapting intervals proposed by Cousineau (Tutorials in Quantitative Methods for Psychology, 1, 42-45, 2005) and Morey (Tutorials in Quantitative Methods for Psychology, 4, 61-64, 2008) so that nonoverlapping CIs for individual means correspond to a confidence for their difference that does not include zero. CIs for the latter can be accomplished by fitting a multilevel model. In situations in which both types of inference are of interest, the use of a two-tiered CI is recommended. Free, open-source, cross-platform software for such interval estimates and plots (and for some common alternatives) is provided in the form of R functions for one-way within-subjects and two-way mixed ANOVA designs. These functions provide an easy-to-use solution to the difficult problem of calculating and displaying within-subjects CIs.
Full-text available
Article
Presenting confidence intervals around means is a common method of expressing uncertainty in data. Loftus and Masson (1994) describe confidence intervals for means in within-subjects designs. These confidence intervals are based on the ANOVA mean squared error. Cousineau (2005) presents an alternative to the Loftus and Masson method, but his method produces confidence intervals that are smaller than those of Loftus and Masson. I show why this is the case and offer a simple correction that makes the expected size of Cousineau confidence intervals the same as that of Loftus and Masson confidence intervals.
Full-text available
Article
Within-subject ANOVAs are a powerful tool to analyze data because the variance associated to differences between the participants is removed from the analysis. Hence, small differences, when present for most of the participants, can be significant even when the participants are very different from one another. Yet, graphs showing standard error or confidence interval bars are misleading since these bars include the between-subject variability. Loftus and Masson (1994) noticed this fact and proposed an alternate method to compute the error bars. However, i) their approach requires that the ANOVA be performed first, which is paradoxical since a graph is an aid to decide whether to perform analyses or not; ii) their method provides a single error bar for all the conditions, masking information such as the heterogeneity of variances across conditions; iii) the method proposed is difficult to implement in commonly-used graphing software. Here we propose a simpler alternative and show how it can be implemented in SPSS.
Full-text available
Article
PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges related to speech recognition technology include: allowance for non-native accent, reliable and corrective feedbacks, and visualization of errors.
Full-text available
Article
: We investigate the suitability of deploying speech technology in computer-based systems that can be used to teach foreign language skills. In reviewing the current state of speech recognition and speech processing technology and by examining a number of voice-interactive CALL applications, we suggest how to create robust interactive learning environments that exploit the strengths of speech technology while working around its limitations. In conclusion, we will draw on our review of these applications to identify directions of future research that might improve both the design and the overall performance of voice-interactive CALL systems. 1.0 Introduction During the past two decades, the exercise of spoken language skills has received increasing attention among educators. Foreign language curricula focus on productive skills with special emphasis on communicative competence. Students' ability to engage in meaningful conversational interaction in the target language is considered an ...
Conference Paper
In addition to simple form filling, there is an increasing need for crowdsourcing workers to perform freeform interactions directly on content in microtask crowdsourcing (e.g. proofreading articles or specifying object boundary in an image). Such microtasks are often organized within well-designed workflows to optimize task quality and workload distribution. However, designing and implementing the interface and workflow for such microtasks is challenging because it typically requires programming knowledge and tedious manual effort. We present ReTool, a web-based tool for requesters to design and publish interactive microtasks and workflows by demonstrating the microtasks for text and image content. We evaluated ReTool against a task-design tool from a popular crowdsourcing platform and showed the advantages of ReTool over the existing approach.
Conference Paper
Mobile wayfinding and guide apps have become indispensable tools for navigating unfamiliar urban spaces. Such applications address targeted, "just-in-time" queries, but are not optimally designed for multi-point expeditions that can quickly build route and survey-level familiarity with a neighbourhood. We first conducted an experimental simulation involving a homebuying scenario to assess the usefulness of a popular mobile wayfinding and search application (Google Maps) for exploring a neighbourhood. We then designed a prototype application called Block Party that addresses a number of limitations of Google Maps for this purpose, and evaluated it in a second replica study. The results suggested that application designs that facilitate switching among distinct but synchronized navigation views such as Block Party might support more efficient usage and the selection of task-appropriate views, leading to better overall spatial awareness.
Conference Paper
This paper presents results from a user study designed to evaluate the effectiveness of Korean text entry methods for smartwatches. Specifically, the study compares the four popular text entry methods for smartphones in the context of smartwatch use (three multi-tap 3x4 keypad methods and a QWERTY-like method). A distinctive feature of text entry in Korea is that traditionally different manufacturers have developed their own text entry methods starting with particular physical layouts on feature phones that are now available as soft keypads on smartphones. This research considers the next step in this progression by studying the viability of adopting these text entry methods on smartwatches. The results from the user study indicate that existing methods can be effective for text entry on smartwatches; analysis of the data offers suggestions for improving the effectiveness of the methods.
Although corrective feedback (CF) has received much interest in the second language acquisition literature, relatively little research has investigated the relationship between CF and learner affect in concrete practice situations. The present study investigates learners’ affective states and practice behaviour in a novel context: oral grammar practice with a computer-assisted language learning (CALL) system employing automatic speech recognition (ASR) technology to analyse learners’ speech and provide feedback. Thirty-one adult learners of Dutch practiced with this system in one of two conditions: the no-feedback condition (NOCF) and the feedback condition (CF) which provided immediate CF through ASR. Despite concerns that CF can elicit negative affective reactions and although practice with feedback forced learners to reformulate more often, CF did not appear to have a negative impact. Our analysis finds no significant differences between the NOCF and CF groups. A significant correlation between practice performance and self-efficacy was found in the CF only. These findings suggest that ASR-enabled CALL systems may be suitable environments for oral grammar practice where CF on oral productions can be provided without negative affective responses, and that without feedback, learners may develop self-efficacy beliefs which do not necessarily reflect their actual performance.
Article
Although information workers may complain about meetings, they are an essential part of their work life. Consequently, busy people spend a significant amount of time scheduling meetings. We present Calendar.help, a system that provides fast, efficient scheduling through structured workflows. Users interact with the system via email, delegating their scheduling needs to the system as if it were a human personal assistant. Common scheduling scenarios are broken down using well-defined workflows and completed as a series of microtasks that are automated when possible and executed by a human otherwise. Unusual scenarios fall back to a trained human assistant who executes them as unstructured macrotasks. We describe the iterative approach we used to develop Calendar.help, and share the lessons learned from scheduling thousands of meetings during a year of real-world deployments. Our findings provide insight into how complex information tasks can be broken down into repeatable components that can be executed efficiently to improve productivity.
Conference Paper
This paper outlines the development and testing of a novel, feedback-enabled attention allocation aid (AAAD), which uses real-time physiological data to improve human performance in a realistic sequential visual search task. Indeed, by optimizing over search duration, the aid improves efficiency, while preserving decision accuracy, as the operator identifies and classifies targets within simulated aerial imagery. Specifically, using experimental eye-tracking data and measurements about target detectability across the human visual field, we develop functional models of detection accuracy as a function of search time, number of eye movements, scan path, and image clutter. These models are then used by the AAAD in conjunction with real time eye position data to make probabilistic estimations of attained search accuracy and to recommend that the observer either move on to the next image or continue exploring the present image. An experimental evaluation in a scenario motivated from human supervisory control in surveillance missions confirms the benefits of the AAAD.
Oral production is an important part in English learning. Lack of a language environment with efficient instruction and feedback is a big issue for non-native speakers’ English spoken skill improvement. A computer-assisted language learning system can provide many potential benefits to language learners. It allows adequate instructions and instant feedback to the student, thus it facilitates self-study and encourages interactive use of the language. In this paper, we analyse Chinese college students' speaking requirements first, and then based on these requirements and usability criteria, a detailed description of an oral skills development system is provided. In this system, supported by Automatic Speech Recognition technology, special interactive activities and feedback forms were designed and a prototype was set up and tested to verify the effectiveness of this proposal.
This study investigates the different learning opportunities enabled by text-based and video-based synchronous computer-mediated communication (SCMC) from an interactionist perspective. Six Chinese-speaking learners of English and six English-speaking learners of Chinese were paired up as tandem (reciprocal) learning dyads. Each dyad participated in four kinds of interactions, namely, English text-based SCMC, Chinese text-based SCMC, English video-based SCMC and Chinese video-based SCMC. Their use of communication strategies (CSs) were analyzed along with an after-task questionnaire and with stimulated reflection to explore systematically and comprehensively the differences between text-based and video-based SCMC. In addition to the main role of qualitative analysis, the quantitative analysis was undertaken to provide an overview of the relative frequencies of the occurrence of the different strategies and to understand their distribution in the different conditions. A MANOVA was applied to understand to what extent the differences are likely to have occurred by chance. The results showed that learners used CSs differently in text-based and video-based SCMC and indicated different learning opportunities provided by these two modes of SCMC. While text-based SCMC appears to have greater potential for learning target-like language forms, video-based SCMC seems particularly effective for fluency development as well as pronunciation improvement.
Article
Sandra Joy Savignon is a professor in the Program in Linguistics and Applied Language Studies at the Pennsylvania State University. A past president of the American Association for Applied Linguistics and the founder and long-time director of the multidisciplinary doctoral program in Second Language Acquisition and Teacher Education at the University of Illinois, she has traveled widely in North and South America, Europe, and Asia, consulting and giving seminars on communicative language teaching. Her books include Communicative competence: theory and classroom practice, winner of the Modern Language Association of America Mildenberger Medal for an outstanding research publication in the field of second/foreign language teaching. Her most recent book is Interpreting communicative language teaching: contexts and concerns in teacher education. She and her husband Gabriel are the parents of three bilingual children, now grown with families of their own.
Article
This article reports the views of 24 Chinese (People's Republic of China) teachers of English on the appropriateness and effectiveness of “Western” language-teaching methods (here defined according to Canale & Swain, 1980) for use in Chinese situations. The Chinese teachers believed that the communicative approach was mainly applicable in China only for those students who planned to go to an English-speaking country, and, as nonnative speakers, they noted their limitations with respect to the sociolinguistic and strategic competence in English that is required for using this approach effectively. The teachers also cited various constraints on implementing Western language-teaching methods, including the context of the wider curriculum, traditional teaching methods, class sizes and schedules, resources and equipment, and the low status of teachers who teach communicative rather than analytic skills. An examination of these views in light of the context and theory of Western language teaching demonstrates that the Chinese teachers' concerns have considerable justification. Various suggestions are made as possible means of adapting Western language-teaching methods to the situation in China.
Article
This article is a revised version of a keynote address given by the author at CALICO '98, the fifteenth annual CALICO symposium, in San Diego, California in July 1998. CALICO wishes to thank Dr. Clifford for his keynote address at CALICO '98 and his permission to publish it here. The opinions expressed in this article are those of the author and do not necessarily represent those of the Defense Language Institute Foreign Language Center.
Article
The purpose of this study is to assess the potential of technology for improving lan- guage education. A review of the effectiveness of past and current practices in the application of information and communication technology(ICT) in language education and the availability as well as capacities of current ICTs was conducted. The review found that existing literature on the effectiveness of technology uses in language education is very limited in four aspects: a) The number of systematic, well-designed empirical evaluative studies of the effects of technology uses in language learning is very small, b) the settings of instruction where the studies were conducted were limited to higher education and adult learners, c) the languages studied were limited to common foreign languages and English as a foreign or second language, and d) the experiments were often short-term and about one or two aspects of language learning (e.g., vocabulary or grammar). However the limited number of available studies shows a pattern of positive effects. They found technology-supported language learning is at least as effective as human teachers, if not more so.
Article
The relative effects of various types of negative feedback on the acquisition of the English dative alternation by 100 adult Spanish-speaking learners of English as a second language were investigated. Our objective was to determine empirically whether feedback can help learners learn the appropriate abstract constraints on an overgeneral rule. All subjects were trained on the alternation, which was presented in terms of a simple structural change. Subjects were divided into groups according to the type of feedback they received when they made an error. Specifically, upon making an error, Group A subjects were given explicit metalinguistic information about the generalization we hoped they would learn. Group B subjects were told that their response was wrong. Group C subjects were corrected when they erred, giving them a model of the response desired along with implicit negative evidence that their response was incorrect. Group D subjects, having made an error, were asked if they were sure about their response. The comparison group received no feedback.
Article
A meta-analysis was conducted to investigate the effects of explicit and implicit instruction on the acquisition of simple and complex grammatical features in English. The target features in the 41 studies contributing to the meta-analysis were categorized as simple or complex based on the number of criteria applied to arrive at the correct target form (Hulstijn & de Graaff, 1994). The instructional treatments were classified as explicit or implicit following Norris and Ortega (2000). The results indicate larger effect sizes for explicit over implicit instruction for simple and complex features. The findings also suggest that explicit instruction positively contributes to learners’ controlled knowledge and spontaneous use of complex and simple forms.
Article
This update to Garrett (1991), “Technology in the Service of Language Learning: Trends and Issues,” explores current uses of technology to facilitate the teaching and assessment of second languages. In this article, I discuss the changes that have taken place over the last 18 years regarding selected topics from the 1991 article, including the relationship between pedagogy, theory, and technology, physical infrastructure, efficacy, copyright concerns, categories of software (e.g., tutorial, authentic materials engagement, communication uses of technology), and evaluation. I then explore the most challenging issues facing computer-assisted language learning (CALL) scholarship and practice today, that is, new demands in language education (based on the conclusions of the 2007 report of the Modern Language Association and Jackson & Malone, 2009), the need to rethink grammar instruction, online learning, social computing, teacher training and professional development, and CALL research. Like the original 1991 article, this work contains an appendix with links to information resources for CALL research and practice. I conclude by saying that new initiatives are needed to promote the use of technology for research on CALL and for facilitating second language acquisition, such as support for institutional language centers, streamlining of the work of professional organizations dedicated to CALL, and the establishment of a national CALL center.
Article
This paper reviews research in spoken language technology for education and more specifically for language learning. It traces the history of the domain and then groups main issues in the interaction with the student. It addresses the modalities of interaction and their implementation issues and algorithms. Then it discusses one user population – children – and an application for them. Finally it has a discussion of overall systems. It can be used as an introduction to the field and a source of reference materials.
Article
Restricted maximum likelihood (REML) is now well established as a method for estimating the parameters of the general Gaussian linear model with a structured covariance matrix, in particular for mixed linear models. Conventionally, estimates of precision and inference for fixed effects are based on their asymptotic distribution, which is known to be inadequate for some small-sample problems. In this paper, we present a scaled Wald statistic, together with an F approximation to its sampling distribution, that is shown to perform well in a range of small sample settings. The statistic uses an adjusted estimator of the covariance matrix that has reduced small sample bias. This approach has the advantage that it reproduces both the statistics and F distributions in those settings where the latter is exact, namely for Hotelling T2 type statistics and for analysis of variance F-ratios. The performance of the modified statistics is assessed through simulation studies of four different REML analyses and the methods are illustrated using three examples.
Tech Background: Babbel Speech Recognition
  • Gmbh Lesson Nine
Lesson Nine GmbH. 2010. Tech Background: Babbel Speech Recognition. (June 2010).
Using Multivariate Statistics. Pearson Education Limited
  • B G Tabachnick
  • L S Fidell
B.G. Tabachnick and L.S. Fidell. 2012. Using Multivariate Statistics. Pearson Education, Limited. http://books.google.ca/books?id=ucj1ygAACAAJ
Google Cloud Speech API
  • Google
  • Ray Clifford
Ray Clifford. 1998. Mirror, Mirror, on the Wall: Reflections on Computer Assisted Language Learning. CALICO Journal 16, 1 (1998), 1. http://search.proquest. com/docview/750443820?accountid=14771
  • Google
24. Google. 2018. Google Cloud Speech API. (2018).