Conference PaperPDF Available

The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing

Authors:

Abstract

The Why2-Atlas system teaches qualitative physics by having students write paragraph-long explanations of simple mechanical phenomena. The tutor uses deep syntactic analysis and abductive theorem proving to convert the student's essay to a proof. The proof formalizes not only what was said, but the likely beliefs behind what was said. This allows the tutor to uncover misconceptions as well as to detect missing correct parts of the explanation. If the tutor finds such a flaw in the essay, it conducts a dialogue intended to remedy the missing or misconceived beliefs, then asks the student to correct the essay. It often takes several iterations of essay correction and dialogue to get the student to produce an acceptable explanation. Pilot subjects have been run, and an evaluation is in progress. After explaining the research questions that the system addresses, the bulk of the paper describes the system 's architecture and operation.
... Why?) and initiate discussions based on the responses (Graesser et al., 2001). AutoTutor and its derivatives (Nye et al., 2014;VanLehn et al., 2002;Graesser et al., 1999Graesser et al., , 2004 arose from Graesser et al. (1995) investigation of human tutoring behaviors and modeled the common approach of helping students improve their answers by way of a conversation. These systems rely on natural language processing (NLP) techniques, such as regular expressions, templates, semantic composition (VanLehn et al., 2002), LSA (Graesser et al., 1999;Person, 2003), and other semantic analysis tools (Graesser et al., 2007). ...
... AutoTutor and its derivatives (Nye et al., 2014;VanLehn et al., 2002;Graesser et al., 1999Graesser et al., , 2004 arose from Graesser et al. (1995) investigation of human tutoring behaviors and modeled the common approach of helping students improve their answers by way of a conversation. These systems rely on natural language processing (NLP) techniques, such as regular expressions, templates, semantic composition (VanLehn et al., 2002), LSA (Graesser et al., 1999;Person, 2003), and other semantic analysis tools (Graesser et al., 2007). Nye et al. (2018) added conversational routines to the online mathematics ITS ALEKS by attaching mini-dialogues to individual problems but left navigation to be done via a website. ...
Article
Full-text available
To emulate the interactivity of in-person math instruction, we developed MathBot, a rule-based chatbot that explains math concepts, provides practice questions, and offers tailored feedback. We evaluated MathBot through three Amazon Mechanical Turk studies in which participants learned about arithmetic sequences. In the first study, we found that more than 40% of our participants indicated a preference for learning with MathBot over videos and written tutorials from Khan Academy. The second study measured learning gains, and found that MathBot produced comparable gains to Khan Academy videos and tutorials. We solicited feedback from users in those two studies to emulate a real-world development cycle, with some users finding the lesson too slow and others finding it too fast. We addressed these concerns in the third and main study by integrating a contextual bandit algorithm into MathBot to personalize the pace of the conversation, allowing the bandit to either insert extra practice problems or skip explanations. We randomized participants between two conditions in which actions were chosen uniformly at random (i.e., a randomized A/B experiment) or by the contextual bandit. We found that the bandit learned a similarly effective pedagogical policy to that learned by the randomized A/B experiment while incurring a lower cost of experimentation. Our findings suggest that personalized conversational agents are promising tools to complement existing online resources for math education, and that data-driven approaches such as contextual bandits are valuable tools for learning effective personalization.
... Examples of successful animated pedagogical agents include AutoTutor [27], CIRCSIM-Tutor [28], Why2-Atlas [29], etc. Such systems foster deep learning as students are prompted to explain their reasoning and reflect on their problem-solving activities. ...
Article
Full-text available
In second-language communication, emotional feedbacks play a preponderant role in instilling positive emotions and thereby facilitating the production of the target language by second-language learners. In contrast, facial expressions help convey emotion, intent, and sometimes even desired actions more effectively. Additionally, according to the facial feedback hypothesis, a major component of several contemporary theories of emotion, facial expressions can regulate emotional behavior and experience. The aim of this study was to determine whether and to what extent emotional expressions reproduced by virtual agents could provide empathetic support to second-language learners during communication tasks. To do so, using the Facial Coding Action System, we implemented a prototype virtual agent that can display a collection of nonverbal feedbacks, including Ekman’ six basic universal emotions and gazing and nodding behaviors. Then, we designed a Wizard of Oz experiment in which second-language learners were assigned independent speaking tasks with a virtual agent. In this paper, we outline our proposed method and report on an initial experimental evaluation which validated the meaningfulness of our approach. Moreover, we present our next steps for improving the system and validating its usefulness through large-scale experiments.
... By contrast, prior work in the field of intelligent tutoring dialogues has widely relied on large rule-based systems injected with human-crafted domain knowledge (Anderson et al., 1995;Aleven et al., 2001;Graesser et al., 2001;VanLehn et al., 2002;Rus et al., 2015b). Many of these systems involve students answering multiple choice or fillin-the-blank questions and being presented with a hint or explanation when they answer incorrectly. ...
Chapter
Enabled by advanced technologies, smart learning is growing into a popular trend. Smart learning can involve combining the Internet, mobile, and context-aware technologies with the analysis of large datasets and information about a specific learner to create learner-specific, meaningful learning activities. A smart learning application can potentially play the role of an informed and capable companion who seeks opportunities to advise others about a wide variety of things that come up in school, at home, or elsewhere. Given the movement toward smart learning applications, this chapter takes a historical view toward the development of inquiry learning and critical thinking skills, reviewing prior approaches and examining the gains and shortfalls of those efforts. We develop a research-based framework for a transformative approach to support the development of inquiry learning and critical thinking skills in young children using smart technologies. Such a transformation could be achieved by incorporating personalized learning, adaptive learning, affective considerations, and mixed realities in a series of steps to help a young learner develop habits of mind that include noticing unusual things, seeking explanations for those things, testing various explanations, and reflecting on progress and process. A framework for a conversational application aimed at developing critical thinking systematically, especially promoting continuing inquiry and reasoning, is presented along with suggestions for determining impact. An initial prototype and results are briefly described.
Conference Paper
Full-text available
To meet expectations of stakeholders in Education 4.0, the inclination towards acceptance, testing, and implementation of Voice Assistants has increased at a faster pace in western universities. This study has been done to understand the present level of performance and capability of voice assistants in answering the basic queries of users from Indian Higher education sector. Three voice assistants namely Amazon Alexa, Microsoft Cortana, and Google Assistant have been considered for a survey in which 100 students participated and tested the performance of these three voice assistants by asking basic questions related to admission and examination. A set of total 14 questions has been selected to be asked by user in both the categories. Category 1 related to Admissions contained 7 little complex questions while category 2 related to Examination had 7 direct and simpler questions. Responses of the users were noted down and after careful analysis, the explanations have been deduced about performance of these voice assistants in factors considered: understanding of voice and language of user, understanding of basic meaning of the question, request to repeat the question, whether the answer was given by voice assistants and quality of the answer received. As per the results, various observations have been noted that may further help Artificial Intelligence developers and programmers to understand the success and failure of voice assistants and accordingly bring improvements in their performance level. This study also helps in knowing about huge applications, contributions and benefits these voice assistants offer to the stakeholders. Once the improvements are incorporated by the developers, marketing professionals can boost the absorption and use of voice assistants in massive higher education sector of India by convincing the important stakeholders that also paves a way to tap into more undiscovered possibilities about use of Artificial Intelligence in higher education system at global level. Keywords: Artificial Intelligence, Voice Assistants, Higher Education, Stakeholders, Education 4.0
Article
Full-text available
This article discusses the usefulness of Toulmin’s model of arguments as structuring an assessment of different types of wrongness in an argument. We discuss the usability of the model within a conversational agent that aims to support users to develop a good argument. Within the article, we present a study and the development of classifiers that identify the existence of structural components in a good argument, namely a claim, a warrant (underlying understanding), and evidence. Based on a dataset (three sub-datasets with 100, 1,026, 211 responses in each) in which users argue about the intelligence or non-intelligence of entities, we have developed classifiers for these components: The existence and direction (positive/negative) of claims can be detected a weighted average F1 score over all classes (positive/negative/unknown) of 0.91. The existence of a warrant (with warrant/without warrant) can be detected with a weighted F1 score over all classes of 0.88. The existence of evidence (with evidence/without evidence) can be detected with a weighted average F1 score of 0.80. We argue that these scores are high enough to be of use within a conditional dialogue structure based on Bloom’s taxonomy of learning; and show by argument an example conditional dialogue structure that allows us to conduct coherent learning conversations. While in our described experiments, we show how Toulmin’s model of arguments can be used to identify structural problems with argumentation, we also discuss how Toulmin’s model of arguments could be used in conjunction with content-wise assessment of the correctness especially of the evidence component to identify more complex types of wrongness in arguments, where argument components are not well aligned. Owing to having progress in argument mining and conversational agents, the next challenges could be the developing agents that support learning argumentation. These agents could identify more complex type of wrongness in arguments that result from wrong connections between argumentation components.
Article
Full-text available
Intelligent tutoring systems (ITSs) are computer programs that provide instruction adapted to the needs of individual students. Dialog systems are computer programs that communicate with human users by using natural language. This paper presents a systematic literature review to address ITSs that incorporate dialog systems and have been implemented in the last twenty years. The review found 33 ITSs and focused on answering the following five research questions. a) What ITSs with natural language dialogue have been developed? b) What is the main purpose of the tutoring dialogue in each system? c) What are the pedagogical features of the teaching process performed by the ITSs with natural language dialogue? d) What natural language understanding approach does each system employ to understand students' utterances? e) What evidence exists related to the evaluation of ITSs with natural language dialogue? The results of this review reveal that most ITSs are directed toward science, technology, engineering, and mathematics (STEM) domains at the university level, and the majority of the selected ITSs implement the expectations and misconceptions tailored approach. Furthermore, most ITSs use dialog to help students learn how to solve a problem by applying rules, laws, etc. (the apply level in Bloom's taxonomy). With regard to the instructional approach, the selected ITSs help students write correct explanations or answers for deep questions; assist students in problem solving; or support a reflective dialogue motivated by either previously provided content or the result of a simulation. Additionally, we found empirical evaluations for 90.91% of the selected ITSs that measure the learning gains and/or assess the impacts of different tutoring strategies.
Chapter
The use of speech and spoken dialogue is a relatively recent addition to instructional systems. As, almost invariably, human instructors and students talk during teaching and training, spoken dialogue would seem to be an important factor in systems that emulate aspects of human instruction. In this chapter, we describe the origins and state of the art of spoken multimodal instruction. We then discuss strengths and weaknesses of the speech modality, key roles of spoken dialogue in multimodal instruction, functional issues in current spoken teaching and training systems, commercial prospects, and some main challenges ahead.
Conference Paper
Automatic knowledge acquisition is a rather complex and challenging task. This paper focuses on the description and evaluation of a semi-automatic authoring tool (SAAT) that has been developed as a part of the Adaptive Courseware based on Natural Language AC&NL Tutor project. The SAAT analyzes a natural language text and, as a result of the declarative knowledge extraction process, it generates domain knowledge that is presented in a form of natural language sentences, questions and domain knowledge graphs. Generated domain knowledge presents expert knowledge in the intelligent tutoring system Tutomat. The natural language processing techniques are applied and the tool’s functionalities are thoroughly explained. This tool is, to our knowledge, the only one that enables natural language question and sentence generation of different levels of complexity. Using an unstructured and unprocessed Wikipedia text in computer science, evaluation of domain knowledge extraction algorithm, i.e. the correctness of extraction outcomes and the effectiveness of extraction methods, was performed. The SAAT outputs were compared with the gold standard, manually developed by two experts. The results showed that 68.7% of detected errors referred to the performance of the integrated linguistic resources, such as CoreNLP, Senna, WordNet, whereas 31.3% of errors referred to the proposed extraction algorithms.
Chapter
This chapter discusses the design of a dialog-based intelligent tutoring system for the domain of Business Information Systems education. The system is designed to help students work on group projects, maintain their motivation, and provide subtle hints for self-directed discovery. We analyze the domain of Business Information Systems—which we find to be “ill-defined” in the sense that e.g. multiple conflicting solutions may exist and be acceptable for a given task. Based on an extensive collection of requirements derived from previous work, we propose a solution that helps both groups find solutions and individuals reflect on these solutions. This combination ensures that not only the group’s result is valid, but also that all group members reach the defined learning goals. We show how the complexity of the domain can be captured in a rather simple way via constraint-based engineering and how machine learning can help map student utterances to these constraints. We demonstrate the intended working principles of the system with some example dialogs and some first thoughts about backend implementation principles.
Article
Full-text available
A survey of pre/post-test data using the Halloun–Hestenes Mechanics Diagnostic test or more recent Force Concept Inventory is reported for 62 introductory physics courses enrolling a total number of students N6542. A consistent analysis over diverse student populations in high schools, colleges, and universities is obtained if a rough measure of the average effectiveness of a course in promoting conceptual understanding is taken to be the average normalized gain g. The latter is defined as the ratio of the actual average gain (%post%pre) to the maximum possible average gain (100 %pre). Fourteen ''traditional'' (T) courses (N2084) which made little or no use of interactive-engagement IE methods achieved an average gain g T-ave 0.230.04 std dev. In sharp contrast, 48 courses (N4458) which made substantial use of IE methods achieved an average gain g IE-ave 0.480.14 std dev, almost two standard deviations of g IE-ave above that of the traditional courses. Results for 30 (N3259) of the above 62 courses on the problem-solving Mechanics Baseline test of Hestenes–Wells imply that IE strategies enhance problem-solving ability. The conceptual and problem-solving test results strongly suggest that the classroom use of IE methods can increase mechanics-course effectiveness well beyond that obtained in traditional practice. © 1998 American Association of Physics Teachers.
Conference Paper
Full-text available
CIRCSIM-Tutor version 2, a dialogue-based intelligent tutoring system (ITS), is nearly five years old. It conducts a conversation with a student to help the student learn to solve a class of problems in cardiovascular physiology dealing with the regulation of blood pressure. It uses natural language for both input and output, and can handle a variety of syntactic constructions and lexical items, including sentence fragments and misspelled words.
Article
Full-text available
AutoTutor is a fully automated computer tutor that assists students in learning about hardware, operating systems, and the Internet in an introductory computer literacy course. AutoTutor presents questions and problems from a curriculum script, attempts to comprehend learner contributions that are entered by keyboard, formulates dialog moves that are sensitive to the learner's contributions (such as prompts, elaborations, corrections, and hints), and delivers the dialog moves with a talking head. Latent Semantic Analysis (LSA) is a major component of the mechanism that evaluates the quality of student contributions in the tutorial dialog. LSA's evaluations of college students' answers to deep reasoning questions are equivalent to the evaluations provided by intermediate experts of computer literacy, but not as high as more accomplished experts in computer science. LSA is capable of discriminating different classes of student ability (good, vague, erroneous or mute students) and in tracking the quality of contributions in tutorial dialog.
Book
The editor of this volume, who is also author or coauthor of five of the contributions, has provided an introduction that not only affords an overview of the separate articles but also interrelates the basic issues in linguistics, psycholinguistics and cognitive studies that are addressed in this volume. The twelve articles are grouped into three sections, as follows: "I. Lexical Representation: " The Passive in Lexical Theory (J. Bresnan); On the Lexical Representation of Romance Reflexive Clitics (J. Grimshaw); and Polyadicity (J. Bresnan)."II. Syntactic Representation: " Lexical-Functional Grammar: A Formal Theory for Grammatical Representation (R. Kaplan and J. Bresnan); Control and Complementation (J. Bresnan); Case Agreement in Russian (C. Neidle); The Representation of Case in Icelandic (A. Andrews); Grammatical Relations and Clause Structure in Malayalam (K. P. Monahan); and Sluicing: A Lexical Interpretation Procedure (L. Levin)."III. Cognitive Processing of Grammatical Representations: " A Theory of the Acquisition of Lexical Interpretive Grammars (S. Pinker); Toward a Theory of Lexico-Syntactic Interactions in Sentence Perception (M. Ford, J. Bresnan, and R. Kaplan); and Sentence Planning Units: Implications for the Speaker's Representation of Meaningful Relations Underlying Sentences (M. Ford).
Article
Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extra-grammatical. The effectiveness and efficiency of any robust parser are a direct func-tion of three main factors: (1) Flexibility: what types of disfluencies and deviations from the grammar can the parser handle?; (2) Search: How does the parser search the space of possible interpretations, and what techniques are applied to prune the search space?; and (3) Parse Selection and Disambiguation: What methods and resources are used to evaluate and rank potential parses and sub-parses, and how does the parser cope with the extreme levels of ambiguity introduced by its flex-ibility parameters? In this chapter we describe our investigations on how to balance flexibility and efficiency in the context of two different robust parsers -a GLR parser and a left corner Chart parser -both based on a unification-augmented context-free grammar formalism. We demonstrate how the combination of a beam search together with am-biguity packing and statistical disambiguation provide a flexible frame-work for achieving a good balance between robustness and efficiency in such parsers. Our investigations are based on experimental results and comparative performance evaluations of both parsers using a grammar for the spoken language ESST (English Spontaneous Scheduling Task) domain.
Article
Abduction is inference to the best explanation. In the TACITUS project at SRI we have developed an approach to abductive inference, called “weighted abduction”, that has resulted in a significant simplification of how the problem of interpreting texts is conceptualized. The interpretation of a text is the minimal explanation of why the text would be true. More precisely, to interpret a text, one must prove the logical form of the text from what is already mutually known, allowing for coercions, merging redundancies where possible, and making assumptions where necessary. It is shown how such “local pragmatics” problems as reference resolution, the interpretation of compound nominals, the resolution of syntactic ambiguity and metonymy, and schema recognition can be solved in this manner. Moreover, this approach of “interpretation as abduction” can be combined with the older view of “parsing as deduction” to produce an elegant and thorough integration of syntax, semantics, and pragmatics, one that spans the range of linguistic phenomena from phonology to discourse structure. Finally, we discuss means for making the abduction process efficient, possibilities for extending the approach to other pragmatics phenomena, and the semantics of the weights and costs in the abduction scheme.
Conference Paper
This paper describes AUTOSEM, a robust semanticinterpretation framework that can operate both atparse time and repair time. The evaluation demonstratesthat AUTOSEM achieves a high level of robustnessefficiently and without requiring any handcoded knowledge dedicated to repair.