Carolyn Penstein Rosé's research while affiliated with Carnegie Mellon University and other places

Publications (335)

Article
Full-text available
Natural language understanding (NLU) has made massive progress driven by large benchmarks, but benchmarks often leave a long tail of infrequent phenomena underrepresented. We reflect on the question: Have transfer learning methods sufficiently addressed the poor performance of benchmark-trained models on the long tail? We conceptualize the long tai...
Article
To date, many AI initiatives (eg, AI4K12, CS for All) developed standards and frameworks as guidance for educators to create accessible and engaging Artificial Intelligence (AI) learning experiences for K‐12 students. These efforts revealed a significant need to prepare youth to gain a fundamental understanding of how intelligence is created, appli...
Article
The StoryQ environment provides an intuitive graphical user interface for middle and high school students to create features from unstructured text data and train and test classification models using logistic regression. StoryQ runs in a web browser, is free and requires no installation. AI concepts addressed include: features, weights, accuracy, t...
Conference Paper
Full-text available
Collaborating effectively with people with diverse professional/cultural backgrounds is a core skill in a globalized world. To prepare students, we need to understand how skills for interprofessional, and multicultural collaboration can best be (1) defined, (2) measured, (3) developed (4), and fostered. Thus, this symposium seeks to identify shared...
Article
Forums are essential components facilitating interactions in online courses. However, in large-scale courses, many posts generated, which results in learners' difficulties. First, the posts are poorly organized and some deviate from the topic, making it difficult for learners' knowledge acquisition. Second, learners cannot receive timely feedback a...
Article
We present a script for conversational reflection guidance embedded in reflective practice. Rebo Junior, a non-adaptive conversational agent, was evaluated in a 12-week field study with apprentices. We analysed apprentices' interactions with Rebo Junior in terms of reflectivity, and measured the development of their reflection competence via reflec...
Conference Paper
Full-text available
Socially shared regulation of learning (SSRL) is essential for the success of collaborative learning, yet learners often neglect needed regulation while facing challenges. In order to provide targeted support when needed, it is critical to identify the precise events that trigger regulation. Multimodal collaborative learning data may offer opportun...
Preprint
Full-text available
Natural language understanding (NLU) has made massive progress driven by large benchmarks, paired with research on transfer learning to broaden its impact. Benchmarks are dominated by a small set of frequent phenomena, leaving a long tail of infrequent phenomena underrepresented. In this work, we reflect on the question: have transfer learning meth...
Conference Paper
Full-text available
This paper reports a design science research methodology (DSRM) study that develops, demonstrates, and evaluates a deep learning model utilizing multimodal data to automatically detect types of interactions for regulation in collaborative learning (RegCL) by using features extracted from electrodermal activity (EDA), video, and audio data involving...
Preprint
Full-text available
As an important task in multimodal context understanding, Text-VQA (Visual Question Answering) aims at question answering through reading text information in images. It differentiates from the original VQA task as Text-VQA requires large amounts of scene-text relationship understanding, in addition to the cross-modal grounding capability. In this p...
Article
Full-text available
Objectives: Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction-extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best cand...
Conference Paper
Knowledge Graph (KG) completion research usually focuses on densely connected benchmark datasets that are not representative of real KGs. We curate two KG datasets that include biomedical and encyclopedic knowledge and use an existing commonsense KG dataset to explore KG completion in the more realistic setting where dense connectivity is not guara...
Article
Full-text available
Professional and lifelong learning are a necessity for workers. This is true both for re-skilling from disappearing jobs, as well as for staying current within a professional domain. AI-enabled scaffolding and just-in-time and situated learning in the workplace offer a new frontier for future impact of AIED. The hallmark of this community’s work ha...
Preprint
Full-text available
Contributing to the literature on aptitude-treatment interactions between worked examples and problem-solving, this paper addresses differential learning from the two approaches when students are positioned as domain experts learning new concepts. Our evaluation is situated in a team project that is part of an advanced software engineering course....
Preprint
Knowledge Graph (KG) completion research usually focuses on densely connected benchmark datasets that are not representative of real KGs. We curate two KG datasets that include biomedical and encyclopedic knowledge and use an existing commonsense KG dataset to explore KG completion in the more realistic setting where dense connectivity is not guara...
Article
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important quest...
Article
This column raises the question, as we begin to emerge from COVID 19, what is the role of the field of AI in this emerging reality? We specifically consider this in the face of tremendous learning loss and widening achievement gaps. In this wake, what specifically is the role of AI in the future of education as we move forward? This question bridge...
Preprint
Teacher Moments is an open source platform that allows the authoring of simulations used for education which we recently revised to integrate intelligent coaching agents. The initial simulation development for Teacher Moments focused on teacher education, but the platform is actively used for professional development with nurses, psychologists, pol...
Preprint
Full-text available
Despite achieving state-of-the-art accuracy on temporal ordering of events, neural models showcase significant gaps in performance. Our work seeks to fill one of these gaps by leveraging an under-explored dimension of textual semantics: rich semantic information provided by explicit textual time cues. We develop STAGE, a system that consists of a n...
Preprint
Full-text available
Recent work on entity coreference resolution (CR) follows current trends in Deep Learning applied to embeddings and relatively simple task-related features. SOTA models do not make use of hierarchical representations of discourse structure. In this work, we leverage automatically constructed discourse parse trees within a neural approach and demons...
Preprint
Full-text available
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important quest...
Article
Full-text available
jats:p>Practice is essential for learning. However, for many interpersonal skills, there often are not enough opportunities and venues for novices to repeatedly practice. Role-playing simulations offer a promising framework to advance practice-based professional training for complex communication skills, in fields such as teaching. In this work, we...
Article
The learning analytics dashboard (LAD) has been recognised as a useful tool to facilitate teachers’ diagnosis and intervention in collaborative learning. However, little empirical evidence has been reported concerning the effects of LAD on teachers in computer-supported collaborative learning (CSCL). The purpose of this study was to evaluate the ef...
Preprint
Full-text available
Modelling persuasion strategies as predictors of task outcome has several real-world applications and has received considerable attention from the computational linguistics community. However, previous research has failed to account for the resisting strategies employed by an individual to foil such persuasion attempts. Grounded in prior literature...
Chapter
Full-text available
In this chapter we provide a survey of language quantification practices in CSCL. We begin by defining quantification of language and providing an overview of the different purposes it serves. We situate language quantification within the spectrum of more to less quantitative research designs to help the reader understand that both quantitative and...
Chapter
Full-text available
The CSCL community has traditionally focused on collaborative learning in small groups or communities. Given the rise of mass collaboration and learning at scale, the community is facing unprecedented opportunity to expand its views to advance collaborative learning at scale. In this chapter, we first explicate the history and development of collab...
Article
Objectives Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity—words or phrases that may refer to different concepts—has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambigu...
Preprint
Full-text available
Coreference resolution (CR) is an essential part of discourse analysis. Most recently, neural approaches have been proposed to improve over SOTA models from earlier paradigms. So far none of the published neural models leverage external semantic knowledge such as type information. This paper offers the first such model and evaluation, demonstrating...
Preprint
Full-text available
Information extraction from conversational data is particularly challenging because the task-centric nature of conversation allows for effective communication of implicit information by humans, but is challenging for machines. The challenges may differ between utterances depending on the role of the speaker within the conversation, especially when...
Preprint
The notion of \emph{face} refers to the public self-image of an individual that emerges both from the individual's own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in t...
Preprint
Full-text available
We tackle the task of adapting event extractors to new domains without labeled data, by aligning the marginal distributions of source and target domains. As a testbed, we create two new event extraction datasets using English texts from two medical domains: (i) clinical notes, and (ii) doctor-patient conversations. We test the efficacy of three mar...
Chapter
Dynamic conversational agent-based support for collaborative learning has shown significant positive effects on learning over no-support or static-support control conditions in prior studies. In order to understand the boundary between human-led and AI-led support for collaboration, we compare in this study an approach where the agent’s primary rol...
Article
Purpose In response to the evolving COVID-19 pandemic, many universities have transitioned to online instruction. With learning promising to be online, at least in part, for the near future, instructors may be thinking of providing online collaborative learning opportunities to their students who are increasingly isolated from their peers because o...
Preprint
Full-text available
We tackle the task of building supervised event trigger identification models which can generalize better across domains. Our work leverages the adversarial domain adaptation (ADA) framework to introduce domain-invariance. ADA uses adversarial training to construct representations that are predictive for trigger identification, but not predictive o...
Preprint
Full-text available
Medical entity linking is the task of identifying and standardizing concepts referred in a scientific article or clinical record. Existing methods adopt a two-step approach of detecting mentions and identifying a list of candidate concepts for them. In this paper, we probe the impact of incorporating an entity disambiguation step in existing entity...
Conference Paper
Full-text available
Conversational user interfaces open up new opportunities for reflection guidance. This paper presents a computer-mediated dialogue structure for reflecting on learning tasks, Rebo Junior, and its evaluation in the context of apprenticeship training. We answer three research questions. Firstly, how apprentices react to Rebo Junior; secondly, whether...
Preprint
Full-text available
Compared to other helping professions, teacher training typically lacks sufficient opportunities for novices to practice new skills. When teachers learn, they listen to people talk about teaching, or talk about teaching themselves, but they very rarely do the work of teaching. Games and simulations offer a promising framework to advance practice-ba...
Article
Full-text available
Using data to understand learning and improve education has great promise. However, the promise will not be achieved simply by AI and Machine Learning researchers developing innovative models that more accurately predict labeled data. As AI advances, modeling techniques and the models they produce are getting increasingly complex, often involving t...
Article
Since its launch in 1977, Computers & Chemical Engineering has published numerous papers on the application of computing technology to chemical engineering problems. In this paper, we present a topic analysis of the journal using various text mining techniques. In particular, we examine the dramatic growth of the journal’s topic coverage since the...
Article
Full-text available
Background. The decision to receive a permanent left ventricular assist device (LVAD) to treat end-stage heart failure (HF) involves understanding and weighing the risks and benefits of a highly invasive treatment. The goal of this study was to characterize end-stage HF patients across parameters that may affect their decision making and to inform...
Conference Paper
In schools and colleges around the world, open-ended home-work assignments are commonly used. However, such assignments require substantial instructor effort for grading, and tend not to support opportunities for repeated practice. We propose UpGrade, a novel learnersourcing approach that generates scalable learning opportunities using prior studen...
Chapter
This paper reports on work adapting an industry standard team practice referred to as Mob Programming into a paradigm called Online Mob Programming (OMP) for the purpose of encouraging teams to reflect on concepts and share work in the midst of their project experience. We present a study situated within a series of three course projects in a large...
Preprint
This paper addresses a key challenge in Educational Data Mining, namely to model student behavioral trajectories in order to provide a means for identifying students most at-risk, with the goal of providing supportive interventions. While many forms of data including clickstream data or data from sensors have been used extensively in time series mo...
Article
Full-text available
The patient decision to receive a left ventricular assist device (LVAD) to address end stage heart failure is an area of much research activity, with two paper-based decision support tools recently validated. In the development of a new, interactive support application for these patients, we sought to record issues experienced by patients and careg...
Conference Paper
Analysis of student writing, both for assessment and for enabling feedback have been of interest to the field of learning analytics. While much progress can be made through detection of local cues in writing, structured prediction approaches offer capabilities that are particularly well tailored to the needs of models aiming to offer substantive fe...
Preprint
Full-text available
Quantitative reasoning is an important component of reasoning that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new dataset to evaluate the ability of models to reason with quantities in textual entailment (incl...