
John StamperCarnegie Mellon University | CMU · Human-Computer Interaction Institute
John Stamper
Doctor of Philosophy
About
106
Publications
18,149
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,097
Citations
Publications
Publications (106)
The Doer Effect states that completing more active learning activities, like practice questions, is more strongly related to positive learning outcomes than passive learning activities, like reading, watching, or listening to course materials. Although broad, most evidence has emerged from practice with tutoring systems in Western, Industrialized,...
Interest in K-12 AI Literacy education has surged in the past year, yet large-scale learning data remains scarce despite considerable efforts in developing learning materials and running summer programs. To make larger scale dataset available and enable more replicable findings, we developed an intelligent online learning platform featuring AI Lite...
Exposing students to low-quality assessments such as multiple-choice questions (MCQs) and short answer questions (SAQs) is detrimental to their learning, making it essential to accurately evaluate these assessments. Existing evaluation methods are often challenging to scale and fail to consider their pedagogical value within course materials. Onlin...
Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing...
Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language...
Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing...
Knowledge Components (KCs) linked to assessments enhance the measurement of student learning, enrich analytics, and facilitate adaptivity. However, generating and linking KCs to assessment items requires significant effort and domain-specific knowledge. To streamline this process for higher-education courses, we employed GPT-4 to generate KCs for m...
In response to the growing need for frequent, high-quality assessments in the expanding field of online learning and the significant time burden their manual creation places on educators, this study proposes Focal, an end-to-end assessment generation pipeline. Focal employs large language models, notably Text-to-Text Transfer Transformers, fine-tra...
Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existing methods for evaluating multiple-choice questions often focus on machine readability metri...
Clinical decision support systems have been increasingly utilized in the healthcare industry to improve patient outcomes and enhance clinical decision-making, taking advantage of the growing digital medical data. Despite their potential, there are still obstacles in an extensive adoption of these systems, such as low usability and human factors. In...
Introduction: Emergency department visits have increased substantially, leading to a significant rise in waiting time for patients. Several kiosk-based solutions have been introduced to reduce waiting times in healthcare facilities and to increase efficacy and user satisfaction. Purpose of the Study: This systematic review aims to identify the most...
Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existing methods for evaluating multiple-choice questions often focus on machine readability metri...
We propose the first annual workshop on Empowering Education with LLMs - the Next-Gen Interface and Content Generation. This full-day workshop explores ample opportunities in leveraging humans, AI, and learning analytics to generate content, particularly appealing to instructors, researchers, learning engineers, and many other roles. The process of...
While self-explanation prompts have been shown to promote robust learning in several knowledge domains, there is less research on how different self-explanation formats benefit each skill set in a given domain. To address this gap, our work investigates 214 students’ problem-solving performance in a learning game for decimal numbers as they perform...
Engaging students in creating novel content, also referred to as learnersourcing, is increasingly recognised as an effective approach to promoting higher-order learning, deeply engaging students with course material and developing large repositories of content suitable for personalized learning. Despite these benefits, some common concerns and crit...
Generating short answer questions is a popular form of learnersourcing with benefits for both the students’ higher-order thinking and the instructors’ collection of assessment items. However, assessing the quality of the student-generated questions can involve significant efforts from instructors and domain experts. In this work, we investigate the...
Students learn more from doing activities and practicing their skills on assessments, yet it can be challenging and time consuming to generate such practice opportunities. In our work, we examine how advances in natural language processing and question generation may help address this issue. In particular, we present a pipeline for generating and e...
In this paper we show how we can utilize human-guided machine learning techniques coupled with a learning science practitioner interface (DataShop) to identify potential improvements to existing educational technology. Specifically, we provide an interface for the classification of underlying Knowledge Components (KCs) to better model student learn...
A key issue in mathematics education is supporting students in developing general problem-solving skills that can be applied to novel, non-routine situations. However, typical mathematics instruction in the U.S. too often is dominated by rote learning, without exposing students to the underlying reasoning or alternate ways to solve problems. As a f...
Associating assessment items with hypothesized knowledge components (KCs) enables us to gain fine-grained data on students’ performance within an ed-tech system. However, creating this association is a time consuming process and requires substantial instructor effort. In this study, we present the results of crowdsourcing valuable insights into the...
When students are given agency in playing and learning from a digital learning game, how do their decisions about sequence of gameplay impact learning and enjoyment? We explored this question in the context of Decimal Point, a math learning game that teaches decimals to middle-school students. Our analysis is based on students in a high-agency cond...
In this research, we explore how expertise is shown in both humans and AI agents. Human experts follow sets of strategies to complete domain specific tasks while AI agents follow a policy. We compare machine generated policies to human strategies in two game domains, using these examples we show how human strategies can be seen in agents. We believ...
Determining the impact of belief bias on everyday reasoning is critical for understanding how our beliefs can influence how we judge arguments. We examined the impact of belief bias on the user’s ability to identify logical fallacies in political arguments. We found that participants had more difficulty identifying logical fallacies in arguments th...
The Hint Factory is a method of automatic hint generation that has been used to augment hints in a number of educational systems. Although the previous implementations were done in domains with largely deterministic environments, the methods are inherently useful in stochastic environments with uncertainty. In this work, we explore the game Connect...
This demo will showcase Tigris-an online workflow tool developed as part of the LearnSphere project. LearnSphere is a community data infrastructure to support learning improvement online, and brings together a number of data repositories including DataShop (Stamper et al., 2010) and DiscourseDB (Rosé & Ferschke, 2016). Instruction is a data-rich ac...
Increasingly, student work is being conducted on computers and online, producing vast amounts of learning‐related data. The educational analytics fields have produced many insights about learning based solely on tutoring systems' automatically logged data, or “log data.” But log data leave out important contextual information about the learning exp...
The proliferation of fake news has underscored the importance of critical thinking in the civic education curriculum. Despite this recognized importance, systems designed to foster these kinds of critical thinking skills are largely absent from the educational technology space. In this work, we utilize an instructional factors analysis in conjuncti...
Temporal analyses are critical to understanding learning processes, yet understudied in education research. Data from different sources are often collected at different grain sizes, which are difficult to integrate. Making sense of data at many levels of analysis, including the most detailed levels, is highly time-consuming. In this paper, we descr...
We demonstrate that, by using a small set of hand-graded student work, we can automatically generate rubric criteria with a high degree of validity, and that a predictive model incorporating these rubric criteria is more accurate than a previously reported model. We present this method as one approach to addressing the often challenging problem of...
Systematic endeavors to take computer science (CS) and computational thinking (CT) to scale in middle and high school classrooms are underway with curricula that emphasize the enactment of authentic CT skills, especially in the context of programming in block-based programming environments. There is, therefore, a growing need to measure students’ l...
Bayesian Knowledge Tracing (BKT) has been employed successfully in intelligent learning environments to individualize curriculum sequencing and help messages. Standard BKT employs four parameters, which are estimated separately for individual knowledge components, but not for individual students. Studies have shown that individualizing the paramete...
In this age of fake news and alternative facts, the need for a citizenry capable of critical thinking has never been greater. While teaching critical thinking skills in the classroom remains an enduring challenge, research on an ill-defined domain like critical thinking in the educational technology space is even more scarce. We propose a difficult...
We demonstrate that, by using a small set of hand-graded students, we can automatically generate rubric parameters with a high degree of validity, and that a predictive model incorporating these rubric parameters is more accurate than a previously reported model. We present this method as one approach to addressing the often challenging problem of...
K-12 classrooms use block-based programming environments (BBPEs) for teaching computer science and computational thinking (CT). To support assessment of student learning in BBPEs, we propose a learning analytics framework that combines hypothesis- and data-driven approaches to discern students' programming strategies from BBPE log data. We use a pr...
Learning Analytics courses and degree programs both on-and offline have begun to proliferate over the last three years. As a result of this growth in interest from students, university administrators, researchers and instructors we believe it is a good time to review how these educational efforts are impacting the field, how synergy between instruc...
This workshop will explore community based repositories for educational data and analytic tools that are used to connect researchers and reduce the barriers to data sharing. Leading innovators in the field, as well as attendees, will identify and report on bottlenecks that remain toward our goal of a unified repository. We will discuss these as wel...
Many introductory programming environments generate a large amount of log data, but making insights from these data accessible to instructors remains a challenge. This research demonstrates that student outcomes can be accurately predicted from student program states at various time points throughout the course, and integrates the resulting predict...
In the spring of 2010, the Association for Computing Machinery (ACM) Special Interest Group on Knowledge Discovery and Data-mining (KDD) selected a dataset from an educational technology for its annual competition. The competition, titled “Educational Data Mining Challenge”, tasked participants with predicting the correctness of student answers to...
This study examines how accurately individual student differences in learning can be predicted from prior student learning activities. Bayesian Knowledge Tracing (BKT) predicts learner performance well and has often been employed to implement cognitive mastery. Standard BKT individualizes parameter estimates for knowledge components, but not for le...
Past studies have shown that Bayesian Knowledge Tracing (BKT) can predict student performance and implement Cognitive Mastery successfully. Standard BKT individualizes parameter estimates for skills, also referred to as knowledge components (KCs), but not for students. Studies deriving individual student parameters from the data logs of student tut...
How do learners in middle and high school enact computational thinking (CT) practices as they build computational artifacts in open-ended programming environments? What configurations and patterns of student behavior in open-ended programming environments provide evidence of their learning of CT process and practices? How can we design personalized...
The amount of data available to build simulation models of schools is immense, but using these data effectively is difficult. Traditional methods of computer modeling of educational systems often either lack transparency in their implementation, are complex, and often do not natively simulate non-linear systems. In response, we advocate a Complex A...
We present a new ITS system called SCALE (Student Centered Adaptive Learning Engine), which is focused on improving learning outcomes by using data collected from existing and emerging educational technology systems combined with machine learning techniques to automatically generate adaptive capabilities. This allows for the creation of intelligent...
Editor's Introduction Advanced educational technologies are developing rapidly and online MOOC courses are becoming more prevalent, creating an enthusiasm for the seemingly limitless data-driven possibilities to affect advances in learning and enhance the learning experience. For these possibilities to unfold, the expertise and collaboration of man...
Increasing widespread use of educational technologies is producing vast amounts of data. Such data can be used to help advance our understanding of student learning and enable more intelligent, interactive, engaging, and effective education. In this article, we discuss the status and prospects of this new and powerful opportunity for datadriven dev...
Deep analysis of domain content yields novel insights and can be used to produce better courses. Aspects of such analysis can be performed by applying AI and statistical algorithms to student data collected from educational technology and better cognitive models can be discovered and empirically validated in terms of more accurate predictions of st...
We examine a large dataset collected by the Marmoset system in a CS2 course. The dataset gives us a richly detailed portrait of student behavior because it combines automatically collected program snapshots with unit tests that can evaluate the correctness of all snapshots. We find that students who start earlier tend to earn better scores, which i...
Automatically-tested online programming exercises can be useful in introductory programming courses as self-tests to accompany readings, for in-class assessment, for skills development, and to provide additional practice for students who need it. CloudCoder (http://cloudcoder.org) is an effort to build a community based on an open-source programmin...
We are starting to integrate Carnegie Learning's Cognitive Tutor (CT) into the Army Research Laboratory's Generalized Intelligent Framework for Tutoring (GIFT), with the aim of extending the tutoring systems to understand the impact of integrating non-cognitive factors into our tutoring. As part of this integration, we focus on ways in which non-co...
We describe a new technique to represent, classify, and use programs written by novices as a base for automatic hint generation for programming tutors. The proposed linkage graph representation is used to record and reuse student work as a domain model, and we use an overlay comparison to compare in-progress work with complete solutions in a twist...
Using the online educational game Battleship Numberline, we have collected over 8 million number line estimates from hundreds of thousands of players. Using random assignment, we evaluate the effects of various adaptive sequencing algorithms on player engagement and learning.
Time pressure helps students practice efficient strategies. We report strong effects from using games to promote fluency in mathematics.
Traditional experimental paradigms have focused on executing experiments in a lab setting and eventually moving successful findings to larger experiments in the field. However, data from field experiments can also be used to inform new lab experiments. Now, with the advent of large student populations using internet-based learning software, online...
W This panel is proposed as a means of promoting mutual learning and continued dialogue between the Educational Data Mining and Learning Analytics communities. EDM has been developing as a community for longer than the LAK conference, so what if anything makes the LAK community different, and where is the common ground?
Student modeling plays a critical role in developing and improving instruction and instructional technologies. We present a technique for automated improvement of student models that leverages the DataShop repository, crowd sourcing, and a version of the Learning Factors Analysis algorithm. We demonstrate this method on eleven educational technolog...
An ideal scenario for educational research is to perform an experiment, report and publish results, make the results and data
available for verification, and finally allow the data to be used in follow up experiments or for secondary analyses. Unfortunately,
this scenario often fails after the results are published. Researchers move on to new data...
In our prior work we showed it was feasible to augment a logic tutor with a data-driven Hint Factory that uses data to automatically generate context-specific hints for an existing computer aided instructional tool. Here we investigate the impact of automatically generated hints on educational outcomes in a robust experiment that shows that hints h...
We show how data visualization and modeling tools can be used with human input to improve student models. We present strategies
for discovering potential flaws in existing student models and use them to identify improvements in a Geometry model. A key
discovery was that the student model should distinguish problem steps requiring problem decomposit...
The Hint Factory is an implementation of our novel method to automatically generate hints using past student data for a logic
tutor. One disadvantage of the Hint Factory is the time needed to gather enough data on new problems in order to provide hints.
In this paper we describe the use of expert sample solutions to “seed” the hint generation proce...
The Pittsburgh Science of Learning Center’s DataShop is an open data repository and set of associated visualization and analysis
tools. DataShop has data from thousands of students deriving from interactions with on-line course materials and intelligent
tutoring systems. The data is fine-grained, with student actions recorded roughly every 20 secon...
One function of a student model in tutoring systems is to select future tasks that will best meet student needs. If the inference procedure that updates the model is inaccurate, the system may select non-optimal tasks for enhancing students' learning. Poor selection may arise when the model assumes multiple knowledge components are required for a s...