Kristy Elizabeth Boyer’s research while affiliated with University of Florida and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (192)


Dialogue Act Taxonomy Used in this Analysis
Evaluation of Classifier Performance for TF-IDF and Sentence-Bert Datasets at Group Level
Predicting and Analyzing Students’ Higher-Order Questions in Collaborative Problem-Solving
  • Conference Paper
  • Full-text available

November 2024

·

18 Reads

·

Toni V Earle-Randell

·

·

[...]

·

Question-asking is a crucial learning and teaching approach. It reveals different levels of students' understanding, application, and potential misconceptions. Previous studies have categorized question types into higher and lower orders, finding positive and significant associations between higher-order questions and students' critical thinking ability and their learning outcomes in different learning contexts. However, the diversity of higher-order questions, especially in collaborative learning environments, has left open the question of how they may be different from other types of dialogue that emerge from students' conversations. To address these questions, our study utilized natural language processing techniques to build a model and investigate the characteristics of students' higher-order questions. We interpreted these questions using Bloom's taxonomy, and our results reveal three types of higher-order questions during collaborative problem-solving. Students often use "Why", "How" and "What If" questions to 1) understand the reason and thought process behind their partners' actions; 2) explore and analyze the project by pinpointing the problem; and 3) propose and evaluate ideas or alternative solutions. In addition, we found dialogue labeled 'Social', 'Question-other', 'Directed at Agent', and 'Confusion/Help Seeking' shows similar underlying patterns to higher-order questions. Our findings provide insight into the different scenarios driving students' higher-order questions and inform the design of adaptive systems to deliver personalized feedback based on students' questions.

Download

Figure 2: A screenshot of the original HoloOrbits environment (Rajarathinam, Palaguachi, and Kang 2024) with the keypoints annotated.
Figure 3: The Learner Model Edit Graph used in our experiments to evaluate LLM robustness across five distinct edit operations to the learner model. Each node represents a "snapshot" of the learner model after specific edits by the developer. Inside each node, the MDHyps comprising the learner model snapshot are listed. Green nodes indicate calibrated snapshots, while yellow nodes represent states untested for calibration. Each MDHyp in the learner model is annotated with a superscript: '?' for untested calibration status and '*' for confirmed calibration. (1) Ex-Situ Transfer: Tests if an MDHyp that is calibrated alongside other MDHyps remains calibrated when tested alone. (2) Combine Hypotheses: Assesses if two separately calibrated hypotheses remain stable when combined. (3) Variable Swap: Involves swapping a single variable within a hypothesis. (4) LC Swap: Evaluates if a prompt template calibrated for one learner characteristic works for another in the same class. (5) Calibration Regression: Tests if a calibrated hypothesis remains stable when a new hypothesis is added to the model.
Figure 4: This figure depicts the hierarchical composition of the learner simulation prompt template, ˆ Isim, which integrates global fragments ( ˆ Iglobal), environment descriptions ( ˆ Ienvironment), and learner persona values ( ˆ Ilearner) to provide contextual grounding. The template also includes Learner Characteristic (LC) Models, ˆ ILC(M), which are parameterized to simulate responses under different hypotheses, Hi,j, evaluated within individual LC models (M1, M2). These components collectively facilitate the generation of contextually appropriate actions in the simulation, reflecting the interplay between the environment and the learner's characteristics.
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments

October 2024

·

44 Reads

Simulating learner actions helps stress-test open-ended interactive learning environments and prototype new adaptations before deployment. While recent studies show the promise of using large language models (LLMs) for simulating human behavior, such approaches have not gone beyond rudimentary proof-of-concept stages due to key limitations. First, LLMs are highly sensitive to minor prompt variations, raising doubts about their ability to generalize to new scenarios without extensive prompt engineering. Moreover, apparently successful outcomes can often be unreliable, either because domain experts unintentionally guide LLMs to produce expected results, leading to self-fulfilling prophecies; or because the LLM has encountered highly similar scenarios in its training data, meaning that models may not be simulating behavior so much as regurgitating memorized content. To address these challenges, we propose Hyp-Mix, a simulation authoring framework that allows experts to develop and evaluate simulations by combining testable hypotheses about learner behavior. Testing this framework in a physics learning environment, we found that GPT-4 Turbo maintains calibrated behavior even as the underlying learner model changes, providing the first evidence that LLMs can be used to simulate realistic behaviors in open-ended interactive learning environments, a necessary prerequisite for useful LLM behavioral simulation.


Figure 1: Two settings are illustrated for IURs: restaurant-reservation and home-automation.
Figure 2: The five-stage IUR generation pipeline.
Figure 5: We report the qualities of the IURs generated using smaller, open-source Llama 2 models of three different sizes (7B, 13B, 70B). All the evaluation results are obtained using the best-performing GPT-4 proxy evaluation model (as described in Section 5).
IndirectRequests: Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests

June 2024

·

8 Reads

Existing benchmark corpora of task-oriented dialogue are collected either using a "machines talking to machines" approach or by giving template-based goal descriptions to crowdworkers. These methods, however, often produce utterances that are markedly different from natural human conversations in which people often convey their preferences in indirect ways, such as through small talk. We term such utterances as Indirect User Requests (IURs). Understanding such utterances demands considerable world knowledge and reasoning capabilities on the listener's part. Our study introduces an LLM-based pipeline to automatically generate realistic, high-quality IURs for a given domain, with the ultimate goal of supporting research in natural language understanding (NLU) and dialogue state tracking (DST) for task-oriented dialogue systems. Our findings show that while large LLMs such as GPT-3.5 and GPT-4 generate high-quality IURs, achieving similar quality with smaller models is more challenging. We release IndirectRequests, a dataset of IURs that advances beyond the initial Schema-Guided Dialog (SGD) dataset in that it provides a challenging testbed for testing the "in the wild" performance of NLU and DST models.



An Automated, Unobtrusive, Formative Assessment of Creativity in a Computer Science and Music Remixing Learning Environment

Psychology of Aesthetics Creativity and the Arts

Creativity is one of the most crucial skills for success in the 21st-century workforce. Specifically, creativity is an important skill to have in science, technology, engineering, and mathematics (STEM)-related fields, and more empirical studies are needed to assess and improve creativity in STEM-related learning environments. In this study, we designed and validated an automated, unobtrusive, formative assessment of creativity in EarSketch, a computational music remixing platform where students learn to write Python or JavaScript code to create pieces of music. Using an existing data set of EarSketch projects (n = 53), we addressed two research questions: (Research Question 1) To what extent is the automated assessment of creativity that we designed in EarSketch psychometrically sound (focusing on validity and reliability), and (Research Question 2) what variables (i.e., divergent thinking, complexity, and self-report variables) predict students’ creativity in EarSketch? Our main findings show that (a) the automated assessment of creativity has reasonable convergent validity (r = .47) and discriminant validity; (b) the automated assessment of creativity has a reliability estimate of .70; and (c) divergent thinking and the students’ confidence in learning how to code significantly predicted students’ creativity scores in an external, consensual assessment of creativity by EarSketch experts. Providing learning environments that can assess and support essential skills such as creativity alongside other STEM-related skills such as programming and computational thinking holds great promise for developing the next generation of the workforce who is not merely aware of STEM concepts and principles, but is creative and innovative in pursuing STEM solutions.



Investigating Linguistic Alignment in Collaborative Dialogue: A Study of Syntactic and Lexical Patterns in Middle School Students

March 2024

·

57 Reads

Language and Speech

Linguistic alignment, the tendency of speakers to share common linguistic features during conversations, has emerged as a key area of research in computer-supported collaborative learning. While previous studies have shown that linguistic alignment can have a significant impact on collaborative outcomes, there is limited research exploring its role in K–12 learning contexts. This study investigates syntactic and lexical linguistic alignments in a collaborative computer science–learning corpus from 24 pairs (48 individuals) of middle school students (aged 11–13). The results show stronger effects of self-alignment than partner alignment on both syntactic and lexical levels, with students often diverging from their partners on task-relevant words. Furthermore, student self-alignment on the syntactic level is negatively correlated with partner satisfaction ratings, while self-alignment on lexical level is positively correlated with their partner’s satisfaction.





Citations (75)


... Although the diverse roles of AI in creating mathematical writing and its benefits in the K-12 learning context have been explored, designing and implementing effective AI-assisted creative mathematics writing solutions face challenges. These challenges include technology (un) acceptance of the stakeholders (Song, Weisberg et al., 2024), overreliance on AI-generated content (Kim, Yu et al., 2024;Chan & Hu, 2023), reduced opportunities for social learning and peer/ teacher feedback (Zimmerman et al., 2024;Guilherme, 2019), and exposure to inappropriate content (Kim, Yu et al., 2024). To better address such limitations and maximize the pedagogical benefits of creative mathematical writing with GenAI, it is important to account for both students' and teachers' perspectives of the lived reality of the classroom and their needs on the ground to create more constructive and meaningful interactions among students, teachers, and the AI system (Heilporn et al., 2021). ...

Reference:

Elementary school students’ and teachers’ perceptions toward creative mathematical writing with Generative AI
A framework for inclusive AI learning design for diverse learners

Computers and Education Artificial Intelligence

... O contínuo desenvolvimento de técnicas inovadoras tem impulsionado um crescimento significativo e progresso naárea de PLN baseada em IA, com um impacto particularmente relevante no ensino médio.À medida que aárea de PLN evolui, seu papel se torna cada vez mais importante em diversas aplicações, moldando o futuro da interação humano-computador e influenciando a forma como processamos e entendemos a linguagem [Katuka et al. 2024]. ...

Integrating Natural Language Processing in Middle School Science Classrooms: An Experience Report
  • Citing Conference Paper
  • March 2024

... Collaborative learning involves two or more learners working together on a shared learning goal through information sharing and negotiation (Dillenbourg, 1999;Roschelle & Teasley, 1995), and as a form of collaborative learning, pair programming, has been particularly effective in K-12 Computer Science (CS) Education and has been demonstrated to positively impact problem-solving skills and CS knowledge (Wei et al., 2021). There is a growing body of knowledge on the use of collaborative programming in K-12 classrooms (Earle-Randell et al., 2024;Zhong et al., 2016) but understanding at a more granular level the collaborative behaviors that emerge during pair programming activities and how we can support learners during these tasks is still an open question. Utilizing natural language processing has been a successful strategy for researchers to model the collaborative discourse between learners (Earle-Randell, 2023), but looking deeply into the higher-order questions that students ask during these collaborative problem-solving tasks could provide valuable insight into the behaviors that drive collaboration between K-12 learners. ...

The impact of near-peer virtual agents on computer science attitudes and collaborative dialogue
  • Citing Article
  • March 2024

International Journal of Child-Computer Interaction

... The researcher trained all the teachers who agreed to participate in this study. The Cellular agent-based programming environment (Meyer et al., 2012), an extension of block-based programming, Snap! (Garcia et al., 2015), was used as a platform on which students built their computational models (Houchins et al., 2021;Lytle et al., 2019). In the activity, students learned the concept of energy transfer within a simple food web model. ...

How Use-Modify-Create Brings Middle Grades Students To Computational Thinking

International Journal of Designs for Learning

... We took these suggestions and proceeded to develop the first version of the NLP4Science platform (see Figure 1). The platform provides visualizations of analyzed data, including pie charts, histograms, word clouds, bar charts, and tables to display information about the sentiment and keywords of the text [9]. ...

NLP4Science: Designing a Platform for Integrating Natural Language Processing in Middle School Science Classrooms
  • Citing Conference Paper
  • October 2023

... In this paper, we utilized a dataset that was collected as part of a larger project to investigate collaborative CS learning with virtual agents for upper elementary school children and has previously been published by Earle-Randell et al. (2023; and Ma et al. (2023). The dataset we analyzed consists of video and audio recordings of 44 fourth-grade learners in an elementary school in the southeastern United States who provided assent and parental consent. ...

How Noisy is Too Noisy? The Impact of Data Noise on Multimodal Recognition of Confusion and Conflict During Collaborative Learning
  • Citing Conference Paper
  • October 2023

... Researchers and practitioners design and develop new technological advancements for cyclists by augmenting helmets, bicycles, and the environment around them [4, 8, 10, 15-18, 20, 32-35]. Recent works aimed to measure and evaluate subjective experiences [1,3,5,6,9,11,13,19,20,23,27], address safe and potentially realistic bicycle simulators [7,14,22,25,26,37], and even propose self-driving bikes [24,39]. Given the increasing popularity of bicycle research in the HCI community reflected in a high number of previous workshops and cycling events [2,21,30,31,36,38], we believe there is a lot we can learn from these emerging research projects for interaction design regarding cycling for researchers, designers, and practitioners. ...

Exploring Real-Time Collaborative Heart Rate Displays for Cycling Partners
  • Citing Conference Paper
  • September 2023

... LLMs are also transforming the landscape for authoring educational agents such as PAs, intelligent tutors (Sottilare et al., 2015), and even simulated learners (Käser and Alexandron, 2023). Before the widespread adoption of modern LLMs, agent authoring was bottlenecked by supervised and reinforcement learning methods that required machine learning expertise (Mannekote et al., 2023;Liu and Chilton, 2022), lots of data, labor-intensive manual annotation, or some combination of these factors. In contrast, the recent development of instruction-tuned LLMs (Wang et al., 2023) enables educational experts to define agent behaviors using natural language instructions in "zero-shot" or "few-shot" setups (i.e., using no annotated examples or only a few, respectively). ...

Exploring Usability Issues in Instruction-Based and Schema-Based Authoring of Task-Oriented Dialogue Agents
  • Citing Conference Paper
  • July 2023

... Our current sample size does not allow us to estimate tutor-specific models of SRL reliably to investigate the effects of these differences in scaffolding on learning, which should be an area of future work. predicted in text classification tasks, for example, by using BERT, which has been fruitfully used for similar prediction tasks in learning analytics (e.g., [21]), or recent large language models. Future work could also investigate if the SRL codes used in this study can be reliably predicted from automated transcriptions of think-aloud, which we used to automate the process of transcribing think-aloud for human labeling, which was still labor-intensive. ...

Enhancing Engagement Modeling in Game-Based Learning Environments with Student-Agent Discourse Analysis

Communications in Computer and Information Science